![Page 1: Introduction to Perl Bioinformatics. What is Perl? Practical Extraction and Report Language A scripting language Components an interpreter scripts: text](https://reader031.vdocuments.site/reader031/viewer/2022013118/56649d2c5503460f94a017a5/html5/thumbnails/1.jpg)
Introduction to Perl
Bioinformatics
![Page 2: Introduction to Perl Bioinformatics. What is Perl? Practical Extraction and Report Language A scripting language Components an interpreter scripts: text](https://reader031.vdocuments.site/reader031/viewer/2022013118/56649d2c5503460f94a017a5/html5/thumbnails/2.jpg)
What is Perl? Practical Extraction and Report
Language A scripting language Components
an interpreter scripts: text files created by user
describing a sequence of steps to be performed by the interpreter
![Page 3: Introduction to Perl Bioinformatics. What is Perl? Practical Extraction and Report Language A scripting language Components an interpreter scripts: text](https://reader031.vdocuments.site/reader031/viewer/2022013118/56649d2c5503460f94a017a5/html5/thumbnails/3.jpg)
Installation Create a Perl directory under C:\ Either
Download AP.msi from the course website (http://curry.ateneo.net/~jpv/BioInf07/) and execute (installs into C:\Perl directory)
Or download and unzip AP.zip into C:\Perl Reset path variable first (or edit C:\
autoexec.bat) so that you can execute scripts from MSDOS C> path=%path%;c:\Perl\bin
![Page 4: Introduction to Perl Bioinformatics. What is Perl? Practical Extraction and Report Language A scripting language Components an interpreter scripts: text](https://reader031.vdocuments.site/reader031/viewer/2022013118/56649d2c5503460f94a017a5/html5/thumbnails/4.jpg)
Writing and RunningPerl Scripts Create/edit script (extension: .pl)
C> edit first.pl
Execute script C> perl first.pl
* Tip: place your scripts in a separate work directory
# my first scriptprint “Hello World”;print “this is my first script”;
![Page 5: Introduction to Perl Bioinformatics. What is Perl? Practical Extraction and Report Language A scripting language Components an interpreter scripts: text](https://reader031.vdocuments.site/reader031/viewer/2022013118/56649d2c5503460f94a017a5/html5/thumbnails/5.jpg)
Perl Features Statements Strings Numbers and Computation Variables and Interpolation Input and Output Files Conditions and Loops Pattern Matching Arrays and Lists
![Page 6: Introduction to Perl Bioinformatics. What is Perl? Practical Extraction and Report Language A scripting language Components an interpreter scripts: text](https://reader031.vdocuments.site/reader031/viewer/2022013118/56649d2c5503460f94a017a5/html5/thumbnails/6.jpg)
Statements A Perl script is a sequence of
statements Examples of statements
print “Type in a value”;$value = <>;$square = $value * $value;print “The square is ”, $square, “\n”;
![Page 7: Introduction to Perl Bioinformatics. What is Perl? Practical Extraction and Report Language A scripting language Components an interpreter scripts: text](https://reader031.vdocuments.site/reader031/viewer/2022013118/56649d2c5503460f94a017a5/html5/thumbnails/7.jpg)
Comments Lines that start with # are ignored
by the Perl interpreter# this is a comment line
In a line, characters that follow # are also ignored$count = $count + 1; # increment
$count
![Page 8: Introduction to Perl Bioinformatics. What is Perl? Practical Extraction and Report Language A scripting language Components an interpreter scripts: text](https://reader031.vdocuments.site/reader031/viewer/2022013118/56649d2c5503460f94a017a5/html5/thumbnails/8.jpg)
Strings String
Sequence of characters Text
In Perl, characters should be surrounded by quotes ‘I am a string’ “I am a string”
Special characters specified through escape sequences (preceded by a \ ) “a newline\n and a tab\t”
![Page 9: Introduction to Perl Bioinformatics. What is Perl? Practical Extraction and Report Language A scripting language Components an interpreter scripts: text](https://reader031.vdocuments.site/reader031/viewer/2022013118/56649d2c5503460f94a017a5/html5/thumbnails/9.jpg)
Numbers Integers specified as a sequence of
digits 6 453
Decimal numbers: 33.2 6.04E24 (scientific notation)
![Page 10: Introduction to Perl Bioinformatics. What is Perl? Practical Extraction and Report Language A scripting language Components an interpreter scripts: text](https://reader031.vdocuments.site/reader031/viewer/2022013118/56649d2c5503460f94a017a5/html5/thumbnails/10.jpg)
Variables Variable: named storage for values
(such as strings and numbers) Names preceded by a $ Sample use:
$count = 5; # assignment statement$message = “Hello”; # another assignmentprint $count; # print the value of a variable
![Page 11: Introduction to Perl Bioinformatics. What is Perl? Practical Extraction and Report Language A scripting language Components an interpreter scripts: text](https://reader031.vdocuments.site/reader031/viewer/2022013118/56649d2c5503460f94a017a5/html5/thumbnails/11.jpg)
Computation Fundamental arithmetic operations:
+ - * / Others
** exponentiation () grouping
Example (try this out as a Perl script)$x = 4;$y = 2;$z = (3 + $x) ** $y;print $z, “\n”;
![Page 12: Introduction to Perl Bioinformatics. What is Perl? Practical Extraction and Report Language A scripting language Components an interpreter scripts: text](https://reader031.vdocuments.site/reader031/viewer/2022013118/56649d2c5503460f94a017a5/html5/thumbnails/12.jpg)
Interpolation Given the following script:
$x = “Smith”;print “Good morning, Mr. $x”;print ‘Good morning, Mr. $x’;
Strings quoted with “” perform expansions on variables escape characters like \n are also
interpreted when strings are quoted with “” but not when they are quoted with ‘’
![Page 13: Introduction to Perl Bioinformatics. What is Perl? Practical Extraction and Report Language A scripting language Components an interpreter scripts: text](https://reader031.vdocuments.site/reader031/viewer/2022013118/56649d2c5503460f94a017a5/html5/thumbnails/13.jpg)
Input and Output Output
print function Escape characters Interpolation
Input Bracket operator (e.g., $line = <>; ) Not typed (takes in strings or
numbers)
![Page 14: Introduction to Perl Bioinformatics. What is Perl? Practical Extraction and Report Language A scripting language Components an interpreter scripts: text](https://reader031.vdocuments.site/reader031/viewer/2022013118/56649d2c5503460f94a017a5/html5/thumbnails/14.jpg)
Input Files Opening a file
open INFILE, ’data.txt’; Input
$line = <INFILE>; Closing a file
close INFILE;
![Page 15: Introduction to Perl Bioinformatics. What is Perl? Practical Extraction and Report Language A scripting language Components an interpreter scripts: text](https://reader031.vdocuments.site/reader031/viewer/2022013118/56649d2c5503460f94a017a5/html5/thumbnails/15.jpg)
Output Files Opening
open OUTFILE, ’>result.txt’; Or, open OUTFILE, ’>>result.txt’;
#append Input
print OUTFILE “Hello”; Closing files
close OUTFILE;
![Page 16: Introduction to Perl Bioinformatics. What is Perl? Practical Extraction and Report Language A scripting language Components an interpreter scripts: text](https://reader031.vdocuments.site/reader031/viewer/2022013118/56649d2c5503460f94a017a5/html5/thumbnails/16.jpg)
Conditions Can execute statements
conditionally Syntax: Example:
if ( condition ) if ( $num > 1000 ){ { statement print “Large”; statement } …}
![Page 17: Introduction to Perl Bioinformatics. What is Perl? Practical Extraction and Report Language A scripting language Components an interpreter scripts: text](https://reader031.vdocuments.site/reader031/viewer/2022013118/56649d2c5503460f94a017a5/html5/thumbnails/17.jpg)
If - Else$num = <>;if ( $num > 1000 ){ print “Large number\n”;}else{ print “Small number\n”;}print “Thanks\n”;
![Page 18: Introduction to Perl Bioinformatics. What is Perl? Practical Extraction and Report Language A scripting language Components an interpreter scripts: text](https://reader031.vdocuments.site/reader031/viewer/2022013118/56649d2c5503460f94a017a5/html5/thumbnails/18.jpg)
Loops Repetitive execution Syntax: Example:
while ( condition )$count = 0;{ while ( $count < 10 ) statement { statement print
“counting-”, $count; … $count = $count +
1; } }
![Page 19: Introduction to Perl Bioinformatics. What is Perl? Practical Extraction and Report Language A scripting language Components an interpreter scripts: text](https://reader031.vdocuments.site/reader031/viewer/2022013118/56649d2c5503460f94a017a5/html5/thumbnails/19.jpg)
Conditions ( expr symbol expr ) Numbers
== equal <= less than or equal
!= not equal >= greater than or equal< less than> greater than
Stringseq ne lt gt le ge=~ pattern match
![Page 20: Introduction to Perl Bioinformatics. What is Perl? Practical Extraction and Report Language A scripting language Components an interpreter scripts: text](https://reader031.vdocuments.site/reader031/viewer/2022013118/56649d2c5503460f94a017a5/html5/thumbnails/20.jpg)
Functions length $str returns number of characters
in $str defined $str tests if $str is a valid string
(useful for testing if $line=<>;suceeded)
chomp $str removes last character from $str(useful because $line=<>;
includesthe newline character)
print $var displays $var on output device
![Page 21: Introduction to Perl Bioinformatics. What is Perl? Practical Extraction and Report Language A scripting language Components an interpreter scripts: text](https://reader031.vdocuments.site/reader031/viewer/2022013118/56649d2c5503460f94a017a5/html5/thumbnails/21.jpg)
Pattern Matching <string> =~ <pattern>
is a condition that that checks if a string matches a pattern
Simplest case: <pattern> specifies a search substringExample: if (s =~ /bio/) …
holds TRUE if s is “molecular biology”, “bioinformatics”, “the bionic man”;FALSE if s is “chemistry”, “bicycle”, “a BiOpsy”
![Page 22: Introduction to Perl Bioinformatics. What is Perl? Practical Extraction and Report Language A scripting language Components an interpreter scripts: text](https://reader031.vdocuments.site/reader031/viewer/2022013118/56649d2c5503460f94a017a5/html5/thumbnails/22.jpg)
Special pattern matching characters \w letters (word character) \d digit \s space character (space, tab
\n)
if ( s =~ /\w\w\s\d\d\d/ ) …holds TRUE for “CS 123 course”,“Take Ma 101 today”FALSE for “Only 1 number here”
![Page 23: Introduction to Perl Bioinformatics. What is Perl? Practical Extraction and Report Language A scripting language Components an interpreter scripts: text](https://reader031.vdocuments.site/reader031/viewer/2022013118/56649d2c5503460f94a017a5/html5/thumbnails/23.jpg)
Special pattern matching characters
. any character ^ beginning of string/line $ end of string or line
if ( s =~ /^\d\d\d\ss..r/ ) …holds TRUE for “300 spartans”FALSE for “all 100 stars”
![Page 24: Introduction to Perl Bioinformatics. What is Perl? Practical Extraction and Report Language A scripting language Components an interpreter scripts: text](https://reader031.vdocuments.site/reader031/viewer/2022013118/56649d2c5503460f94a017a5/html5/thumbnails/24.jpg)
Groups and Quantifiers [xyz] character set | alternatives * zero or more + 1 or more ? 0 or 1 {M} exactly M {M,N} between M and N characters
![Page 25: Introduction to Perl Bioinformatics. What is Perl? Practical Extraction and Report Language A scripting language Components an interpreter scripts: text](https://reader031.vdocuments.site/reader031/viewer/2022013118/56649d2c5503460f94a017a5/html5/thumbnails/25.jpg)
NCBI file Example
/VERSION\s+(\S+)\s+GI:(\S+)/
Matches a version line Parenthesis groups characters for
future retrieval $1 stands for the first version
number,$2 gets the number after “GI:”