bioperl modules. object oriented programming in perl (1) defining a class – a class is simply a...

39
Bioperl modules

Post on 21-Dec-2015

233 views

Category:

Documents


0 download

TRANSCRIPT

Bioperl modules

Object Oriented Programming in Perl (1)

• Defining a class– A class is simply a package with subroutines that function

as methods.

#!/usr/local/bin/perlpackage Cat;sub new {…}sub meow {…}

Object Oriented Programming in Perl (2)

$new_object = new ClassName;

$cat->meow();

Perl ObjectTo initiates an object from a class, call the class “new” method.

Using MethodTo use the methods of an object, use the “->” operator.

Object Oriented Programming in Perl (3)

• Inheritance– Declare a class array called @ISA.

• This array store the name and parent class(es) of the new species.

package NorthAmericanCat;@NorthAmericanCat::ISA = (“Cat”);sub new { …}

Perl Modules

A Perl module is a reusable package defined in a library file whose name is the same as the name of the package.

Names of perl modules

• Each Perl module has a unique name.• To minimize name space collision, Perl provides a

hierarchical name space for modules.– Components of a module name are separated by double

colons (::).– For example,

• Math::Complex• Math::Approx• String::BitCount• String::Approx

Module files

• Each module is contained in a single file.

• Module files are stored in a subdirectory hierarchy that parallels the module name hierarchy.

• All module files have an extension of .pm.

Module Is stored in

Config Config.pm

Math::Complex Math/Complex.pm

String::Approx String/Approx.pm

Module libraries

• The Perl interpreter has a list of directories in which it searhces for modules.

• Global arry @INC

>perl –V

@INC:

/usr/local/lib/perl5/5.00503/sun4-solaris

/usr/local/lib/perl5/5.00503

/usr/local/lib/perl5/site-perl/5.005/sun4-solaris

/usr/local/lib/perl5/site-perl/5.005

Using Modules

• A module can be loaded by calling the use function.

use Foo;

bar( “a” ); # using bar method

blat( “b” ); # using blat method

Bioperl toolkit• Core package (bioperl-live)

– THE basic package and it’s required by all the other packages• Run package (bioperl-run)

– Providing wrappers for executing some 60 common bioinformatics applications

• DB package (bioperl-db)– Subproject to store sequence and annotation data in a BioSQL

relational database• Network package (bioperl-network)

– Parses and analyzes protein-protein interaction data• Dev package (bioperl-dev)

– New and exploratory bioperl development

Bioperl Object-Oriented

• The Bioperl takes advantages of the OO design to create a consistent, well documented, object model for interacting with biological data in the life sciences.

• Bioperl Name space The Bioperl package installs everything in the Bio:: namespace.

(where are the packages stored???)

Bioperl Objects

• Sequence handling objects– Sequence objects

– Alignment objects

– Location objects

• Other Objects:3D structure objects, tree objects and phylogenetic trees, map objects, bibliographic objects and graphics objects

Sequence handling

• Typical sequence handling tasks:– Access the sequence

– Format the sequence

– Sequence alignment and comparison • Search for similar sequences

• Pairwise comparisons

• Multiple alignment

Sequence Annotation

• Bio::SeqFeature Sequence object can have multiple sequence feature (SeqFeature) objects (e.g. Gene, Exon, or Promoter objects) associated with it.

• Bio::Annotation A Seq object can also have an Annotation object (used to store database links, literature references and comments) associated with it

Sequence Input/Output

The Bio::SeqIO system was designed to make getting and storing sequences to and from the myriad of formats as easy as possible.

Accessing sequence data

– Bioperl supports accessing remote databases as well as local databases.

– Bioperl currently supports sequence data retrieval from the GenBank, Genpept, RefSeq, SwissProt, and EMBL databases

Format the sequences

• SeqIO object can read a stream of sequences in one format: Fasta, EMBL, GenBank, Swissprot, PIR, GCG, SCF, phd/phred, Ace, or raw (plain sequence), then write to another file in another format

Manipulating sequence data

$seqobj->display_id() # the human readable id of the sequence

$seqobj->subseq(5,10) # part of the sequence as a string $seqobj->desc() # a description of the sequence

$seqobj->trunc(5,10) # truncation from 5 to 10 as new object

$seqobj->revcom # reverse complements sequence

$seqobj->translate # translation of the sequence…

Search result parsing

The Bio::SearchIO system was designed for parsing sequence database searches (BLAST, sim4, waba, FASTA, HMMER, exonerate, etc.)

Manipulating alignment

The Bio::AlignIO system was designed for manipulating the alignment objects in different formats including aln, phylip, fasta, etc.

Example: Format the sequences

Example: using “seq_formating.pl” to convert “sequences.gb” to another format

Copy the files to the current directory

Check whether the files are executable

Now, let’s look at the genbank file.

The home directory in Windows system.

If you have Notepad++ installed, click “Edit with Notepad++”.

If not, try to open “sequence.gb” with Notepad program.

uncheck

The format of the input sequences.

The perl script file

If no arguments were supplied, a usage information will appear for instructions.

Program name

Input file

Format of the input sequences

Output file

Format of the output sequences

<enter>

Program suceeded!

Now it’s time to look at the file generated.

Use ‘command prompt’ to run the script

Type:cd<space>c:\BioDownload

To enter the BioDownload folder

Type:dir

To display the files in the current folder (NOT ls)

You should have the following files in the folder(you may have other files, but that’s fine):(1)seq_formating.pl(2)sequences.gb.txt

Type:perl<space>seq_formating.pl<space>sequences.gb.txt<space>genbank<space>sequences.fasta<space>fasta

Output file

The format of the output sequences.

Parsing the BLAST output

What’s next: