gfp workshop
TRANSCRIPT
GFP WorkshopUndergraduate Bioinformatics Club (UBIC) at UCSD
Alexander Niema Moshiri
Green Fluorescent Protein:
Origins
Green Fluorescent Protein (GFP) is a naturally-occurring
protein in a species of jellyfish, Aequorea victoria
When excited by blue or ultraviolet light, GFP
fluoresces a green color
A fluorescent Aequorea victoria
Green Fluorescent Protein:
A Brief History of wtGFP
GFP has been studied as early as the 1960s
However, its utility for molecular biologists was
not realized until the 1990s
In 1992, Douglas Prasher cloned and
sequenced the wild-type GFP (wtGFP) gene
“Wild-type” = Natural
Prasher proposed using GFP as a biochemical
tracer that allows us to look at the inner
workings of cells
Douglas Prasher
Green Fluorescent Protein:
Recombination of wtGFP
The lab of Martin Chalfie expressed wtGFP in E. coli and
C. elegans
To their surprise, wtGFP was able to glow in both
species without needing any jellyfish cofactors
C. elegans expressing wtGFP
Green Fluorescent Protein:
Bioengineered
In 1995, by changing a single amino
acid, Roger Tsien engineered the first
improved mutant of GFP with
increased fluorescence and
photostability
Tsien was awarded the 2008 Nobel
Prize in chemistry for his GFP work
He is currently a professor at UCSD
Further improvements to GFP were
made over the next few yearsRoger Tsien
Green Fluorescent Protein:
Current State of Mutants
Today, many more derivatives have
been created from GFP and dsRed (a
red fluorescent protein)
Researchers have access to a range
of colors, including green, yellow,
orange, red, violet, blue, and cyan
An illustration of a San Diego beach scene
drawn using 8 colors of FPs
Rainbow of FPs from the Tsien lab
Green Fluorescent Protein:
Experimental Uses
We mentioned before that FPs can be used to track
cellular processes
Researchers can simply attach an FP to some object of
interest and then they can visually follow the object
Mice expressing GFP next to normal mice GFP-expressing neurons
Protein Data Bank:
A Brief Overview
The Protein Data Bank (PDB) is a
repository of 3D structural data
of large biological molecules
(e.g. proteins and nucleic acids)
This structural data can be
downloaded and used to render
a 3D image of the molecule of
interest
3D rendering of GFP from PDB data
Protein Data Bank:
Step 1: Querying the PDB
Open Mozilla Firefox and navigate to www.rcsb.org
The search box on the top of the page allows you to
“Search by PDB ID, author, macromolecule, sequence,
or ligands”
Search for the term Green Fluorescent Protein and hit
“Go”
Scroll down and click on entry 4KW4: “Crystal Structure
of Green Fluorescent Protein”
Protein Data Bank:
Step 2: Questions About Results
Who are the authors of the primary citation for 4KW4?
What organism is this protein from?
How long (in amino acids) is this protein?
What method was used to produce this entry’s data?
What is the resolution in Angstroms (Å)?
Protein Data Bank:
Step 3: Rendering 3D Structure
Return to the PDB homepage: www.rcsb.org
In the left-column panel, click “Visualize”
In the box that says “Enter a PDB ID”, enter 4KW4 and
click “View Jmol”
You should see a 3D rendering of GFP
You can click and drag the 3D render to rotate it
Protein Data Bank:
Step 4: Display Customization
Under “Select Display Mode,” click “Custom View”
Cycle through the different Style options and choose
your favorite
My personal favorite is the default, Cartoon
Cycle through the different Color options
You can also change the color(s) by Right-Clicking on the
3D render, going to Color, then Structures, then Cartoon
(assuming you’re still in Cartoon style), and choosing a
color
You can also go to Color Structures Cartoon By
Scheme and choose one of those options
Protein Data Bank:
Step 5: Exporting 3D Image
Finish customizing the 3D image to your liking
Feel free to play with the other options in the menu that
pops up when you Right-Click on the 3D image
If you want to revert to the original settings, just refresh
the page and it will reload with the default settings
When you are ready to export the final image, just click
the blue “Export 3D Image” button, specify a
destination, and click “Save”
Enjoy your cool 3D image of GFP!
Multiple Sequence Alignment:
The FASTA Format
The FASTA format is a text-based format for
representing DNA, RNA, or Protein sequences
A sequence in the FASTA format begins with a single-line
description (beginning with the ‘>’ character), followed
by line(s) of sequence data
Multiple Sequence Alignment:
Sequence Alignment
A sequence alignment is a way of arranging biological
sequences (DNA, RNA, or Protein) to identify regions of
similarity between the sequences
Gaps can be inserted between characters in the
sequences so that identical or similar characters can be
aligned in the same column
An example multiple sequence alignment
Multiple Sequence Alignment:
GFP and its Derivatives
In the following activity, we will
align the sequences of GFP and some
of its derivative fluorescent proteins
These proteins’ sequences are
provided in the file named
protein_sequences.fasta
Using the results from the multiple
sequence alignment, we will be able
to construct a phylogenetic tree
This tree will provide us information
about the pairwise “closeness”
between the protein sequences
Multiple Sequence Alignment:
ClustalW2
ClustalW2 is a popular multiple sequence alignment tool
Download protein_sequences.fasta from:
http://ubic.ucsd.edu/gfp/
Go to the ClustalW2 website:
http://www.ebi.ac.uk/Tools/msa/clustalw2/
Under “STEP 1 – Enter your input sequences”, upload protein_sequences.fasta by clicking “Choose File”
Under “STEP 4 – Submit your job”, click “Submit”
Multiple Sequence Alignment:
ClustalW (Continued)
After you click Submit, ClustalW2 will redirect you to
the results of the multiple sequence alignment
The IDs of the sequences are to the left of the alignment,
and each row of the alignment corresponds to a single
sequence (e.g. the first row of every chunk is
“GFP(4KW4)”)
If the alignment doesn’t make sense to you, be sure to
ask one of the UBIC officers any questions you have!
Evolutionary Relationships:
Phylogenetic Tree
A phylogenetic tree is a branching diagram (or “tree”) that shows relationships of “closeness” between different biological species or other entities
Elements that are closer together on the tree have “closer” (more similar) sequences
In the ClustalW2 results page, click “Send to ClustalW2_Phylogeny”
On the resulting page, under “STEP 3 – Submit your job”, click “Submit”
Draw out the phylogenetic tree (questions will be asked about it on the Extra Credit assignment)
Phylogenetic Trees:
Biological Importance
The information provided by phylogenetic trees is extremely valuable and is even applicable to medicine
In 1994, Richard Schmidt, an American physician, used a sample of blood from one of his AIDS-infected patients to inject into his ex-lover and former colleague, Janice Trahan, infecting her with HIV
HIV DNA was collected from the victim, from the putative patient source, and from thirty-two other unrelated, HIV-positive individuals
Scientists concluded that of all the samples they tested, the two viruses' DNA from the victim and the patient matched almost exactly, even with HIV's potential to mutate very rapidly
Phylogenetic Tree from the
HIV Court Case
GFP Workshop:
Summary
Congratulations on finishing the GFP Workshop!
Throughout the workshop, you learned the following:
GFP’s history and uses
How to use the PDB (and rendering 3D protein structures)
Multiple Sequence Alignment using ClustalW2
Phylogenetic Tree Construction from a Multiple Sequence
Alignment using ClustalW2_Phylogeny
We hope you enjoyed the workshop, and we hope you
have found interest in the field of Bioinformatics!