tiny genome activity guide v1.7 2019 - garvan

17
1 TINY GENOME An activity about medical applications of DNA sequencing Teacher Guide v1.7 2019 Overview This activity introduces students to the processes involved in DNA testing and analysis, one of the most relevant applications of medical genomics today. This is a paper-based activity that can be completed in pairs. Students sequence the genome of a hypothetical creature, identify variants, and determine the cause of the patient’s symptoms. The first part of the activity models processes of DNA sequencing and revises polypeptide synthesis and mutation. The second part of the activity involves data analysis, interpretation of research, problem solving and critical thinking. A worksheet covering different types of mutations is also included. Activity Outcomes Students: -model the processing of DNA sequencing data -assess the effect of mutation on genotype and phenotype -investigate genetic causes of non-infectious disease -evaluate the importance of international databases of genetic information -interpret data to predict effect of mutations, comparing coding and non-coding regions -modify hypotheses to reflect new evidence -select and extract information and data from primary and secondary sources to make predictions and solve problems Assumed knowledge The central dogma of biology. DNAà mRNAàprotein. Transcription and translation. Mutations- types and effects Optional- how DNA sequencing works. Materials Needed Per student: -Variant Types Worksheet (optional) -Student Activity Guide (can be double sided, can be black and white) Per pair: -Task 1 worksheet, ideally printed A3 and in colour -one set of “short reads” (print same size as task 1 worksheet) and scissors (or provide pre-cut) -Additional handouts, printed A4, single sided (ideally uncollated) For teacher/class: -Teacher Guide with answers -PowerPoint with answers to tasks and questions These are available only via our secure Dropbox. Teachers can email [email protected] for access. Application Notes This activity is divided into tasks. Tasks 1-4 take students through the process of DNA sequencing, finding mutations and making sense of them. Depending on the student group, this could fill a 30-45min lesson (allow longer if starting with the quiz and variant worksheet).

Upload: others

Post on 26-May-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: TINY GENOME Activity guide v1.7 2019 - Garvan

1

TINY GENOME An activity about medical applications of DNA sequencing Teacher Guide v1.7 2019 Overview This activity introduces students to the processes involved in DNA testing and analysis, one of the most relevant applications of medical genomics today. This is a paper-based activity that can be completed in pairs. Students sequence the genome of a hypothetical creature, identify variants, and determine the cause of the patient’s symptoms. The first part of the activity models processes of DNA sequencing and revises polypeptide synthesis and mutation. The second part of the activity involves data analysis, interpretation of research, problem solving and critical thinking. A worksheet covering different types of mutations is also included. Activity Outcomes Students: -model the processing of DNA sequencing data -assess the effect of mutation on genotype and phenotype -investigate genetic causes of non-infectious disease -evaluate the importance of international databases of genetic information -interpret data to predict effect of mutations, comparing coding and non-coding regions -modify hypotheses to reflect new evidence -select and extract information and data from primary and secondary sources to make predictions and solve problems Assumed knowledge

• The central dogma of biology. DNAà mRNAàprotein. Transcription and translation.

• Mutations- types and effects • Optional- how DNA sequencing works.

Materials Needed Per student: -Variant Types Worksheet (optional)

-Student Activity Guide (can be double sided, can be black and white) Per pair: -Task 1 worksheet, ideally printed A3 and in colour

-one set of “short reads” (print same size as task 1 worksheet) and scissors (or provide pre-cut)

-Additional handouts, printed A4, single sided (ideally uncollated) For teacher/class:

-Teacher Guide with answers -PowerPoint with answers to tasks and questions

These are available only via our secure Dropbox. Teachers can email [email protected] for access.

Application Notes This activity is divided into tasks. Tasks 1-4 take students through the process of DNA sequencing, finding mutations and making sense of them. Depending on the student group, this could fill a 30-45min lesson (allow longer if starting with the quiz and variant worksheet).

Page 2: TINY GENOME Activity guide v1.7 2019 - Garvan

2

Tasks 5 and 6 involve application, critical thinking and problem solving. Task 5 introduces the concept of genetic databases, and task 6 involves interpreting a scientific abstract. These tasks could be used as a second lesson or given as an extension sheet. They could also be completed as a whole group discussion. NSW HSC Syllabus links Heredity

• model the process of polypeptide synthesis, including: – transcription and translation – analysing the function and importance of polypeptide synthesis – assessing how genes and environment affect phenotypic expression

• collect, record and present data to represent frequencies of characteristics in a population, in order to identify trends, patterns, relationships and limitations in data, for example:

– examining frequency data – analysing single nucleotide polymorphism (SNP)

• investigate the use of technologies to determine inheritance patterns in a population using, for example:

– DNA sequencing and profiling • investigate the use of data analysis from a large-scale collaborative project to identify trends,

patterns and relationships, for example: – population genetics studies used to determine the inheritance of a disease or disorder

Genetic Change • compare the causes, processes and effects of different types of mutation, including but not

limited to: – point mutation – chromosomal mutation

• assess the significance of ‘coding’ and ‘non-coding’ DNA segments in the process of mutation • evaluate the benefits of using genetic technologies in agricultural, medical and industrial

applications Questioning and Predicting

• modify questions and hypotheses to reflect new evidence Analysing Data and Information

• derive trends, patterns and relationships in data and information • assess error, uncertainty and limitations in data

Problem Solving • use modelling (including mathematical examples) to explain phenomena, make predictions and

solve problems using evidence from primary and secondary sources • use scientific evidence and critical thinking skills to solve problems

QUIZ You may like to use this quick quiz to make sure that students have the required background knowledge before starting this activity. It requires a codon wheel or table (a codon wheel can be found on the back of the student guide) 1) Select the correct flow of information

a. mRNA-→DNA-→protein c. DNA→mRNA→protein b. DNA-→ protein→tRNA d. Protein→DNA→mRNA

2) Which of the following is not like the others?

a. variant c. change b. mutation d. sequence

3) What amino acid does GCT code for?

a. Serine c. Methionine b. Valine d. Alanine

4) A deletion of a single nucleotide in the coding region of a gene will usually result in:

a. Frameshift mutation c. Stop codon b. Balanced insertion d. Silent mutation

Page 3: TINY GENOME Activity guide v1.7 2019 - Garvan

3

5) Challenge Question: What would be the effect of a G to A mutation in the third position of a codon that usually codes for tryptophan?

a. silent mutation c. nonsense mutation b. missense mutation d. frameshift mutation

Note that question 5 includes four steps. Describing this to students after the quiz can help them understand strategies for answering MCQ.

1. Finding the codon for tryptophan (TGG) 2. Mutating it as directed (TGA) 3. Interpreting this as a stop codon 4. Knowing that a premature stop codon is a nonsense mutation

Variant terms worksheet The webpage for this activity on our teacher portal includes a variant terminology worksheet in a powerpoint file. It could also be used prior to or alongside the tiny genome activity. The worksheet introduces students to some of the terminology used to describe variants in the field of medical genetics. It also relates them to the syllabus terms “point mutation” and “chromosomal mutation”. Additional resources Here are some suggested resources for students who need additional revision before moving on to this activity Activity: “Mutate a DNA sequence” from Teach.Genetics http://teach.genetics.utah.edu/content/dna/#item2 This paper-based activity follows the impact of a mutation through transcription and translation Flash-based Interactive: “Transcribe and translate a gene” from Learn.Genetics http://learn.genetics.utah.edu/content/basics/transcribe/ Perform transcription and translation on a short stretch of DNA Video: “The different types of mutations” from Khan Academy https://www.khanacademy.org/test-prep/mcat/biomolecules/genetic-mutations/v/the-different-types-of-mutations This 5 min video covers the different types and effects of point mutations. Credits This activity was written by Lauren McKnight and Bronwyn Terrill Education and Communication Team Kinghorn Centre for Clinical Genomics, Garvan Institute of Medical Research With kind acknowledgement of input from John Duggan and the staff and students of Newtown Performing Arts High School and Roseville College.

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Page 4: TINY GENOME Activity guide v1.7 2019 - Garvan

1

TINY GENOME- Student Activity Guide Imagine you are a geneticist who runs a clinic for Teegos. Teegos are small, green, friendly, blob-like creatures that eat metal and live to approximately 50 years old. An unwell adult teego comes into your genomics centre. It is tired and turning yellow. You believe that genome sequencing may give you the information you need to help your patient. Unlike humans, teegos have very small genomes: 70 base pairs contained on a single chromosome. Teegos are monoploid, which means they have only one copy of their chromosome in each cell. Thanks to the “Teego Genome Project”, you have access to a “reference genome” for the Teego species- the sequence of all 70 DNA bases of a randomly chosen healthy Teego. By comparing your patient’s genome to this reference, you may be able to find differences that explain the symptoms you can see. The first step is to send your patient’s DNA to the lab for whole genome sequencing.

Making the link… Teego vs Human DNA Sequencing Facts

Teego Human Number of chromosomes 1

(haploid) 46

(diploid, i.e. 23 pairs) Basepairs in genome 70 6 billion Average number of variants per individual

3-5 Millions

Length of fragments for DNA sequencing

20bp 200-500bp

Average “read depth” (see below)

3 30+

Method of alignment Cut and compare Complex computer algorithms

Number of protein-coding genes

2 ~20,000

Amino acids per protein 6 450

DNA sequencing The lab uses a technique called “massively parallel sequencing”, (sometimes called “next generation sequencing”) following this method:

Þ extract DNA (multiple copies of genome) Þ randomly break DNA into fragments ~20 bases long (fragments will overlap) Þ attach fragments to a glass surface and make a cluster of identical copies Þ read order of bases by adding fluorescent nucleotides one at a time in a

specially designed sequencing machine.

Page 5: TINY GENOME Activity guide v1.7 2019 - Garvan

2

The lab then sends you the data as “sequencing reads”. These represent the order of DNA bases in each fragment of your patient’s DNA. Remember these will overlap, but should cover the whole 70 bases of your patient’s genome. Your job is to line these up with the reference genome to work out which part of the genome they cover, and to look for any variants. Variants are any difference between the DNA sequence of an individual and the DNA sequence of the reference genome. They could be one-letter differences (single nucleotide variants), deletions or additions (larger changes are not included in this exercise).

TASK 1: Mapping

1) Working in pairs, team up with another pair to divide the workload. All pairs should do reads 1-2, one pair can do reads 3-5 and the other can do reads 6-8.

2) Cut out the required “sequencing reads” from the bottom of the task 1 worksheet. These represent sequence data from the fragments of your teego’s DNA.

3) Notice that read 1 has already been marked down. Place the cut-out of read 1 underneath the reference genome at this position to see that it lines up. Notice that there is a T instead of an A at position 56, which has been noted.

4) Now start with read 2. Try to find the place on the reference genome that this fragment matches. Remember it might not match perfectly- it’s like a find-a-word where there might be spelling mistakes!

5) Mark down the position of read 2 (it will fit on the same line as read 1), noting any differences. You can mark deletions with “-“ and insertions using “^”.

6) Repeat for the other reads assigned to your pair. Reads that overlap will need to be placed on new lines.

DNA sequencing. “Mapping” means aligning short fragments (“reads”) to a reference genome. This allows differences (“variants”) to be seen.

Reference genome

Sequencing reads

A>C Single nucleotide variant

Sequencing read:

What to write:

Reference genome:

Page 6: TINY GENOME Activity guide v1.7 2019 - Garvan

3

Hints: -You are not looking for “base pairing”- the letters in the reference genome will be the same as the letters in the fragments -Start your search using a recognisable string of letters such as GGG -Think about what effect additions or deletions would have on the mapping process When you’re done, share your answers with the other pair and then check your alignment with the answer sheet. Task 1 Questions 1. Read 2 starts with a G. What is the position number of this G in the reference genome? (Use the ruler above the reference genome.)

2. Sometimes the ends and centromeres (middle) of a chromosome can be difficult to sequence. Are there any positions that were not covered by the sequencing reads? Write the position numbers below. You can also note this with a question mark underneath the reference sequence.

3. The number of times a section of DNA is sequenced is called the ‘coverage’ or ‘read depth’. Look at the way the lines overlap- what is the highest read depth in your genome sequence?

Page 7: TINY GENOME Activity guide v1.7 2019 - Garvan

4

Task 2: Variants Now that you have aligned the teego’s DNA sequence data, you can focus on the variants. Variants are what make everyone’s genome unique. Most variants are harmless, however some can contribute to disease. We call these “pathogenic variants”. When looking for pathogenic variants, we first start with a list of all variants. You should have found four variants during the mapping process. Find their position number and write them below using “standard teego variant notation”.

Task 2 Questions 1. Can you be sure you have found all of the variants in your teego? What are the limitations in your data?

2. Does the presence of variants (which are also known as mutations) mean that your patient has a genetic disease?

Standard Teego Variant Notation Eg a change from a T to a C at position 2 would be: TG02T>C TG= teego genome 02= position of change T= base in reference genome C= base in patient’s genome Deletions/Insertions: Eg TG51del or TG16insC

Page 8: TINY GENOME Activity guide v1.7 2019 - Garvan

5

Task 3: Teego Genes How will we find out which variants are likely to be harmful? One way is to look to see what impact they might have on protein production. To do that, we first need to find which regions of the genome are protein-coding genes. Finding genes: 1) Using the reference genome, read along until you find a start codon, ATG (this codes for methionine and is the first codon of every protein-coding gene). Circle the start codon. 2) Continue along the sequence, marking off each codon (groups of 3 bases) using a / to define the “reading frame” of the gene. 3) Keep going until you come to a stop codon, TAA, TAG, or TGA. (Note that you might find these letters together but they only form a “stop” as a complete codon). 4) Once you have a start and stop codon, you can highlight the whole gene. Then you can start looking for the next start codon (this can be in any “reading frame”). You should find two complete genes in the teego reference genome. Translating genes Now that you have the sequence of the genes, you can find the order of amino acids of the proteins they encode (use the codon wheel at the end of this guide). Translate the genes using the three-letter abbreviations for the proteins (eg Met Pro Ser…). You can team up with another pair again and translate one gene each.

1st gene:

2nd gene: Task 3 Questions 1) Which of your teego’s variants are found within protein-coding genes?

2) How many DNA bases would you need to encode a protein that was 400 amino acids long?

Page 9: TINY GENOME Activity guide v1.7 2019 - Garvan

6

Task 4- predicting the effect of variants Now that we have found the genes, we can predict the effect of the variants that we found. Again, scientists have standard notation for changes to proteins. For example, if the 5th amino acid in a protein was a histidine, and it was changed to an alanine, you would write His5Ala. For the two variants that you found within genes, work out the resulting change in the protein sequence, writing your answer in the table under “protein change”. Also write down the type of change. Point mutations can be classified as:

-Silent: no change -Missense: a change to one amino acid -Nonsense: a change to the whole protein (eg a premature stop codon or a frameshift mutation)

Variant Protein Change Type of change

Task 4 Questions 1) What type of mutation would result from a deletion at position 16?

2) At this point, which variant do you suspect is the cause of your patient’s symptoms?

Page 10: TINY GENOME Activity guide v1.7 2019 - Garvan

7

Task 5- Variant Databases Although we can make predictions about the effect of variants, we cannot confirm whether they are harmful until we validate the findings in other individuals. The worldwide community of teego researchers and clinicians have collaborated to form Teego Genome Database (TGDB)- a list of all known teego variants. An extract is found on the handout for task 5. The database includes information from sequencing over 1000 teego genomes. This allows the minor allele frequency (MAF) to be calculated. For teegos this is the same as the proportion of individuals who have the allele (as a decimal). The database also classifies variants according to their clinical significance. Most variants are harmless or “benign”, others are “pathogenic”, which mean they are associated with or cause a disease. Sometimes there is not enough information to know for sure, so the variant might be classified as either “suspected pathogenic” or “unknown significance”. Search for all four of your teego’s variants in the database and highlight any you find. Task 5 Questions 1) What are the names of the two genes you identified in task 3? What proteins do they code for?

2) Were all of your teego’s variants found in the database? Why do you think that might be?

3) Did the information in the TGDB confirm your hypothesis about the disease-causing variant in your patient? Is there a variant that fits better?

5) Is this the only explanation for your patient’s symptoms? Describe the limitations in your data and suggest alternative explanations.

Page 11: TINY GENOME Activity guide v1.7 2019 - Garvan

8

Task 6- Taking action on variants Now that you have a suspected variant causing your patient’s symptoms, you can use this information to guide the treatment and management of the condition. Using another database, PubMed, you find a research paper about the TG08G>A variant. Take a look at the “Greeglob Abstract” and see what information you can extract. Task 6 questions 1) In your own words, what does the greeglob protein do, and what happens when this function is disrupted?

2) What is the function of the non-coding DNA around position 8 of the teego genome? What might be the effect of a variant in this region?

3) Based on this paper, what treatment might you suggest for your patient?

4) What recommendation would you make to the teego healthcare system for the use of genome sequencing for the prevention of this disease?

5) Someone suggests that only sequencing the regions of the genome that contain genes (whole exome sequencing) be used because it is cheaper, and there is evidence that most known disease-causing variants are found in protein-coding DNA. Do you agree that this is the best plan for the teego population? Justify your response.

Page 12: TINY GENOME Activity guide v1.7 2019 - Garvan

9

Further extension questions: E1. Why do you think variant at position 22 didn’t cause disease? Look up the structure of amino acids. E2. Looking at the TGDB, what is the relationship between MAF and clinical significance? Why do you think this might be? E3. Practice exam question: Explain how variants in non-coding DNA can contribute to the function of an organism? E4. What sort of protein change would the TG57T>A and TG64A>C mutations result in? E5. From the phenotypes listed, can you propose a function for the metect protein? E6. Analysis of the TGDB revealed that there were no nonsense mutations. Suggest a possible explanation for this finding. E7. Your teego responded well to the suggested treatment and is now completely symptom free. Write a 200 word news article reporting on this story. E8. Would you classify this disorder as genetic or nutritional? E9. A real life version of a database similar to TGDB for humans is called ClinVar (https://www.ncbi.nlm.nih.gov/clinvar/). It collects information about hundreds of thousands of listed variants and their associated phenotypes. They are submitted, reviewed and verified by hundreds of research centres all over the world. Complete the following practice exam question:

Assess the importance of public databases for the sharing of large data sets to determine the genetic basis of disease.

Page 13: TINY GENOME Activity guide v1.7 2019 - Garvan

Tiny Genome Codon Wheel A codon is three letters (bases). The first letter corresponds to the inner circle, the second letter to the next layer out from the centre, and the third letter to the third layer. The outer layer gives the name of the amino acid as well as its three-letter abbreviation and its one-letter abbreviation. The structure and category of each amino acid are included for interest.

Page 14: TINY GENOME Activity guide v1.7 2019 - Garvan

Read Alignment Tiny Genome Worksheet for Task 1

Teego Reference Genome GRCt13

Sequencing Reads1

234

5

678

1 10 20 30 40 50 60 70

T

56

AA A A

A AA A

A

A A A

A A AA A A

A AA AAA A

AA A A A AA

A A A AA

A A AA A A A A

G GG

G GG

G G GG G

G G G G G GG

GG G

GG GG G

G G GG

G G G G G

G

G GG G G G GGG

TTT TT T T T TT T T

T T T TT TTT T

T T TTT T TT

T T T

T

T TTT T TTT T T T

T T T

T T T T T T T TT T T T T T TT T T

CCCCC C C C C C CC C C C CC C C CC

C C C CCC

C C C C CC C C CCC C

C C C C C

C C C C CCC C C C C C C C C C C C CC CC

C CC

CC

CC

CC

T1

Page 15: TINY GENOME Activity guide v1.7 2019 - Garvan

1

234

5

678

1

234

5

678

1

234

5

678

1

234

5

678

1

234

5

678

AA A A

A AA A

A

A A A

A A AA A A

A AA AAA A

A A A AA

AA A A

A AA A

A

A A A

A A AA A A

A AA AAA A

A A A AA

AA A A

A AA A

A

A A A

A A AA A A

A AA AAA A

A A A AA

AA A A

A AA A

A

A A A

A A AA A A

A AA AAA A

A A A AA

AA A A

A AA A

A

A A A

A A AA A A

A AA AAA A

A A A AA

G GG

G GG

G G GG G

G G G G G GG

GG G

GG GG G

G G GG

G

G GG

G GG

G G GG G

G G G G G GG

GG G

GG GG G

G G GG

G

G GG

G GG

G G GG G

G G G G G GG

GG G

GG GG G

G G GG

G

G GG

G GG

G G GG G

G G G G G GG

GG G

GG GG G

G G GG

G

G GG

G GG

G G GG G

G G G G G GG

GG G

GG GG G

G G GG

G

TTT TT T T T TT T T

T T T TT TTT T

T T TTT T TT

T T TT TTT T TTT T T T

T T T

TTT TT T T T TT T T

T T T TT TTT T

T T TTT T TT

T T TT TTT T TTT T T T

T T T

TTT TT T T T TT T T

T T T TT TTT T

T T TTT T TT

T T TT TTT T TTT T T T

T T T

TTT TT T T T TT T T

T T T TT TTT T

T T TTT T TT

T T TT TTT T TTT T T T

T T T

TTT TT T T T TT T T

T T T TT TTT T

T T TTT T TT

T T TT TTT T TTT T T T

T T T

CCCCC C C C C C CC C C C CC C C CC

C C C CCC

C C C C CC C C CCC C

C C C C CC CC

CC

CC

CC

CCCCC C C C C C CC C C C CC C C CC

C C C CCC

C C C C CC C C CCC C

C C C C CC CC

CC

CC

CC

CCCCC C C C C C CC C C C CC C C CC

C C C CCC

C C C C CC C C CCC C

C C C C CC CC

CC

CC

CC

CCCCC C C C C C CC C C C CC C C CC

C C C CCC

C C C C CC C C CCC C

C C C C CC CC

CC

CC

CC

CCCCC C C C C C CC C C C CC C C CC

C C C CCC

C C C C CC C C CCC C

C C C C CC CC

CC

CC

CC

Tiny Genome Reads to print and cut for Task 1 (5 sets)

Page 16: TINY GENOME Activity guide v1.7 2019 - Garvan

Tiny Genome Information for task 5- Teego Genome Database Extract

VariantID Gene Protein MAF Clinicalsignificance

Phenotype

TG01del - 0.002 unknownsignificance

-

TG08G>A - 0.025 suspectedpathogenic

fatigue,shapechange,yellowtone

TG22G>C GGB greeglob 0.24 benign -

TG31T>A GGB greeglob 0.005 pathogenic yellowinfantsyndrome.Fatalinfirstyearoflife

TG34T>A GGB greeglob 0.018 pathogenic greeglobsyndrome.Severeyellownessandfatigue

TG39T>G - 0.07 suspectedpathogenic

fatigue

TG41insG - 0.29 benign -

TG44T>C - 0.01 unknownsignificance

unconfirmedreportsofyellowness

TG49G>A - 0.31 benign -

TG56A>T MTC metect 0.13 benign -

TG57T>A MTC metect 0.021 suspectedpathogenic

belowaverageabilitytolocatemetals

TG64A>C MTC metect 0.009 pathogenic metectsyndrome-opaqueeyes,smallsize,failuretolocatemetals

MAF-minorallelefrequency

ATCGCAGGTCG CGCCCGGGTCA ACTCATCTGAT ATTCTCGAATG TCTAGCCATTAA

CC

TGDB

Page 17: TINY GENOME Activity guide v1.7 2019 - Garvan

Tiny Genome Information for task 6- Greeglob Abstract Teego Mol. Genet. (2018) 132:107-113 DOI 10.840/z4234-013-1249-9

Identification of a novel regulatory element for the teego GGB gene essential for greeglob expression in the presence of ferrous metals Maria A. Smallcot – Jefferson Mehinski Yin Su Xin – John J. Smithson - Juan Paltero Received: 3 April 2018 / Accepted: 15 May 2018 / Published online: 3 Jun 2018 © The Author(s) 2018. This article is published with open access Abstract Greeglob is an essential, green protein involved in teego nutrient conversion. It is associated with diseases such as greeglob syndrome and yellow infant syndrome, which involve fatigue and yellowness. Variants at position 08 of the teego genome have previously been linked with an adult onset form of greeglob syndrome. This position is directly adjacent to the GGB gene, which encodes greeglob. We used CRISPR/Cas9 to introduce a TG08G>A variant to teego tissue in culture. Proteomic studies revealed reduced greeglob expression when the culture medium contained ferrous metals, but not when an iron-free media was used. We demonstrate that position 08 is within a regulatory element for the GGB gene that enables greeglob expression in the presence of iron. This finding has implications for the understanding of teego gene regulation and also suggests new approaches to treating adult-onset greeglob syndrome.

ORIGINAL RESEARCH