![Page 1: BARCODE Quality Assessment: Frequency Matrix Approach](https://reader035.vdocuments.site/reader035/viewer/2022062813/56816625550346895dd97ff4/html5/thumbnails/1.jpg)
BARCODE Quality Assessment:Frequency Matrix Approach
Mark Y Stoeckle, Rockefeller UniversityKevin C R Kerr, Royal Ontario Museum
PLoS ONE August 2012 e43992
![Page 2: BARCODE Quality Assessment: Frequency Matrix Approach](https://reader035.vdocuments.site/reader035/viewer/2022062813/56816625550346895dd97ff4/html5/thumbnails/2.jpg)
Potential BARCODE errors
• Taxonomic mislabeling• Pseudogenes• Sequencing Error
![Page 3: BARCODE Quality Assessment: Frequency Matrix Approach](https://reader035.vdocuments.site/reader035/viewer/2022062813/56816625550346895dd97ff4/html5/thumbnails/3.jpg)
Hypothesis:
SequencingErrors
= RareVariants
![Page 4: BARCODE Quality Assessment: Frequency Matrix Approach](https://reader035.vdocuments.site/reader035/viewer/2022062813/56816625550346895dd97ff4/html5/thumbnails/4.jpg)
Avian BARCODEs: large, representative dataset
11K records/2.7K species (27% known)
![Page 5: BARCODE Quality Assessment: Frequency Matrix Approach](https://reader035.vdocuments.site/reader035/viewer/2022062813/56816625550346895dd97ff4/html5/thumbnails/5.jpg)
Most 1st, 2nd positions >99.9% conserved
![Page 6: BARCODE Quality Assessment: Frequency Matrix Approach](https://reader035.vdocuments.site/reader035/viewer/2022062813/56816625550346895dd97ff4/html5/thumbnails/6.jpg)
Most aa positions >99.9% conserved
![Page 7: BARCODE Quality Assessment: Frequency Matrix Approach](https://reader035.vdocuments.site/reader035/viewer/2022062813/56816625550346895dd97ff4/html5/thumbnails/7.jpg)
Working definitionsfor rare variants:
• VERY LOW FREQUENCY VARIANT (VLF): nt, aa in <0.1% seqs at a given position
• SINGLETON VLF: in 1 indiv/species
• SHARED VLF: in ≥2 indiv/species
![Page 8: BARCODE Quality Assessment: Frequency Matrix Approach](https://reader035.vdocuments.site/reader035/viewer/2022062813/56816625550346895dd97ff4/html5/thumbnails/8.jpg)
• Singleton VLFs: (mostly) seq error• Shared VLFs: (mostly) biological
Concentrated at ends of segment
Relatively evenly distributed
Spatial distribution of VLFs
![Page 9: BARCODE Quality Assessment: Frequency Matrix Approach](https://reader035.vdocuments.site/reader035/viewer/2022062813/56816625550346895dd97ff4/html5/thumbnails/9.jpg)
Sliding window analysis nVLFs
![Page 10: BARCODE Quality Assessment: Frequency Matrix Approach](https://reader035.vdocuments.site/reader035/viewer/2022062813/56816625550346895dd97ff4/html5/thumbnails/10.jpg)
Calculating Error Rate
• Most (94%) 2nd positions >99.9% conserved• (Nearly) all 2nd position seq errors are VLFs
187 2nd pos si-nVLFs (probable errors) . 216 2nd pos/BC x 10,760 BC
= 8 x 10-5 errors/base pair
>
![Page 11: BARCODE Quality Assessment: Frequency Matrix Approach](https://reader035.vdocuments.site/reader035/viewer/2022062813/56816625550346895dd97ff4/html5/thumbnails/11.jpg)
Limitations, Observations
• First seq error assessment for BARCODEs
• Some singleton nVLFs likely biological—calc rate is upper limit
• ~3% BARCODEs ≥ 1 error (av 1.7/BARCODE)
• Seq errors unlikely to affect species ID
• Increased apparent intraspecific variation
![Page 12: BARCODE Quality Assessment: Frequency Matrix Approach](https://reader035.vdocuments.site/reader035/viewer/2022062813/56816625550346895dd97ff4/html5/thumbnails/12.jpg)
Applications: 1. Compare database quality
Error bars=95% CI
![Page 13: BARCODE Quality Assessment: Frequency Matrix Approach](https://reader035.vdocuments.site/reader035/viewer/2022062813/56816625550346895dd97ff4/html5/thumbnails/13.jpg)
2. Highlight BARCODEs w probable errors--annotate?
Annotated Homer Simpson
Portrait
![Page 14: BARCODE Quality Assessment: Frequency Matrix Approach](https://reader035.vdocuments.site/reader035/viewer/2022062813/56816625550346895dd97ff4/html5/thumbnails/14.jpg)
3. BONUS: cryptic pseudogenesflagged by multiple SHARED VLFs
Alder flycatcher (Empidonax alnorum)
![Page 15: BARCODE Quality Assessment: Frequency Matrix Approach](https://reader035.vdocuments.site/reader035/viewer/2022062813/56816625550346895dd97ff4/html5/thumbnails/15.jpg)
Fuscous flycatcher (Cnemotriccus fuscatus) Canada goose (Branta canadensis)
Cryptic pseudogenes uncommon: 0.1% BARCODEs
![Page 16: BARCODE Quality Assessment: Frequency Matrix Approach](https://reader035.vdocuments.site/reader035/viewer/2022062813/56816625550346895dd97ff4/html5/thumbnails/16.jpg)
Conclusions
• Avian BARCODE sequence quality high and improving
• Frequency matrix potential utility for sequence database QA
• Caution on studies involving rare variants
![Page 17: BARCODE Quality Assessment: Frequency Matrix Approach](https://reader035.vdocuments.site/reader035/viewer/2022062813/56816625550346895dd97ff4/html5/thumbnails/17.jpg)
Acknowledgments
Natural Sciences and Engineering Research Council of Canada
Jesse Ausubel
Alan Baker Jan LifjeldArild Johnsen Per EricsonCarla Dove Gary GravesPablo Tubaro Dario Litjmaer