Download - NGS data format and General Quality Control. Data format “Flowchart” Sequencer raw data FastqSAM/BAM
![Page 1: NGS data format and General Quality Control. Data format “Flowchart” Sequencer raw data FastqSAM/BAM](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649d9c5503460f94a84da1/html5/thumbnails/1.jpg)
NGS data format and General Quality Control
![Page 2: NGS data format and General Quality Control. Data format “Flowchart” Sequencer raw data FastqSAM/BAM](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649d9c5503460f94a84da1/html5/thumbnails/2.jpg)
Data format “Flowchart”
Sequencer raw data Fastq SAM/BAM
![Page 3: NGS data format and General Quality Control. Data format “Flowchart” Sequencer raw data FastqSAM/BAM](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649d9c5503460f94a84da1/html5/thumbnails/3.jpg)
Fastq file
• Used to record raw reads coming off the sequencers
• Each record contains four lines• Parameters were usually set by the sequencer,
such as read length
![Page 4: NGS data format and General Quality Control. Data format “Flowchart” Sequencer raw data FastqSAM/BAM](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649d9c5503460f94a84da1/html5/thumbnails/4.jpg)
Fastq file
![Page 5: NGS data format and General Quality Control. Data format “Flowchart” Sequencer raw data FastqSAM/BAM](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649d9c5503460f94a84da1/html5/thumbnails/5.jpg)
• Line 1 begins with a '@' character and is followed by a sequence identifier and an optional description (like a FASTA title line).
• Line 2 is the raw sequence letters. The read length is the length of the string.
• Line 3 begins with a '+' character and is optionally followed by the same sequence identifier (and any description) again.
• Line 4 encodes the quality values for the sequence in Line 2, and must contain the same number of symbols as letters in the sequence.
http://en.wikipedia.org/wiki/FASTQ_format
![Page 6: NGS data format and General Quality Control. Data format “Flowchart” Sequencer raw data FastqSAM/BAM](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649d9c5503460f94a84da1/html5/thumbnails/6.jpg)
General quality control of raw reads
• Using FASTQC– A tool that implements some general rules– Basic Statistics– Per base sequence quality– Per sequence quality scores– Per base sequence content– Per base GC content– Per sequence GC content– Per base N content– Sequence Length Distribution– Sequence Duplication Levels– Overrepresented sequences– Kmer Content
![Page 7: NGS data format and General Quality Control. Data format “Flowchart” Sequencer raw data FastqSAM/BAM](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649d9c5503460f94a84da1/html5/thumbnails/7.jpg)
Quality scores
![Page 8: NGS data format and General Quality Control. Data format “Flowchart” Sequencer raw data FastqSAM/BAM](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649d9c5503460f94a84da1/html5/thumbnails/8.jpg)
Perbase “N” percentage
![Page 9: NGS data format and General Quality Control. Data format “Flowchart” Sequencer raw data FastqSAM/BAM](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649d9c5503460f94a84da1/html5/thumbnails/9.jpg)
Sample FASTQC reports
Good quality : http://www.bioinformatics.babraham.ac.uk/projects/fastqc/good_sequence_short_fastqc/fastqc_report.html
Bad quality: http://www.bioinformatics.babraham.ac.uk/projects/fastqc/bad_sequence_fastqc/fastqc_report.html
![Page 10: NGS data format and General Quality Control. Data format “Flowchart” Sequencer raw data FastqSAM/BAM](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649d9c5503460f94a84da1/html5/thumbnails/10.jpg)
Data format “Flowchart”
Sequencer Fastq SAM/BAM
![Page 11: NGS data format and General Quality Control. Data format “Flowchart” Sequencer raw data FastqSAM/BAM](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649d9c5503460f94a84da1/html5/thumbnails/11.jpg)
SAM/BAM
• SAM stands for Sequence Alignment Map• BAM is the binary form of SAM• Used for mapped/aligned reads• Generated by NGS mapper/aligners
![Page 12: NGS data format and General Quality Control. Data format “Flowchart” Sequencer raw data FastqSAM/BAM](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649d9c5503460f94a84da1/html5/thumbnails/12.jpg)
SAM
![Page 13: NGS data format and General Quality Control. Data format “Flowchart” Sequencer raw data FastqSAM/BAM](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649d9c5503460f94a84da1/html5/thumbnails/13.jpg)
BAM