igv: integrated genome viewer - institut pasteur...igv: integrated genome viewer l a, a i & a i...
TRANSCRIPT
Am
elG
houi
la, C
laud
ia C
hica
, Em
naA
chou
ri&
Fa
tma
Gue
rfali
C3B
I Ha
nds-
on N
GS
cour
se –
IPP
–23
rdN
ov 2
016
1
IGV:IntegratedGenomeViewer
Am
elG
houi
la, C
laud
ia C
hica
, Em
naA
chou
ri&
Fa
tma
Gue
rfali
C3B
I Ha
nds-
on N
GS
cour
se –
IPP
–23
rdN
ov 2
016
2
IGV
Am
elG
houi
la, C
laud
ia C
hica
, Em
naA
chou
ri&
Fa
tma
Gue
rfali
C3B
I Ha
nds-
on N
GS
cour
se –
IPP
–23
rdN
ov 2
016
3
GFFandBEDformats
Am
elG
houi
la, C
laud
ia C
hica
, Em
naA
chou
ri&
Fa
tma
Gue
rfali
C3B
I Ha
nds-
on N
GS
cour
se –
IPP
–23
rdN
ov 2
016
4
GFF3Format
GFF3ThetwomostwidelyusedformatsforrepresentinggenomefeaturesaretheBEDandGFFformats.GFF(Generic Feature Format)is astandardfileformatforstoring genomic features inatext file.Careful :GFFhasseveral versions,themost recent is GFF3.GFFstart at a1-basedpositionandendsat a1-basedposition.
► 1linefor1feature► tab-delimited file(tab-separated columns)► 9columns +optional additional information
http://gmod.org/wiki/GFF3http://www.ensembl.org/
SEQ-ID SOURCE
TYPE START-END ATTRIBUTESSCORE
STRA
ND
PHAS
E
Am
elG
houi
la, C
laud
ia C
hica
, Em
naA
chou
ri&
Fa
tma
Gue
rfali
C3B
I Ha
nds-
on N
GS
cour
se –
IPP
–23
rdN
ov 2
016
5
BEDFormat► other useful fileformatsBED=(BrowserExtensibleData)(http://genome.ucsc.edu/FAQ/FAQformat)
BEDstarts arezero-based andBEDendsareone-based► Tab-delimited files► 3firstfields required,others optional
Fileformat(BEDPE):describesdisjointgenomefeatures,suchasstructuralvariationsorpaired-endsequencealignments.
CHR FEAT
URE
NAM
E
SCORE
*STRA
ND
START-END
*Any stringcould be used (p-value,enrichment score…)
7.thickStart - Thestarting positionat which thefeature is drawn thickly.8.thickEnd - Theending positionat which thefeature is drawn thickly.9.itemRgb - AnRGBvalueoftheform R,G,B(e.g.255,0,0).10.blockCount - Thenumber ofblocks(exons)intheBEDline.11.blockSizes - Acomma-separated list oftheblocksizes.12.blockStarts - Acomma-separated list ofblockstarts.
Am
elG
houi
la, C
laud
ia C
hica
, Em
naA
chou
ri&
Fa
tma
Gue
rfali
C3B
I Ha
nds-
on N
GS
cour
se –
IPP
–23
rdN
ov 2
016
6
DifferentBEDFormats
Am
elG
houi
la, C
laud
ia C
hica
, Em
naA
chou
ri&
Fa
tma
Gue
rfali
C3B
I Ha
nds-
on N
GS
cour
se –
IPP
–23
rdN
ov 2
016
7
BEDFormat
► BEDTools supportawide rangeofoperations forinterrogating andmanipulatinggenomic features.
intersectBedReturns overlapping features between two BED/GFF/VCFfiles.
genomeCoverageBedHistogram ora“perbase”reportofgenome coverage.…
► Coverage
The number of times each nucleotide is « read »à Fold Coverage (numberX )
Am
elG
houi
la, C
laud
ia C
hica
, Em
naA
chou
ri&
Fa
tma
Gue
rfali
C3B
I Ha
nds-
on N
GS
cour
se –
IPP
–23
rdN
ov 2
016
8http://wiki.bits.vib.be/index.php/NGS-formats
VCFFormatcoordinates are1-based
VariantcallingandVCFformat