barbera van schaik bioinformatics laboratory, amc b.d.vanschaik@amc.uva.nl

Post on 31-Mar-2015

225 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Barbera van Schaik

Bioinformatics Laboratory, AMC

b.d.vanschaik@amc.uva.nl

Current sequencing projects

Neurogenetics laboratorySomatic mutation detection (Frank Baas, Marja Jakobs)

Laboratory of Experimental VirologyVirus discovery (Michel de Vries)

Dept. Clinical Immunology & RheumatologyTCR-beta variant detection (Niek de Vries, Paul Klarenbeek)

Clinical VirologyHepatitis C (Richard Molenkamp)

Erasmus MC – RotterdamRe-sequencing tumor samples (Ernie de Boer, Michael Moorhouse)

TCR-beta

Recombinatie van gensegmenten :

Vanuit de germline:

- Van ieder gensegment wordt 1 variant geselecteerd

-Deze worden aan elkaar gekoppeld

Alleen functionele recombinaties leiden tot functionele T-cellen

Paul Klarenbeek

Totale theoretische variatie(in vivo blijkt dit veel lager te liggen)

Paul Klarenbeek

CDR3 region

Unique for each clonal expansion

How to identify clonal expansions?

Germline DNA

mRNA

Thymocytes

Paul Klarenbeek

Dept. Clinical Immunology& Rheumatology

Goal: identify and enumerate TCR-beta variants

Vn CDR3 JPrimer A

J

C

Primer B

Barcode

Vn CDR3 JPrimer A

C

C

Primer B

Barcode

Paul Klarenbeek

Roche (454) sequencing

Run 07-07-2008110,509 sequences

26,445,844 nucleotides in total

Run 30-09-2008106,234 sequences

24,983,646 nt

Sequence lengthsFrequency sequence length

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

0 50 100 150 200 250 300 350 400 450

sequence length

freq

uen

cy

frequency sequence length

0

2000

4000

6000

8000

10000

12000

14000

16000

18000

0 50 100 150 200 250 300

sequence length

freq

uen

cy

20080707

20080930

Pipeline Rheumatology

Convert sff to fasta+quality scores

Chop sequences into: MID, fragment

Divide sequences based on MID and region

Identify the V, J and C segments

Locate highly variable area

Count variants Quality control

Perl scriptsRoche softwareBLATAccess/Excel

Recognize MID and primer

MID = barcode for samplePrimer = Primer for J or C segment

# MIDs for the 3 regions that are sequenced with primer for J segment:

MID=(TAGT|ACTA|CGAC|CTCG)# Primer for J segment (first) or C segment (last two):Primer=(CTTACCTACAACGGTTAACCTGGTC|

AGCTCAAACACAGCGACCTC|GGAACACCTTGTTCAGGTC)

Pipeline Rheumatology

Convert sff to fasta+quality scores

Chop sequences into: MID, fragment

Divide sequences based on MID and region

Identify the V, J and C segments

Locate highly variable area

Count variants Quality control

Perl scriptsRoche softwareBLATAccess/Excel

Identify V, J and segment

IMGT website: reference sequencesBLAT all roche sequences against 3

referencesSelection: only store first hit per reference

MID C J CDR3 V

MID J CDR3 V

Locate highly variable region(CDR3)

Get CDR3 sequence -> countDetermine deletions from V and J segment

Determine reading frame

MID C J CDR3 V

MID J CDR3 V

Pipeline Rheumatology

Convert sff to fasta+quality scores

Chop sequences into: MID, fragment

Divide sequences based on MID and region

Identify the V, J and C segments

Locate highly variable area

Count variants Quality control

Perl scriptsRoche softwareBLATAccess/Excel

Count variants: BLAT hits

Blat hits TRBV reference

0

1000

2000

3000

4000

5000

6000X

0719

2|T

RB

V12

-3*0

1|H

omo

X07

223|

TR

BV

12-5

*01|

Hom

o

X07

192|

TR

BV

12-3

*01|

Hom

o

X07

223|

TR

BV

12-5

*01|

Hom

o

M14

264|

TR

BV

12-4

*02|

Hom

o

X07

192|

TR

BV

12-3

*01|

Hom

o

X07

223|

TR

BV

12-5

*01|

Hom

o

X07

192|

TR

BV

12-3

*01|

Hom

o

X07

223|

TR

BV

12-5

*01|

Hom

o

X07

223|

TR

BV

12-5

*01|

Hom

o

X07

192|

TR

BV

12-3

*01|

Hom

o

X07

223|

TR

BV

12-5

*01|

Hom

o

X07

192|

TR

BV

12-3

*01|

Hom

o

X74

844|

TR

BV

7-9*

06|H

omo

X07

192|

TR

BV

12-3

*01|

Hom

o

X07

223|

TR

BV

12-5

*01|

Hom

o

X07

192|

TR

BV

12-3

*01|

Hom

o

X07

223|

TR

BV

12-5

*01|

Hom

o

M14

264|

TR

BV

12-4

*02|

Hom

o

ACTA ACTA ACTA ACTA ACTA CGAC CGAC CTCG CTCG nomatch nomatch nomatch nomatch nomatch TAGT TAGT TAGT TAGT TAGT

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

region/MID/hit

freq

uen

cy

Count CDR3 variantsregion cdr3_sequence freq

1 767

1 CTCCCTTTTTGGGGG 32

1 CCCTCAGGATTTCAGGG 30

1 TGGGTC 29

1 CAAACATGA 23

1 GGGACGGAGA 20

1 GGACAGT 20

1 CCCAGACAGG 19

1 AACCGGA 18

1 CTCCACTGGACACGT 18

1 ACGGG 18

1 CGCCCGGGACAGGGCCCTTCGGGG 18

1 CCACCCCGCGGCAGGAGGG 17

1 CCCACCGGGACAGGGGCGTC 17

1 GGTATACGGGCAGCGG 16

1 GACCTTGTGGTC 16

1 ACAGGGGGAG 16

1 GCGGG 16

Example of CDR3

sequences

V segment deletions

0

5000

10000

15000

20000

25000

30000

35000

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 19 21 22 23 24 25 26 28 29 31 32

nt

fre

qu

en

cy

Count deleted ntof V and J segment

Check where alignment stops wrt reference

J segment deletions

0

5000

10000

15000

20000

25000

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

nt

freq

uen

cy

Pipeline Rheumatology

Convert sff to fasta+quality scores

Chop sequences into: MID, fragment

Divide sequences based on MID and region

Identify the V, J and C segments

Locate highly variable area

Count variants Quality control

Perl scriptsRoche softwareBLATAccess/Excel

To do next

Determine reading frameQuality control

Future plansDetection of all TCR-beta variants

TCR-alpha receptor variantsSame procedure for B-cells

Screen patients for receptor variations

top related