john hammond ([email protected])john hammond ([email protected]) innate immune...

36
Targeted genomic enrichment and SMRT sequencing of immune-related gene complexes John Hammond ([email protected])

Upload: others

Post on 26-Jan-2021

4 views

Category:

Documents


0 download

TRANSCRIPT

  • Targeted genomic enrichment and

    SMRT sequencing of immune-related

    gene complexes

    John Hammond

    ([email protected])

  • Innate immune gene variation: germ-line

    encoded NK cell receptors and MHC class I

    Haplotype variation, high polymorphism and variegated expression creates NK

    cell subsets with different specificities and functionsLRC NKC

    MHC

    Genetically defined and inbred animals are key to dissecting genomic

    function

    This arm of the immune system is critical in controlling and resolving viral

    infection

    Diverse NK cell receptor systems are rapidly evolving under intense

    selection pressure from rapidly evolving pathogens.

    CD8 T cells

  • • Reference genome not well resolved in these regions

    • Poor SNP coverage in hard to assemble repetitive

    regions

    • Reference genome presents only one haplotype of

    many

    Highly repetitive regions are difficult to

    sequence with short read technology

  • Human MHC class I is highly diverse but

    haplotypes do not vary in gene content

    Gene A B C E F G

    Alleles 4,200 5,091 3,854 27 30 60

    Proteins 2,923 3,664 2,644 8 5 19

    Nulls 200 150 144 1 0 3

    A greater degree of

    structural variation

    in cattle

  • Sequence identity comparison of two

    cattle class I genomic haplotypes

  • MHC class II

    The current HD SNP chip does not interrogate

    MHC variation

  • The cattle KIR complex has expanded and demonstrates all

    the features of a functional immune complex

    Key properties of KIR loci Human KIR Cattle KIR

    Inhibitory and Activating

    Activating genes disarmed

    Functionally variable haplotypes ?

    Polymorphic

    Paired activating and inhibitory

    receptors

    Identity

  • The cattle NKC is largely correct in reference assembly-

    determined by BAC clones and a new cattle reference assembly

    Schwartz et al. 2017. Immunogenetics.

  • Dis

    tan

    ce

    be

    twe

    en

    SN

    Ps

    SNP position in the genome

    ~280 kb

    8 SNPs & 17 genes

    The cattle natural killer complex

    • missing SNP variation over the most diverse region

  • The identity between genes and gene blocks is too

    high to map short reads over the KLRC region

    % o

    f re

    ads d

    iffe

    rent

    from

    UM

    D3.1

    Location on BTA5

    Illumina 250 bp PE reads

  • Enrichment and de novo assembly of

    immune related gene complexes in cattle for

    SNP discovery.

    Cattle are arguably the most important livestock species: they

    provide humans with meat, milk, hides, traction, manure, status

    and security.

    Reducing the burden of disease can have enormous positive

    impact for food security and welfare.

    Complex immune traits are phenotypically diverse making

    breeding/selection processes challenging-but there are many

    opportunities!

  • Where are the immune genes known to be involved bTB?

    Prof Liz Glass, Roslin.

  • Used the Roche (Nimblegen) SeqCap EZ system

    Targeted enrichment of Immune-related gene

    cluster with Roche Nimblegen probes

    Library prep needed considerable optimisation

    Average pull down fragment was 5.5 kb

    http://www.google.co.uk/url?sa=i&rct=j&q=Nimblegen+tiling&source=images&cd=&cad=rja&docid=RqmqtKzR_JO9XM&tbnid=NG3OIJhNAfRi9M:&ved=0CAUQjRw&url=http://www.nimblegen.com/choice/&ei=KSHoUbi5FsOVO5v7gbgG&bvm=bv.49478099,d.ZWU&psig=AFQjCNG3w84UM3ozPlZoopmCwGwvKHeulQ&ust=1374253620612261http://www.google.co.uk/url?sa=i&rct=j&q=Nimblegen+tiling&source=images&cd=&cad=rja&docid=RqmqtKzR_JO9XM&tbnid=NG3OIJhNAfRi9M:&ved=0CAUQjRw&url=http://www.nimblegen.com/choice/&ei=KSHoUbi5FsOVO5v7gbgG&bvm=bv.49478099,d.ZWU&psig=AFQjCNG3w84UM3ozPlZoopmCwGwvKHeulQ&ust=1374253620612261

  • Four rounds of probe design and optimisation

    Illumina set

    First design to use “masking” as a way to deal with multiple variant targets for single

    genomic region, thus reducing over-capture of non-variant subregions.

    NG1-Pilot set ~ 9Mb with 50 known matches

    Update included an increase number of target regions and increase in number of variant

    inputs per region. Match level stretched to 50 for coverage.

    NG2 ~ 5Mb with 50 known matches

    Similar strategy to 151221_BTAU_TPI_NiGen2_EZ_HX3. Size of NKC target decreased,

    and MHCIIb,TPI, RP and IG regions removed.

    NG3 ~ 5Mb Main aim is to reduce off target mapping of 70-80 %

    Redesign of 170131_BTAU_DH_TPI_EZ_HX3. Match levels reduced to 3, mapping targets

    against reference included, efficiencies estimated, probes in NKC region replicated 3x and

    MHC replicated 2x.

    2 animals

    PacBio sequencing; 23 animals

  • NG1 probe set- good enrichment but still much off target

    Custom chromosome

    NKC

    MHC

  • Custom chromosome

    NKC

    MHC

    NG2 probe set- better enrichment but NKC dropped out

  • Probe performance NKC (NG2)

    -

    20,000,000

    40,000,000

    60,000,000

    80,000,000

    100,000,000

    120,000,000

    chr5/NKC

    Blue = nucleotides binding to whole chromosome

    Green = nucleotides binding to target area

  • Probe performance LRC (NG2)

    -

    20,000,000

    40,000,000

    60,000,000

    80,000,000

    100,000,000

    120,000,000

    chr18/LRC

    Blue = nucleotides binding to whole chromosome

    Green = nucleotides binding to target area

  • -

    50,000,000

    100,000,000

    150,000,000

    200,000,000

    250,000,000

    chr23/MHC

    Probe performance MHC (NG2)

    Blue = nucleotides binding to whole chromosome

    Green = nucleotides binding to target area

  • • On-target efficiency sacrificed for overlapping probe

    coverage- not entirely necessary

    • Many off-target regions unsupported by probe

    sequence- multimapping and polymorphism/variation

    • Masking does not adequately reduce the redundancy

    of the inputs but does allow probes with similar

    sequences to hybridize to similar haplotype

    sequences resulting in greater depth of coverage

    Probe design summary

  • • At least subreads from 2 SMRTcells combined for de

    novo assembly with Canu

    • filtered subreads as input

    • Default parameters minreadlength>1kb• for MHC minreadlength >3kb improves assembly

    • For the LRC this does not improve the assemblies

    • gfa file as output from Canu screened for contigs that

    contain MHC or KIR genes/haplotypes with bandage,

    which were then extracted and mapped to known

    MHC/KIR haplotypes

    Enrichment and De novo assembly

  • A18 – “gene 6” reconstructed haplotype

    252NC1 TRIM26

    Gene6

    6*01301

    2 contigs: 170kb, 53kb

    105991

    9 contigs

    NC1 TRIM26

    Gene6

    6*01301

    705983TNC1 TRIM26

    Gene6

    6*01301

    7 contigs

  • A31 – “gene 1+2” reconstructed haplotype

    NC1 TRIM26Gene1

    *02101Gene2

    *02201

    20005

    8 contigs

    604652NC1 TRIM26

    Gene1

    *02101Gene2

    *02201

    8 contigs

  • Heterozygous A18/A31(mixed reads from 252 and 200005 as input in de novo)

    • Shared regions assemble contigs that are more similar to the haplotype with

    more reads

    • Alleles for gene 6, gene 2, gene 1 identical to previous

    • FALCON

  • MHC class I full-length bovine haplotypes

    P3 NC1 1 4 2 6 T

    TRIM26

    A14

    ARS14 P3 NC1 5 2 T

    35kb 62kb 67kb 69kb 58kb103kb

    A11 3 2 T

    Angus P3 NC1 3 2 6 TP6

    Brahman P3 3 2 T3

    A18

    A31

    P3 NC1 6 T

    P3 NC1 1 2 T

    20kb

  • Breed IDknown

    haplotypede novo

    haplotypeallele

    haplotype allelesHereford Dominette ? 02*07001; 05*07201

    Friesian 252 A18 A18 A18 06*01301

    Friesian 200005 A31 A31 A31 01*02101; 02*02201

    Friesian 105991 A18 A18 A18 06*01301

    Friesian Herman A14/? A14? 01*02301; 04*02401;02*02501; no 06*

    Friesian 505204 A14 A14 A14? 01*02301;04*02401;02*02501; no 06*

    Friesian 604652 A31 A31 A31 01*02101; 02*02201

    Hereford Domino ? 02*06001;06*04001;05*07201

    Friesian 705983T A18 A18 A18 06*01301

    Highland 8052 ? new? 01*03102; new02*?

    Sahiwal 83H ? new? new03*?

    Friesian 206818 ? A14/A14 A14 01*02301;04*02401;02*02501;06*04001

    Friesian 706886 ? A14? new01*;02*02501;04*02401

    Friesian 206846 ? new? 01*01901;02*02501

    Friesian 206853 ? A14/het? A14/? 01*02301;04*02401;02*02501;06*04001

    Friesian 504805 A10/A14 01*02301;04*02401;03*00201; new02*

    Friesian 504805 A10/A1401*02301; 04*02401;02*02501;

    03*00201;

    Friesian dried706823 ? new? 01*02101;04*02401;02*02501;

    Friesian 306812 ? new? new 01*;02*00801;04*02401

    Friesian 159 A31 A31 01*02101; 02*02201

    Most likely haplotype based on alleles

    1-3bp differences to allele

  • • 200005 – reads from 2 SMRTcells* (>940,000 subreads, >1kb length)

    200005_NG1+Sequel

    KIR haplotype contains block A and B

    • 200005 – reads from 2 SMRTcells* (>625,000 subreads, >3kb length)

    *includes one Sequel run

  • • 252 – reads from 4 SMRTcells* (>863,000 subreads with >3kb length):

    8 contigs, longest 118kb

    KIR haplotype from 252 missing block B?

    252_NG1

    • 252 – reads from 1 SMRTcells (>486,000 subreads >1kb length):

    8 contigs

    One contig*includes one Sequel run

  • De novo assembly of KIR from two other A18 animals

    705983T_NG2+2rep

    105991_NG2+2rep

    • 705983T – reads from 2 SMRTcells (> 776,000 subreads >1kb length)

    • 105991 – reads from 2 SMRTcells (> 606,000 subreads >1kb length)

    • Also missing block B?

  • - Illumina reads from 125 Holstein bulls mapped to

    immune-related genecluster haplotypes (BWA)

    - NKC, LRC, MHC

    - SNPs called with x variant caller

    - Filtered SNPs: QUAL > 900, strictly biallelic (no INDELs)o Called for all individuals; Alternative allele frequencies: between 5%

    and 95% (only NKC)

    - Selected SNPs 10-15kb apart across region and based

    on representation of mapping data

    - checked 50bp flanking region if repeat within haplotype

    o Transferability to other haplotypes (according to SNP

    coordinate) checked

    o SNPs checked for haplotype specificity, and gene specificity

    (MHC only)

    SNP selection using de novo assembled

    haplotypes

  • New SNP panel over 3 different gene complexes being

    used to increase the power of GWAS for complex

    disease traits

    The single SNP on the current Illumina SNP chip

    The cattle new cattle LRC SNP panel

  • UMD3.1

    A14

    A11

    A18

    A31

    pink SNPs

    blue SNPs

    orange SNPs

    MHC haplotype SNP selection

  • ~280 kb

    MAGOHB

    KLRA KLRJKLRC1-3

    NKC haplotype SNP selection

  • First round of SNP selection successful

    (~70 % success).

    Established segregating

    markers in a cohort or 1500

    extreme phenotype bTB

    resistant cattle, currently doing

    GWAS

  • Acknowledgments

    Immunogenetics

    Nick Sanderson

    Alasdair Allan

    Mark Gibson

    John Schwartz

    Rebecca Philp

    Clare Grant

    Karen Billington

    Paul Norman

    Libby Guethlein

    Peter Parham

    Farbod Babrzadeh

    John Young

    Richard Borne

    Doro Harrison

    Elizabeth Morecroft

    Kevan Hanson

    William Mwangi

    Giuseppe Maccari

    Derek Bickhart

    Timothy Smith

    William Thompson

    Juan Medrano

    Denise Raterman

    Cynthia Moehlenkamp

    George Mayhew

    BB/M027155/1, BB/J006211/1

    GCRF Databases and ResourcesLiz Glass