核酸序列分析及结构预测 主 讲 张 军 细胞生物学及遗传学教研室

Click here to load reader

Upload: mercia

Post on 14-Jan-2016

162 views

Category:

Documents


7 download

DESCRIPTION

核酸序列分析及结构预测 主 讲 张 军 细胞生物学及遗传学教研室. 第一节 核酸序列的数据形式 1. 串( string )符号或字符的有序排列,符号或字符来自有限集合 {A, T, G, C} 。序列( sequence )与串是同一概念。 s=ATTGCATATG ;串的长度 |s|; 串 s 某个位置的字符表示为 s i , 1 ≤ i ≤ |s| 。 特别的,长度为 0 的串称为空串( empty string ),用符号 ε 表示。. 2. 子串( substring )和子序列( subsequence ),二者不是相同的概念。 子串和超串 - PowerPoint PPT Presentation

TRANSCRIPT

  • 1. string{A, T, G, C}sequence

    s=ATTGCATATG|s|; ssi 1 i |s|

    0empty string

  • 2. substringsubsequence

    s=ATGCGGTA; t=TGCGG; st

    s=ATGCGGTA; t=TGTA st

    interval

    s=ATGCGGTACGTATACG; u=CG, s[i, i+1]

  • 3. uw(concatenation),uws = ATGCGGTA; t=TGCGGst = ATGCGGTATGCGGts = TGCGGATGCGGTAs = ATsss= AT AT AT=s3

    prefix

    s = ATGCGGTAGC; prefix(s,3)=ATG; prefix(s,0) =

    s1u, s=tu, tu

  • suffix

    s = ATGCGGTAGC

    suffix(s,3) =AGC suffix(s,2) =GC suffix(s,0) =

    s1u, s=ut, tu

    (killer agent)1

    ||-1

  • , s = ATGCGGTAGC

    s= TGCGGTAGCs = ATGCGGTAG ATGC GGTAG ? (ATGC ) GGTAG ATGC ( GGTAG )

    stu=(st)u=s(tu); |s| -1, |t| -1, |u| -1

    |st| = |s| + |t| ,st

  • 1

    s[ij]= i-1 s |s| -j

    prefix(s, k) = s |s| -k

    Suffic(s, k) = |s| -k s

  • homology- Orthologous paralogous

    similarity

  • a1 in species I, a1 in species II)a1 and a2 in species I

  • Alignment s=GACGGATTAGt=GATCGGAATAGAlignment2: GA-CGGATTAGGATCGGAATAGAlignment1:GACGGATTAG GATCGGAATAG

  • ()

    4DNA{A, C, G, T}

    IUPAC

  • IUPAC

    GGGuanine AAAdenine TTThymine CCCytosineRG or APurine YT or CPyrimidine MA or CAmino KG or TKeto SG or CStrong interaction (3 H bonds) WA or TWeak interaction (2 H bonds) HA or C or TNot-GBG or T or Cnot-AVG or C or Anot-T(not-U) DG or A or Tnot-C NG or A or T or CAny

  • DNA

  • 1

    2

    3

    4

    54

  • (global alignment)

    s=ATTGCATATGt=ATTGATATC

    s=ATTGCATATGt=ATTG ATATC

  • 121

    s, t2sim(s, t)=max{score i}

    s=ATTGCATATG s=ATTGCATATG t=ATTG ATATC; t=ATTG ATATC8(-2)(-1)=5 4+ (-2) + (-1) 5 =-1

  • 2.

    st(local alignment)st

    s=AATTGCATATGt=ATTGT

    s(2,3,4,5)=t(1,2,3,4)

  • 3.

    st

    s=AATTGCATATGt=ATTGTs=AATTGCATATG s=AATTGCATATGt= - ATTGT - - - - - t= A- TTG - - T - - -

    2

  • 2sim(s, t)max{score i}

    s t; t=AGCTT; s=TTA TTA - - TTA AGTTA AGCTAAGCTT

  • 1 2

    (cost)

    dist(s,t)=min{cost i}

  • ACCGACAATATGCATA ATAGGTATAACAGTCAACCGACAATATGCATA ACTGACAATATGGATA

  • RNA

  • 1 2

  • 1 1

  • 108108

  • aHomo sapiensPongo pygmaeusb108 (a) (b)

  • DNA

  • DNA

    DNA

    (fragment assembly)

    fragment

  • ATTGGGCA; CGATT; TGGGCAGA - - ATTGGGCA - -CGATT - - - - - - - - - - - TGGGCAGA

    CGATTGGGCAGA

  • (shortest common superstring)

    (reconstruction)

    (multicontig)

  • DNA DNA

  • DNADNADNADNARNA

    DNAPromoterTerminator sequenceSplice site

  • DNA

  • training setcontrol set

  • training setcontrol set

  • Sn Sp

    TpTnFnFp

  • functional sitefunctional sequencemotifsignal

  • functional region

  • A common consensus : NTATN

  • 1

    215

  • GGAATTCCRG or AYT or CMA or CKG or TSG or C(3)WA or T(2)HA or C or TGBG or T or CAVG or C or AT(U)DG or A or TCNG or A or T or C

  • : (1) N(2) (3) 54(4) 2(5) N1

  • TTATGATATATACGCTTGTC TCCAC TTATGATATATACGCTTGTC TCCAC TNNNN tTATG tACGC tTGTC tCCAC tTATG tACGC tTGTC tCCAC TNNNC [1] [2] [3] [4] [2] [3] NNNNNTNNNN TNNNC tACGc tTGTc tCCAc [4] [2] tACGc tTGTc tCCAc [3] TNSNC [5] Consensus1 TNSNC TTATG ATATA [5] Consensus2 NTATN TNSNC

  • DNA

  • B

    4n 4n

  • M(aj,j)aj,a {A,T,G,C}

  • s=a1a2an

    S=ATTGCA Ws = 1+6+14-5+8+19=43 TTWs TSWs T'S

  • MA+ A- 1M 23-6 3SiSi A+ 4Si A-5 4WSi TMSiM16 5WSi TMSiM16 67 7M2

  • MM

  • DNA

  • ORF,open reading frame

  • ()

    21 64/3

  • DNAORFORFORF

  • () 641

    DNA36:4:1

    DNA

  • DNAORFORFORF

    ORF

  • fabcabca1,b1,c1, a2,b2,c2,, an+1,bn+1a1b1c1n

  • n

  • i

    3nPiPi

  • ()

  • () :

  • sensitivitySnspecificitySp

  • () EST

  • () 53

  • () ,e1, i1, , in-1, en , ATG-1n-UAG

  • donor- gt acceptor- ag

  • gene A

  • i0, e1, i1, , en, in ij0jn el1ln i0in

  • DNA 13

    2

    3-i0, e1-en, in

  • sourcesink

  • () DNARNA

  • HPESE-mailWSwebCL/EXSC

  • RNA

    RNA

  • GCG GCG (Genetics Computer Group)

    140

  • GCG GenBankEMBL GCGPIRSWISS-PROTSP-TrEMBL

  • 1Gap: BestFit: FrameAlign: CompareDotPlot: GapShowProfileGap:

  • 2PileUp: HmmerAlignPlotSimilarityPrettyPrettyBoxMEMEHmmerBuildHmmerCalibrateProfileMakeProfileGapOverlapNoOverlapOldDistances

  • 3LookUp

    StringSearch

    Names

  • 4BLASTNetBLASTFastASsearchTFastA/TfastX/FastXFrameSearchMotifSearchHmmerSearchProfileSearchProfileSegmentsFindPatternsMotifsWordSearchHmmerPfamSegments

  • 5DNA/RNAMfoldDNARNAPlotFoldMfoldStemLoop

  • 6PAUPSearchPAUPDisplayDistancesDiverge

  • 7GelStartGelEnterGelMergeGelAssembleGelViewGelDisassemble

  • 8TestCodeCodonPreferenceFramesRepeatCompositionCodonFrequencyCorrespond

  • 9MapMapPlotMapSort: PeptideMapPlasmidMapPeptideSort:

  • 10PrimePrimePairMeltTemp

  • 11ProfileScanCoilScanHTHScanSPScanIsoelectric: PepPlotPeptideStructurePlotStructure

  • 12 ReverseShuffle CorruptSampleDataSetGCGToBLAST