positional association rules dr. bernard chen ph.d. university of central arkansas
TRANSCRIPT
Positional Association Rules
Dr. Bernard Chen Ph.D.University of Central Arkansas
Central Dogma of Molecular Biology
Amino Acids, the subunit of proteins
Protein Primary, Secondary, and Tertiary Structure
Protein 3D Structure
Protein Sequence Motif Although there are 20 amino acids, the
construction of protein primary structure is not randomly choose among those amino acids
Sequence Motif: A relatively small number of functionally
or structurally conserved sequence patterns that occurs repeatedly in a group of related proteins.
Protein Sequence Motif
These biologically significant regions orresidues are usually: Enzyme catalytic site Prostethic group attachment sites
(heme, pyridoxal-phosphate, biotin…) Amino acid involved in binding a metal
ion Cysteines involved in disulfide bonds Regions involved in binding a molecule
(ATP/ADP, GDP/GTP, Ca, DNA…)
HSSP-BLOSUM62 Measure
Part1Bioinformatics
Knowledge and Dataset Collection
Part2Discovering Protein
Sequence Motifs
Part3Motif Information
Extraction
Part4Mining the Relations between Motifs and
Motifs
Part5Protein Local Tertiary Structure Prediction
FutureWorks
Motivation In order to obtain the DNA/protein
sequence motifs information, fixing the length of sequence segments is usually necessary.
Due to the fixed size, they might deliver a number of similar motifs simply shifted by several bases or including mismatches
Example If there exists a biological sequence motif with
length of 12 and we set the window size to 9, it is highly possible that we discovered two similar sequence motifs where one motif covers the front part of the biological sequence motif and the other one covers the rear part.
Positional Association Rules The basic association rule gives the information
of A => B
However, under the circumstances of the “order” involved with the appearance of items, the basic association rule is not powerful enough
we introduce another parameter called “distance assurance” to help identify frequent itemset with frequent distance
Positional Association Rules
Pseudocode of Positional Association Rule with the Apriori concept Algorithm: Positional Association Rule with the Apriori ConceptInput: Database, D, (Protein sequences as Transactions and Sequence Motifs as items), min_support, min_confidence, and min_distance_assuranceOutput: P, positional association rules in D
Method: L = find_frequent_itemsets(D, min_support) S = find_strong_association_rules(L, min_confidence) for (k=2; Sk ≠ Ø; k++ ) for each strong association rule, r Sk antecedent_motif = Apriori_Motif_Construct(r_ant) consequence_motif = Apriori_Motif_Construct(r_con) if antecident_motif == NULL or consequence_motif == NULL: goto Step (4) for each protein sequence, ps D for (ant_position=1; |ps| ; ant_position++) if antecedent_motif start appear on ps[ant_position]: r_ant_count++ for (con_position=1; |ps| ; con_position++) if consequent_motif start appear on ps[con_position]: distance = ant_position – con_position rdistance ++ Pk = { rdistance | rdistance > min_distance_assurance * r_ant_count }
Apriori_Motif_Construct(itemset) if |itemset| == 1: return itemset else: for each positional association rules in P|itemset| if all items in the itemset appear in the positional association rule: return the new motif constructed by the positional association rule return NULL
Positional Association Rules Example
Positional Association Rules Example
minimum support = 60%, minimum confidence = 80%, minimum distance assurance =
60%
minimum support = 60%, minimum confidence = 80%, minimum distance assurance = 60%
Scan for C1 A: 3/5 A B: 5/5 B C: 2/5 => => AB, AD,
BDD: 4/5 DE: 1/5
minimum support = 60%, minimum confidence = 80%, minimum distance assurance = 60%
Scan for C2
AB: 3/5 ABAD: 3/5 => AD => ABDBD: 4/5 BD
minimum support = 60%, minimum confidence = 80%, minimum distance assurance = 60%
Scan for C3
ABD: 3/5 => ABD => no C4
minimum support = 60%, minimum confidence = 80%, minimum distance assurance = 60%
Therefore, the itemset that pass support: {AB, AD, BD, ABD}
Next, we need to compute their confidence
minimum support = 60%, minimum confidence = 80%, minimum distance assurance = 60%
First, we work on 2-itemset:{AB,AD,BD}
A=>B: 3/3 B=>A: 3/5A=>D: 3/3 D=>A: 3/4B=>D: 4/5D=>B: 4/4
minimum support = 60%, minimum confidence = 80%, minimum distance assurance = 60%
then, we work on 3-itemset:{ABD}
A=>BD: 3/3B=>AD: 3/5D=>AB: 3/4AB=>D: 3/3AD=>B: 3/3BD=>A: 3/4
minimum support = 60%, minimum confidence = 80%, minimum distance assurance = 60%
Thus, the strong association rules we have:
2-itemset 3-itemsetA=>B A=>BDA=>D AB=>DB=>D AD=>BD=>B
Next, we work on Positional Association rules…
Positional Association Rules D=>B minimum distance assurance = 60%
1. = 3/4 3. =1/4
2. = 1/4
)(3
BD
)(19
BD
)(20
BD
Positional Association Rules B=>D minimum distance assurance = 60%
1. = 3/6 3. = 1/6
2. = 1/6
)(3
DB
)(17
DB
)(19
DB
Positional Association Rules A=>B minimum distance assurance = 60%
1. = 2/4 3. = 1/4
2. = 1/4 4. = 1/4
)(2
BA
)(22
BA
)(25
BA
)(24
BA
Positional Association Rules A=>D minimum distance assurance = 60%
1. = 3/4
2. = 1/4
)(5
DA
)(28
DA
Positional Association Rules AD=>B minimum distance assurance = 60%
1. = 2/3
2. = 1/3
))((25
BDA
))((245
BDA
Positional Association Rules AB=>D minimum distance assurance = 60%
NO Positional Association Rules on AB !!!
Positional Association Rules A=>BD minimum distance assurance = 60%
1. = 2/4
2. = 1/4
))((32
BDA
))((325
BDA