readability combinatorial pattern matching (cpm) june 29, 2015 rayan chikhi, cnrs lille sofya...
DESCRIPTION
Questions ?? ??TRANSCRIPT
Readability
Combinatorial Pattern Matching (CPM)June 29, 2015
Rayan Chikhi, CNRS LilleSofya Raskhodnikova, Penn State
Paul Medvedev, Penn StateMartin Milanič, University of Primorska
Overlap Digraph (definition)• A string overlaps a string if there is a suffix of that is equal to a prefix of . • They overlap properly if, in addition, the suffix and prefix are both proper.• The overlap digraph of a set of strings is a digraph where each string is a
vertex and there is an edge if and only if properly overlaps .• Various variants of overlap graphs used in bioinformatics applications
ACGTA GTAAC
CCCCTGGACT
Questions• Do overlap digraphs have any properties or structure that can be
exploited?– Given a graph, Braga and Meidanis (2002) showed how to label the
vertices so that the graph is an overlap graph
• How does the set of graphs generated depend on the string length?– BM labeling used strings of length – Limiting the string length limits the graphs that can be generated
? ?
??
Readability in the digraph model• A labeling is an assignment of strings to vertices• Let be a directed graph.• An overlap labeling is a labeling such that is an edge if and only if the
string of x properly overlaps the string of y.• The readability of a digraph D, denoted , is the smallest nonnegative
integer such that there exists an injective overlap labeling of with strings of length .
ACGTA GTAAC
CCCCTGGACT2<𝑟 (𝐺 )≤5
Readability in the bipartite graph model
• Let be a bipartite graph.• An overlap labeling is a labeling such that is an edge if and only if the
string of x properly overlaps the string of y.• The readability of a bipartite graph , denoted r(G), is the smallest
nonnegative integer r such that there exists an injective overlap labeling of G with strings of length r.
• Thm: There exists a bijection such that for all – = set of bipartite graphs with nodes in each part– = set of all digraphs with nodes .
ACA
CAC
AGA
CAT
Examples• Complete bipartite graph on vertices ()
• Even cycle on vertices ()
41
12
12
23
23
34
34
41
Is there a simple and useful string-free formulation of readability?
𝑢1𝑢2
𝑣1𝑣2𝑒3
𝑒2𝑒1
P4-rule and P4 Lemma• A decomposition of size k is a weight function • Given an overlap labeling , the -decomposition is a decomposition
assigning each edge the length of the minimum overlap between and .• P4 Lemma: If is an overlap labeling, then the -decomposition satisfies the
following (called the P4-rule):– For every induced , if middle edge has the maximum weight, then
ℓ (𝑢¿¿1)¿
ℓ (𝑣¿¿2)¿𝑤(𝑒¿¿2)¿
ℓ (𝑣¿¿1)¿𝑤(𝑒¿¿1)¿
ℓ (𝑢¿¿2)¿ 𝑤(𝑒¿¿3)¿
Trees• Given a decomposition , we say that labeling achieves if it is an overlap
labeling and is the -decomposition. • Let be a tree. Theorem:
• P4 Lemma implies • Claim: if satisfies the P4-rule, then there exists a labeling achieving • Order edges by non-decreasing weight, and def• Inductively construct labeling for . Let
– Note that , because of -rule and is -free– Relabel and with – where A has length and is composed of new, non-repeating characters
𝑢𝑤(𝑢 ,𝑣) 𝑣ℓ 𝑗 (𝑣 ) A ℓ 𝑗 (𝑢) ℓ 𝑗 (𝑣 ) A ℓ 𝑗 (𝑢)
|ℓ 𝑗 (𝑢)| |ℓ 𝑗 (𝑣 )|
Proof of claim (key idea)
Case
𝑢𝑤(𝑢 ,𝑣)𝑣ℓ 𝑗 (𝑣 ) A ℓ 𝑗 (𝑢) ℓ 𝑗 (𝑣 ) A ℓ 𝑗 (𝑢)
𝑢 ′
Case
𝑢𝑤(𝑢 ,𝑣)𝑣ℓ 𝑗 (𝑣 ) ℓ 𝑗 (𝑣 )ℓ 𝑗 (𝑢)
𝑢 ′
ℓ 𝑗 (𝑢)
ℓ 𝑗 (𝑢′ )
ℓ 𝑗 (𝑢 ′)
For cycles, theorem not true
2
4
2 3
1
2
3
-free bipartite graphs• The strict -rule is
– For every induced , if middle edge has the maximum weight, then
• Theorem: For a -free bipartite graph • For graphs with , theorem not true
4
2
3 3
1
1
1
General bipartite graphs• Let be the subgraph of including only edges with weight .• Define as the size of the smallest decomposition satisfying the HUB-rule:
for all – bicliques: is a disjoint union of bicliques– hierarchical: If and have the same neighborhoods in , then they have
the same neighborhoods in for . • Thm:
𝑢1𝑢2
𝑣1𝑣2𝑖
𝑖𝑖𝑖
𝑢1𝑢2
𝑣1𝑖𝑖
ℓ (𝑢¿¿1)¿
ℓ (𝑣¿¿1)¿𝑖ℓ (𝑢¿¿2)¿
How large can readability be?
• Theorem: Almost all graphs have readability – via counting argument
Distinctness• Distinctness of two vertices in the same bipartition is the number of vertices
in one neighborhood and not the other (taking the max of the two values)• Distinctness of is the minimum distinctness over all pairs• Thm:
– Consider the decomposition of an optimal labeling– Case 1: every is a matching
• Adding a matching can increase the distinctness by at most one
– Case 2: Let be the last one that is not a matching• Using the fact that the decomposition satisfies the HUB-rule
𝑢1
𝑢2
𝑗𝑗
¿ 𝑗¿ 𝑗 ≥𝐷(𝐺)
Hadamard Graphs• bipartite graph
– vertices assigned -long binary codewords– edge if the inner-product of the codewords is odd
𝐻300
01
10
11
00
01
10
11
𝐻2
• Theorem:
Trees
1
12
2
2 3
3
33
𝑤1
𝑤1
𝑤2>𝑤1
• Thm: – For all trees , – For full k-ary tree of height k,
• Assume fsoc there exists an opt decomp of size • A path from root to leaf with distinct edge weights,
with values, with edges
𝑘=3
𝑤3>𝑤2
𝑤2
𝑤2𝑤3
𝑤3
ConclusionsResults• A string-free formulation of readability that is
– exactly equivalent for trees– asymptotically equivalent for -free bipartite graphs– “weakly” equivalent for general graphs
• Existence of a graph family with readability of
Open problems• Find other rules that an -decomposition must satisfy to close the gap : • Let
– We know – Do there exists graphs with ?
• Complexity• Understand graphs that have poly-logarithmic readability
The end
Combinatorial Pattern Matching (CPM)June 29, 2015
Rayan Chikhi, CNRS LilleSofya Raskhodnikova, Penn State
Paul Medvedev, Penn StateMartin Milanič, University of Primorska
General graphs• Define for as the subgraph of including only edges with weight at most .• Lem: An -decomposition satisfies the following (HUB-rule), for all
– is a disjoint union of bicliques– If and have the same neighborhoods in , then they have the same
neighborhoods in for .
𝑢1𝑢2
𝑣1𝑣2𝑖
𝑖𝑖𝑖
𝑢1𝑢2
𝑣1𝑖𝑖
ℓ (𝑢¿¿1)¿
ℓ (𝑣¿¿1)¿𝑖ℓ (𝑢¿¿2)¿
• Define as the size of the smallest decomposition satisfying the HUB-rule.• Thm:
Questions/Results• Do there exists graphs with readability
Almost all graphs have readability
• Counting argument– There are bipartite graphs with vertices.– There are at most labellings of length