lecture: graph-based genome scaffolding - igmigm.univ-mlv.fr/~mweller/scaf_lec.pdf · sanger...
TRANSCRIPT
DNA
- double strand
- inside nucleus (safe)
RNA
- single strand
- outside nucleus
- transfers genetic code
- Thymine (T) → Uracil (U)
Polymerase
2/33
DNA
- double strand
- inside nucleus (safe)
RNA
- single strand
- outside nucleus
- transfers genetic code
- Thymine (T) → Uracil (U)
Polymerase
2/33
Sanger Sequencing [Sanger et al ’77]
CCTGGACGGGTCAGACATGACAGTGGCCCCAAGATTCACAAGATCGTATCTCAATACAGTAAACGAGCAATGGACCTGCCCAGTCTGTACTGTCACCGGGGTTCTAAGTGTTCTAGCATAGAGTTATGTCATTTGCTCGTTA
GGA*GGACCTGCCCA*GGACCTGCCCAGTCTGTA*
Sanger Sequencing
1. split helix & create thousands of copies
2. add polymerase & floating bases: A C G T3. add a special base: A* (polymerase cannot extend)
4. stir & let polymerase act
5. measure the length of each fragment
each length is the position of a T in the template
Problem
unreliable after a couple hundred bp
chop up DNA into pieces and read those
3/33
Sanger Sequencing [Sanger et al ’77]
CCTGGACGGGTCAGACATGACAGTGGCCCCAAGATTCACAAGATCGTATCTCAATACAGTAAACGAGCAATGGACCTGCCCAGTCTGTACTGTCACCGGGGTTCTAAGTGTTCTAGCATAGAGTTATGTCATTTGCTCGTTA
GGA*GGACCTGCCCA*GGACCTGCCCAGTCTGTA*
Sanger Sequencing
1. split helix & create thousands of copies
2. add polymerase & floating bases: A C G T3. add a special base: A* (polymerase cannot extend)
4. stir & let polymerase act
5. measure the length of each fragment
each length is the position of a T in the template
Problem
unreliable after a couple hundred bp
chop up DNA into pieces and read those
3/33
Sanger Sequencing [Sanger et al ’77]
CCTGGACGGGTCAGACATGACAGTGGCCCCAAGATTCACAAGATCGTATCTCAATACAGTAAACGAGCAAT
GGACCTGCCCAGTCTGTACTGTCACCGGGGTTCTAAGTGTTCTAGCATAGAGTTATGTCATTTGCTCGTTAGGA*GGACCTGCCCA*GGACCTGCCCAGTCTGTA*
Sanger Sequencing
1. split helix & create thousands of copies
2. add polymerase & floating bases: A C G T3. add a special base: A* (polymerase cannot extend)
4. stir & let polymerase act
5. measure the length of each fragment
each length is the position of a T in the template
Problem
unreliable after a couple hundred bp
chop up DNA into pieces and read those
3/33
Sanger Sequencing [Sanger et al ’77]
CCTGGACGGGTCAGACATGACAGTGGCCCCAAGATTCACAAGATCGTATCTCAATACAGTAAACGAGCAAT
GGACCTGCCCAGTCTGTACTGTCACCGGGGTTCTAAGTGTTCTAGCATAGAGTTATGTCATTTGCTCGTTAGGA*GGACCTGCCCA*GGACCTGCCCAGTCTGTA*
Sanger Sequencing
1. split helix & create thousands of copies
2. add polymerase & floating bases: A C G T3. add a special base: A* (polymerase cannot extend)
4. stir & let polymerase act
5. measure the length of each fragment
each length is the position of a T in the template
Problem
unreliable after a couple hundred bp
chop up DNA into pieces and read those
3/33
Sanger Sequencing [Sanger et al ’77]
CCTGGACGGGTCAGACATGACAGTGGCCCCAAGATTCACAAGATCGTATCTCAATACAGTAAACGAGCAAT
GGACCTGCCCAGTCTGTACTGTCACCGGGGTTCTAAGTGTTCTAGCATAGAGTTATGTCATTTGCTCGTTAGGA*GGACCTGCCCA*GGACCTGCCCAGTCTGTA*
Sanger Sequencing
1. split helix & create thousands of copies
2. add polymerase & floating bases: A C G T3. add a special base: A* (polymerase cannot extend)
4. stir & let polymerase act
5. measure the length of each fragment
each length is the position of a T in the template
Problem
unreliable after a couple hundred bp
chop up DNA into pieces and read those
3/33
Sanger Sequencing [Sanger et al ’77]
CCTGGACGGGTCAGACATGACAGTGGCCCCAAGATTCACAAGATCGTATCTCAATACAGTAAACGAGCAAT
GGACCTGCCCAGTCTGTACTGTCACCGGGGTTCTAAGTGTTCTAGCATAGAGTTATGTCATTTGCTCGTTA
GGA*
GGACCTGCCCA*GGACCTGCCCAGTCTGTA*
Sanger Sequencing
1. split helix & create thousands of copies
2. add polymerase & floating bases: A C G T3. add a special base: A* (polymerase cannot extend)
4. stir & let polymerase act
5. measure the length of each fragment
each length is the position of a T in the template
Problem
unreliable after a couple hundred bp
chop up DNA into pieces and read those
3/33
Sanger Sequencing [Sanger et al ’77]
CCTGGACGGGTCAGACATGACAGTGGCCCCAAGATTCACAAGATCGTATCTCAATACAGTAAACGAGCAAT
GGACCTGCCCAGTCTGTACTGTCACCGGGGTTCTAAGTGTTCTAGCATAGAGTTATGTCATTTGCTCGTTA
GGA*GGACCTGCCCA*
GGACCTGCCCAGTCTGTA*
Sanger Sequencing
1. split helix & create thousands of copies
2. add polymerase & floating bases: A C G T3. add a special base: A* (polymerase cannot extend)
4. stir & let polymerase act
5. measure the length of each fragment
each length is the position of a T in the template
Problem
unreliable after a couple hundred bp
chop up DNA into pieces and read those
3/33
Sanger Sequencing [Sanger et al ’77]
CCTGGACGGGTCAGACATGACAGTGGCCCCAAGATTCACAAGATCGTATCTCAATACAGTAAACGAGCAAT
GGACCTGCCCAGTCTGTACTGTCACCGGGGTTCTAAGTGTTCTAGCATAGAGTTATGTCATTTGCTCGTTA
GGA*GGACCTGCCCA*GGACCTGCCCAGTCTGTA*
Sanger Sequencing
1. split helix & create thousands of copies
2. add polymerase & floating bases: A C G T3. add a special base: A* (polymerase cannot extend)
4. stir & let polymerase act
5. measure the length of each fragment
each length is the position of a T in the template
Problem
unreliable after a couple hundred bp
chop up DNA into pieces and read those
3/33
Sanger Sequencing [Sanger et al ’77]
CCTGGACGGGTCAGACATGACAGTGGCCCCAAGATTCACAAGATCGTATCTCAATACAGTAAACGAGCAAT
GGACCTGCCCAGTCTGTACTGTCACCGGGGTTCTAAGTGTTCTAGCATAGAGTTATGTCATTTGCTCGTTA
GGA*GGACCTGCCCA*GGACCTGCCCAGTCTGTA*
Sanger Sequencing
1. split helix & create thousands of copies
2. add polymerase & floating bases: A C G T3. add a special base: A* (polymerase cannot extend)
4. stir & let polymerase act
5. measure the length of each fragment
each length is the position of a T in the template
Problem
unreliable after a couple hundred bp
chop up DNA into pieces and read those
3/33
Sanger Sequencing [Sanger et al ’77]
CCTGGACGGGTCAGACATGACAGTGGCCCCAAGATTCACAAGATCGTATCTCAATACAGTAAACGAGCAAT
GGACCTGCCCAGTCTGTACTGTCACCGGGGTTCTAAGTGTTCTAGCATAGAGTTATGTCATTTGCTCGTTA
GGA*GGACCTGCCCA*GGACCTGCCCAGTCTGTA*
Sanger Sequencing
1. split helix & create thousands of copies
2. add polymerase & floating bases: A C G T3. add a special base: A* (polymerase cannot extend)
4. stir & let polymerase act
5. measure the length of each fragment
each length is the position of a T in the template
Problem
unreliable after a couple hundred bp
chop up DNA into pieces and read those
3/33
Next Generation Sequencing ( )
ACCA
AGTCTGGAGAGTC
TGAGTACCA
ACTCA......ACCTCTGGTACTCA......ACCTCTCAGTGGTACTCA......ACCTCTCAGACCTCTCAG
ACTCATGGTCTGAGAGGT......TGAGTACCA
TGGTACTCA......ACCTCTCAG
1. chop DNA into smaller pieces
2. add anchors to each end of each piece
3. “flow cell” containing anchor places
4. strand anchors its two ends to two anchor places
5. polymerase completes the strand into double-strand
6. double strand is denaturized into single strands
7. rinse, repeat (last 3 steps) until flow chip is “full”
8. read all strands from their anchor points outwards
Paired-End reads (distance between reads = “insert size”)
4/33
Next Generation Sequencing ( )
ACCA
AGTCTGGAGAGTC
TGAGTACCA
ACTCA......ACCTC
TGGTACTCA......ACCTCTCAGTGGTACTCA......ACCTCTCAGACCTCTCAG
ACTCATGGTCTGAGAGGT......TGAGTACCA
TGGTACTCA......ACCTCTCAG
1. chop DNA into smaller pieces
2. add anchors to each end of each piece
3. “flow cell” containing anchor places
4. strand anchors its two ends to two anchor places
5. polymerase completes the strand into double-strand
6. double strand is denaturized into single strands
7. rinse, repeat (last 3 steps) until flow chip is “full”
8. read all strands from their anchor points outwards
Paired-End reads (distance between reads = “insert size”)
4/33
Next Generation Sequencing ( )
ACCA
AGTCTGGAGAGTC
TGAGTACCA
ACTCA......ACCTC
TGGTACTCA......ACCTCTCAG
TGGTACTCA......ACCTCTCAGACCTCTCAG
ACTCATGGTCTGAGAGGT......TGAGTACCA
TGGTACTCA......ACCTCTCAG
1. chop DNA into smaller pieces
2. add anchors to each end of each piece
3. “flow cell” containing anchor places
4. strand anchors its two ends to two anchor places
5. polymerase completes the strand into double-strand
6. double strand is denaturized into single strands
7. rinse, repeat (last 3 steps) until flow chip is “full”
8. read all strands from their anchor points outwards
Paired-End reads (distance between reads = “insert size”)
4/33
Next Generation Sequencing ( )
ACCA
AGTC
TGGAGAGTC
TGAGTACCA
ACTCA......ACCTC
TGGTACTCA......ACCTCTCAG
TGGTACTCA......ACCTCTCAGACCTCTCAG
ACTCATGGTCTGAGAGGT......TGAGTACCA
TGGTACTCA......ACCTCTCAG
1. chop DNA into smaller pieces
2. add anchors to each end of each piece
3. “flow cell” containing anchor places
4. strand anchors its two ends to two anchor places
5. polymerase completes the strand into double-strand
6. double strand is denaturized into single strands
7. rinse, repeat (last 3 steps) until flow chip is “full”
8. read all strands from their anchor points outwards
Paired-End reads (distance between reads = “insert size”)
4/33
Next Generation Sequencing ( )
ACCA
AGTC
TGGAGAGTC
TGAGTACCA
ACTCA......ACCTCTGGTACTCA......ACCTCTCAG
TGGTACTCA......ACCTCTCAG
ACCTCTCAG
ACTCATGGTCTGAGAGGT......TGAGTACCA
TGGTACTCA......ACCTCTCAG
1. chop DNA into smaller pieces
2. add anchors to each end of each piece
3. “flow cell” containing anchor places
4. strand anchors its two ends to two anchor places
5. polymerase completes the strand into double-strand
6. double strand is denaturized into single strands
7. rinse, repeat (last 3 steps) until flow chip is “full”
8. read all strands from their anchor points outwards
Paired-End reads (distance between reads = “insert size”)
4/33
Next Generation Sequencing ( )
ACCA
AGTC
TGGAGAGTC
TGAGTACCA
ACTCA......ACCTCTGGTACTCA......ACCTCTCAGTGGTACTCA......ACCTCTCAG
ACCTCTCAG
ACTCATGGT
CTGAGAGGT......TGAGTACCA
TGGTACTCA......ACCTCTCAG
1. chop DNA into smaller pieces
2. add anchors to each end of each piece
3. “flow cell” containing anchor places
4. strand anchors its two ends to two anchor places
5. polymerase completes the strand into double-strand
6. double strand is denaturized into single strands
7. rinse, repeat (last 3 steps) until flow chip is “full”
8. read all strands from their anchor points outwards
Paired-End reads (distance between reads = “insert size”)
4/33
Next Generation Sequencing ( )
ACCA
AGTC
TGGAGAGTC
TGAGTACCA
ACTCA......ACCTCTGGTACTCA......ACCTCTCAGTGGTACTCA......ACCTCTCAG
ACCTCTCAG
ACTCATGGT
CTGAGAGGT......TGAGTACCA
TGGTACTCA......ACCTCTCAG
1. chop DNA into smaller pieces
2. add anchors to each end of each piece
3. “flow cell” containing anchor places
4. strand anchors its two ends to two anchor places
5. polymerase completes the strand into double-strand
6. double strand is denaturized into single strands
7. rinse, repeat (last 3 steps) until flow chip is “full”
8. read all strands from their anchor points outwards
Paired-End reads (distance between reads = “insert size”)
4/33
Next Generation Sequencing ( )
ACCA
AGTCTGGAGAGTC
TGAGTACCA
ACTCA......ACCTCTGGTACTCA......ACCTCTCAGTGGTACTCA......ACCTCTCAGACCTCTCAG
ACTCATGGT
CTGAGAGGT......TGAGTACCA
TGGTACTCA......ACCTCTCAG
1. chop DNA into smaller pieces
2. add anchors to each end of each piece
3. “flow cell” containing anchor places
4. strand anchors its two ends to two anchor places
5. polymerase completes the strand into double-strand
6. double strand is denaturized into single strands
7. rinse, repeat (last 3 steps) until flow chip is “full”
8. read all strands from their anchor points outwards
Paired-End reads (distance between reads = “insert size”)
4/33
Next Generation Sequencing ( )
ACCA
AGTCTGGAGAGTC
TGAGTACCA
ACTCA......ACCTCTGGTACTCA......ACCTCTCAGTGGTACTCA......ACCTCTCAGACCTCTCAG
ACTCATGGT
CTGAGAGGT......TGAGTACCA
TGGTACTCA......ACCTCTCAG
1. chop DNA into smaller pieces
2. add anchors to each end of each piece
3. “flow cell” containing anchor places
4. strand anchors its two ends to two anchor places
5. polymerase completes the strand into double-strand
6. double strand is denaturized into single strands
7. rinse, repeat (last 3 steps) until flow chip is “full”
8. read all strands from their anchor points outwards
Paired-End reads (distance between reads = “insert size”)
4/33
Next Generation Sequencing ( )
ACCA
AGTCTGGAGAGTC
TGAGTACCA
ACTCA......ACCTCTGGTACTCA......ACCTCTCAGTGGTACTCA......ACCTCTCAGACCTCTCAG
ACTCATGGT
CTGAGAGGT......TGAGTACCA
TGGTACTCA......ACCTCTCAG
1. chop DNA into smaller pieces
2. add anchors to each end of each piece
3. “flow cell” containing anchor places
4. strand anchors its two ends to two anchor places
5. polymerase completes the strand into double-strand
6. double strand is denaturized into single strands
7. rinse, repeat (last 3 steps) until flow chip is “full”
8. read all strands from their anchor points outwards
Paired-End reads (distance between reads = “insert size”)
4/33
Next Generation Sequencing ( )
ACCA
AGTCTGGAGAGTC
TGAGTACCA
ACTCA......ACCTCTGGTACTCA......ACCTCTCAGTGGTACTCA......ACCTCTCAGACCTCTCAG
ACTCATGGT
CTGAGAGGT......TGAGTACCA
TGGTACTCA......ACCTCTCAG
1. chop DNA into smaller pieces
2. add anchors to each end of each piece
3. “flow cell” containing anchor places
4. strand anchors its two ends to two anchor places
5. polymerase completes the strand into double-strand
6. double strand is denaturized into single strands
7. rinse, repeat (last 3 steps) until flow chip is “full”
8. read all strands from their anchor points outwards
Paired-End reads (distance between reads = “insert size”)
4/33
Sequence Assembly: Overview
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTT CGACACTCCTTGGGTTTT CTAGGCCATTGATTGCGGGTCACTTCGC GGTTCTCT GGTCCAGGTGCTGTCAACGACA
TCGCTAGGGTTCTCTAACGA TTTACGTCGCGG CGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGC CGACACTCCTTGGGTTTT GGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGG CTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGNNNNCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
Goal: reconstruct sequence
Idea: overlap reads
5/33
Sequence Assembly: Overview
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTT CGACACTCCTTGGGTTTT CTAGGCCATTGATTGCGGGTC
ACTTCGC GGTTCTCT GGTCCAGGTGCTGTCAACGACA
TCGCTAGGGTTCTCTAACGA TTTACGTCGCGG CGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGC CGACACTCCTTGGGTTTT GGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGG CTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGNNNNCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
Goal: reconstruct sequence
Idea: overlap reads
5/33
Sequence Assembly: Overview
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTT CGACACTCCTTGGGTTTT CTAGGCCATTGATTGCGGGTC
ACTTCGC GGTTCTCT GGTCCAGGTGCTGTCAACGACA
TCGCTAGGGTTCTCTAACGA TTTACGTCGCGG CGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGC CGACACTCCTTGGGTTTT GGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGG CTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGNNNNCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
Goal: reconstruct sequence
Idea: overlap reads
Problem 1: parts of the sequence might not be covered by reads
sequence with “high coverage”
5/33
Sequence Assembly: Overview
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTT CGACACTCCTTGGGTTTT CTAGGCCATTGATTGCGGGTC
ACTTCGC GGTTCTCT GGTCCAGGTGCTGTCAACGACA
TCGCTAGGGTTCTCTAACGA TTTACGTCGCGG CGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGC CGACACTCCTTGGGTTTT GGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGG CTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGNNNNCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
Goal: reconstruct sequence
Idea: overlap reads
Problem 1: parts of the sequence might not be covered by reads
sequence with “high coverage”
5/33
Sequence Assembly: Overview
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTT CGACACTCCTTGGGTTTT CTAGGCCATTGATTGCGGGTC
ACTTCGC GGTTCTCT GGTCCAGGTGCTGTCAACGACATCGCTAGGGTTCTCTAACGA TTTACGTCGCGG CGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGC CGACACTCCTTGGGTTTT GGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGG CTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGNNNNCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
Goal: reconstruct sequence
Idea: overlap reads
Problem 1: parts of the sequence might not be covered by reads
sequence with “high coverage”
5/33
Sequence Assembly: Overview
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTT CGACACTCCTTGGGTTTT CTAGGCCATTGATTGCGGGTC
ACTTCGC GGTTCTCT GGTCCAGGTGCTGTCAACGACATCGCTAGGGTTCTCTAACGA TTTACGTCGCGG CGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGC CGACACTCCTTGGGTTTT GGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGG CTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGNNNNCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
Goal: reconstruct sequence
Idea: overlap reads
Problem 1: parts of the sequence might not be covered by reads
sequence with “high coverage”
5/33
Sequence Assembly: Overview
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTT CGACACTCCTTGGGTTTT CTAGGCCATTGATTGCGGGTC
ACTTCGC GGTTCTCT GGTCCAGGTGCTGTCAACGACATCGCTAGGGTTCTCTAACGA TTTACGTCGCGG CGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGC CGACACTCCTTGGGTTTT GGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGG CTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGNNNNCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
Goal: reconstruct sequence
Idea: overlap reads
Problem 2: Shortest Common Superstring is NP-hard
“Overlap-Layout-Consensus” assemblers
Problem: Θ(n2) too slow in practice
DeBruijn-graph based assembly
1. chop all reads into “k-mers”
2. builds overlap graph
(“DeBruijn graph”)
3. find
k = 4
GAAC
AACT
ACTTCTTC
TTCG
TCGC
CGCT
CCTT
CTTG
TTGG
5/33
Sequence Assembly: Overview
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTT CGACACTCCTTGGGTTTT CTAGGCCATTGATTGCGGGTC
ACTTCGC GGTTCTCT GGTCCAGGTGCTGTCAACGACATCGCTAGGGTTCTCTAACGA TTTACGTCGCGG CGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGC CGACACTCCTTGGGTTTT GGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGG CTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGNNNNCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
Goal: reconstruct sequence
Idea: overlap reads
Problem 2: Shortest Common Superstring is NP-hard
“Overlap-Layout-Consensus” assemblers
1. produce best pairwise overlaps
2. layout the reads according to the overlaps
3. for each position, compute consensus base
Problem: Θ(n2) too slow in practice
DeBruijn-graph based assembly
1. chop all reads into “k-mers”
2. builds overlap graph
(“DeBruijn graph”)
3. find
k = 4
GAAC
AACT
ACTTCTTC
TTCG
TCGC
CGCT
CCTT
CTTG
TTGG
5/33
Sequence Assembly: Overview
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTC CGACACTCCTTGGGTTTT CTAGGCCATTGATTGCGGGTC
ACTTCGC GGTTCTCT GGTCCAGGTGCTGTCAACGACATCGCTAGGGTTCTCTAACGA TTTACGTCGCGG CGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGC CGACACTCCTTGGGTTTT GGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGG CTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGNNNNCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
Goal: reconstruct sequence
Idea: overlap reads
Problem 2: Shortest Common Superstring is NP-hard
“Overlap-Layout-Consensus” assemblers
1. produce best pairwise overlaps
2. layout the reads according to the overlaps
3. for each position, compute consensus base
Problem: Θ(n2) too slow in practice
DeBruijn-graph based assembly
1. chop all reads into “k-mers”
2. builds overlap graph
(“DeBruijn graph”)
3. find
k = 4
GAAC
AACT
ACTTCTTC
TTCG
TCGC
CGCT
CCTT
CTTG
TTGG
5/33
Sequence Assembly: Overview
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTT CGACACTCCTTGGGTTTT CTAGGCCATTGATTGCGGGTC
ACTTCGC GGTTCTCT GGTCCAGGTGCTGTCAACGACATCGCTAGGGTTCTCTAACGA TTTACGTCGCGG CGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGC CGACACTCCTTGGGTTTT GGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGG CTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGNNNNCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
Goal: reconstruct sequence
Idea: overlap reads
Problem 2: Shortest Common Superstring is NP-hard
“Overlap-Layout-Consensus” assemblers
Problem: Θ(n2) too slow in practice
DeBruijn-graph based assembly
1. chop all reads into “k-mers”
2. builds overlap graph
(“DeBruijn graph”)
3. find
k = 4
GAAC
AACT
ACTTCTTC
TTCG
TCGC
CGCT
CCTT
CTTG
TTGG
5/33
Sequence Assembly: Overview
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTT CGACACTCCTTGGGTTTT CTAGGCCATTGATTGCGGGTC
ACTTCGC GGTTCTCT GGTCCAGGTGCTGTCAACGACATCGCTAGGGTTCTCTAACGA TTTACGTCGCGG CGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGC CGACACTCCTTGGGTTTT GGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGG CTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGNNNNCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
Goal: reconstruct sequence
Idea: overlap reads
Problem 2: Shortest Common Superstring is NP-hard
“Overlap-Layout-Consensus” assemblers
Problem: Θ(n2) too slow in practice
DeBruijn-graph based assembly
1. chop all reads into “k-mers”
2. builds overlap graph
(“DeBruijn graph”)
3. find path using all overlaps
k = 4
GAAC
AACT
ACTTCTTC
TTCG
TCGC
CGCT
CCTT
CTTG
TTGG
5/33
Sequence Assembly: Overview
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTT CGACACTCCTTGGGTTTT CTAGGCCATTGATTGCGGGTC
ACTTCGC GGTTCTCT GGTCCAGGTGCTGTCAACGACATCGCTAGGGTTCTCTAACGA TTTACGTCGCGG CGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGC CGACACTCCTTGGGTTTT GGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGG CTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGNNNNCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
Goal: reconstruct sequence
Idea: overlap reads
Problem 2: Shortest Common Superstring is NP-hard
“Overlap-Layout-Consensus” assemblers
Problem: Θ(n2) too slow in practice
DeBruijn-graph based assembly
1. chop all reads into “k-mers”
2. builds overlap graph
(“DeBruijn graph”)
3. find Eulerian path
k = 4
GAAC
AACT
ACTTCTTC
TTCG
TCGC
CGCT
CCTT
CTTG
TTGG
5/33
Sequence Assembly: Overview
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTT CGACACTCCTTGGGTTTT CTAGGCCATTGATTGCGGGTC
ACTTCGC GGTTCTCT GGTCCAGGTGCTGTCAACGACATCGCTAGGGTTCTCTAACGA TTTACGTCGCGG CGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGC CGACACTCCTTGGGTTTT GGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGG CTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGNNNNCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
Goal: reconstruct sequence
Idea: overlap reads
Problem 2: Shortest Common Superstring is NP-hard
“Overlap-Layout-Consensus” assemblers
Problem: Θ(n2) too slow in practice
DeBruijn-graph based assembly
1. chop all reads into “k-mers”
2. builds overlap graph
(“DeBruijn graph”)
3. find Eulerian path
k = 4
GAAC
AACT
ACTTCTTC
TTCG
TCGC
CGCTCCTT
CTTG
TTGG
5/33
Sequence Assembly: Overview
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTT CGACACTCCTTGGGTTTT CTAGGCCATTGATTGCGGGTC
ACTTCGC GGTTCTCT GGTCCAGGTGCTGTCAACGACATCGCTAGGGTTCTCTAACGA TTTACGTCGCGG CGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGC CGACACTCCTTGGGTTTT GGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGG CTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGNNNNCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
Goal: reconstruct sequence
Idea: overlap reads
Problem 2: Shortest Common Superstring is NP-hard
“Overlap-Layout-Consensus” assemblers
Problem: Θ(n2) too slow in practice
DeBruijn-graph based assembly
1. chop all reads into “k-mers”
2. builds overlap graph
(“DeBruijn graph”)
3. find Eulerian path
k = 4
GAAC
AACT
ACTTCTTC
TTCG
TCGC
CGCTCCTT
CTTG
TTGG
5/33
Sequence Assembly: Overview
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTT CGACACTCCTTGGGTTTT CTAGGCCATTGATTGCGGGTC
ACTTCGC GGTTCTCT GGTCCAGGTGCTGTCAACGACATCGCTAGGGTTCTCTAACGA TTTACGTCGCGG CGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGC CGACACTCCTTGGGTTTT GGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGG CTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGNNNNCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
Goal: reconstruct sequence
Idea: overlap reads
Problem 3: repeats (common in DNA) make assembly ambiguous
end product is a set of “contiguous regions”
Problem: “contig soup” not very useful
But: we have paired-end information!
5/33
Sequence Assembly: Overview
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTT CGACACTCCTTGGGTTTT CTAGGCCATTGATTGCGGGTC
ACTTCGC GGTTCTCT GGTCCAGGTGCTGTCAACGACATCGCTAGGGTTCTCTAACGA TTTACGTCGCGG CGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGC CGACACTCCTTGGGTTTT GGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGG CTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGNNNNCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
Goal: reconstruct sequence
Idea: overlap reads
Problem 3: repeats (common in DNA) make assembly ambiguous
end product is a set of “contiguous regions”
Problem: “contig soup” not very useful
But: we have paired-end information!
5/33
Sequence Assembly: Overview
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTT CGACACTCCTTGGGTTTT CTAGGCCATTGATTGCGGGTC
ACTTCGC GGTTCTCT GGTCCAGGTGCTGTCAACGACATCGCTAGGGTTCTCTAACGA TTTACGTCGCGG CGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGC CGACACTCCTTGGGTTTT GGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGG CTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGNNNNCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
Goal: reconstruct sequence
Idea: overlap reads
Problem 3: repeats (common in DNA) make assembly ambiguous
end product is a set of “contiguous regions”
Problem: “contig soup” not very useful
But: we have paired-end information!
5/33
Sequence Assembly: Overview
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTT CGACACTCCTTGGGTTTT CTAGGCCATTGATTGCGGGTC
ACTTCGC GGTTCTCT GGTCCAGGTGCTGTCAACGACATCGCTAGGGTTCTCTAACGA TTTACGTCGCGG CGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGC CGACACTCCTTGGGTTTT GGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGG CTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGNNNNCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
Goal: reconstruct sequence
Idea: overlap reads
Problem 3: repeats (common in DNA) make assembly ambiguous
end product is a set of “contiguous regions”
Problem: “contig soup” not very useful
But: we have paired-end information!
5/33
Sequence Assembly: Overview
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTT CGACACTCCTTGGGTTTT CTAGGCCATTGATTGCGGGTC
ACTTCGC GGTTCTCT GGTCCAGGTGCTGTCAACGACATCGCTAGGGTTCTCTAACGA TTTACGTCGCGG CGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGC CGACACTCCTTGGGTTTT GGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGG CTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGNNNNCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
Goal: reconstruct sequence
Idea: overlap reads
Problem 3: repeats (common in DNA) make assembly ambiguous
end product is a set of “contiguous regions”
Problem: “contig soup” not very useful
But: we have paired-end information!
5/33
Sequence Assembly: Overview
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTT CGACACTCCTTGGGTTTT CTAGGCCATTGATTGCGGGTC
ACTTCGC GGTTCTCT GGTCCAGGTGCTGTCAACGACATCGCTAGGGTTCTCTAACGA TTTACGTCGCGG CGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGC CGACACTCCTTGGGTTTT GGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGG CTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGNNNNCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
Goal: reconstruct sequence
Idea: overlap reads
Problem 3: repeats (common in DNA) make assembly ambiguous
end product is a set of “contiguous regions”
Problem: “contig soup” not very useful
But: we have paired-end information!
5/33
Sequence Assembly: Overview
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTT CGACACTCCTTGGGTTTT CTAGGCCATTGATTGCGGGTC
ACTTCGC GGTTCTCT GGTCCAGGTGCTGTCAACGACATCGCTAGGGTTCTCTAACGA TTTACGTCGCGG CGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGC CGACACTCCTTGGGTTTT GGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGTTCTCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTACTTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGG CTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
TTTGCCCCTGAACTTCGCTAGGGTTCTCTAACGACACTCCTTGGGTTTTTACGTCGCGGNNNNCTAGGCCATTGATTGCGGGTCCAGGTGCTGTCAACGACACTCCTTGGGTTTTTAC
Goal: reconstruct sequence
Idea: overlap reads
Problem 3: repeats (common in DNA) make assembly ambiguous
end product is a set of “contiguous regions”
Problem: “contig soup” not very useful
But: we have paired-end information!
5/33
Genome Scaffolding: Previous Work
Goal: order & orient contigs
Idea: use pairing information on reads to “link” contigs together
- SOPRA [Dayarian, Michael, Sengupta, BMC Bioinf. 11, ’10]
I removes reads in high-coverage area (likely repeats)I orientation step (heuristic) + ordering step (heuristic)I coded in Pearl (!!!)I (observed sparse contig graph)
- SSPACE [Boetzer & al., Bioinf. 27(4), ’11]
- OPERA [Gao, Sung, Ngaraja, JCB. 18(11), ’11]
- GRASS [Gritsenko & al., Bioinf. 28(11), ’12]
- SCARPA [Donmez, Brudno, Bioinf. 29(4), ’13]
- . . . [Huson & al., JACM, ’02][Nieuwerburgh & al., NAR, ’12]
6/33
Genome Scaffolding: Previous Work
Goal: order & orient contigs
Idea: use pairing information on reads to “link” contigs together
- SOPRA [Dayarian, Michael, Sengupta, BMC Bioinf. 11, ’10]
- SSPACE [Boetzer & al., Bioinf. 27(4), ’11]
I heuristic contig extensionI “reasonable time”
- OPERA [Gao, Sung, Ngaraja, JCB. 18(11), ’11]
- GRASS [Gritsenko & al., Bioinf. 28(11), ’12]
- SCARPA [Donmez, Brudno, Bioinf. 29(4), ’13]
- . . . [Huson & al., JACM, ’02][Nieuwerburgh & al., NAR, ’12]
6/33
Genome Scaffolding: Previous Work
Goal: order & orient contigs
Idea: use pairing information on reads to “link” contigs together
- SOPRA [Dayarian, Michael, Sengupta, BMC Bioinf. 11, ’10]
- SSPACE [Boetzer & al., Bioinf. 27(4), ’11]
- OPERA [Gao, Sung, Ngaraja, JCB. 18(11), ’11]
I np+O(1) time (p =#edge-deletions (≥ feedback edge set))I most work done by a heuristic “graph contraction”
- GRASS [Gritsenko & al., Bioinf. 28(11), ’12]
- SCARPA [Donmez, Brudno, Bioinf. 29(4), ’13]
- . . . [Huson & al., JACM, ’02][Nieuwerburgh & al., NAR, ’12]
6/33
Genome Scaffolding: Previous Work
Goal: order & orient contigs
Idea: use pairing information on reads to “link” contigs together
- SOPRA [Dayarian, Michael, Sengupta, BMC Bioinf. 11, ’10]
- SSPACE [Boetzer & al., Bioinf. 27(4), ’11]
- OPERA [Gao, Sung, Ngaraja, JCB. 18(11), ’11]
- GRASS [Gritsenko & al., Bioinf. 28(11), ’12]
I Mixed-Integer Quadratic ProgrammingI deals with uncertain data (slack variables)
“intractable even for small # of contigs”
I heuristic workaround:I solve relaxed formulation & use slack values ILP
- SCARPA [Donmez, Brudno, Bioinf. 29(4), ’13]
- . . . [Huson & al., JACM, ’02][Nieuwerburgh & al., NAR, ’12]
6/33
Genome Scaffolding: Previous Work
Goal: order & orient contigs
Idea: use pairing information on reads to “link” contigs together
- SOPRA [Dayarian, Michael, Sengupta, BMC Bioinf. 11, ’10]
- SSPACE [Boetzer & al., Bioinf. 27(4), ’11]
- OPERA [Gao, Sung, Ngaraja, JCB. 18(11), ’11]
- GRASS [Gritsenko & al., Bioinf. 28(11), ’12]
- SCARPA [Donmez, Brudno, Bioinf. 29(4), ’13]
I orientation step: use FPT algo for Odd Cycle TransersalI ordering step: heuristic
- . . . [Huson & al., JACM, ’02][Nieuwerburgh & al., NAR, ’12]
6/33
Genome Scaffolding: Previous Work
Goal: order & orient contigs
Idea: use pairing information on reads to “link” contigs together
- SOPRA [Dayarian, Michael, Sengupta, BMC Bioinf. 11, ’10]
- SSPACE [Boetzer & al., Bioinf. 27(4), ’11]
- OPERA [Gao, Sung, Ngaraja, JCB. 18(11), ’11]
- GRASS [Gritsenko & al., Bioinf. 28(11), ’12]
- SCARPA [Donmez, Brudno, Bioinf. 29(4), ’13]
- . . . [Huson & al., JACM, ’02][Nieuwerburgh & al., NAR, ’12]
6/33
Graph-Based Scaffolding
AACGACAC
TCCTTGGG
TTTTTACG
TCGCGG
CTAGGCCATTGATTGCGGGTCCAGGTGCTG
GTTAATGTCCGAGCATAAAACTCTGGTTGGC
GTACTGAACTTGGGTTCCATAGGACCCAGA
AGAGCTTGACAGTAACACATTTAGGAGCACGCG
CGTCGCGG
ACTTGGGGTTTTTAC
CTACTGA
2
5
10
9
14
3
613
10
14
3
6
7 /33
Graph-Based Scaffolding
AACGACAC
TCCTTGGG
TTTTTACG
TCGCGG
CTAGGCCATTGATTGCGGGTCCAGGTGCTG
GTTAATGTCCGAGCATAAAACTCTGGTTGGC
GTACTGAACTTGGGTTCCATAGGACCCAGA
AGAGCTTGACAGTAACACATTTAGGAGCACGCG
CGTCGCGG
ACTTGGGGTTTTTAC
CTACTGA
2
5
10
9
14
3
613
10
14
3
6
Strategy
1. map reads into contigs
2. pair contigs according to read-pairing (weighted)
3. cover “scaffold graph” with (heavy) alternating paths
each path corresponds to a chromosome
7 /33
Graph-Based Scaffolding
AACGACAC
TCCTTGGG
TTTTTACG
TCGCGG
CTAGGCCATTGATTGCGGGTCCAGGTGCTG
GTTAATGTCCGAGCATAAAACTCTGGTTGGC
GTACTGAACTTGGGTTCCATAGGACCCAGA
AGAGCTTGACAGTAACACATTTAGGAGCACGCG
CGTCGCGG
ACTTGGG
GTTTTTAC
CTACTGA
2
5
10
9
14
3
613
10
14
3
6
Strategy
1. map reads into contigs
2. pair contigs according to read-pairing (weighted)
3. cover “scaffold graph” with (heavy) alternating paths
each path corresponds to a chromosome
7 /33
Graph-Based Scaffolding
AACGACAC
TCCTTGGG
TTTTTACG
TCGCGG
CTAGGCCATTGATTGCGGGTCCAGGTGCTG
GTTAATGTCCGAGCATAAAACTCTGGTTGGC
GTACTGAACTTGGGTTCCATAGGACCCAGA
AGAGCTTGACAGTAACACATTTAGGAGCACGCG
CGTCGCGG
ACTTGGGGTTTTTAC
CTACTGA
2
5
10
9
14
3
613
10
14
3
6
Strategy
1. map reads into contigs
2. pair contigs according to read-pairing (weighted)
3. cover “scaffold graph” with (heavy) alternating paths
each path corresponds to a chromosome
7 /33
Graph-Based Scaffolding
AACGACAC
TCCTTGGG
TTTTTACG
TCGCGG
CTAGGCCATTGATTGCGGGTCCAGGTGCTG
GTTAATGTCCGAGCATAAAACTCTGGTTGGC
GTACTGAACTTGGGTTCCATAGGACCCAGA
AGAGCTTGACAGTAACACATTTAGGAGCACGCG
CGTCGCGG
ACTTGGGGTTTTTAC
CTACTGA
2
5
10
9
14
3
613
10
14
3
6
Strategy
1. map reads into contigs
2. pair contigs according to read-pairing (weighted)
3. cover “scaffold graph” with (heavy) alternating paths
each path corresponds to a chromosome
7 /33
Graph-Based Scaffolding
AACGACAC
TCCTTGGG
TTTTTACG
TCGCGG
CTAGGCCATTGATTGCGGGTCCAGGTGCTG
GTTAATGTCCGAGCATAAAACTCTGGTTGGC
GTACTGAACTTGGGTTCCATAGGACCCAGA
AGAGCTTGACAGTAACACATTTAGGAGCACGCG
CGTCGCGG
ACTTGGGGTTTTTAC
CTACTGA
2
5
10
9
14
3
613
10
14
3
6
Strategy
1. map reads into contigs
2. pair contigs according to read-pairing (weighted)
3. cover “scaffold graph” with (heavy) alternating paths
each path corresponds to a chromosome
7 /33
Graph-Based Scaffolding
AACGACAC
TCCTTGGG
TTTTTACG
TCGCGG
CTAGGCCATTGATTGCGGGTCCAGGTGCTG
GTTAATGTCCGAGCATAAAACTCTGGTTGGC
GTACTGAACTTGGGTTCCATAGGACCCAGA
AGAGCTTGACAGTAACACATTTAGGAGCACGCG
CGTCGCGG
ACTTGGGGTTTTTAC
CTACTGA
25
10
9
14
3
613
10
14
3
6
Strategy
1. map reads into contigs
2. pair contigs according to read-pairing (weighted)
3. cover “scaffold graph” with (heavy) alternating paths
each path corresponds to a chromosome
7 /33
Graph-Based Scaffolding
AACGACAC
TCCTTGGG
TTTTTACG
TCGCGG
CTAGGCCATTGATTGCGGGTCCAGGTGCTG
GTTAATGTCCGAGCATAAAACTCTGGTTGGC
GTACTGAACTTGGGTTCCATAGGACCCAGA
AGAGCTTGACAGTAACACATTTAGGAGCACGCG
CGTCGCGG
ACTTGGGGTTTTTAC
CTACTGA
25
10
9
14
3
613
10
14
3
6
Strategy
1. map reads into contigs
2. pair contigs according to read-pairing (weighted)
3. cover “scaffold graph” with (heavy) alternating paths
each path corresponds to a chromosome
7 /33
Graph-Based Scaffolding
AACGACAC
TCCTTGGG
TTTTTACG
TCGCGG
CTAGGCCATTGATTGCGGGTCCAGGTGCTG
GTTAATGTCCGAGCATAAAACTCTGGTTGGC
GTACTGAACTTGGGTTCCATAGGACCCAGA
AGAGCTTGACAGTAACACATTTAGGAGCACGCG
CGTCGCGG
ACTTGGGGTTTTTAC
CTACTGA
25
10
9
14
3
613
10
14
3
6
Strategy
1. map reads into contigs
2. pair contigs according to read-pairing (weighted)
3. cover “scaffold graph” with (heavy) alternating paths
each path corresponds to a chromosome
7 /33
Graph-Based Scaffolding
AACGACAC
TCCTTGGG
TTTTTACG
TCGCGG
CTAGGCCATTGATTGCGGGTCCAGGTGCTG
GTTAATGTCCGAGCATAAAACTCTGGTTGGC
GTACTGAACTTGGGTTCCATAGGACCCAGA
AGAGCTTGACAGTAACACATTTAGGAGCACGCG
CGTCGCGG
ACTTGGGGTTTTTAC
CTACTGA
25
10
9
14
3
613
10
14
3
6
ScaffoldingInput: Graph G , perfect matching M, weights ω, k, σp ∈ N
Question: Can G be covered by
≤σp alternating paths
σc alternating cycles
of total weight ≥ k?
7 /33
Graph-Based Scaffolding
AACGACAC
TCCTTGGG
TTTTTACG
TCGCGG
CTAGGCCATTGATTGCGGGTCCAGGTGCTG
GTTAATGTCCGAGCATAAAACTCTGGTTGGC
GTACTGAACTTGGGTTCCATAGGACCCAGA
AGAGCTTGACAGTAACACATTTAGGAGCACGCG
CGTCGCGG
ACTTGGGGTTTTTAC
CTACTGA
25
10
9
14
3
613
10
14
3
6
ScaffoldingInput: Graph G , perfect matching M, weights ω, k, σp, σc ∈ N
Question: Can G be covered by
≤σp alternating paths &
≤σc alternating cycles
of total weight ≥ k?
7 /33
Graph-Based Scaffolding
AACGACAC
TCCTTGGG
TTTTTACG
TCGCGG
CTAGGCCATTGATTGCGGGTCCAGGTGCTG
GTTAATGTCCGAGCATAAAACTCTGGTTGGC
GTACTGAACTTGGGTTCCATAGGACCCAGA
AGAGCTTGACAGTAACACATTTAGGAGCACGCG
CGTCGCGG
ACTTGGGGTTTTTAC
CTACTGA
25
10
9
14
3
613
10
14
3
6
Exact ScaffoldingInput: Graph G , perfect matching M, weights ω, k, σp, σc ∈ N
Question: Can G be covered by
σp alternating paths &
σc alternating cycles
of total weight ≥ k?
7 /33
19
10
71
26
22
32
59
71
78
2 60
85
1
38
1
17
19
7
25
8/33
140
262
149
119
336
116
397
72
1
75
14
5
100
1
21
16
10
37
45
1
22
1
7
61
78
69
1
15
51
40
65
2
61
8/33
1
63
94
25
9
28
29
55
2
17
1
67
62
1
5
71
26
24
47
83
71
34
71
68
72
1
51
19
79
54
151
67
6
66
34
68
59
2
23
11
12
1
5
1
2
6
1
49
13
71
28
71
5
6
11
33
67
24
88
82
2
42
25
1
39
14
65
5
2
70
24
89
46
1
1
1
80
1
73
64
36
72
33
73
67
59
1
14
1
1
70
3
1
33
53
2
2
1
135
4
2980
74
80
84
11
31
1
1
62
73
38
31
33
77
108
28
13
1 4
1
70
8
5
1
47
3
31
2
1
18
56
49
45
2
2
8/33
Hardness Warm up: Hamiltonian Path
Recall: Scaffolding
Input: Graph G , perfect matching M, weights ω, k, σp, σc ∈ NQuestion: Can G be covered by ≤ σp alternating paths &
≤ σc alternating cycles of total weight ≥ k?
Construction
Given a directed graph D .
1. make a copy of D
2. duplicate all vertices M3. “slide” down all arrow tips & ignore directions
9/33
Hardness Warm up: Hamiltonian Path
Recall: Scaffolding
Input: Graph G , perfect matching M, weights ω, k, σp, σc ∈ NQuestion: Can G be covered by ≤ σp alternating paths &
≤ σc alternating cycles of total weight ≥ k?
Construction
Given a directed graph D .
1. make a copy of D2. duplicate all vertices M
3. “slide” down all arrow tips & ignore directions
9/33
Hardness Warm up: Hamiltonian Path
Recall: Scaffolding
Input: Graph G , perfect matching M, weights ω, k, σp, σc ∈ NQuestion: Can G be covered by ≤ σp alternating paths &
≤ σc alternating cycles of total weight ≥ k?
Construction
Given a directed graph D .
1. make a copy of D2. duplicate all vertices M3. “slide” down all arrow tips & ignore directions
9/33
Hardness Warm up: Hamiltonian Path
Recall: Scaffolding
Input: Graph G , perfect matching M, weights ω, k, σp, σc ∈ NQuestion: Can G be covered by ≤ σp alternating paths &
≤ σc alternating cycles of total weight ≥ k?
Lemma
D admits a directed Hamiltonian path ⇔ M can be covered with a
single alternating path in G .
9/33
Hardness Warm up: Hamiltonian Path
Recall: Scaffolding
Input: Graph G , perfect matching M, weights ω, k, σp, σc ∈ NQuestion: Can G be covered by ≤ σp alternating paths &
≤ σc alternating cycles of total weight ≥ k?
Lemma
D admits a directed Hamiltonian path ⇔ M can be covered with a
single alternating path in G .
“⇒”: replace each v in the Hamiltonian path by vup → vlow .
alternating X covers M X
9/33
Hardness Warm up: Hamiltonian Path
Recall: Scaffolding
Input: Graph G , perfect matching M, weights ω, k, σp, σc ∈ NQuestion: Can G be covered by ≤ σp alternating paths &
≤ σc alternating cycles of total weight ≥ k?
Lemma
D admits a directed Hamiltonian path ⇔ M can be covered with a
single alternating path in G .
“⇒”: replace each v in the Hamiltonian path by vup → vlow .
alternating X covers M X
9/33
Hardness Warm up: Hamiltonian Path
Recall: Scaffolding
Input: Graph G , perfect matching M, weights ω, k, σp, σc ∈ NQuestion: Can G be covered by ≤ σp alternating paths &
≤ σc alternating cycles of total weight ≥ k?
Lemma
D admits a directed Hamiltonian path ⇔ M can be covered with a
single alternating path in G .
“⇐”: contract each matching edge in the covering alternating path.
hits all vertices exactly once X is valid directed path X
9/33
Hardness Warm up: Hamiltonian Path
Recall: Scaffolding
Input: Graph G , perfect matching M, weights ω, k, σp, σc ∈ NQuestion: Can G be covered by ≤ σp alternating paths &
≤ σc alternating cycles of total weight ≥ k?
Lemma
D admits a directed Hamiltonian path ⇔ M can be covered with a
single alternating path in G .
“⇐”: contract each matching edge in the covering alternating path.
hits all vertices exactly once X is valid directed path X
9/33
Hardness Warm up: Hamiltonian Path
Recall: Scaffolding
Input: Graph G , perfect matching M, weights ω, k, σp, σc ∈ NQuestion: Can G be covered by ≤ σp alternating paths &
≤ σc alternating cycles of total weight ≥ k?
Theorem
Scaffolding is NP-hard, even restricted to
• bipartite graphs
• (σp, σc) ∈ {(0, 1), (1, 0)} and
• ω : E → {0}.
Corollary
Scaffolding with 2 weights is NP-hard in any
sufficiently dense graph class.
9/33
Hardness Warm up: Hamiltonian Path
Recall: Scaffolding
Input: Graph G , perfect matching M, weights ω, k, σp, σc ∈ NQuestion: Can G be covered by ≤ σp alternating paths &
≤ σc alternating cycles of total weight ≥ k?
Theorem
Scaffolding is NP-hard, even restricted to
• supergraphs of bipartite graphs
• (σp, σc) ∈ {(0, 1), (1, 0)} and
• ω : E → {0, 1}.
Corollary
Scaffolding with 2 weights is NP-hard in any
sufficiently dense graph class.
9/33
Hardness Warm up: Hamiltonian Path
Recall: Scaffolding
Input: Graph G , perfect matching M, weights ω, k, σp, σc ∈ NQuestion: Can G be covered by ≤ σp alternating paths &
≤ σc alternating cycles of total weight ≥ k?
Theorem
Scaffolding is NP-hard, even restricted to
• supergraphs of bipartite graphs
• (σp, σc) ∈ {(0, 1), (1, 0)} and
• ω : E → {0, 1}.
Corollary
Scaffolding with 2 weights is NP-hard in any
sufficiently dense graph class.9/33
Hardness Warm up: Hamiltonian Path
Recall: Scaffolding
Input: Graph G , perfect matching M, weights ω, k, σp, σc ∈ NQuestion: Can G be covered by ≤ σp alternating paths &
≤ σc alternating cycles of total weight ≥ k?
Theorem
Exact Scaffolding is NP-hard, even restricted to
• supergraphs of bipartite graphs
• (σp, σc) ∈ {(0, 1), (1, 0)} and
• ω : E → {0, 1}.
Corollary
Exact Scaffolding with 2 weights is NP-hard in any
sufficiently dense graph class.9/33
Scaffolding in Co-Bipartites
???
?
Wait, what?
Recap: Corollary
Scaffolding with 2 weights is NP-hard in any
sufficiently dense graph class.
Unweighted!
10 /33
Scaffolding in Co-Bipartites
???
?
Wait, what?
Recap: Corollary
Scaffolding with 2 weights is NP-hard in any
sufficiently dense graph class.
Unweighted!
10 /33
Scaffolding in Co-Bipartites
???
?
Wait, what?
Recap: Corollary
Scaffolding with 2 weights is NP-hard in any
sufficiently dense graph class.
Unweighted!
10 /33
Scaffolding in Co-Bipartites
???
?
Wait, what?
Recap: Corollary
Scaffolding with 2 weights is NP-hard in any
sufficiently dense graph class.
Unweighted!
10 /33
Scaffolding in Unweighted Co-Bipartites
X Y
forbidden!
Observation
no edges between X & Y need 2 objects (paths/cycles)
otherwise can always cover G with 1 path
TODO
decide if we can cover with 1 cycle
11 / 33
Scaffolding in Unweighted Co-Bipartites
X Y
forbidden!
Observation
∃ alternating cycle with non-matching edge X extend to cover all M in G [X ]
Observation
#matching edges between X & Y even (and > 0) X
#matching edges between X & Y odd
find any non-matching edge between X & Y#matching edges between X & Y is 0
all other cases are X (tedious case analysis)
11 / 33
Scaffolding in Unweighted Co-Bipartites
X Y
forbidden!
Observation
∃ alternating cycle with non-matching edge X extend to cover all M in G [X ]
Observation
#matching edges between X & Y even (and > 0) X
#matching edges between X & Y odd
find any non-matching edge between X & Y#matching edges between X & Y is 0
all other cases are X (tedious case analysis)
11 / 33
Scaffolding in Unweighted Co-Bipartites
X Y
forbidden!
Observation
∃ alternating cycle with non-matching edge X extend to cover all M in G [X ]
Observation
#matching edges between X & Y even (and > 0) X
#matching edges between X & Y odd
find any non-matching edge between X & Y#matching edges between X & Y is 0
all other cases are X (tedious case analysis)
11 / 33
Scaffolding in Unweighted Co-Bipartites
X Y
forbidden!
Observation
∃ alternating cycle with non-matching edge X extend to cover all M in G [X ]
Observation
#matching edges between X & Y even (and > 0) X
#matching edges between X & Y odd
find any non-matching edge between X & Y#matching edges between X & Y is 0
all other cases are X (tedious case analysis)
11 / 33
Scaffolding in Unweighted Co-Bipartites
X Y
forbidden!
Observation
∃ alternating cycle with non-matching edge X extend to cover all M in G [X ]
Observation
#matching edges between X & Y even (and > 0) X#matching edges between X & Y odd
find any non-matching edge between X & Y#matching edges between X & Y is 0
all other cases are X (tedious case analysis)
11 / 33
Scaffolding in Unweighted Co-Bipartites
X Y
forbidden!
Observation
∃ alternating cycle with non-matching edge X extend to cover all M in G [X ]
Observation
#matching edges between X & Y even (and > 0) X#matching edges between X & Y odd
find any non-matching edge between X & Y
#matching edges between X & Y is 0
all other cases are X (tedious case analysis)
11 / 33
Scaffolding in Unweighted Co-Bipartites
X Y
forbidden!
Observation
∃ alternating cycle with non-matching edge X extend to cover all M in G [X ]
Observation
#matching edges between X & Y even (and > 0) X#matching edges between X & Y odd
find any non-matching edge between X & Y
#matching edges between X & Y is 0
all other cases are X (tedious case analysis)
11 / 33
Scaffolding in Unweighted Co-Bipartites
X Y
forbidden!
Observation
∃ alternating cycle with non-matching edge X extend to cover all M in G [X ]
Observation
#matching edges between X & Y even (and > 0) X#matching edges between X & Y odd
find any non-matching edge between X & Y
#matching edges between X & Y is 0
all other cases are X (tedious case analysis)
11 / 33
Scaffolding in Unweighted Co-Bipartites
X Y
forbidden!
Observation
∃ alternating cycle with non-matching edge X extend to cover all M in G [X ]
Observation
#matching edges between X & Y even (and > 0) X#matching edges between X & Y odd
find any non-matching edge between X & Y#matching edges between X & Y is 0
all other cases are X (tedious case analysis)
11 / 33
Scaffolding in Unweighted Co-Bipartites
X Y
forbidden!
Observation
∃ alternating cycle with non-matching edge X extend to cover all M in G [X ]
Observation
#matching edges between X & Y even (and > 0) X#matching edges between X & Y odd
find any non-matching edge between X & Y#matching edges between X & Y is 0
all other cases are X (tedious case analysis)
11 / 33
Scaffolding in Unweighted Co-Bipartites
X Y
forbidden!
Observation
∃ alternating cycle with non-matching edge X extend to cover all M in G [X ]
Observation
#matching edges between X & Y even (and > 0) X#matching edges between X & Y odd
find any non-matching edge between X & Y#matching edges between X & Y is 0
all other cases are X (tedious case analysis)
11 / 33
Scaffolding in Unweighted Co-Bipartites
X Y
forbidden!
Observation
∃ alternating cycle with non-matching edge X extend to cover all M in G [X ]
Observation
#matching edges between X & Y even (and > 0) X#matching edges between X & Y odd
find any non-matching edge between X & Y#matching edges between X & Y is 0
all other cases are X (tedious case analysis)
11 / 33
Scaffolding in Unweighted Co-Bipartites
X Y
forbidden!
Observation
∃ alternating cycle with non-matching edge X extend to cover all M in G [X ]
Observation
#matching edges between X & Y even (and > 0) X#matching edges between X & Y odd
find any non-matching edge between X & Y#matching edges between X & Y is 0
all other cases are X (tedious case analysis)
11 / 33
Scaffolding in Unweighted Co-Bipartites
X Y
forbidden!
Theorem
Scaffolding can be solved in O(n + m) time on co-bipartite graphs
11 / 33
Scaffolding in Unweighted Trees
Observation
no alternating cycles in a tree
Observation
consider a lowest leaf `
M is perfect ` matched
parent p of ` has only 1 child
Case 1parent g of p is matched “below”
g is matched to a leaf `′
always take `−p−g−`′
g
p
`
Case 2parent g of p is matched “above”
either p is the only child of g delete ` & g and reduce kor g has another child u u matched “below” ∃ “clone” of g−p−` take p−`
12 /33
Scaffolding in Unweighted Trees
Observation
no alternating cycles in a tree
Observation
consider a lowest leaf `
M is perfect ` matched
parent p of ` has only 1 child
Case 1parent g of p is matched “below”
g is matched to a leaf `′
always take `−p−g−`′
g
p
`
Case 2parent g of p is matched “above”
either p is the only child of g delete ` & g and reduce kor g has another child u u matched “below” ∃ “clone” of g−p−` take p−`
12 /33
Scaffolding in Unweighted Trees
Observation
no alternating cycles in a tree
Observation
consider a lowest leaf `M is perfect ` matched
parent p of ` has only 1 child
Case 1parent g of p is matched “below”
g is matched to a leaf `′
always take `−p−g−`′
g
p
`
Case 2parent g of p is matched “above”
either p is the only child of g delete ` & g and reduce kor g has another child u u matched “below” ∃ “clone” of g−p−` take p−`
12 /33
Scaffolding in Unweighted Trees
Observation
no alternating cycles in a tree
Observation
consider a lowest leaf `M is perfect ` matched
parent p of ` has only 1 child
Case 1parent g of p is matched “below”
g is matched to a leaf `′
always take `−p−g−`′
g
p
`
Case 2parent g of p is matched “above”
either p is the only child of g delete ` & g and reduce kor g has another child u u matched “below” ∃ “clone” of g−p−` take p−`
12 /33
Scaffolding in Unweighted Trees
Observation
no alternating cycles in a tree
Observation
consider a lowest leaf `M is perfect ` matched
parent p of ` has only 1 child
Case 1parent g of p is matched “below”
g is matched to a leaf `′
always take `−p−g−`′
g
p
`
Case 2parent g of p is matched “above”
either p is the only child of g delete ` & g and reduce kor g has another child u u matched “below” ∃ “clone” of g−p−` take p−`
12 /33
Scaffolding in Unweighted Trees
Observation
no alternating cycles in a tree
Observation
consider a lowest leaf `M is perfect ` matched
parent p of ` has only 1 child
Case 1parent g of p is matched “below”
g is matched to a leaf `′
always take `−p−g−`′
g
p
`
Case 2parent g of p is matched “above”
either p is the only child of g delete ` & g and reduce kor g has another child u u matched “below” ∃ “clone” of g−p−` take p−`
12 /33
Scaffolding in Unweighted Trees
Observation
no alternating cycles in a tree
Observation
consider a lowest leaf `M is perfect ` matched
parent p of ` has only 1 child
Case 1parent g of p is matched “below”
g is matched to a leaf `′
always take `−p−g−`′
g
p
`
Case 2parent g of p is matched “above”
either p is the only child of g delete ` & g and reduce kor g has another child u u matched “below” ∃ “clone” of g−p−` take p−`
12 /33
Scaffolding in Unweighted Trees
Observation
no alternating cycles in a tree
Observation
consider a lowest leaf `M is perfect ` matched
parent p of ` has only 1 child
Case 1parent g of p is matched “below”
g is matched to a leaf `′
always take `−p−g−`′
g
p
`
Case 2parent g of p is matched “above”
either p is the only child of g delete ` & g and reduce kor g has another child u u matched “below” ∃ “clone” of g−p−` take p−`
12 /33
Scaffolding in Unweighted Trees
Observation
no alternating cycles in a tree
Observation
consider a lowest leaf `M is perfect ` matched
parent p of ` has only 1 child
Case 1parent g of p is matched “below”
g is matched to a leaf `′
always take `−p−g−`′
g
p
`
Case 2parent g of p is matched “above”
either p is the only child of g delete ` & g and reduce kor g has another child u u matched “below” ∃ “clone” of g−p−` take p−`
12 /33
Scaffolding in Unweighted Trees
Observation
no alternating cycles in a tree
Observation
consider a lowest leaf `M is perfect ` matched
parent p of ` has only 1 child
Case 1parent g of p is matched “below”
g is matched to a leaf `′
always take `−p−g−`′
g
p
`
Case 2parent g of p is matched “above”
either p is the only child of g delete ` & g and reduce kor g has another child u u matched “below” ∃ “clone” of g−p−` take p−`
12 /33
Scaffolding in Unweighted Trees
Observation
no alternating cycles in a tree
Observation
consider a lowest leaf `M is perfect ` matched
parent p of ` has only 1 child
Case 1parent g of p is matched “below”
g is matched to a leaf `′
always take `−p−g−`′
g
p
`
Case 2parent g of p is matched “above”
either p is the only child of g delete ` & g and reduce kor g has another child u u matched “below” ∃ “clone” of g−p−` take p−`
12 /33
Scaffolding in Unweighted Trees
Observation
no alternating cycles in a tree
Observation
consider a lowest leaf `M is perfect ` matched
parent p of ` has only 1 child
Case 1parent g of p is matched “below”
g is matched to a leaf `′
always take `−p−g−`′
g
p
`
Case 2parent g of p is matched “above”
either p is the only child of g
delete ` & g and reduce kor g has another child u u matched “below” ∃ “clone” of g−p−` take p−`
12 /33
Scaffolding in Unweighted Trees
Observation
no alternating cycles in a tree
Observation
consider a lowest leaf `M is perfect ` matched
parent p of ` has only 1 child
Case 1parent g of p is matched “below”
g is matched to a leaf `′
always take `−p−g−`′
g
p
`
Case 2parent g of p is matched “above”
either p is the only child of g delete ` & g and reduce k
or g has another child u u matched “below” ∃ “clone” of g−p−` take p−`
12 /33
Scaffolding in Unweighted Trees
Observation
no alternating cycles in a tree
Observation
consider a lowest leaf `M is perfect ` matched
parent p of ` has only 1 child
Case 1parent g of p is matched “below”
g is matched to a leaf `′
always take `−p−g−`′
g
p
`
Case 2parent g of p is matched “above”
either p is the only child of g delete ` & g and reduce kor g has another child u
u matched “below” ∃ “clone” of g−p−` take p−`
12 /33
Scaffolding in Unweighted Trees
Observation
no alternating cycles in a tree
Observation
consider a lowest leaf `M is perfect ` matched
parent p of ` has only 1 child
Case 1parent g of p is matched “below”
g is matched to a leaf `′
always take `−p−g−`′
g
p
`
Case 2parent g of p is matched “above”
either p is the only child of g delete ` & g and reduce kor g has another child u u matched “below”
∃ “clone” of g−p−` take p−`
12 /33
Scaffolding in Unweighted Trees
Observation
no alternating cycles in a tree
Observation
consider a lowest leaf `M is perfect ` matched
parent p of ` has only 1 child
Case 1parent g of p is matched “below”
g is matched to a leaf `′
always take `−p−g−`′
g
p
`
Case 2parent g of p is matched “above”
either p is the only child of g delete ` & g and reduce kor g has another child u u matched “below” ∃ “clone” of g−p−`
take p−`
12 /33
Scaffolding in Unweighted Trees
Observation
no alternating cycles in a tree
Observation
consider a lowest leaf `M is perfect ` matched
parent p of ` has only 1 child
Case 1parent g of p is matched “below”
g is matched to a leaf `′
always take `−p−g−`′
g
p
`
Case 2parent g of p is matched “above”
either p is the only child of g delete ` & g and reduce kor g has another child u u matched “below” ∃ “clone” of g−p−` take p−`
12 /33
Scaffolding in Unweighted Trees
Observation
no alternating cycles in a tree
Observation
consider a lowest leaf `M is perfect ` matched
parent p of ` has only 1 child
Case 1parent g of p is matched “below”
g is matched to a leaf `′
always take `−p−g−`′
g
p
`
Theorem
Scaffolding can be solved in O(n) time on unweighted trees
12 /33
Scaffolding in Weighted Trees
Dynamic Programming Idea
bottom-up traversal; in each
vertex v , need to remember:
• #paths used below v• v incident with non-matching?
Semantics
[p, x ]v = max. weight collected
below v with p finished paths
“under x ”
up to vj(abbrev: last child [p, x ]v)
v
v0
v1 v2 v3v4
v5
v
vj
125
9 6
3
1 1 X 0
1 X 0
1 X 01 X 0
1 X 01 0 X 9
1 1 X 0
2 1 X 9
2 2 X 0
1 X 9
2 X 0
1 1 X 0
2 1 X 3
2 2 X 0
1 X 3
2 X 0
1 2 X 9
1 3 X 0
1 2 X 9
1 3 X 0
2 3 X 12
2 3 X 21
2 4 X 9
2 4 X 12
2 5 X 0
. . . .
Recurrence
Let v1, v2, . . . , vc be the children of v .
13 /33
Scaffolding in Weighted Trees
Dynamic Programming Idea
bottom-up traversal; in each
vertex v , need to remember:
• #paths used below v• v incident with non-matching?
Semantics
[p, x ]v = max. weight collected
below v with p finished paths
“under x ”
up to vj(abbrev: last child [p, x ]v)
v
v0
v1 v2 v3v4
v5
v
vj
125
9 6
3
1 1 X 0
1 X 0
1 X 01 X 0
1 X 01 0 X 9
1 1 X 0
2 1 X 9
2 2 X 0
1 X 9
2 X 0
1 1 X 0
2 1 X 3
2 2 X 0
1 X 3
2 X 0
1 2 X 9
1 3 X 0
1 2 X 9
1 3 X 0
2 3 X 12
2 3 X 21
2 4 X 9
2 4 X 12
2 5 X 0
. . . .
Recurrence
Let v1, v2, . . . , vc be the children of v .
13 /33
Scaffolding in Weighted Trees
Dynamic Programming Idea
bottom-up traversal; in each
vertex v , need to remember:
• #paths used below v• v incident with non-matching?
Semantics
[p, x ]v = max. weight collected
below v with p finished paths
“under x ”
up to vj(abbrev: last child [p, x ]v)
v
v0
v1 v2 v3v4
v5
v
vj
125
9 6
3
1 1 X 0
1 X 0
1 X 01 X 0
1 X 01 0 X 9
1 1 X 0
2 1 X 9
2 2 X 0
1 X 9
2 X 0
1 1 X 0
2 1 X 3
2 2 X 0
1 X 3
2 X 0
1 2 X 9
1 3 X 0
1 2 X 9
1 3 X 0
2 3 X 12
2 3 X 21
2 4 X 9
2 4 X 12
2 5 X 0
. . . .
Recurrence
Let v1, v2, . . . , vc be the children of v .[p,X]v := max
p1, p2, . . . , pc∑pi = p
∑1≤i≤c
maxx∈{X,X}
[pi , x ]vi
BAD IDEA!
13 /33
Scaffolding in Weighted Trees
Dynamic Programming Idea
bottom-up traversal; in each
vertex v , need to remember:
• #paths used below v• v incident with non-matching?
Semantics
[p, x ]v = max. weight collected
below v with p finished paths
“under x ”
up to vj(abbrev: last child [p, x ]v)
v
v0
v1 v2 v3v4
v5
v
vj
125
9 6
3
1 1 X 0
1 X 0
1 X 01 X 0
1 X 01 0 X 9
1 1 X 0
2 1 X 9
2 2 X 0
1 X 9
2 X 0
1 1 X 0
2 1 X 3
2 2 X 0
1 X 3
2 X 0
1 2 X 9
1 3 X 0
1 2 X 9
1 3 X 0
2 3 X 12
2 3 X 21
2 4 X 9
2 4 X 12
2 5 X 0
. . . .
Recurrence
Let v1, v2, . . . , vc be the children of v .[p,X]v := max
p1, p2, . . . , pc∑pi = p
∑1≤i≤c
maxx∈{X,X}
[pi , x ]vi
BAD IDEA!
13 /33
Scaffolding in Weighted Trees
Dynamic Programming Idea
bottom-up traversal; in each
vertex v , need to remember:
• #paths used below v• v incident with non-matching?
Semantics
[j , p, x ]v = max. weight collected
below v with p finished paths
“under x ” up to vj(abbrev: last child [p, x ]v)
v
v0
v1 v2 v3v4
v5
v
vj
125
9 6
3
1 1 X 0
1 X 0
1 X 01 X 0
1 X 01 0 X 9
1 1 X 0
2 1 X 9
2 2 X 0
1 X 9
2 X 0
1 1 X 0
2 1 X 3
2 2 X 0
1 X 3
2 X 0
1 2 X 9
1 3 X 0
1 2 X 9
1 3 X 0
2 3 X 12
2 3 X 21
2 4 X 9
2 4 X 12
2 5 X 0
. . . .
Recurrence
Let v1, v2, . . . , vc be the children of v .
13 /33
Scaffolding in Weighted Trees
Dynamic Programming Idea
bottom-up traversal; in each
vertex v , need to remember:
• #paths used below v• v incident with non-matching?
Semantics
[j , p, x ]v = max. weight collected
below v with p finished paths
“under x ” up to vj(abbrev: last child [p, x ]v)
v
v0
v1 v2 v3v4
v5
v
vj
125
9 6
3
1 1 X 0
1 X 0
1 X 01 X 0
1 X 01 0 X 9
1 1 X 0
2 1 X 9
2 2 X 0
1 X 9
2 X 0
1 1 X 0
2 1 X 3
2 2 X 0
1 X 3
2 X 0
1 2 X 9
1 3 X 0
1 2 X 9
1 3 X 0
2 3 X 12
2 3 X 21
2 4 X 9
2 4 X 12
2 5 X 0
. . . .
Recurrence
Let v1, v2, . . . , vc be the children of v .
[0, 0,X]v :=0
[j , p, x ]v :=maxpj≤p
max{[pj ,X]vj , [pj ,X]vj }+ [j − 1, p − pj , x ]v if vvj /∈Mω(vvj ) + [pj + 1,X]vj + [j − 1, p − pj ,X]v if x = X& vvj /∈M{
[pj − 1,X]vj[pj − 1,X]vj
}+ [j − 1, p − pj , x ]v if vvj ∈M
13 /33
Scaffolding in Weighted Trees
Dynamic Programming Idea
bottom-up traversal; in each
vertex v , need to remember:
• #paths used below v• v incident with non-matching?
Semantics
[j , p, x ]v = max. weight collected
below v with p finished paths
“under x ” up to vj(abbrev: last child [p, x ]v)
v
v0
v1 v2 v3v4
v5
v
vj
125
9 6
3
1 1 X 0
1 X 0
1 X 01 X 0
1 X 01 0 X 9
1 1 X 0
2 1 X 9
2 2 X 0
1 X 9
2 X 0
1 1 X 0
2 1 X 3
2 2 X 0
1 X 3
2 X 0
1 2 X 9
1 3 X 0
1 2 X 9
1 3 X 0
2 3 X 12
2 3 X 21
2 4 X 9
2 4 X 12
2 5 X 0
. . . .
Recurrence
Let v1, v2, . . . , vc be the children of v .
[0, 0,X]v :=0
[j , p, x ]v :=maxpj≤p
max{[pj ,X]vj , [pj ,X]vj }+ [j − 1, p − pj , x ]v if vvj /∈M
ω(vvj ) + [pj + 1,X]vj + [j − 1, p − pj ,X]v if x = X& vvj /∈M{[pj − 1,X]vj[pj − 1,X]vj
}+ [j − 1, p − pj , x ]v if vvj ∈M
13 /33
Scaffolding in Weighted Trees
Dynamic Programming Idea
bottom-up traversal; in each
vertex v , need to remember:
• #paths used below v• v incident with non-matching?
Semantics
[j , p, x ]v = max. weight collected
below v with p finished paths
“under x ” up to vj(abbrev: last child [p, x ]v)
v
v0
v1 v2 v3v4
v5
v
vj
125
9 6
3
1 1 X 0
1 X 0
1 X 01 X 0
1 X 01 0 X 9
1 1 X 0
2 1 X 9
2 2 X 0
1 X 9
2 X 0
1 1 X 0
2 1 X 3
2 2 X 0
1 X 3
2 X 0
1 2 X 9
1 3 X 0
1 2 X 9
1 3 X 0
2 3 X 12
2 3 X 21
2 4 X 9
2 4 X 12
2 5 X 0
. . . .
Recurrence
Let v1, v2, . . . , vc be the children of v .
[0, 0,X]v :=0
[j , p, x ]v :=maxpj≤p
max{[pj ,X]vj , [pj ,X]vj }+ [j − 1, p − pj , x ]v if vvj /∈Mω(vvj ) + [pj + 1,X]vj + [j − 1, p − pj ,X]v if x = X& vvj /∈M
{[pj − 1,X]vj[pj − 1,X]vj
}+ [j − 1, p − pj , x ]v if vvj ∈M
13 /33
Scaffolding in Weighted Trees
Dynamic Programming Idea
bottom-up traversal; in each
vertex v , need to remember:
• #paths used below v• v incident with non-matching?
Semantics
[j , p, x ]v = max. weight collected
below v with p finished paths
“under x ” up to vj(abbrev: last child [p, x ]v)
v
v0
v1 v2 v3v4
v5
v
vj
125
9 6
3
1 1 X 0
1 X 0
1 X 01 X 0
1 X 01 0 X 9
1 1 X 0
2 1 X 9
2 2 X 0
1 X 9
2 X 0
1 1 X 0
2 1 X 3
2 2 X 0
1 X 3
2 X 0
1 2 X 9
1 3 X 0
1 2 X 9
1 3 X 0
2 3 X 12
2 3 X 21
2 4 X 9
2 4 X 12
2 5 X 0
. . . .
Recurrence
Let v1, v2, . . . , vc be the children of v .
[0, 0,X]v :=0
[j , p, x ]v :=maxpj≤p
max{[pj ,X]vj , [pj ,X]vj }+ [j − 1, p − pj , x ]v if vvj /∈Mω(vvj ) + [pj + 1,X]vj + [j − 1, p − pj ,X]v if x = X& vvj /∈M{
[pj − 1,X]vj[pj − 1,X]vj
}+ [j − 1, p − pj , x ]v if vvj ∈M
13 /33
Scaffolding in Weighted Trees
Dynamic Programming Idea
bottom-up traversal; in each
vertex v , need to remember:
• #paths used below v• v incident with non-matching?
Semantics
[j , p, x ]v = max. weight collected
below v with p finished paths
“under x ” up to vj(abbrev: last child [p, x ]v)
v
v0
v1 v2 v3v4
v5
vj
125
9 6
3
1 1 X 0
1 X 0
1 X 01 X 0
1 X 01 0 X 9
1 1 X 0
2 1 X 9
2 2 X 0
1 X 9
2 X 0
1 1 X 0
2 1 X 3
2 2 X 0
1 X 3
2 X 0
1 2 X 9
1 3 X 0
1 2 X 9
1 3 X 0
2 3 X 12
2 3 X 21
2 4 X 9
2 4 X 12
2 5 X 0
. . . .
Recurrence
Let v1, v2, . . . , vc be the children of v .
[0, 0,X]v :=0
[j , p, x ]v :=maxpj≤p
max{[pj ,X]vj , [pj ,X]vj }+ [j − 1, p − pj , x ]v if vvj /∈Mω(vvj ) + [pj + 1,X]vj + [j − 1, p − pj ,X]v if x = X& vvj /∈M{
[pj − 1,X]vj[pj − 1,X]vj
}+ [j − 1, p − pj , x ]v if vvj ∈M
13 /33
Scaffolding in Weighted Trees
Dynamic Programming Idea
bottom-up traversal; in each
vertex v , need to remember:
• #paths used below v• v incident with non-matching?
Semantics
[j , p, x ]v = max. weight collected
below v with p finished paths
“under x ” up to vj(abbrev: last child [p, x ]v)
v
v0
v1 v2 v3v4
v5
vj
125
9 6
3
1 1 X 0
1 X 0
1 X 01 X 0
1 X 01 0 X 9
1 1 X 0
2 1 X 9
2 2 X 0
1 X 9
2 X 0
1 1 X 0
2 1 X 3
2 2 X 0
1 X 3
2 X 0
1 2 X 9
1 3 X 0
1 2 X 9
1 3 X 0
2 3 X 12
2 3 X 21
2 4 X 9
2 4 X 12
2 5 X 0
. . . .
Recurrence
Let v1, v2, . . . , vc be the children of v .
[0, 0,X]v :=0
[j , p, x ]v :=maxpj≤p
max{[pj ,X]vj , [pj ,X]vj }+ [j − 1, p − pj , x ]v if vvj /∈Mω(vvj ) + [pj + 1,X]vj + [j − 1, p − pj ,X]v if x = X& vvj /∈M{
[pj − 1,X]vj[pj − 1,X]vj
}+ [j − 1, p − pj , x ]v if vvj ∈M
13 /33
Scaffolding in Weighted Trees
Dynamic Programming Idea
bottom-up traversal; in each
vertex v , need to remember:
• #paths used below v• v incident with non-matching?
Semantics
[j , p, x ]v = max. weight collected
below v with p finished paths
“under x ” up to vj(abbrev: last child [p, x ]v)
v
v0
v1 v2 v3v4
v5
vj
125
9 6
3
1 1 X 0
1 X 0
1 X 01 X 0
1 X 01 0 X 9
1 1 X 0
2 1 X 9
2 2 X 0
1 X 9
2 X 0
1 1 X 0
2 1 X 3
2 2 X 0
1 X 3
2 X 0
1 2 X 9
1 3 X 0
1 2 X 9
1 3 X 0
2 3 X 12
2 3 X 21
2 4 X 9
2 4 X 12
2 5 X 0
. . . .
Recurrence
Let v1, v2, . . . , vc be the children of v .
[0, 0,X]v :=0
[j , p, x ]v :=maxpj≤p
max{[pj ,X]vj , [pj ,X]vj }+ [j − 1, p − pj , x ]v if vvj /∈Mω(vvj ) + [pj + 1,X]vj + [j − 1, p − pj ,X]v if x = X& vvj /∈M{
[pj − 1,X]vj[pj − 1,X]vj
}+ [j − 1, p − pj , x ]v if vvj ∈M
13 /33
Scaffolding in Weighted Trees
Dynamic Programming Idea
bottom-up traversal; in each
vertex v , need to remember:
• #paths used below v• v incident with non-matching?
Semantics
[j , p, x ]v = max. weight collected
below v with p finished paths
“under x ” up to vj(abbrev: last child [p, x ]v)
v
v0
v1 v2 v3v4
v5
vj
125
9 6
3
1 1 X 0
1 X 0
1 X 01 X 0
1 X 0
1 0 X 9
1 1 X 0
2 1 X 9
2 2 X 0
1 X 9
2 X 0
1 1 X 0
2 1 X 3
2 2 X 0
1 X 3
2 X 0
1 2 X 9
1 3 X 0
1 2 X 9
1 3 X 0
2 3 X 12
2 3 X 21
2 4 X 9
2 4 X 12
2 5 X 0
. . . .
Recurrence
Let v1, v2, . . . , vc be the children of v .
[0, 0,X]v :=0
[j , p, x ]v :=maxpj≤p
max{[pj ,X]vj , [pj ,X]vj }+ [j − 1, p − pj , x ]v if vvj /∈Mω(vvj ) + [pj + 1,X]vj + [j − 1, p − pj ,X]v if x = X& vvj /∈M{
[pj − 1,X]vj[pj − 1,X]vj
}+ [j − 1, p − pj , x ]v if vvj ∈M
13 /33
Scaffolding in Weighted Trees
Dynamic Programming Idea
bottom-up traversal; in each
vertex v , need to remember:
• #paths used below v• v incident with non-matching?
Semantics
[j , p, x ]v = max. weight collected
below v with p finished paths
“under x ” up to vj(abbrev: last child [p, x ]v)
v
v0
v1 v2 v3v4
v5
vj
125
9 6
3
1 1 X 0
1 X 0
1 X 01 X 0
1 X 01 0 X 9
1 1 X 0
2 1 X 9
2 2 X 0
1 X 9
2 X 0
1 1 X 0
2 1 X 3
2 2 X 0
1 X 3
2 X 0
1 2 X 9
1 3 X 0
1 2 X 9
1 3 X 0
2 3 X 12
2 3 X 21
2 4 X 9
2 4 X 12
2 5 X 0
. . . .
Recurrence
Let v1, v2, . . . , vc be the children of v .
[0, 0,X]v :=0
[j , p, x ]v :=maxpj≤p
max{[pj ,X]vj , [pj ,X]vj }+ [j − 1, p − pj , x ]v if vvj /∈Mω(vvj ) + [pj + 1,X]vj + [j − 1, p − pj ,X]v if x = X& vvj /∈M{
[pj − 1,X]vj[pj − 1,X]vj
}+ [j − 1, p − pj , x ]v if vvj ∈M
13 /33
Scaffolding in Weighted Trees
Dynamic Programming Idea
bottom-up traversal; in each
vertex v , need to remember:
• #paths used below v• v incident with non-matching?
Semantics
[j , p, x ]v = max. weight collected
below v with p finished paths
“under x ” up to vj(abbrev: last child [p, x ]v)
v
v0
v1 v2 v3v4
v5
vj
125
9 6
3
1 1 X 0
1 X 0
1 X 01 X 0
1 X 01 0 X 9
1 1 X 0
2 1 X 9
2 2 X 0
1 X 9
2 X 0
1 1 X 0
2 1 X 3
2 2 X 0
1 X 3
2 X 0
1 2 X 9
1 3 X 0
1 2 X 9
1 3 X 0
2 3 X 12
2 3 X 21
2 4 X 9
2 4 X 12
2 5 X 0
. . . .
Recurrence
Let v1, v2, . . . , vc be the children of v .
[0, 0,X]v :=0
[j , p, x ]v :=maxpj≤p
max{[pj ,X]vj , [pj ,X]vj }+ [j − 1, p − pj , x ]v if vvj /∈Mω(vvj ) + [pj + 1,X]vj + [j − 1, p − pj ,X]v if x = X& vvj /∈M{
[pj − 1,X]vj[pj − 1,X]vj
}+ [j − 1, p − pj , x ]v if vvj ∈M
13 /33
Scaffolding in Weighted Trees
Dynamic Programming Idea
bottom-up traversal; in each
vertex v , need to remember:
• #paths used below v• v incident with non-matching?
Semantics
[j , p, x ]v = max. weight collected
below v with p finished paths
“under x ” up to vj(abbrev: last child [p, x ]v)
v
v0
v1 v2 v3v4
v5
vj
125
9 6
3
1 1 X 0
1 X 0
1 X 01 X 0
1 X 01 0 X 9
1 1 X 0
2 1 X 9
2 2 X 0
1 X 9
2 X 0
1 1 X 0
2 1 X 3
2 2 X 0
1 X 3
2 X 0
1 2 X 9
1 3 X 0
1 2 X 9
1 3 X 0
2 3 X 12
2 3 X 21
2 4 X 9
2 4 X 12
2 5 X 0
. . . .
Recurrence
Let v1, v2, . . . , vc be the children of v .
[0, 0,X]v :=0
[j , p, x ]v :=maxpj≤p
max{[pj ,X]vj , [pj ,X]vj }+ [j − 1, p − pj , x ]v if vvj /∈Mω(vvj ) + [pj + 1,X]vj + [j − 1, p − pj ,X]v if x = X& vvj /∈M{
[pj − 1,X]vj[pj − 1,X]vj
}+ [j − 1, p − pj , x ]v if vvj ∈M
13 /33
Scaffolding in Weighted Trees
Dynamic Programming Idea
bottom-up traversal; in each
vertex v , need to remember:
• #paths used below v• v incident with non-matching?
Semantics
[j , p, x ]v = max. weight collected
below v with p finished paths
“under x ” up to vj(abbrev: last child [p, x ]v)
v
v0
v1 v2 v3v4
v5
vj
125
9 6
3
1 1 X 0
1 X 0
1 X 01 X 0
1 X 01 0 X 9
1 1 X 0
2 1 X 9
2 2 X 0
1 X 9
2 X 0
1 1 X 0
2 1 X 3
2 2 X 0
1 X 3
2 X 0
1 2 X 9
1 3 X 0
1 2 X 9
1 3 X 0
2 3 X 12
2 3 X 21
2 4 X 9
2 4 X 12
2 5 X 0
. . . .
Recurrence
Let v1, v2, . . . , vc be the children of v .
[0, 0,X]v :=0
[j , p, x ]v :=maxpj≤p
max{[pj ,X]vj , [pj ,X]vj }+ [j − 1, p − pj , x ]v if vvj /∈Mω(vvj ) + [pj + 1,X]vj + [j − 1, p − pj ,X]v if x = X& vvj /∈M{
[pj − 1,X]vj[pj − 1,X]vj
}+ [j − 1, p − pj , x ]v if vvj ∈M
13 /33
Scaffolding in Weighted Trees
Dynamic Programming Idea
bottom-up traversal; in each
vertex v , need to remember:
• #paths used below v• v incident with non-matching?
Semantics
[j , p, x ]v = max. weight collected
below v with p finished paths
“under x ” up to vj(abbrev: last child [p, x ]v)
v
v0
v1 v2 v3v4
v5
vj
125
9 6
3
1 1 X 0
1 X 0
1 X 01 X 0
1 X 01 0 X 9
1 1 X 0
2 1 X 9
2 2 X 0
1 X 9
2 X 0
1 1 X 0
2 1 X 3
2 2 X 0
1 X 3
2 X 0
1 2 X 9
1 3 X 0
1 2 X 9
1 3 X 0
2 3 X 12
2 3 X 21
2 4 X 9
2 4 X 12
2 5 X 0
. . . .
Recurrence
Let v1, v2, . . . , vc be the children of v .
[0, 0,X]v :=0
[j , p, x ]v :=maxpj≤p
max{[pj ,X]vj , [pj ,X]vj }+ [j − 1, p − pj , x ]v if vvj /∈Mω(vvj ) + [pj + 1,X]vj + [j − 1, p − pj ,X]v if x = X& vvj /∈M{
[pj − 1,X]vj[pj − 1,X]vj
}+ [j − 1, p − pj , x ]v if vvj ∈M
13 /33
Scaffolding in Weighted Trees
Dynamic Programming Idea
bottom-up traversal; in each
vertex v , need to remember:
• #paths used below v• v incident with non-matching?
Semantics
[j , p, x ]v = max. weight collected
below v with p finished paths
“under x ” up to vj(abbrev: last child [p, x ]v)
v
v0
v1 v2 v3v4
v5
vj
125
9 6
3
1 1 X 0
1 X 0
1 X 01 X 0
1 X 01 0 X 9
1 1 X 0
2 1 X 9
2 2 X 0
1 X 9
2 X 0
1 1 X 0
2 1 X 3
2 2 X 0
1 X 3
2 X 0
1 2 X 9
1 3 X 0
1 2 X 9
1 3 X 0
2 3 X 12
2 3 X 21
2 4 X 9
2 4 X 12
2 5 X 0
. . . .
Recurrence
Let v1, v2, . . . , vc be the children of v .
[0, 0,X]v :=0
[j , p, x ]v :=maxpj≤p
max{[pj ,X]vj , [pj ,X]vj }+ [j − 1, p − pj , x ]v if vvj /∈Mω(vvj ) + [pj + 1,X]vj + [j − 1, p − pj ,X]v if x = X& vvj /∈M{
[pj − 1,X]vj[pj − 1,X]vj
}+ [j − 1, p − pj , x ]v if vvj ∈M
13 /33
3-Approximation in Dense Graphs
σp = 1, σc = 1?
Approximate Scaffolding
1. sort all edges by weight
2. repeatedly take heaviest edge, if
possible
14 /33
3-Approximation in Dense Graphs
σp = 1, σc = 1?
Approximate Scaffolding1. sort all edges by weight
2. repeatedly take heaviest edge, if
possible
14 /33
3-Approximation in Dense Graphs
σp = 1, σc = 1?
Approximate Scaffolding1. sort all edges by weight
2. repeatedly take heaviest edge, if
possible
14 /33
3-Approximation in Dense Graphs
σp = 1, σc = 1?
Approximate Scaffolding1. sort all edges by weight
2. repeatedly take heaviest edge, if
possible
14 /33
3-Approximation in Dense Graphs
σp = 1, σc = 1?
Approximate Scaffolding1. sort all edges by weight
2. repeatedly take heaviest edge, if
possible
14 /33
3-Approximation in Dense Graphs
σp = 1, σc = 1?
Approximate Scaffolding1. sort all edges by weight
2. repeatedly take heaviest edge, if
possible
14 /33
3-Approximation in Dense Graphs
σp = 1, σc = 1?
Approximate Scaffolding1. sort all edges by weight
2. repeatedly take heaviest edge, if
possible
14 /33
3-Approximation in Dense Graphs
σp = 1, σc = 1?
Approximate Scaffolding1. sort all edges by weight
2. repeatedly take heaviest edge, if
possible
14 /33
3-Approximation in Dense Graphs
σp = 1, σc = 1?
Approximate Scaffolding1. sort all edges by weight
2. repeatedly take heaviest edge, if
possible
14 /33
3-Approximation in Dense Graphs
σp = 1, σc = 1?
Approximate Scaffolding1. sort all edges by weight
2. repeatedly take heaviest edge, if
possible
14 /33
3-Approximation in Dense Graphs
σp = 1, σc = 1?
Approximate Scaffolding1. sort all edges by weight
2. repeatedly take heaviest edge, if
possible
14 /33
3-Approximation in Dense Graphs
σp = 1, σc = 1?
Approximate Scaffolding1. sort all edges by weight
2. repeatedly take heaviest edge, if
possible
14 /33
3-Approximation in Dense Graphs
σp = 1, σc = 1?
Approximate Scaffolding1. sort all edges by weight
2. repeatedly take heaviest edge, if
possible
14 /33
3-Approximation in Dense Graphs
σp = 1, σc = 1?
Approximate Scaffolding1. sort all edges by weight
2. repeatedly take heaviest edge, if
possible
Proof
Result S∗ is a valid solution X
Note: taking an edge forbids ≤ 3 OPT edges
mark the ≤ 3 OPT-edges when taking an edge e e is heaviest among them
3ω(S∗) ≥ OPT
14 /33
3-Approximation in Dense Graphs
σp = 1, σc = 1?
Approximate Scaffolding1. sort all edges by weight
2. repeatedly take heaviest edge, if
possible
Proof
Result S∗ is a valid solution XNote: taking an edge forbids ≤ 3 OPT edges
mark the ≤ 3 OPT-edges when taking an edge e e is heaviest among them
3ω(S∗) ≥ OPT
14 /33
3-Approximation in Dense Graphs
σp = 1, σc = 1?
Approximate Scaffolding1. sort all edges by weight
2. repeatedly take heaviest edge, if
possible
Proof
Result S∗ is a valid solution XNote: taking an edge forbids ≤ 3 OPT edges
mark the ≤ 3 OPT-edges when taking an edge e e is heaviest among them
3ω(S∗) ≥ OPT
14 /33
3-Approximation in Dense Graphs
σp = 1, σc = 1?
Approximate Scaffolding1. sort all edges by weight
2. repeatedly take heaviest edge, if
possible
Proof
Result S∗ is a valid solution XNote: taking an edge forbids ≤ 3 OPT edges
mark the ≤ 3 OPT-edges when taking an edge e e is heaviest among them
3ω(S∗) ≥ OPT
14 /33
3-Approximation in Dense Graphs
σp = 1, σc = 1?
Approximate Scaffolding1. sort all edges by weight
2. repeatedly take heaviest edge, if
possible
Proof
Result S∗ is a valid solution XNote: taking an edge forbids ≤ 3 OPT edges
mark the ≤ 3 OPT-edges when taking an edge e e is heaviest among them
3ω(S∗) ≥ OPT
14 /33
3-Approximation in Dense Graphs
σp = 1, σc = 1?
Approximate Scaffolding1. sort all edges by weight
2. repeatedly take heaviest edge, if
possible
Theorem
Scaffolding in complete graphs can be
3-approximated in O(|V | log |V |) time.
Remark
For Exact Scaffolding, we have to keep an eye on the number of
components too.
14 /33
3-Approximation in Dense Graphs
σp = 1, σc = 1?
Approximate Scaffolding1. sort all edges by weight
2. repeatedly take heaviest edge, if
possible
Theorem
Scaffolding in complete (bipartite) graphs can be
3-approximated in O(|V | log |V |) time.
Remark
For Exact Scaffolding, we have to keep an eye on the number of
components too.
14 /33
3-Approximation in Dense Graphs
σp = 1, σc = 1?
Approximate Scaffolding1. sort all edges by weight
2. repeatedly take heaviest edge, if
possible
Theorem
Scaffolding in complete (bipartite) graphs can be
3-approximated in O(|V | log |V |) time.
Remark
For Exact Scaffolding, we have to keep an eye on the number of
components too.
14 /33
2-Approximation in Dense Graphs
σp = 1, σc = 1?
Approximate Exact Scaffolding
1. compute max.-weight perfect
matching S S ∪M is collection of cycles
2. “fix” all but lightest edge per cycle
3. repeatedly flip any lightest non-fix
4-cycle intersecting 2 cycles
until at most σc + σp cycles remain
4. repeatedly remove lightest non-fix
cycle-edge
until at most σc cycles remain
15 /33
2-Approximation in Dense Graphs
σp = 1, σc = 1?
Approximate Exact Scaffolding1. compute max.-weight perfect
matching S S ∪M is collection of cycles
2. “fix” all but lightest edge per cycle
3. repeatedly flip any lightest non-fix
4-cycle intersecting 2 cycles
until at most σc + σp cycles remain
4. repeatedly remove lightest non-fix
cycle-edge
until at most σc cycles remain
15 /33
2-Approximation in Dense Graphs
σp = 1, σc = 1?
Approximate Exact Scaffolding1. compute max.-weight perfect
matching S S ∪M is collection of cycles
2. “fix” all but lightest edge per cycle
3. repeatedly flip any lightest non-fix
4-cycle intersecting 2 cycles
until at most σc + σp cycles remain
4. repeatedly remove lightest non-fix
cycle-edge
until at most σc cycles remain
15 /33
2-Approximation in Dense Graphs
σp = 1, σc = 1?
Approximate Exact Scaffolding1. compute max.-weight perfect
matching S S ∪M is collection of cycles
2. “fix” all but lightest edge per cycle
3. repeatedly flip any lightest non-fix
4-cycle intersecting 2 cycles
until at most σc + σp cycles remain
4. repeatedly remove lightest non-fix
cycle-edge
until at most σc cycles remain
15 /33
2-Approximation in Dense Graphs
σp = 1, σc = 1?
Approximate Exact Scaffolding1. compute max.-weight perfect
matching S S ∪M is collection of cycles
2. “fix” all but lightest edge per cycle
3. repeatedly flip any lightest non-fix
4-cycle intersecting 2 cycles
until at most σc + σp cycles remain
4. repeatedly remove lightest non-fix
cycle-edge
until at most σc cycles remain
15 /33
2-Approximation in Dense Graphs
σp = 1, σc = 1?
Approximate Exact Scaffolding1. compute max.-weight perfect
matching S S ∪M is collection of cycles
2. “fix” all but lightest edge per cycle
3. repeatedly flip any lightest non-fix
4-cycle intersecting 2 cycles
until at most σc + σp cycles remain
4. repeatedly remove lightest non-fix
cycle-edge
until at most σc cycles remain
15 /33
2-Approximation in Dense Graphs
σp = 1, σc = 1?
Approximate Exact Scaffolding1. compute max.-weight perfect
matching S S ∪M is collection of cycles
2. “fix” all but lightest edge per cycle
3. repeatedly flip any lightest non-fix
4-cycle intersecting 2 cycles
until at most σc + σp cycles remain
4. repeatedly remove lightest non-fix
cycle-edge
until at most σc cycles remain
15 /33
2-Approximation in Dense Graphs
σp = 1, σc = 1?
Approximate Exact Scaffolding1. compute max.-weight perfect
matching S S ∪M is collection of cycles
2. “fix” all but lightest edge per cycle
3. repeatedly flip any lightest non-fix
4-cycle intersecting 2 cycles
until at most σc + σp cycles remain
4. repeatedly remove lightest non-fix
cycle-edge
until at most σc cycles remain
15 /33
2-Approximation in Dense Graphs
σp = 1, σc = 1?
Approximate Exact Scaffolding1. compute max.-weight perfect
matching S S ∪M is collection of cycles
2. “fix” all but lightest edge per cycle
3. repeatedly flip any lightest non-fix
4-cycle intersecting 2 cycles
until at most σc + σp cycles remain
4. repeatedly remove lightest non-fix
cycle-edge
until at most σc cycles remainProof
Result S∗ is a valid solution X
ω(S∗) ≥ ω(fix) ≥ ω(S)/2 ≥ OPT/2
15 /33
2-Approximation in Dense Graphs
σp = 1, σc = 1?
Approximate Exact Scaffolding1. compute max.-weight perfect
matching S S ∪M is collection of cycles
2. “fix” all but lightest edge per cycle
3. repeatedly flip any lightest non-fix
4-cycle intersecting 2 cycles
until at most σc + σp cycles remain
4. repeatedly remove lightest non-fix
cycle-edge
until at most σc cycles remainProof
Result S∗ is a valid solution Xω(S∗) ≥ ω(fix) ≥ ω(S)/2 ≥ OPT/2
15 /33
2-Approximation in Dense Graphs
σp = 1, σc = 1?
Approximate Exact Scaffolding1. compute max.-weight perfect
matching S S ∪M is collection of cycles
2. “fix” all but lightest edge per cycle
3. repeatedly flip any lightest non-fix
4-cycle intersecting 2 cycles
until at most σc + σp cycles remain
4. repeatedly remove lightest non-fix
cycle-edge
until at most σc cycles remainTheorem
Exact Scaffolding in complete graphs can be
2-approximated in O(|V |2) time.
Remark
For Scaffolding, replace Step 3 by either merging cycles or
removing lightest edge, whatever looses less weight
15 /33
2-Approximation in Dense Graphs
σp = 1, σc = 1?
Approximate Exact Scaffolding1. compute max.-weight perfect
matching S S ∪M is collection of cycles
2. “fix” all but lightest edge per cycle
3. repeatedly flip any lightest non-fix
4-cycle intersecting 2 cycles
until at most σc + σp cycles remain
4. repeatedly remove lightest non-fix
cycle-edge
until at most σc cycles remainTheorem
Exact Scaffolding in complete (bipartite) graphs can be
2-approximated in O(|V |2) time.
Remark
For Scaffolding, replace Step 3 by either merging cycles or
removing lightest edge, whatever looses less weight
15 /33
2-Approximation in Dense Graphs
σp = 1, σc = 1?
Approximate Exact Scaffolding1. compute max.-weight perfect
matching S S ∪M is collection of cycles
2. “fix” all but lightest edge per cycle
3. repeatedly flip any lightest non-fix
4-cycle intersecting 2 cycles
until at most σc + σp cycles remain
4. repeatedly remove lightest non-fix
cycle-edge
until at most σc cycles remainTheorem
Exact Scaffolding in complete (bipartite) graphs can be
2-approximated in O(|V |2) time.
Remark
For Scaffolding, replace Step 3 by either merging cycles or
removing lightest edge, whatever looses less weight
15 /33
Exact Algorithms I: Brute Force
j ii − 2j ii − 2j ii − 2
[p, c , j ]i :=max. weight collectible before vi with p & cpaths/cycles plus one path starting at vj
[p, c , j ]i =[p, c , j ]i−2 + ω(vi−2vi−1) if j < i − 2 & vi−2vi−1 ∈ E
[p, c , i − 1]i = maxj<i−2j even
{
[p − 1, c , j ]i−2
[p, c − 1, j ]i−2 + ω(vjvi−2) if vjvi−2 ∈ E
Observation
An ordering of V (G ) certifies YES-instances of Scaffolding.
try all O(n!) certificates
contigs force every other vertex O(n!!)
16 /33
Exact Algorithms I: Brute Force
j ii − 2j ii − 2j ii − 2
[p, c , j ]i :=max. weight collectible before vi with p & cpaths/cycles plus one path starting at vj
[p, c , j ]i =[p, c , j ]i−2 + ω(vi−2vi−1) if j < i − 2 & vi−2vi−1 ∈ E
[p, c , i − 1]i = maxj<i−2j even
{
[p − 1, c , j ]i−2
[p, c − 1, j ]i−2 + ω(vjvi−2) if vjvi−2 ∈ E
Observation
An ordering of V (G ) certifies YES-instances of Scaffolding.
try all O(n!) certificates
contigs force every other vertex O(n!!)
16 /33
Exact Algorithms I: Brute Force
j ii − 2
j ii − 2j ii − 2
[p, c , j ]i :=max. weight collectible before vi with p & cpaths/cycles plus one path starting at vj
[p, c , j ]i =[p, c , j ]i−2 + ω(vi−2vi−1) if j < i − 2 & vi−2vi−1 ∈ E
[p, c , i − 1]i = maxj<i−2j even
{
[p − 1, c , j ]i−2
[p, c − 1, j ]i−2 + ω(vjvi−2) if vjvi−2 ∈ E
Observation
An ordering of V (G ) certifies YES-instances of Scaffolding.
try all O(n!) certificates
contigs force every other vertex O(n!!)
16 /33
Exact Algorithms I: Brute Force
j ii − 2
j ii − 2
j ii − 2
[p, c , j ]i :=max. weight collectible before vi with p & cpaths/cycles plus one path starting at vj
[p, c , j ]i =[p, c , j ]i−2 + ω(vi−2vi−1) if j < i − 2 & vi−2vi−1 ∈ E
[p, c , i − 1]i = maxj<i−2j even
{[p − 1, c , j ]i−2
[p, c − 1, j ]i−2 + ω(vjvi−2) if vjvi−2 ∈ E
Observation
An ordering of V (G ) certifies YES-instances of Scaffolding.
try all O(n!) certificates
contigs force every other vertex O(n!!)
16 /33
Exact Algorithms I: Brute Force
j ii − 2j ii − 2
j ii − 2
[p, c , j ]i :=max. weight collectible before vi with p & cpaths/cycles plus one path starting at vj
[p, c , j ]i =[p, c , j ]i−2 + ω(vi−2vi−1) if j < i − 2 & vi−2vi−1 ∈ E
[p, c , i − 1]i = maxj<i−2j even
{[p − 1, c , j ]i−2
[p, c − 1, j ]i−2 + ω(vjvi−2) if vjvi−2 ∈ E
Observation
An ordering of V (G ) certifies YES-instances of Scaffolding.
try all O(n!) certificates
contigs force every other vertex O(n!!)
16 /33
Exact Algorithms I: Brute Force
j ii − 2j ii − 2
j ii − 2
[p, c , j ]i :=max. weight collectible before vi with p & cpaths/cycles plus one path starting at vj
[p, c , j ]i =[p, c , j ]i−2 + ω(vi−2vi−1) if j < i − 2 & vi−2vi−1 ∈ E
[p, c , i − 1]i = maxj<i−2j even
{[p − 1, c , j ]i−2
[p, c − 1, j ]i−2 + ω(vjvi−2) if vjvi−2 ∈ E
Observation
An ordering of V (G ) certifies YES-instances of Scaffolding.
try all O(n!) certificates
contigs force every other vertex O(n!!)
16 /33
Exact Algorithms I: Brute Force
j ii − 2j ii − 2
j ii − 2
[p, c , j ]i :=max. weight collectible before vi with p & cpaths/cycles plus one path starting at vj
[p, c , j ]i =[p, c , j ]i−2 + ω(vi−2vi−1) if j < i − 2 & vi−2vi−1 ∈ E
[p, c , i − 1]i = maxj<i−2j even
{[p − 1, c , j ]i−2
[p, c − 1, j ]i−2 + ω(vjvi−2) if vjvi−2 ∈ E
Observation
An ordering of V (G ) certifies YES-instances of Scaffolding.
try all O(n!) certificates
contigs force every other vertex O(n!!)
16 /33
Exact Algorithms I: Brute Force
j ii − 2j ii − 2
j ii − 2
[p, c , j ]i :=max. weight collectible before vi with p & cpaths/cycles plus one path starting at vj
[p, c , j ]i =[p, c , j ]i−2 + ω(vi−2vi−1) if j < i − 2 & vi−2vi−1 ∈ E
[p, c , i − 1]i = maxj<i−2j even
{[p − 1, c , j ]i−2
[p, c − 1, j ]i−2 + ω(vjvi−2) if vjvi−2 ∈ E
Observation
An ordering of V (G ) certifies YES-instances of Scaffolding.
try all O(n!) certificates
contigs force every other vertex O(√
2n · n/2!)
16 /33
Exact Algorithms II: Dynamic Programming
S − xy
x yw
u
Semantics
[S , p, c , u, v ] = max. weight collectible in G [S ] by p alt. paths, calt. cycles and an alt. path starting at u & ending at v
Computation
Let xy ∈M. Then, [{xy}, 0, 0, x , y ] := 0 and
[S , p, c , u, y ] := maxw∈G [S−xy ]
u 6=w
[S − xy , p, c , u,w ] + ω(wx)
[S , p, c , x , y ] :=
maxu,w∈G [S−xy ]
{
[S − xy , p − 1, c , u,w ]
[S − xy , p, c − 1, u,w ] + ω(wu) if wu ∈ E (G ) \M
17 /33
Exact Algorithms II: Dynamic Programming
S − xy
x ywu
Semantics
[S , p, c , u, v ] = max. weight collectible in G [S ] by p alt. paths, calt. cycles and an alt. path starting at u & ending at v
Computation
Let xy ∈M. Then, [{xy}, 0, 0, x , y ] := 0 and
[S , p, c , u, y ] := maxw∈G [S−xy ]
u 6=w
[S − xy , p, c , u,w ] + ω(wx)
[S , p, c , x , y ] :=
maxu,w∈G [S−xy ]
{
[S − xy , p − 1, c , u,w ]
[S − xy , p, c − 1, u,w ] + ω(wu) if wu ∈ E (G ) \M
17 /33
Exact Algorithms II: Dynamic Programming
S − xy
x ywu
Semantics
[S , p, c , u, v ] = max. weight collectible in G [S ] by p alt. paths, calt. cycles and an alt. path starting at u & ending at v
Computation
Let xy ∈M. Then, [{xy}, 0, 0, x , y ] := 0 and
[S , p, c , u, y ] := maxw∈G [S−xy ]
u 6=w
[S − xy , p, c , u,w ] + ω(wx)
[S , p, c , x , y ] := maxu,w∈G [S−xy ]
{[S − xy , p − 1, c , u,w ]
[S − xy , p, c − 1, u,w ] + ω(wu) if wu ∈ E (G ) \M
17 /33
Exact Algorithms II: Dynamic Programming
S − xy
x ywu
Semantics
[S , p, c , u, v ] = max. weight collectible in G [S ] by p alt. paths, calt. cycles and an alt. path starting at u & ending at v
Computation
Let xy ∈M. Then, [{xy}, 0, 0, x , y ] := 0 and
[S , p, c , u, y ] := maxw∈G [S−xy ]
u 6=w
[S − xy , p, c , u,w ] + ω(wx)
[S , p, c , x , y ] := maxu,w∈G [S−xy ]
{[S − xy , p − 1, c , u,w ]
[S − xy , p, c − 1, u,w ] + ω(wu) if wu ∈ E (G ) \M
17 /33
Exact Algorithms II: Dynamic Programming
S − xy
x ywu
Semantics
[S , p, c , u, v ] = max. weight collectible in G [S ] by p alt. paths, calt. cycles and an alt. path starting at u & ending at v
Theorem
Scaffolding can be solved in O(√
2nn3σpσc) time.
17 /33
Sparse Graphs: Quasi-Forest
Recall
• Scaffolding is hard in any sufficiently
dense graph class
• Scaffolding is easy in trees
A Shot at Sparsity
G is Quasi-forest ⇔ G −M is forest
Observation
Each leaf v of G −M has degree 2 in G if unweighted, can we take both?
remove all non-matching edges from parent u, except uv
But is it even NP-hard?
18 /33
Sparse Graphs: Quasi-Forest
Recall
• Scaffolding is hard in any sufficiently
dense graph class
• Scaffolding is easy in trees
A Shot at Sparsity
G is Quasi-forest ⇔ G −M is forest
Observation
Each leaf v of G −M has degree 2 in G if unweighted, can we take both?
remove all non-matching edges from parent u, except uv
But is it even NP-hard?
18 /33
Sparse Graphs: Quasi-Forest
Recall
• Scaffolding is hard in any sufficiently
dense graph class
• Scaffolding is easy in trees
A Shot at Sparsity
G is Quasi-forest ⇔ G −M is forest
Observation
Each leaf v of G −M has degree 2 in G if unweighted, can we take both?
remove all non-matching edges from parent u, except uv
But is it even NP-hard?
18 /33
Sparse Graphs: Quasi-Forest
Recall
• Scaffolding is hard in any sufficiently
dense graph class
• Scaffolding is easy in trees
A Shot at Sparsity
G is Quasi-forest ⇔ G −M is forest
Observation
Each leaf v of G −M has degree 2 in G if unweighted, can we take both?
remove all non-matching edges from parent u, except uv
But is it even NP-hard?
18 /33
Sparse Graphs: Quasi-Forest
Recall
• Scaffolding is hard in any sufficiently
dense graph class
• Scaffolding is easy in trees
A Shot at Sparsity
G is Quasi-forest ⇔ G −M is forest
Observation
Each leaf v of G −M has degree 2 in G if unweighted, can we take both?
remove all non-matching edges from parent u, except uv
u
v
Observation
• v in path & u in cycle 1 path X• v in path & u in path 2 paths X
unless it’s the same path!
But is it even NP-hard?
18 /33
Sparse Graphs: Quasi-Forest
Recall
• Scaffolding is hard in any sufficiently
dense graph class
• Scaffolding is easy in trees
A Shot at Sparsity
G is Quasi-forest ⇔ G −M is forest
Observation
Each leaf v of G −M has degree 2 in G if unweighted, can we take both?
remove all non-matching edges from parent u, except uv
u
v
Observation
• v in path & u in cycle 1 path X• v in path & u in path 2 paths X
unless it’s the same path!
But is it even NP-hard?
18 /33
Sparse Graphs: Quasi-Forest
Recall
• Scaffolding is hard in any sufficiently
dense graph class
• Scaffolding is easy in trees
A Shot at Sparsity
G is Quasi-forest ⇔ G −M is forest
Observation
Each leaf v of G −M has degree 2 in G if unweighted, can we take both?
remove all non-matching edges from parent u, except uv
u
v
Observation
• v in path & u in cycle 1 path X• v in path & u in path 2 paths X
unless it’s the same path!But is it even NP-hard?
18 /33
Sparse Graphs: Quasi-Forest
Recall
• Scaffolding is hard in any sufficiently
dense graph class
• Scaffolding is easy in trees
A Shot at Sparsity
G is Quasi-forest ⇔ G −M is forest
Observation
Each leaf v of G −M has degree 2 in G if unweighted, can we take both?
remove all non-matching edges from parent u, except uv
u
v
Observation
• v in path & u in cycle 1 path X• v in path & u in path 2 paths X
unless it’s the same path!
But is it even NP-hard?
18 /33
Sparse Graphs: Quasi-Forest
Recall
• Scaffolding is hard in any sufficiently
dense graph class
• Scaffolding is easy in trees
A Shot at Sparsity
G is Quasi-forest ⇔ G −M is forest
Observation
Each leaf v of G −M has degree 2 in G if unweighted, can we take both? X
remove all non-matching edges from parent u, except uv
u
v
Observation
• v in path & u in cycle 1 path X• v in path & u in path 2 paths X
unless it’s the same path!But is it even NP-hard?
18 /33
Sparse Graphs: Quasi-Forest
Recall
• Scaffolding is hard in any sufficiently
dense graph class
• Scaffolding is easy in trees
A Shot at Sparsity
G is Quasi-forest ⇔ G −M is forest
Observation
Each leaf v of G −M has degree 2 in G if σp = 0, we have to take both!
remove all non-matching edges from parent u, except uv
u
v
Observation
• v in path & u in cycle 1 path X• v in path & u in path 2 paths X
unless it’s the same path!
But is it even NP-hard?
18 /33
Sparse Graphs: Quasi-Forest
Recall
• Scaffolding is hard in any sufficiently
dense graph class
• Scaffolding is easy in trees
A Shot at Sparsity
G is Quasi-forest ⇔ G −M is forest
Observation
Each leaf v of G −M has degree 2 in G if σp = 0, we have to take both!
remove all non-matching edges from parent u, except uv
u
v
Observation
• v in path & u in cycle 1 path X• v in path & u in path 2 paths X
unless it’s the same path!
But is it even NP-hard?
18 /33
Sparse Graphs: Quasi-Forest
Recall
• Scaffolding is hard in any sufficiently
dense graph class
• Scaffolding is easy in trees
A Shot at Sparsity
G is Quasi-forest ⇔ G −M is forest
Observation
Each leaf v of G −M has degree 2 in G if σp = 0, we have to take both!
remove all non-matching edges from parent u, except uv
Corollary
Scaffolding can be solved in O(n) on quasi-forests if σp = 0.
Scaffolding can be solved in O(n2σp+1) in quasi-forests.
But is it even NP-hard?
18 /33
Sparse Graphs: Quasi-Forest
Recall
• Scaffolding is hard in any sufficiently
dense graph class
• Scaffolding is easy in trees
A Shot at Sparsity
G is Quasi-forest ⇔ G −M is forest
Observation
Each leaf v of G −M has degree 2 in G if σp = 0, we have to take both!
remove all non-matching edges from parent u, except uv
Corollary
Scaffolding can be solved in O(n) on quasi-forests if σp = 0.Scaffolding can be solved in O(n2σp+1) in quasi-forests.
But is it even NP-hard?
18 /33
Sparse Graphs: Quasi-Forest
Recall
• Scaffolding is hard in any sufficiently
dense graph class
• Scaffolding is easy in trees
A Shot at Sparsity
G is Quasi-forest ⇔ G −M is forest
Observation
Each leaf v of G −M has degree 2 in G if σp = 0, we have to take both!
remove all non-matching edges from parent u, except uv
Corollary
Scaffolding can be solved in O(n) on quasi-forests if σp = 0.Scaffolding can be solved in O(n2σp+1) in quasi-forests.
But is it even NP-hard?18 /33
Sparse Graphs: Quasi-Forest
Weighted 2-SATInput: ϕ on X in 2-CNF, weights ω : X × {0, 1} → N, k ∈ N
Question: is there a satisfying assignment for ϕ of weight ≤ k?
Remark
Independent Set is special case of Weighted 2-SAT
19 /33
Sparse Graphs: Quasi-Forest
Weighted 2-SATInput: ϕ on X in 2-CNF, weights ω : X × {0, 1} → N, k ∈ N
Question: is there a satisfying assignment for ϕ of weight ≤ k?
Remark
Independent Set is special case of Weighted 2-SAT
19 /33
Sparse Graphs: Quasi-Forest
19 /33
Sparse Graphs: Quasi-Forest
19 /33
Sparse Graphs: Quasi-Forest
19 /33
Sparse Graphs: Quasi-Forest
19 /33
Sparse Graphs: Quasi-Forest
19 /33
Sparse Graphs: Quasi-Forest
19 /33
Sparse Graphs: Quasi-Forest
Observation
∃ weight-k satisfying assignment
⇔
∃ weight-k cover with ≤ nalternating paths
Theorem
Scaffolding is NP-hard even if G −Mis a collection of paths with weights
0/1
Corollary
no 2o(n+m)-time algorithm (ETH)
no no(k)-time algorithm (FPT 6=W[t])
19 /33
Sparse Graphs: Quasi-Forest
Observation
∃ weight-k satisfying assignment
⇔
∃ weight-k cover with ≤ nalternating paths
Theorem
Scaffolding is NP-hard even if G −Mis a collection of paths with weights
0/1
Corollary
no 2o(n+m)-time algorithm (ETH)
no no(k)-time algorithm (FPT 6=W[t])
19 /33
Sparse Graphs: Quasi-Forest
Observation
∃ weight-k satisfying assignment
⇔
∃ weight-k cover with ≤ nalternating paths
Theorem
Scaffolding is NP-hard even if G −Mis a collection of paths with weights
0/1
Corollary
no 2o(n+m)-time algorithm (ETH)
no no(k)-time algorithm (FPT 6=W[t])
19 /33
Sparse Graphs: Quasi-Forest
Observation
∃ weight-k satisfying assignment
⇔
∃ weight-k cover with ≤ nalternating paths
Theorem
Scaffolding is NP-hard even if G −Mis a collection of paths with weights
0/1
Corollary
no 2o(n+m)-time algorithm (ETH)
no no(k)-time algorithm (FPT 6=W[t])
19 /33
Sparse Graphs: Quasi-Forest
Observation
∃ weight-k satisfying assignment
⇔
∃ weight-k cover with ≤ nalternating paths
Theorem
Scaffolding is NP-hard even if G −Mis a collection of paths with weights
0/1
Corollary
no 2o(n+m)-time algorithm (ETH)
no no(k)-time algorithm (FPT 6=W[t])
19 /33
Sparse Graphs: Quasi-Forest
Observation
∃ weight-k satisfying assignment
⇔
∃ weight-k cover with ≤ nalternating paths
Theorem
Scaffolding is NP-hard even if G −Mis a collection of paths with weights
0/1
Corollary
no 2o(n+m)-time algorithm (ETH)
no no(k)-time algorithm (FPT 6=W[t])
19 /33
Sparse Graphs: Quasi-Forest
Observation
∃ weight-k satisfying assignment
⇔
∃ weight-k cover with ≤ nalternating paths
Theorem
Scaffolding is NP-hard even if G −Mis a collection of paths with weights
0/1
Corollary
no 2o(n+m)-time algorithm (ETH)
no no(k)-time algorithm (FPT 6=W[t])
19 /33
Other Forms of Tree-Likeness
Tree Decompositions
tree T , each vertex i associated to some Xi ⊆ V (G ) s.t.
1. ∀ e ∈ E (G ), there is some i ∈ V (T ) with e ∈ Xi2. ∀ v ∈ V (G ), bags containing v induce a connected subtree
treewidth tw = size of largest bag - 1
Hope
Practical instances of Scaffolding have low treewidth (they
originate from linear structure)
Nice Decompositions
Leaf: X = ∅Introduce v : i has single child j and Xi \ Xj = {v}Forget v : i has single child j and Xj \ Xi = {v}Introduce uv : i has single child j and uv ⊆ Xi = Xj
(each edge introduced exactly once)
Join: i has 2 children j and ` and Xi = Xj = X`
20/33
Other Forms of Tree-Likeness
Tree Decompositions
tree T , each vertex i associated to some Xi ⊆ V (G ) s.t.
1. ∀ e ∈ E (G ), there is some i ∈ V (T ) with e ∈ Xi2. ∀ v ∈ V (G ), bags containing v induce a connected subtree
treewidth tw = size of largest bag - 1
Hope
Practical instances of Scaffolding have low treewidth (they
originate from linear structure)
Nice Decompositions
Leaf: X = ∅Introduce v : i has single child j and Xi \ Xj = {v}Forget v : i has single child j and Xj \ Xi = {v}Introduce uv : i has single child j and uv ⊆ Xi = Xj
(each edge introduced exactly once)
Join: i has 2 children j and ` and Xi = Xj = X`
20/33
Other Forms of Tree-Likeness
Tree Decompositions
tree T , each vertex i associated to some Xi ⊆ V (G ) s.t.
1. ∀ e ∈ E (G ), there is some i ∈ V (T ) with e ∈ Xi2. ∀ v ∈ V (G ), bags containing v induce a connected subtree
treewidth tw = size of largest bag - 1
Hope
Practical instances of Scaffolding have low treewidth (they
originate from linear structure)
Nice Decompositions
Leaf: X = ∅Introduce v : i has single child j and Xi \ Xj = {v}Forget v : i has single child j and Xj \ Xi = {v}Introduce uv : i has single child j and uv ⊆ Xi = Xj
(each edge introduced exactly once)
Join: i has 2 children j and ` and Xi = Xj = X`
20/33
How do Solutions Interact with Bags?
Xi
Gi [S ]
Ingredients
• degree-function d : X → {0, 1, 2}• “pairing” ⊆
(X2
)∪ X
#matchings possibilities O(|X ||X |/2)
• #paths and #cycles completed “below the bag”
21 /33
How do Solutions Interact with Bags?
Xi
Gi [S ]
Ingredients
• degree-function d : X → {0, 1, 2}• “pairing” ⊆
(X2
)∪ X
#matchings possibilities O(|X ||X |/2)
• #paths and #cycles completed “below the bag”
21 /33
How do Solutions Interact with Bags?
Xi
Gi [S ]
Ingredients
• degree-function d : X → {0, 1, 2}• “pairing” ⊆
(X2
)∪ X
#matchings possibilities O(|X ||X |/2)
• #paths and #cycles completed “below the bag”
21 /33
How do Solutions Interact with Bags?
Xi
Gi [S ]
Ingredients
• degree-function d : X → {0, 1, 2}• “pairing” ⊆
(X2
)∪ X
#matchings possibilities O(|X ||X |/2)
• #paths and #cycles completed “below the bag”
21 /33
How do Solutions Interact with Bags?
Xi
Gi [S ]
Ingredients
• degree-function d : X → {0, 1, 2}• “pairing” ⊆
(X2
)∪ X
#matchings possibilities O(|X ||X |/2)
• #paths and #cycles completed “below the bag”
21 /33
How do Solutions Interact with Bags?
Xi
Gi [S ]
Ingredients
• degree-function d : X → {0, 1, 2}• “pairing” ⊆
(X2
)∪ X #matchings possibilities O(|X ||X |/2)
• #paths and #cycles completed “below the bag”
21 /33
How do Solutions Interact with Bags?
Xi
Gi [S ]
Ingredients
• degree-function d : X → {0, 1, 2}• “pairing” ⊆
(X2
)∪ X #matchings possibilities O(|X ||X |/2)
• #paths and #cycles completed “below the bag”
Semantics
[d ,P, p, c]i = max. weight of any S with M∩ E (Gi ) ⊆ S ⊆ E (Gi ) and
1. each vertex v ∈ Xi has degree d(v) in Gi [S ],2. for each uv ∈ P , Gi [S ] contains an alternating path. . .
u = v : . . . from u avoiding d−1(1)u 6= v : . . . from u to v
3. Gi [S ] contains p alt. paths & c alt. cycles avoiding d−1(1)
21 /33
How do Solutions Interact with Bags?
Xi
Gi [S ]
Ingredients
• degree-function d : X → {0, 1, 2}• “pairing” ⊆
(X2
)∪ X #matchings possibilities O(|X ||X |/2)
• #paths and #cycles completed “below the bag”
Leaf Bag
[∅,∅, 0, 0]i = 0
21 /33
How do Solutions Interact with Bags?
Xi
Gi [S ]
Ingredients
• degree-function d : X → {0, 1, 2}• “pairing” ⊆
(X2
)∪ X #matchings possibilities O(|X ||X |/2)
• #paths and #cycles completed “below the bag”
Introduce v (single child j)[d ,P, p, c]i = [d |v→⊥,P, p, c]j
21 /33
How do Solutions Interact with Bags?
Xi
Gi [S ]
Ingredients
• degree-function d : X → {0, 1, 2}• “pairing” ⊆
(X2
)∪ X #matchings possibilities O(|X ||X |/2)
• #paths and #cycles completed “below the bag”
Introduce uv (single child j)Case 1: d(u) = d(v) = 2
u v
21 /33
How do Solutions Interact with Bags?
Xi
Gi [S ]
Ingredients
• degree-function d : X → {0, 1, 2}• “pairing” ⊆
(X2
)∪ X #matchings possibilities O(|X ||X |/2)
• #paths and #cycles completed “below the bag”
Introduce uv (single child j)Case 1: d(u) = d(v) = 2
u v [d ,P, p, c]i = [d |u→1,v→1,P + uv , p, c − 1]j
21 /33
How do Solutions Interact with Bags?
Xi
Gi [S ]
Ingredients
• degree-function d : X → {0, 1, 2}• “pairing” ⊆
(X2
)∪ X #matchings possibilities O(|X ||X |/2)
• #paths and #cycles completed “below the bag”
Introduce uv (single child j)Case 1: d(u) = d(v) = 2
u v u v u v u v
21 /33
How do Solutions Interact with Bags?
Xi
Gi [S ]
Ingredients
• degree-function d : X → {0, 1, 2}• “pairing” ⊆
(X2
)∪ X #matchings possibilities O(|X ||X |/2)
• #paths and #cycles completed “below the bag”
Forget v (single child j)
u
[d ,P, p, c]i = max
[d |v→1,P + vv , p − 1, c]j
maxuu∈P
[d |v→1, (P − uu) + uv , p, c]j
maxx∈{0,2}
[d |v→x ,P, p, c]j
21 /33
How do Solutions Interact with Bags?
Xi
Gi [S ]
Ingredients
• degree-function d : X → {0, 1, 2}• “pairing” ⊆
(X2
)∪ X #matchings possibilities O(|X ||X |/2)
• #paths and #cycles completed “below the bag”
Forget v (single child j)
u
v
[d ,P, p, c]i = max
[d |v→1,P + vv , p − 1, c]j
maxuu∈P
[d |v→1, (P − uu) + uv , p, c]j
maxx∈{0,2}
[d |v→x ,P, p, c]j
21 /33
How do Solutions Interact with Bags?
Xi
Gi [S ]
Ingredients
• degree-function d : X → {0, 1, 2}• “pairing” ⊆
(X2
)∪ X #matchings possibilities O(|X ||X |/2)
• #paths and #cycles completed “below the bag”
Forget v (single child j)u v
[d ,P, p, c]i = max
[d |v→1,P + vv , p − 1, c]j
maxuu∈P
[d |v→1, (P − uu) + uv , p, c]j
maxx∈{0,2}
[d |v→x ,P, p, c]j
21 /33
How do Solutions Interact with Bags?
Xi
Gi [S ]
Ingredients
• degree-function d : X → {0, 1, 2}• “pairing” ⊆
(X2
)∪ X #matchings possibilities O(|X ||X |/2)
• #paths and #cycles completed “below the bag”
Forget v (single child j)
u
v
[d ,P, p, c]i = max
[d |v→1,P + vv , p − 1, c]j
maxuu∈P
[d |v→1, (P − uu) + uv , p, c]j
maxx∈{0,2}
[d |v→x ,P, p, c]j
21 /33
How do Solutions Interact with Bags?
Xi
Gi [S ]
Ingredients
• degree-function d : X → {0, 1, 2}• “pairing” ⊆
(X2
)∪ X #matchings possibilities O(|X ||X |/2)
• #paths and #cycles completed “below the bag”
Join Bag (children j & `)
[d ,P, p, c]i = maxdj ,Pj ,pj ,cj
maxP`
Pj t P` = P
[dj ,Pj , pj , cj ]j + [d − dj ,P`, p − pj , c − cj ]`
O(3tw · twtw /2 ·σp · σc) table entries
and O((tw +2)tw · σp · σc · n) time
21 /33
How do Solutions Interact with Bags?
Xi
Gi [S ]
Ingredients
• degree-function d : X → {0, 1, 2}• “pairing” ⊆
(X2
)∪ X #matchings possibilities O(|X ||X |/2)
• #paths and #cycles completed “below the bag”
Join Bag (children j & `)
[d ,P, p, c]i = maxdj ,Pj ,pj ,cj
maxP`
Pj t P` = P
[dj ,Pj , pj , cj ]j + [d − dj ,P`, p − pj , c − cj ]`
O(3tw · twtw /2 ·σp · σc) table entries
and O((tw +2)tw · σp · σc · n) time
21 /33
How do Solutions Interact with Bags?
Xi
Gi [S ]
Ingredients
• degree-function d : X → {0, 1, 2}• “pairing” ⊆
(X2
)∪ X #matchings possibilities O(|X ||X |/2)
• #paths and #cycles completed “below the bag”
Join Bag (children j & `)
[d ,P, p, c]i = maxdj ,Pj ,pj ,cj
maxP`
Pj t P` = P
[dj ,Pj , pj , cj ]j + [d − dj ,P`, p − pj , c − cj ]`
O(2tw · twtw /2 ·σp · σc) table entries
and O((tw +2)tw · σp · σc · n) time
21 /33
How do Solutions Interact with Bags?
Xi
Gi [S ]
Ingredients
• degree-function d : X → {0, 1, 2}• “pairing” ⊆
(X2
)∪ X #matchings possibilities O(|X ||X |/2)
• #paths and #cycles completed “below the bag”
Join Bag (children j & `)
[d ,P, p, c]i = maxdj ,Pj ,pj ,cj
maxP`
Pj t P` = P
[dj ,Pj , pj , cj ]j + [d − dj ,P`, p − pj , c − cj ]`
O(2tw · twtw /2 ·σp · σc) table entries and O((tw +2)tw · σp · σc · n) time
21 /33
Too slow
in prac-
tice!
22/33
Integer Linear Program Formulation
s
tc
t
p- chromosomes = disjoint s-t-paths- bin. variables yuv = 1⇔ u → v used
x{u,v} = yuv + yvu
- force contigs: ∀uv∈Mxuv = 1- path preservation: ∀u 6=s,t
∑v yvu =
∑v yuv
- path bounds:∑
v yvt ≤ σ- forbid cycles (row generation via callback):
∀ cycle C :∑
uv∈Cyuv < |C |
- objective: max∑e∈E
x{u,v} · ω(e)
- cycle consistency: ∀uyutc ≤ ysu
- jump mechanics
!!!
23/33
Integer Linear Program Formulation
s
tc
t
p
- chromosomes = disjoint s-t-paths
- bin. variables yuv = 1⇔ u → v usedx{u,v} = yuv + yvu
- force contigs: ∀uv∈Mxuv = 1- path preservation: ∀u 6=s,t
∑v yvu =
∑v yuv
- path bounds:∑
v yvt ≤ σ- forbid cycles (row generation via callback):
∀ cycle C :∑
uv∈Cyuv < |C |
- objective: max∑e∈E
x{u,v} · ω(e)
- cycle consistency: ∀uyutc ≤ ysu
- jump mechanics
!!!
23/33
Integer Linear Program Formulation
s
tc
t
p
- chromosomes = disjoint s-t-paths- bin. variables yuv = 1⇔ u → v used
x{u,v} = yuv + yvu
- force contigs: ∀uv∈Mxuv = 1- path preservation: ∀u 6=s,t
∑v yvu =
∑v yuv
- path bounds:∑
v yvt ≤ σ
- forbid cycles (row generation via callback):
∀ cycle C :∑
uv∈Cyuv < |C |
- objective: max∑e∈E
x{u,v} · ω(e)
- cycle consistency: ∀uyutc ≤ ysu
- jump mechanics
!!!
23/33
Integer Linear Program Formulation
s
tc
t
p
- chromosomes = disjoint s-t-paths- bin. variables yuv = 1⇔ u → v used
x{u,v} = yuv + yvu
- force contigs: ∀uv∈Mxuv = 1- path preservation: ∀u 6=s,t
∑v yvu =
∑v yuv
- path bounds:∑
v yvt ≤ σ- forbid cycles (row generation via callback):
∀ cycle C :∑
uv∈Cyuv < |C |
- objective: max∑e∈E
x{u,v} · ω(e)
- cycle consistency: ∀uyutc ≤ ysu
- jump mechanics
!!!
23/33
Integer Linear Program Formulation
s
tc
t
p
- chromosomes = disjoint s-t-paths- bin. variables yuv = 1⇔ u → v used
x{u,v} = yuv + yvu
- force contigs: ∀uv∈Mxuv = 1- path preservation: ∀u 6=s,t
∑v yvu =
∑v yuv
- path bounds:∑
v yvt ≤ σ- forbid cycles (row generation via callback):
∀ cycle C :∑
uv∈Cyuv < |C |
- objective: max∑e∈E
x{u,v} · ω(e)
- cycle consistency: ∀uyutc ≤ ysu
- jump mechanics
!!!
23/33
Integer Linear Program Formulation
s tc
tp- chromosomes = disjoint s-t-paths- bin. variables yuv = 1⇔ u → v used
x{u,v} = yuv + yvu
- force contigs: ∀uv∈Mxuv = 1- path preservation: ∀u 6=s,t
∑v yvu =
∑v yuv
- path bounds:∑
v yvt ≤ σ- forbid cycles (row generation via callback):
∀ cycle C :∑
uv∈Cyuv < |C |
- objective: max∑e∈E
x{u,v} · ω(e)
- cycle consistency: ∀uyutc ≤ ysu
- jump mechanics
!!!
23/33
Integer Linear Program Formulation
s tc
tp- chromosomes = disjoint s-{tp, tc}-paths- bin. variables yuv = 1⇔ u → v used
x{u,v} = yuv + yvu
- force contigs: ∀uv∈Mxuv = 1- path preservation: ∀u 6=s,tp ,tc
∑v yvu =
∑v yuv
- path & cycle bounds:∑
v yvt{p,c} ≤ σ{p,c}- forbid cycles (row generation via callback):
∀ cycle C :∑
uv∈C(yuv−yutc ) < |C |
- objective: max∑e∈E
x{u,v} · ω(e)
- cycle consistency: ∀uyutc ≤ ysu
- jump mechanics
!!!
23/33
Extension: Contig Jumps
Mean read length: 70bp
Mean insert size: 472bp
140
262
149
119
336
116
397
72
1
75
14
5
100
1
21
16
10
37
45
1
22
1
7
61
78
69
1
15
51
40
65
2
61
24/33
Extension: Contig Jumps
Mean read length: 70bp
Mean insert size: 472bp
140
262
149
119
336
116
397
72
1
75
14
5
100
1
21
16
10
37
45
1
22
1
7
61
78
69
1
15
51
40
65
2
61
24/33
Extension: Contig Jumps
Mean read length: 70bp
Mean insert size: 472bp
140
262
149
119
336
116
397
72
1
75
14
5
100
1
21
16
10
37
45
1
22
1
7
61
78
69
1
15
51
40
65
2
61
24/33
Extension: Contig Jumps
Mean read length: 70bp
Mean insert size: 472bp
140
262
149
119
336
116
397
72
1
75
14
5
100
1
21
16
10
37
45
1
22
1
7
61
78
69
1
15
51
40
65
2
61
24/33
Integer Linear Program Formulation
s tc
tp- chromosomes = disjoint s-{tp, tc}-paths- bin. variables yuv = 1⇔ u → v used
x{u,v} = yuv + yvu
- force contigs: ∀uv∈Mxuv = 1- path preservation: ∀u 6=s,tp ,tc
∑v yvu =
∑v yuv
- path & cycle bounds:∑
v yvt{p,c} ≤ σ{p,c}- forbid cycles (row generation via callback):
∀ cycle C :∑
uv∈C(yuv−yutc ) < |C |
- objective: max∑e∈E
x{u,v} · ω(e)
- cycle consistency: ∀uyutc ≤ ysu
- jump mechanics
!!!
25/33
Integer Linear Program Formulation
s tc
tp- chromosomes = disjoint s-{tp, tc}-paths- bin. variables yuv = 1⇔ u → v used
x{u,v} = yuv + yvu
- force contigs: ∀uv∈Mxuv = 1- path preservation: ∀u 6=s,tp ,tc
∑v yvu =
∑v yuv
- path & cycle bounds:∑
v yvt{p,c} ≤ σ{p,c}- forbid cycles (row generation via callback):
∀ cycle C :∑
uv∈C(yuv−yutc ) < |C |
- objective: max∑e∈E
x{u,v} · ω(e)
- cycle consistency: ∀uyutc ≤ ysu
- jump mechanics
!!!
25/33
Integer Linear Program Formulation
s tc
tp- chromosomes = disjoint s-{tp, tc}-paths- bin. variables yuv = 1⇔ u → v used
x{u,v} = yuv + yvu
- force contigs: ∀uv∈Mxuv = 1- path preservation: ∀u 6=s,tp ,tc
∑v yvu =
∑v yuv
- path & cycle bounds:∑
v yvt{p,c} ≤ σ{p,c}- forbid cycles (row generation via callback):
∀ cycle C :∑
uv∈C(yuv−yutc ) < |C |
- objective: max∑e∈E
x{u,v} · ω(e)
- cycle consistency: ∀uyutc ≤ ysu
- jump mechanics
!!!
25/33
Integer Linear Program Formulation
s tc
tp- chromosomes = disjoint s-{tp, tc}-paths- bin. variables yuv = 1⇔ u → v used
x{u,v} = yuv + yvu
- force contigs: ∀uv∈Mxuv = 1- path preservation: ∀u 6=s,tp ,tc
∑v yvu =
∑v yuv
- path & cycle bounds:∑
v yvt{p,c} ≤ σ{p,c}- forbid cycles (row generation via callback):
∀ cycle C :∑
uv∈C(yuv−yutc ) < |C |
- objective: max∑e∈E
x{u,v} · ω(e)
- cycle consistency: ∀uyutc ≤ ysu
- jump mechanics
!!!
25/33
Integer Linear Program Formulation
s tc
tp- chromosomes = disjoint s-{tp, tc}-paths- bin. variables yuv = 1⇔ u → v used
x{u,v} = yuv + yvu
- force contigs: ∀uv∈Mxuv = 1- path preservation: ∀u 6=s,tp ,tc
∑v yvu =
∑v yuv
- path & cycle bounds:∑
v yvt{p,c} ≤ σ{p,c}- forbid cycles (row generation via callback):
∀ cycle C :∑
uv∈C(yuv−yutc ) < |C |
- objective: max∑e∈E
x{u,v} · ω(e)
- cycle consistency: ∀uyutc ≤ ysu
- jump mechanics
!!!
25/33
Integer Linear Program Formulation
s tc
tp- chromosomes = disjoint s-{tp, tc}-paths- bin. variables yuv = 1⇔ u → v used
x{u,v} = yuv + yvu
- force contigs: ∀uv∈Mxuv = 1- path preservation: ∀u 6=s,tp ,tc
∑v yvu =
∑v yuv
- path & cycle bounds:∑
v yvt{p,c} ≤ σ{p,c}- forbid cycles (row generation via callback):
∀ cycle C :∑
uv∈C(yuv−yutc ) < |C |
- objective: max∑e∈E
x{u,v} · ω(e)
- cycle consistency: ∀uyutc ≤ ysu
- jump mechanics
!!!
Jump Mechanicsfor each non-contig uv ,
1. introduce a variable zuv
2. construct “jump network” between u and v that fits in the gap
3. add zuv to x{u,v}extra: preprocess instance to finish incomplete jumps
25/33
Integer Linear Program Formulation
s tc
tp- chromosomes = disjoint s-{tp, tc}-paths- bin. variables yuv = 1⇔ u → v used
x{u,v} = yuv + yvu+zuv + zvu
- force contigs: ∀uv∈Mxuv = 1- path preservation: ∀u 6=s,tp ,tc
∑v yvu =
∑v yuv
- path & cycle bounds:∑
v yvt{p,c} ≤ σ{p,c}- forbid cycles (row generation via callback):
∀ cycle C :∑
uv∈C(yuv−yutc ) < |C |
- objective: max∑e∈E
x{u,v} · ω(e)
- cycle consistency: ∀uyutc ≤ ysu
- jump mechanics
!!!
Jump Mechanicsfor each non-contig uv ,
1. introduce a variable zuv
2. construct “jump network” between u and v that fits in the gap
3. add zuv to x{u,v}extra: preprocess instance to finish incomplete jumps
25/33
Extension: Contig Jumps
Mean read length: 70bp
Mean insert size: 472bp
140
262
149
119
336
116
397
72
1
75
14
5
100
1
21
16
10
37
45
1
22
1
7
61
78
69
1
15
51
40
65
2
61
26/33
Extension: Contig Jumps
Mean read length: 70bp
Mean insert size: 472bp
140
262
149
119
336
116
397
72
1
75
14
5
100
1
21
16
10
37
45
1
22
1
7
61
78
69
1
15
51
40
65
2
61
26/33
Extension: Contig Jumps
Mean read length: 70bp
Mean insert size: 472bp
140
262
149
119
336
116
397
72
1
75
14
5
100
1
21
16
10
37
45
1
22
1
7
61
78
69
1
15
51
40
65
2
61
26/33
ILP Extension: Multiplicities
GGTGCGAGAGAGGTCATGGATTGCAACGA
GGTGCGAGAGGCCACTCCAATTGCAACGA
×2
×1
×1
×2
27/33
ILP Extension: Multiplicities
GGTGCGAGAGAGGTCATGGATTGCAACGA
GGTGCGAGAGGCCACTCCAATTGCAACGA
×2
×1
×1
×2
27/33
ILP Extension: Multiplicities
GGTGCGAGAGAGGTCATGGATTGCAACGA
GGTGCGAGAGGCCACTCCAATTGCAACGA
×2
×1
×1
×2
27/33
ILP Extension: Multiplicities
70
5
47
67
70
135
49
3931
5
49
56
80
80
34
6
11
2489
19
25
83
73
82
71
9
14
63
72
64
28
29
33
66
33
68
94
62
62
11
47
18
67
8
6
74
59
12
34
46
88
42
79
77
13
17
26
24
70
65
45
33
73
67
23
71
108
67
36
13
55
5
38
72
51
71
5
28
73
31
71
59
80
54
84
28
29
71
151
24
5
6
33
68
53
25
×2
28/33
ILP Extension: Multiplicities
70
5
47
67
70
135
49
3931
5
49
56
80
80
34
6
11
2489
19
25
83
73
82
71
9
14
63
72
64
28
29
33
66
33
68
94
62
62
11
47
18
67
8
6
74
59
12
34
46
88
42
79
77
13
17
26
24
70
65
45
33
73
67
23
71
108
67
36
13
55
5
38
72
51
71
5
28
73
31
71
59
80
54
84
28
29
71
151
24
5
6
33
68
53
25
×2
28/33
Integer Linear Program Formulation
s tc
tp- chromosomes = disjoint s-{tp, tc}-paths- bin. variables yuv = 1⇔ u → v used
x{u,v} = yuv + yvu+zuv + zvu
- force contigs: ∀uv∈Mxuv = 1- path preservation: ∀u 6=s,tp ,tc
∑v yvu =
∑v yuv
- path & cycle bounds:∑
v yvt{p,c} ≤ σ{p,c}- forbid cycles (row generation via callback):
∀ cycle C :∑
uv∈C(yuv−yutc ) < |C |
- objective: max∑e∈E
x{u,v} · ω(e)
- cycle consistency: ∀uyutc ≤ ysu
- jump mechanics
!!!
Multiplicities1. make yuv , x{u,v} integers in domain [0,m({u, v})]2. change callback
29/33
Integer Linear Program Formulation
s tc
tp- chromosomes = disjoint s-{tp, tc}-paths- int. variables yuv = `⇔ u → v used ` times
x{u,v} = yuv + yvu+zuv + zvu
- force contigs: ∀uv∈Mxuv≥1- path preservation: ∀u 6=s,tp ,tc
∑v yvu =
∑v yuv
- path & cycle bounds:∑
v yvt{p,c} ≤ σ{p,c}- forbid cycles (row generation via callback):
∀ cycle C :∑
uv∈Cyuv ≤ |C | ·mmax ·
∑u∈C ,v /∈C
yuv
- objective: max∑e∈E
x{u,v} · ω(e)
- cycle consistency: ∀uyutc ≤ ysu
- jump mechanics !!!Multiplicities1. make yuv , x{u,v} integers in domain [0,m({u, v})]2. change callback
29/33
Linearization of Solutions
Problem
no unique chromosome-configuration explaining solution
Proof
30/33
Linearization of Solutions
Problem
no unique chromosome-configuration explaining solution
Proof
30/33
Linearization of Solutions
Problem
no unique chromosome-configuration explaining solution
Proof
30/33
Linearization of Solutions
Theorem
(G ,M,m) uniquely linearizable ⇔ no “ambigous paths”
(=alt. path of uniform multiplicity µ & each end incident to non-contig < µ)
Proof
30/33
Linearization of Solutions
Theorem
(G ,M,m) uniquely linearizable ⇔ no “ambigous paths”
(=alt. path of uniform multiplicity µ & each end incident to non-contig < µ)
3 2 2 1 3 3 5 5 71
2 1 2 1
Proof
30/33
Linearization of Solutions
Theorem
(G ,M,m) uniquely linearizable ⇔ no “ambigous paths”
(=alt. path of uniform multiplicity µ & each end incident to non-contig < µ)
Proof
“⇒”: contraposition; let p = ambigous path
2 2 21
11
(G ,M,m) not uniquely linearizable
30/33
Linearization of Solutions
Theorem
(G ,M,m) uniquely linearizable ⇔ no “ambigous paths”
(=alt. path of uniform multiplicity µ & each end incident to non-contig < µ)
Proof
“⇒”: contraposition; let p = ambigous path
2 2 21
11
(G ,M,m) not uniquely linearizable
30/33
Linearization of Solutions
Theorem
(G ,M,m) uniquely linearizable ⇔ no “ambigous paths”
(=alt. path of uniform multiplicity µ & each end incident to non-contig < µ)
Proof
“⇒”: contraposition; let p = ambigous path
2 2 21
11
(G ,M,m) not uniquely linearizable
30/33
Linearization of Solutions
Theorem
(G ,M,m) uniquely linearizable ⇔ no “ambigous paths”
(=alt. path of uniform multiplicity µ & each end incident to non-contig < µ)
Proof
“⇒”: contraposition; let p = ambigous path
2 2 21
11
(G ,M,m) not uniquely linearizable
30/33
Linearization of Solutions
Theorem
(G ,M,m) uniquely linearizable ⇔ no “ambigous paths”
(=alt. path of uniform multiplicity µ & each end incident to non-contig < µ)
Proof
“⇐”: let (G ,M,m) be free of ambigous paths
Reduction (does not decrease number of linearizations):
5
2
22
result is collection of alternating paths & cycles
30/33
Linearization of Solutions
Theorem
(G ,M,m) uniquely linearizable ⇔ no “ambigous paths”
(=alt. path of uniform multiplicity µ & each end incident to non-contig < µ)
Proof
“⇐”: let (G ,M,m) be free of ambigous paths
Reduction (does not decrease number of linearizations):
3 3 3
5
2
22
result is collection of alternating paths & cycles
30/33
Linearization of Solutions
Theorem
(G ,M,m) uniquely linearizable ⇔ no “ambigous paths”
(=alt. path of uniform multiplicity µ & each end incident to non-contig < µ)
Proof
“⇐”: let (G ,M,m) be free of ambigous paths
Reduction (does not decrease number of linearizations):
3
5
2
22
result is collection of alternating paths & cycles
30/33
Linearization of Solutions
Theorem
(G ,M,m) uniquely linearizable ⇔ no “ambigous paths”
(=alt. path of uniform multiplicity µ & each end incident to non-contig < µ)
Proof
“⇐”: let (G ,M,m) be free of ambigous paths
Reduction (does not decrease number of linearizations):
3
5
2
22
result is collection of alternating paths & cycles
30/33
Linearization of Solutions
Theorem
(G ,M,m) uniquely linearizable ⇔ no “ambigous paths”
(=alt. path of uniform multiplicity µ & each end incident to non-contig < µ)
Proof
“⇐”: let (G ,M,m) be free of ambigous paths
Reduction (does not decrease number of linearizations):
3
3
22
result is collection of alternating paths & cycles
30/33
Linearization of Solutions
Theorem
(G ,M,m) uniquely linearizable ⇔ no “ambigous paths”
(=alt. path of uniform multiplicity µ & each end incident to non-contig < µ)
Proof
“⇐”: let (G ,M,m) be free of ambigous paths
Reduction (does not decrease number of linearizations):
3
3
22
result is collection of alternating paths & cycles
30/33
Linearization of Solutions
Theorem
(G ,M,m) uniquely linearizable ⇔ no “ambigous paths”
(=alt. path of uniform multiplicity µ & each end incident to non-contig < µ)
Proposals
1. decide arbitrarily
missassembly
2. isolate each ambiguity
information loss
3. cut as few ends as possible
computationally hard
4. cut as few multiplicities as possible
computationally hard
30/33
Linearization of Solutions
Theorem
(G ,M,m) uniquely linearizable ⇔ no “ambigous paths”
(=alt. path of uniform multiplicity µ & each end incident to non-contig < µ)
Proposals
1. decide arbitrarily
missassembly
2. isolate each ambiguity
information loss
3. cut as few ends as possible
computationally hard
4. cut as few multiplicities as possible
computationally hard
30/33
Linearization of Solutions
Theorem
(G ,M,m) uniquely linearizable ⇔ no “ambigous paths”
(=alt. path of uniform multiplicity µ & each end incident to non-contig < µ)
Proposals
1. decide arbitrarily missassembly
2. isolate each ambiguity
information loss
3. cut as few ends as possible
computationally hard
4. cut as few multiplicities as possible
computationally hard
30/33
Linearization of Solutions
Theorem
(G ,M,m) uniquely linearizable ⇔ no “ambigous paths”
(=alt. path of uniform multiplicity µ & each end incident to non-contig < µ)
Proposals
1. decide arbitrarily missassembly
2. isolate each ambiguity
information loss
3. cut as few ends as possible
computationally hard
4. cut as few multiplicities as possible
computationally hard
30/33
Linearization of Solutions
Theorem
(G ,M,m) uniquely linearizable ⇔ no “ambigous paths”
(=alt. path of uniform multiplicity µ & each end incident to non-contig < µ)
Proposals
1. decide arbitrarily missassembly
2. isolate each ambiguity information loss
3. cut as few ends as possible
computationally hard
4. cut as few multiplicities as possible
computationally hard
30/33
Linearization of Solutions
Theorem
(G ,M,m) uniquely linearizable ⇔ no “ambigous paths”
(=alt. path of uniform multiplicity µ & each end incident to non-contig < µ)
Proposals
1. decide arbitrarily missassembly
2. isolate each ambiguity information loss
3. cut as few ends as possible
computationally hard
4. cut as few multiplicities as possible
computationally hard
30/33
Linearization of Solutions
Theorem
(G ,M,m) uniquely linearizable ⇔ no “ambigous paths”
(=alt. path of uniform multiplicity µ & each end incident to non-contig < µ)
Proposals
1. decide arbitrarily missassembly
2. isolate each ambiguity information loss
3. cut as few ends as possible computationally hard
4. cut as few multiplicities as possible
computationally hard
30/33
Linearization of Solutions
Theorem
(G ,M,m) uniquely linearizable ⇔ no “ambigous paths”
(=alt. path of uniform multiplicity µ & each end incident to non-contig < µ)
Proposals
1. decide arbitrarily missassembly
2. isolate each ambiguity information loss
3. cut as few ends as possible computationally hard
4. cut as few multiplicities as possible
computationally hard
30/33
Linearization of Solutions
Theorem
(G ,M,m) uniquely linearizable ⇔ no “ambigous paths”
(=alt. path of uniform multiplicity µ & each end incident to non-contig < µ)
Proposals
1. decide arbitrarily missassembly
2. isolate each ambiguity information loss
3. cut as few ends as possible computationally hard
4. cut as few multiplicities as possible computationally hard
30/33
Linearization of Solutions
Theorem
(G ,M,m) uniquely linearizable ⇔ no “ambigous paths”
(=alt. path of uniform multiplicity µ & each end incident to non-contig < µ)
Proposals
1. decide arbitrarily missassembly
2. isolate each ambiguity information loss
3. cut as few ends as possible computationally hard
4. cut as few multiplicities as possible computationally hard
Multiplicitiesone &
#non-matching adj. to contig
30/33
Linearization of Solutions
Theorem
(G ,M,m) uniquely linearizable ⇔ no “ambigous paths”
(=alt. path of uniform multiplicity µ & each end incident to non-contig < µ)
Proposals
1. decide arbitrarily missassembly
2. isolate each ambiguity information loss
3. cut as few ends as possible computationally hard
4. cut as few multiplicities as possible computationally hard
Multiplicitiesone &
#non-matching adj. to contig
30/33
Linearization of Solutions
Theorem
(G ,M,m) uniquely linearizable ⇔ no “ambigous paths”
(=alt. path of uniform multiplicity µ & each end incident to non-contig < µ)
Proposals
1. decide arbitrarily missassembly
2. isolate each ambiguity information loss
3. cut as few ends as possible computationally hard
4. cut as few multiplicities as possible computationally hard
Multiplicitiesone &
#non-matching adj. to contig
30/33
Linearization of Solutions
Theorem
(G ,M,m) uniquely linearizable ⇔ no “ambigous paths”
(=alt. path of uniform multiplicity µ & each end incident to non-contig < µ)
Proposals
1. decide arbitrarily missassembly
2. isolate each ambiguity information loss
3. cut as few ends as possible computationally hard
4. cut as few multiplicities as possible computationally hard
Multiplicitiesone &
#non-matching adj. to contig
30/33
Linearization of Solutions
Theorem
(G ,M,m) uniquely linearizable ⇔ no “ambigous paths”
(=alt. path of uniform multiplicity µ & each end incident to non-contig < µ)
Proposals
1. decide arbitrarily missassembly
2. isolate each ambiguity information loss
3. cut as few ends as possible computationally hard
4. cut as few multiplicities as possible computationally hard
30/33
Linearization of Solutions
Theorem
(G ,M,m) uniquely linearizable ⇔ no “ambigous paths”
(=alt. path of uniform multiplicity µ & each end incident to non-contig < µ)
Proposals
1. decide arbitrarily missassembly
2. isolate each ambiguity information loss
3. cut as few ends as possible computationally hard
4. cut as few multiplicities as possible computationally hard
30/33
Linearization of Solutions
Theorem
(G ,M,m) uniquely linearizable ⇔ no “ambigous paths”
(=alt. path of uniform multiplicity µ & each end incident to non-contig < µ)
Proposals
1. decide arbitrarily missassembly
2. isolate each ambiguity information loss
3. cut as few ends as possible computationally hard
4. cut as few multiplicities as possible computationally hard
30/33
Linearization of Solutions
Theorem
(G ,M,m) uniquely linearizable ⇔ no “ambigous paths”
(=alt. path of uniform multiplicity µ & each end incident to non-contig < µ)
Proposals
1. decide arbitrarily missassembly
2. isolate each ambiguity information loss
3. cut as few ends as possible computationally hard
4. cut as few multiplicities as possible computationally hard
30/33
Linearization of Solutions
Theorem
(G ,M,m) uniquely linearizable ⇔ no “ambigous paths”
(=alt. path of uniform multiplicity µ & each end incident to non-contig < µ)
Proposals
1. decide arbitrarily missassembly
2. isolate each ambiguity information loss
3. cut as few ends as possible computationally hard
4. cut as few multiplicities as possible computationally hard
30/33
Conclusion
What we saw
- 3-step sequencing technique:
1. produce paired-end reads
2. assemble reads to contigs
3. scaffold contigs to chromosomes using read-pairings
- computationally hard problem for dense graphs with weights 0/1
- no constant-factor approx or subexponential-time algorithm
for linear quasi trees with weights 0/1
- O(n2) time on unweighted cliques/co-bipartite/split
- O(n · σp · σc) time for constant treewidth
- 2-approximable in cliques/complete bipartite in O(n3) time
- O(√
2n
poly(n)) time exact algorithm
- ILP formulation with contig jumps & multiplicities
- Linearization problem raised by multiplicities in solution
31 /33
Conclusion
What we saw
- 3-step sequencing technique:
1. produce paired-end reads
2. assemble reads to contigs
3. scaffold contigs to chromosomes using read-pairings
- computationally hard problem for dense graphs with weights 0/1
- no constant-factor approx or subexponential-time algorithm
for linear quasi trees with weights 0/1
- O(n2) time on unweighted cliques/co-bipartite/split
- O(n · σp · σc) time for constant treewidth
- 2-approximable in cliques/complete bipartite in O(n3) time
- O(√
2n
poly(n)) time exact algorithm
- ILP formulation with contig jumps & multiplicities
- Linearization problem raised by multiplicities in solution
31 /33
Conclusion
What we saw
- 3-step sequencing technique:
1. produce paired-end reads
2. assemble reads to contigs
3. scaffold contigs to chromosomes using read-pairings
- computationally hard problem for dense graphs with weights 0/1
- no constant-factor approx or subexponential-time algorithm
for linear quasi trees with weights 0/1
- O(n2) time on unweighted cliques/co-bipartite/split
- O(n · σp · σc) time for constant treewidth
- 2-approximable in cliques/complete bipartite in O(n3) time
- O(√
2n
poly(n)) time exact algorithm
- ILP formulation with contig jumps & multiplicities
- Linearization problem raised by multiplicities in solution
31 /33
Conclusion
What we saw
- 3-step sequencing technique:
1. produce paired-end reads
2. assemble reads to contigs
3. scaffold contigs to chromosomes using read-pairings
- computationally hard problem for dense graphs with weights 0/1
- no constant-factor approx or subexponential-time algorithm
for linear quasi trees with weights 0/1
- O(n2) time on unweighted cliques/co-bipartite/split
- O(n · σp · σc) time for constant treewidth
- 2-approximable in cliques/complete bipartite in O(n3) time
- O(√
2n
poly(n)) time exact algorithm
- ILP formulation with contig jumps & multiplicities
- Linearization problem raised by multiplicities in solution
31 /33
Conclusion
What we saw
- 3-step sequencing technique:
1. produce paired-end reads
2. assemble reads to contigs
3. scaffold contigs to chromosomes using read-pairings
- computationally hard problem for dense graphs with weights 0/1
- no constant-factor approx or subexponential-time algorithm
for linear quasi trees with weights 0/1
- O(n2) time on unweighted cliques/co-bipartite/split
- O(n · σp · σc) time for constant treewidth
- 2-approximable in cliques/complete bipartite in O(n3) time
- O(√
2n
poly(n)) time exact algorithm
- ILP formulation with contig jumps & multiplicities
- Linearization problem raised by multiplicities in solution
31 /33
Conclusion
What we saw
- 3-step sequencing technique:
1. produce paired-end reads
2. assemble reads to contigs
3. scaffold contigs to chromosomes using read-pairings
- computationally hard problem for dense graphs with weights 0/1
- no constant-factor approx or subexponential-time algorithm
for linear quasi trees with weights 0/1
- O(n2) time on unweighted cliques/co-bipartite/split
- O(n · σp · σc) time for constant treewidth
- 2-approximable in cliques/complete bipartite in O(n3) time
- O(√
2n
poly(n)) time exact algorithm
- ILP formulation with contig jumps & multiplicities
- Linearization problem raised by multiplicities in solution
31 /33
Conclusion
What we saw
- 3-step sequencing technique:
1. produce paired-end reads
2. assemble reads to contigs
3. scaffold contigs to chromosomes using read-pairings
- computationally hard problem for dense graphs with weights 0/1
- no constant-factor approx or subexponential-time algorithm
for linear quasi trees with weights 0/1
- O(n2) time on unweighted cliques/co-bipartite/split
- O(n · σp · σc) time for constant treewidth
- 2-approximable in cliques/complete bipartite in O(n3) time
- O(√
2n
poly(n)) time exact algorithm
- ILP formulation with contig jumps & multiplicities
- Linearization problem raised by multiplicities in solution
31 /33
Conclusion
What we saw
- 3-step sequencing technique:
1. produce paired-end reads
2. assemble reads to contigs
3. scaffold contigs to chromosomes using read-pairings
- computationally hard problem for dense graphs with weights 0/1
- no constant-factor approx or subexponential-time algorithm
for linear quasi trees with weights 0/1
- O(n2) time on unweighted cliques/co-bipartite/split
- O(n · σp · σc) time for constant treewidth
- 2-approximable in cliques/complete bipartite in O(n3) time
- O(√
2n
poly(n)) time exact algorithm
- ILP formulation with contig jumps & multiplicities
- Linearization problem raised by multiplicities in solution
31 /33
Conclusion
Outlook
- 3rd generation sequencing: PacBio, Oxford Nanoporeproduces long reads (10-15kbp), but error-prone
correction using small reads?
- generally: multi-library scaffolding
- other sources for contig-connections (phylogenetic
information?)
- better parameters for Scaffolding and Scaffold Linearization
analyze practical instances
- approximation/heuristics for Scaffold Linearization
32/33
Conclusion
Outlook
- 3rd generation sequencing: PacBio, Oxford Nanoporeproduces long reads (10-15kbp), but error-prone
correction using small reads?
- generally: multi-library scaffolding
- other sources for contig-connections (phylogenetic
information?)
- better parameters for Scaffolding and Scaffold Linearization
analyze practical instances
- approximation/heuristics for Scaffold Linearization
32/33
Conclusion
Outlook
- 3rd generation sequencing: PacBio, Oxford Nanoporeproduces long reads (10-15kbp), but error-prone
correction using small reads?
- generally: multi-library scaffolding
- other sources for contig-connections (phylogenetic
information?)
- better parameters for Scaffolding and Scaffold Linearization
analyze practical instances
- approximation/heuristics for Scaffold Linearization
32/33
Conclusion
Outlook
- 3rd generation sequencing: PacBio, Oxford Nanoporeproduces long reads (10-15kbp), but error-prone
correction using small reads?
- generally: multi-library scaffolding
- other sources for contig-connections (phylogenetic
information?)
- better parameters for Scaffolding and Scaffold Linearization
analyze practical instances
- approximation/heuristics for Scaffold Linearization
32/33
Conclusion
Outlook
- 3rd generation sequencing: PacBio, Oxford Nanoporeproduces long reads (10-15kbp), but error-prone
correction using small reads?
- generally: multi-library scaffolding
- other sources for contig-connections (phylogenetic
information?)
- better parameters for Scaffolding and Scaffold Linearization
analyze practical instances
- approximation/heuristics for Scaffold Linearization
32/33
Fin!
33/33