combinatorial optimization for text layout

36
1 Combinatorial Optimization for Text Layout Richard Anderson University of Washington Microsoft Research, Beijing, September 6, 2000 http://www.cs.washington.edu/homes/anderson/msrcn.ppt

Upload: redford

Post on 06-Jan-2016

81 views

Category:

Documents


0 download

DESCRIPTION

Combinatorial Optimization for Text Layout. Richard Anderson University of Washington. Microsoft Research, Beijing, September 6, 2000 http://www.cs.washington.edu/homes/anderson/msrcn.ppt. Biography. Background Education PhD Stanford (1985), Post Doc MSRI, Berkeley Experience - PowerPoint PPT Presentation

TRANSCRIPT

1

Combinatorial Optimization for Text Layout

Richard Anderson

University of Washington

Microsoft Research, Beijing, September 6, 2000

http://www.cs.washington.edu/homes/anderson/msrcn.ppt

2

Biography Background

Education PhD Stanford (1985), Post Doc MSRI, Berkeley

Experience University of Washington, since 1986. Associate Chair for

outreach. Visiting prof. IISc, Bangalore, 1993-1994

Professional Interests Algorithms

Parallel algorithms, N-Body Simulation, Model Checking for Software, Text Layout

Distance Learning Tutored Video Instruction, Professional Master’s Program

3

Optimization for Text Layout Express text placement as a geometric

optimization problem.

Why??? Generate best layouts Body of algorithmic research to build on, as well

as high performance hardware Problem specification and formalization Flexibility via parameterization

4

TeX [Knuth] Typography as optimization

Optimal paragraphing via dynamic programming algorithm

Flexibility Tradeoff between uneven lines and

hyphenation frequency Penalty: weighted sum of whitespace and

hyphenation penalties

5

Outline

Survey of problems studied 1) Generating all paragraphs of text 2) Picture layout with anchors to text 3) Optimal table layout 4) Customized content compression

6

Paragraphing problem Given geometric constraints, find line breaks

Fixed width, find minimum height Greedy Algorithm

Fixed height, find minimum width Only need to consider n2 widths: O(n3) algorithm. Most practical approach – binary search on width.

O(nlog W) algorithm Theoretical O(n) algorithm

7

All minimal paragraph sizes Find minimum width paragraph for a given height. Solve for each height: best known: O(n3/2)

Malfoy couldn’t believe his eyes when he saw that Harry and Ron were still at Hogwarts the next day, looking tired but perfectly cheerful.

Malfoy couldn’t believe his eyes when he saw that Harry and Ron were still at Hogwarts the next day, looking tired but perfectly cheerful.

Malfoy couldn’t believe his eyes when he saw that Harry and Ron were still at Hogwarts the next day, looking tired but perfectly cheerful.

8

All minimal paragraph sizes

Motivation Placement of floating text Formatting tables with text entries

Basic approach Break into segments of roughly n1/2 words each Compute possibilities for these, and then combine

Much work still to do on this problem

9

Placement of text and pictures

Given text with embedded pictures and tables

Place pictures close to their references (anchors)

This is a major headache when using LaTeX! Futher complications

Multi-column layouts Partial column width pictures Typographic considerations for text and headings Other graphical layout considerations

10

Placement of text and pictures Given text and pictures, where each picture

has a location in the text, find a layout which minimizes the sum of the text-anchor distances

Single page and multi page problems Horizontal placement of pictures fixed wrt

column boundaries May require that picture order is consistent

with text order

11

12

Results 2-d bin packing problem – do the pictures fit

on the page. May not be the problem of interest – simper

cases – pictures fit in columns, align with text rows, fixed horizontal position in columns.

Easy for one column. NP-complete for three or more columns. NP-complete even if picture area is very

small.

13

Fixed horizontal bin packing Two-d bin packing, except that rectangles have fixed

horizontal positions Motivated by picture placement Best known result: 3-approximation algorithm Problem arises in memory allocation

14

Practical results The number of pictures and columns is small.

(columns <= 5, pictures <= 10). Enumeration works well for pictures <= 3. Branch and bound works well for pictures

<=6. Heuristics + B&B work well for given range. Prototypes developed, including typography

and aesthetic considerations. Very interesting layouts generated

15

Tables General Problem

Given a set of configurations for each cell, find the maximum value table that satisfies size constraints

Special Cases Layout Problem

No values, minimize table height for fixed width Compression Problem

Configurations for a cell satisfy nesting property Value decreases with size

16

Layout Problem (with S. Sobti)

NP complete Restricted instances: {(1,2), (2,1)}, {(1,1)}

Divination. Sybill Trelawney

Defense against dark arts. R. J. Lupin

Potions. Severus Snape

Care of magical creatures. Rubeus Hagrid

Divination. Sybill Trelawney

Defense against dark arts. R. J. Lupin

Potions. Severus Snape

Care of magical creatures. Rubeus Hagrid

17

Layout Problem: results

Fixed W, minimize H, NP complete

Minimize W+H solvable with mincut algorithm

Compute convex hull of feasible table configurations

Heuristic algorithm

18

Table compression problem Display a table in less than the required

area, with a penalty for shrinking cellsDivination. Sybill Trelawney

Defense against dark arts. R. J. Lupin

Potions. Severus Snape

Care of magical creatures. Rubeus Hagrid

Divin. Sybill T.

Defense against dark arts. Lupin

Potions. Severus Snape

Care of magical creatures. Hagrid

Divin. Sybill T.

Def. dark arts. Lupin

Potions. Severus Snape

Care of magical critters. Hagrid

Divin. Sybill T.

Def. dark arts. Lupin

Potions. S. Snape

Care of creatures. Hagrid

Divin. Sybill T.

Dark arts. Lupin

Potions. S. Snape

Critr care. Hagrid

Div D. arts. Lupin

Pot

Critters.Hagrid

19

Compression Problem NP complete for simple case

Choice cells: 1 x 1 (value 1), 0 x 0 (value 0) Dummy cells: 0 x 0 (value 0) Maximize number of full size choice cells in

when table n x n table compressed to n/2 x n/2.

Reduction from clique problem Incidence matrix reduction

20

Attacking the 0-1 problem

1

2

1

3 3

2

4 4

Choose n/2 vertices from each side to maximize the number of edges between chosen vertices

Equivalent problem: maximum density (n/2,n/2)-subgraph of a (n,n)-bipartite graph

21

Greedy Algorithm Find MDS of G=(X,Y,E)

Choose X’, the set of n/2 vertices of highest degree w.r.t. Y

Choose Y’, the set of n/2 vertices of highest degree w.r.t. X’

Claim: (X’,Y’) is a 1/2 approximation of the MDS

Proof: (X’,Y) has at least as many edges as the MDS.

(X’,Y’) has at least half as many edges as (X’,Y)

22

Greedy Algorithms

Non-bipartite graphs Add vertices of maximum degree starting

with empty graph Remove vertices of minimum degree,

starting with full graph 4/9 approximation algorithm (Asahiro et al.)

Open problem: generalize and analyze greedy algorithms for tables

23

Semidefinite programming Maxcut problem: divide vertices of a graph into two sets to

maximize number of edges between the sets. Goemans-Williamson SDP result:

Improved approximation bound from 0.5 to 0.878 Introduced new technique to the field Idea - solve the problem on an n-dimensional sphere, use a random

projection to divide vertices.

MDS problem can also be attacked with SDP. Technical problems with bipartiteness and equal division lead to a weak result.

24

Research directions

Can semidefinite programming beat the greedy algorithm on the 0-1 problem?

Develop greedy algorithms for the general case. Linear programming: fractional solution to table

problems has a natural interpretation. Results on rounding? Combinatorial algorithms for the fractional problem.

Develop/analyze fast heuristic algorithms

25

Content Choice If information does not fit, allow substitutionsThe Dark Forces: A Guide to Self-Protection, Quenton Trimble, Hogwarts Academic Press, Hogsmeade, 1999, 2nd Edition, 238 pages, Albus Dumbledore editor.

The Dark Forces: A Guide to Self-Protection, Quenton Trimble, Hogwarts, Hogsmeade, 1999, 2nd Ed., 238 pp.

The Dark Forces: A Guide to Self-Protection, Quenton Trimble, Hogwarts Ac. Press, Hogsmeade, 1999, 2nd Edition, 238 pages

The Dark Forces: A Guide to Self-Protection, Quenton Trimble, Hogwarts Ac. Press, Hogsmeade, 1999, 2nd Ed., 238 pp, Albus Dumbledore ed.

26

The Dark Forces: A Guide to Self-Protection, Q. Trimble, HAP, Hogs., `99, 2nd, 238 pp.

The Dark Forces, Q. Trimble, HAP, Hogs., 1999, 2nd, 238 pp.

The Dark Forces: Self-Protection, Q. Trimble, HAP, 1999, 2nd, 238 pp.

The Dark Forces Q. Trimble, HAP, `99, 2nd, 238 pp.

Dark Forces, Q. Trimble, HAP, `99, 2nd.

Dark Forces, Q. Trimble, HAP, 1999.

Dk. Forces, Q. Trimble, HAP, 1999.

Dark Forces, Trimble.

27

Source representation

<text> <choice> <fragment val=90> The Dark Forces: A Guide to Self-Protection </fragment> <fragment val=50> The Dark Forces: Self-Protection </fragment> <fragment val=30> The Dark Forces</fragment> <fragment val=20> Dark Forces</fragment> <fragment val=10> Dk. Forces</fragment> </choice> <choice> <fragment val=30> Hogwarts Academic Press </fragment> <fragment val=20> Hogwarts Ac. Press </fragment> <fragment val=15> Hogwarts </fragment> <fragment val=10> HAP </fragment> <fragment val=0> </fragment> </choice> . . . </text>

28

Typography with content choice

Problem 1: Given a fixed area for the text, find the

optimal choice of content Problem 2:

Find the set of all maximal configurations Problem 3:

Find a good approximation to the set of all maximal configurations

29

Content Choice

Algorithmic choice: rectangles with values. Place one rectangle from each set to maximize value.

4040

25 20 15

30

Warm up problem: Lists Optimally display the

list for a fixed height Set of configurations

for each list item. (height, value)

Solvable with knapsack dynamic programming algorithm

31

List compression

Harry Potter and the Prisoner of Azkaban ~ J. K. Rowling / Hardcover / Published 1999 Our Price: $9.98 Harry Potter and the Sorcerer's Stone J. K. Rowling / Hardcover / Published 1998 Our Price: $8.98 Harry Potter and the Chamber of Secrets J. K. Rowling / Hardcover / Published 1999 Our Price: $8.98

Harry Potter and the Prisoner of Azkaban ~ Usually ships in 24 hours J. K. Rowling / Hardcover / Published 1999 Our Price: $9.98 ~ You Save: $9.97 (50%) Harry Potter and the Sorcerer's Stone ~ Usually ships in 24 hours J. K. Rowling / Hardcover / Published 1998 Our Price: $8.98 ~ You Save: $8.97 (50%) Harry Potter and the Chamber of Secrets J. K. Rowling / Hardcover / Published 1999 Our Price: $8.98 ~ You Save: $8.97 (50%)

Harry Potter and the Prisoner of Azkaban ~ J. K. Rowling / HC / Publ 1999 Our Price: $9.98 Harry Potter and the Sorcerer's Stone J. K. Rowling / HC / 1998 $8.98 Harry Potter and the Chamber of Secrets J. K. Rowling / HC / 1999 $8.98

Harry Potter and the Prisoner of Azkaban J. K. Rowling $9.98 Harry Potter and the Sorcerer's Stone Rowling HP : Chamber of Secrets

32

Implementation goal

Real time resizing of lists Maintain optimal display as window size

changes. Recompute at refresh rate Knapsack/dynamic programming

algorithm http://www.cs.washington.edu/homes/anderson/demo2/Page1.htm

33

Customization

Choice-content generation Generate choices for fields

Automatic abbreviations Dictionary lookup

Assign weights Based on compression and component Based on user profile

34

Browsing applications Browsing book lists

User sets degree of compression Issues query Source gives default weights

Value of field Strength of match Value of item

Weights modified based on user profile Optimal list display done for given compression

factor

35

Display of 2-d time tables

Show most likely routes and times at highest precision

Based on user profile and travel data

Memory of user interactions (expanding items)

36

Summary Graphical layout as geometric optimization Theoretical background

Basic algorithms for rectangle placement Algorithm implementation

Performance requirements are significant Application

Do these techniques work for universal, customized display?