algorithms on secondary storage...

33
Worcester Polytechnic Institute Algorithms on Secondary Storage “Sorting” Chapter 13. Professor E. Rundensteiner CS542

Upload: others

Post on 25-Jan-2021

10 views

Category:

Documents


0 download

TRANSCRIPT

  • Worcester Polytechnic Institute

    Algorithms on Secondary Storage“Sorting”

    Chapter 13.

    Professor E. Rundensteiner

    CS542

  • Worcester Polytechnic Institute

    Lesson: Using secondary storage effectively

    ─ Data too large to “live” in memory

    ─ “Regular” algorithms on small scale only

    ─ Must design algorithms for “large data”

    ─ I/O costs dominate – must reduce I/O

  • Worcester Polytechnic Institute

    Why Sort?

    • A classic problem in computer science!

    • Data requested in sorted order via SQL:

    ─e.g., find students in increasing gpa order

    • Sorting is first step in bulk loading B+ tree index.

    • Sorting useful for eliminating duplicate copies in a collection of records

    • Sort-merge join algorithm involves sorting.

    • Problem: sort 1Gb of data with 1Mb of RAM.

    ─why not virtual memory?

  • Worcester Polytechnic Institute

    2-Way Sort: Requires 3 Buffers

    • PREPARE:

    1 pass:

    Read a page,

    sort it, write it.

    • MERGE:

    Many passes:

    Merge smaller

    runs into

    larger runs. Main memory buffer

    INPUT 1

    INPUT 2

    OUTPUT

    DiskDisk

    Main memory buffer

    INPUT /

    OUTPUT

    DiskDisk

  • Worcester Polytechnic Institute

    Two-Way External Merge Sort

  • Worcester Polytechnic Institute

    • Idea: Divide and conquer: sort subfiles and merge into larger sorts

    Input file

    1-page runs

    2-page runs

    4-page runs

    8-page runs

    PASS 0

    PASS 1

    PASS 2

    PASS 3

    9

    3,4 6,2 9,4 8,7 5,6 3,1 2

    3,4 5,62,6 4,9 7,8 1,3 2

    2,3

    4,6

    4,7

    8,9

    1,3

    5,6 2

    2,3

    4,4

    6,7

    8,9

    1,2

    3,5

    6

    1,2

    2,3

    3,4

    4,5

    6,6

    7,8

  • Worcester Polytechnic Institute

    Input file

    1-page runs

    2-page runs

    4-page runs

    8-page runs

    PASS 0

    PASS 1

    PASS 2

    PASS 3

    9

    3,4 6,2 9,4 8,7 5,6 3,1 2

    3,4 5,62,6 4,9 7,8 1,3 2

    2,3

    4,6

    4,7

    8,9

    1,3

    5,6 2

    2,3

    4,4

    6,7

    8,9

    1,2

    3,5

    6

    1,2

    2,3

    3,4

    4,5

    6,6

    7,8

    Cost ?

  • Worcester Polytechnic Institute

    Input file

    1-page runs

    2-page runs

    4-page runs

    8-page runs

    PASS 0

    PASS 1

    PASS 2

    PASS 3

    9

    3,4 6,2 9,4 8,7 5,6 3,1 2

    3,4 5,62,6 4,9 7,8 1,3 2

    2,3

    4,6

    4,7

    8,9

    1,3

    5,6 2

    2,3

    4,4

    6,7

    8,9

    1,2

    3,5

    6

    1,2

    2,3

    3,4

    4,5

    6,6

    7,8

    Costs for pass :all pages

    # of passes :height of tree

    Total cost : product of above

  • Worcester Polytechnic Institute

    Input file

    1-page runs

    2-page runs

    4-page runs

    8-page runs

    PASS 0

    PASS 1

    PASS 2

    PASS 3

    9

    3,4 6,2 9,4 8,7 5,6 3,1 2

    3,4 5,62,6 4,9 7,8 1,3 2

    2,3

    4,6

    4,7

    8,9

    1,3

    5,6 2

    2,3

    4,4

    6,7

    8,9

    1,2

    3,5

    6

    1,2

    2,3

    3,4

    4,5

    6,6

    7,8

    • Each pass reads + writes each page in file.

    • N pages in file => 2N

    • Number of passes

    • So total cost is:

    log2 1N

    2 12N Nlog

  • Worcester Polytechnic Institute

    External Merge Sort

    •What if we had more buffer pages?

    •How do we utilize them wisely ?

    - Two main ideas !

  • Worcester Polytechnic Institute

    Phase 1 : Prepare

    B Main memory buffers

    INPUT 1

    INPUT B

    DiskDisk

    INPUT 2

    . . .. . .

    • Construct as large as possible starter lists.

  • Worcester Polytechnic Institute

    Phase 2 : Merge

    Compose as many sorted sublistsinto one long sorted list.

    B Main memory buffers

    INPUT 1

    INPUT B-1

    OUTPUT

    DiskDisk

    INPUT 2

    . . .. . .

  • Worcester Polytechnic Institute

    Cost of External Merge Sort

    •Cost = 2N * (# of passes)

    •# of passes: 1 1 log /B N B

  • Worcester Polytechnic Institute

    Example

    • Buffer : with 5 buffer pages

    • File to sort : 108 pages

    ─Pass 0:

    ▪ Size of each run?

    ▪ Number of runs?

    ─Pass 1:

    ▪ Size of each run?

    ▪ Number of runs?

    ─ etc ???

  • Worcester Polytechnic Institute

    Example

    • Buffer : with 5 buffer pages

    • File to sort : 108 pages

    ─Pass 0: = 22 sorted runs of 5 pages each (last run is only 3 pages)

    ─Pass 1: = 6 sorted runs of 20 pages each (last run is only 8 pages)

    ─Pass 2: 2 sorted runs, 80 pages and 28 pages

    ─Pass 3: Sorted file of 108 pages

    • Total I/O costs: ? 2*N * (4 passes)

    108 5/

    22 4/

  • Worcester Polytechnic Institute

    Example

    Buffer : with 5 buffer pages ; File to sort : 108 pages

    Total IO costs

    = 2 * N ( # of passes )

    = 2N * (

    = 2 * 108 * (1 + )

    = 2 * 108 * (1 + log_4 ( 22 ))

    = 2 * 108 * 4

    5/1084log_

    1 1 log /B N B

  • Worcester Polytechnic Institute

    Number of Passes of External Sort

    N B=3 B=5 B=9 B=17 B=129 B=257

    100 7 4 3 2 1 1

    1,000 10 5 4 3 2 2

    10,000 13 7 5 4 2 2

    100,000 17 9 6 5 3 3

    1,000,000 20 10 7 5 3 3

    10,000,000 23 12 8 6 4 3

    100,000,000 26 14 9 7 4 4

    1,000,000,000 30 15 10 8 5 4

    - gain of utilizing all available buffers

    - importance of a high fan-in during merging

  • Worcester Polytechnic Institute

    Optimizing External Sorting

    •Cost metric ?

    ─I/O only (until now)

    ─CPU is nontrivial, worth reducing

  • Worcester Polytechnic Institute

    Optimizing Internal Sorting: Phase 1

    . . .12

    4

    28

    103

    5

    CURRENT SETINPUT

    OUTPUT

    ➢1 input, 1 output, B-2 current set buffer

    ➢Main idea: repeatedly pick tuple in current set with smallest k

    value that is still greater than largest k value in output buffer and

    append it to output buffer

  • Worcester Polytechnic Institute

    Internal Sort Algorithm

    . . .124

    2910

    3

    5

    CURRENT SETINPUT

    OUTPUT

    8

  • Worcester Polytechnic Institute

    Internal Sort Algorithm

    . . .12

    4

    2

    4

    10

    3

    5

    CURRENT SETINPUT

    OUTPUT

    9

    8

  • Worcester Polytechnic Institute

    Internal Sort Algorithm

    . . .12

    4

    2

    8

    103

    5

    CURRENT SETINPUT

    OUTPUT

    ➢Input & Output? new input page is read in, if it is fully consumed.

    Output is written out, when it is full

    ➢When terminate current run? When all tuples in current set are

    smaller than largest tuple in output buffer.

  • Worcester Polytechnic Institute

    Analysis of Heapsort

    • Fact: Average length of a run in heapsort is 2B

    ─The “snowplow” analogy

    • Worst-Case:

    ─What is min length of a run?

    ─How does this arise?

    • Best-Case:

    ─What is max length of a run?

    ─How does this arise?

    • Quicksort is faster, but ...

    B

  • Worcester Polytechnic Institute

    Optimizing External Sorting: Phase 2

    • Further optimization for external sorting:

    ─Blocked I/O

    ─Double buffering

  • Worcester Polytechnic Institute

    Blocking of Buffer Pages: Ext. Sort

    OUTPUT OUTPUT'

    Disk

    INPUT 1

    INPUT k

    INPUT 2

    INPUT 1

    INPUT 2

    block size b

    INPUT k

    B main memory buffers

    Disk

    block size b

    • Thus far : Do 1 I/O a page at a time

    • But : Reading a block of pages sequentially is cheaper

    • Make input/output a block of b pages

  • Worcester Polytechnic Institute

    Example: Blocking for Merge Sort

    • Idea : buffer blocks = b pages; 1 buffer block of b pages for input,

    1 buffer block of b pages for outputthen we can merge |B-b/b| runs in that first pass

    Example : 10 buffer pages

    if one-page input and output buffer blocks,

    then we produce 9 runs at a time

    if two-page input and output buffer blocks,

    then we produce 4 runs at a time

  • Worcester Polytechnic Institute

    Blocking of Buffer Pages

    OUTPUT OUTPUT'

    Disk

    INPUT 1

    INPUT k

    INPUT 2

    INPUT 1

    INPUT 2

    block size b

    INPUT k

    B main memory buffers, k-way merge

    Disk

    block size b

    Pro: Save total costs of “IOs”Cons: Reduce fan-out during merge passes

    In practice, most files still sorted in 2-3 passes.

  • Worcester Polytechnic Institute

    Double Buffering – Overlap CPU and I/O

    • To reduce wait time for I/O request to complete, prefetch into `shadow block’.

    OUTPUT

    OUTPUT'

    Disk Disk

    INPUT 1

    INPUT k

    INPUT 2

    INPUT 1'

    INPUT 2'

    INPUT k'

    block sizeb

    B main memory buffers, k-way merge

    Potentially, more passesIn practice, most files still sorted in 2-3 passes.

  • Worcester Polytechnic Institute

    Using B+ Trees for Sorting

    • Scenario: Table to be sorted has B+ tree index on sorting column(s).

    • Idea: Retrieve records in order by traversing leaf pages.

    • Is this a good idea?

    • Cases to consider:

    ─B+ tree is clustered Good idea!

    ─B+ tree is not clustered Bad idea!

  • Worcester Polytechnic Institute

    Clustered B+ Tree Used for Sorting

    • Traversal of tree from root to left-most leaf,

    • then if Alt. 1, retrieve all leaf pages in order

    • If Alt. 2, additional cost of retrieving actual data records

    Always better than external sorting!

    (Directs search)

    Data Records

    Index

    Data Entries("Sequence set")

  • Worcester Polytechnic Institute

    Unclustered B+ Tree Used for Sorting

    • Alternative (2) for data; each data entry contains rid of a data record. In general, one I/O per data record!

    (Directs search)

    Data Records

    Index

    Data Entries("Sequence set")

  • Worcester Polytechnic Institute

    External Sorting vs. Unclustered Index

    N Sorting p=1 p=10 p=100

    100 200 100 1,000 10,000

    1,000 2,000 1,000 10,000 100,000

    10,000 40,000 10,000 100,000 1,000,000

    100,000 600,000 100,000 1,000,000 10,000,000

    1,000,000 8,000,000 1,000,000 10,000,000 100,000,000

    10,000,000 80,000,000 10,000,000 100,000,000 1,000,000,000

    p: # of records per page (p=100 realistic)B=1,000 and block size=32 for sorting

  • Worcester Polytechnic Institute

    Summary

    • External sorting is important; DBMS may dedicate part of buffer pool for sorting!

    • External merge sort minimizes disk I/O costs:

    ─In practice, # of phases rarely more than 3.

    • Choice of internal sort algorithm may matter.

    • Best sorts are wildly fast:

    ─Despite 40+ years of research, still improving!

    • Clustered B+ tree is good for sorting; unclusteredtree is usually very bad.