query suspend and resume - duke ?· a better solution for query suspend and resume. in search for...

Download Query Suspend and Resume - Duke ?· a better solution for query suspend and resume. In search for this…

Post on 01-May-2019




0 download

Embed Size (px)


Query Suspend and Resume

Badrish ChandramouliDuke University


Christopher N. Bond

Google Inc.chrisbond@google.com,

Shivnath BabuDuke University


Jun YangDuke University


ABSTRACTSuppose a long-running analytical query is executing on a databaseserver and has been allocated a large amount of physical mem-ory. A high-priority task comes in and we need to run it imme-diately with all available resources. We have several choices. Wecould swap out the old query to disk, but writing out a large execu-tion state may take too much time. Another option is to terminatethe old query and restart it after the new task completes, but wewould waste all the work already performed by the old query. Yetanother alternative is to periodically checkpoint the query duringexecution, but traditional synchronous checkpointing carries highoverhead. In this paper, we advocate a database-centric approachto implementing query suspension and resumption, with negligi-ble execution overhead, bounded suspension cost, and efficient re-sumption. The basic idea is to let each physical query operatorperform lightweight checkpointing according to its own seman-tics, and coordinate asynchronous checkpoints among operatorsthrough a novel contracting mechanism. At the time of suspen-sion, we find an optimized suspend plan for the query, which mayinvolve a combination of dumping current state to disk and goingback to previous checkpoints. The plan seeks to minimize the sus-pend/resume overhead while observing the constraint on suspen-sion time. Our approach requires only small changes to the iteratorinterface, which we have implemented in the PREDATOR databasesystem. Experiments with our implementation demonstrate signif-icant advantages of our approach over traditional alternatives.Categories and Subject Descriptors: H.2.4 [Database Manage-ment]: Query ProcessingGeneral Terms: Design, Experimentation, PerformanceKeywords: Query, Suspend, Resume, Processing, Optimization

1 IntroductionConsider a database management system (DBMS) processing along-running memory-intensive analytical query Qlo . Suppose an-other memory-intensive query Qhi with much higher priority isThis work is supported by an NSF CAREER Award (IIS-0238386) and by IBM Faculty Awards.This work commenced when the author was at Duke University.

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.SIGMOD07, June 1114, 2007, Beijing, China.Copyright 2007 ACM 978-1-59593-686-8/07/0006 ...$5.00.

now submitted to the DBMS. The result of Qhi may be needed tomake a real-time decision, so we need to process Qhi as quickly aspossible and with all available resources. Ideally, the DBMS shouldsuspend the execution of Qlo , quickly release all resources held byQlo , and start Qhi using all resources. Query Qlo can be resumedonce Qhi finishes execution, ideally without losing any significantfraction of the work that Qlo has done prior to suspend. In thispaper, we address the problem of implementing query suspend andresume efficiently in a DBMS.

Query suspend and resume are important in many settings: Queries with different priorities: Today, DBMSs are often used

for mixed workloads, with both high-priority, real-time decision-making queries and low-priority, non-customer-facing analyti-cal queries (e.g., OLAP, mining). Typically, DBMSs run low-priority queries whenever high-priority queries are not running.However, with 24 7 operations, the window of time a DBMScan devote fully to low-priority resource-intensive queries be-comes smaller and increasingly fragmented. To maximize re-source utilization in this setting, it is important for the DBMSto be able to quickly suspend long-running, low-priority querieswhen high-priority queries arrive; the suspended queries can beresumed when the high-priority ones have completed.

Utility and Grid settings: Queries are now run frequently oncomputational utilities (e.g., Condor [4]) and Grids [5] com-posed of autonomous resources. When the owner of resourceswants to use them, queries running on these resources must re-lease control quickly, and migrate to other resources.

Software rejuvenation: Benefits of software rejuvenation [7],the practice of rebooting enterprise computing systems regu-larly, are now recognized widely. Reboot is critical when perfor-mance degrades due to resource exhaustion caused by resourceleaks. Query suspend and resume is important in this settingbecause (1) the challenge of predicting query completion timesaccurately (e.g., see [16]) makes it difficult to schedule querycompletion to match a rejuvenation schedule; (2) when perfor-mance degrades, it may not be cost-effective to wait for all run-ning queries to complete before rebooting the system.

DBMS maintenance: Various maintenance utilitiese.g., statis-tics update, index rebuild, data repartitioningneed to be runregularly on a DBMS to ensure good performance. Support forsuspending and resuming long-running queries gives adminis-trators and self-tuning DBMSs more flexibility in schedulingmaintenance utilities.

Query suspend and resume can be handled in different ways. Thequery Qlo to be suspended can be killed, and restarted during re-sume. Killing and restarting Qlo wastes time and resources, par-ticularly if Qlo has already performed a lot of work. Furthermore,

a kill-and-restart approach can lead to starvation of the query in asetting where suspend requests are common (e.g., in a shared Gridrunning on autonomous resources).

Query suspend can be partially achieved in some DBMSs by re-ducing the priority of the process running Qlo , e.g., using the renicecommand in UNIX. Usually, such approaches affect only the CPUutilization directly, and are ineffective or take a long time to releaseresources held by memory, I/O, or network-intensive plans. SomeDBMSs and operating systems (OSs) provide utilities (e.g., IBMDB2s Query Patroller [10]) to limit the resources allocated to low-priority queries so that high-priority queries will always find freeresources. None of these techniques provides a suspend/resumefunctionality that meets the needs of plan migration, software reju-venation, and DBMS maintenance.

Another technique to suspend Qlo is to write out its entire in-memory execution state to disk, like an OS process swap underheavy memory contention. This technique can have high overheadbecause queries can easily use gigabytes of memory in modern sys-tems with large memory. For instance, writing 1 GB of data to diskin 1-KB chunks can take up to 90 seconds in a modern system.Writing through storage managers (e.g., SHORE [2]) often involvesadditional system overhead. To make matters worse, in a Grid set-ting where Qlo needs to be migrated elsewhere, dumping state overthe network can have an order of magnitude higher overhead.Challenges and Contributions We have discussed why we needa better solution for query suspend and resume. In search for thissolution, we first point out two well-known ideas on how to dealwith the execution state of a data flow for suspend and resume. The entire state can be dumped to disk on suspend, and read

back on resume, like an OS-style process swap. While dumpingis simple to implement, it can cause high overhead during bothsuspend and resume.

The state can be checkpointed at selected points during execu-tion. Suspension keeps track of the exact suspend point; re-sumption starts from the last checkpoint, rolls forward to thesuspend point, and then continues execution. Checkpointingminimizes suspend-time overhead and avoids redoing the workperformed up to the last checkpoint. However, checkpointing(1) incurs overhead during execution, and (2) has to redo thework done since the last checkpoint.

Although the general ideas of dumping and checkpointing havebeen studied in various systems (e.g., [14]), unique challenges andopportunities arise when suspending and resuming DBMS queries,which are complex data flows consisting of physical query oper-ators with well-understood behavior. We next illustrate some ofthese challenges and opportunities.

Example 1 (Running example). Figure 1 shows a simple execu-tion plan for R ./ S ./ T consisting of two block-based nested loopjoins (NLJ for short) [6] and three table scans. We will use thisoperator tree as a running example in this paper. During each it-eration of an NLJ outer loop, the operator first reads output tuplesfrom the outer child to fill a large in-memory outer buffer, and thenperforms joins while reading tuples from the inner child. The con-tent of the outer buffer constitutes the in-memory heap state of theNLJ. Figure 2 shows how the amount of heap state changes withtime for the two NLJs. The child NLJ first fills up its outer buffer,then its total state plateaus while it reads its inner child and pro-duces R ./ S tuples to fill the parent NLJs outer buffer. When theparent NLJs outer buffer is full, this operator reads from its innerchild and produces R./S ./T tuples. When the inner child has ex-hausted its output, the NLJ discards its state and begins rebuildingits outer buffer with the next batch of child tuples.




ScanR ScanS







Checkpoint here

maps to very old

checkpoint of



Figure 1: Execution plan and outer


View more >