query processing and networking infrastructures day 2 of 2 joe hellerstein uc berkeley september 27,...
Post on 22-Dec-2015
216 views
TRANSCRIPT
Query Processing and Networking Infrastructures
Day 2 of 2
Joe HellersteinUC Berkeley
September 27, 2002
Outline
Day 1: Query Processing Crash Course Intro Queries as indirection How do relational databases run queries? How do search engines run queries? Scaling up: cluster parallelism and distribution
Day 2: Research Synergies w/Networking Queries as indirection, revisited Useful (?) analogies to networking research Some of our recent research at the seams Some of your research? Directions and collective discussion
Indirections
Standard: Spatial Indirection
Allows referent to move without changes to referers Doesn’t matter where
the object is, we find it.
Alternative: copying Works if updates are
managed carefully, or don’t exist
Temporal Indirection
Asynchronous communication is indirection in time Doesn’t matter
when the object arrives, you find it
Analogy to space Sender referer Recipient
referent
Generalizing
Indirection in Space x-to-one or x-to-many? Physical or Logical mapping?
Indirection in Time Persistence model: storage or re-
xmission Persistence role: sender or receiver
Indirection in Space, Redux
One-to-one, one-to-many, many-to-many? Standard relational issue E.g. virtual address is many-to-one E.g. email distribution list is one-to-many
Physical or logical Mapping table?
E.g. page tables, mailing list, DNS, multicast group lists
Logical E.g. queries, subscriptions, interests
Indirection in Time, Redux
Persistence model: storage or re-xmission Storage: e.g. DB, heap, stack, NW buffer,
mailqueue Re-xmission: e.g. polling, retries.
“Joe is so persistent”
Persistence of put or get Put: e.g. DB insert, email, retry Get: e.g. subscription, polling
Examples: Storage Systems
Virtual Memory System Space: 1-to-1, physical Time: synchronous (no indirection)
Database System Space: many-to-many, logical Time: synchronous (no indirection)
Broadcast Disks Space: 1-to-1 Time: re-xmitted put
Examples: Split-Phase APIs
Polling Space: no indirection Time: re-xmitted get
Callbacks Space: no indirection Time: stored get
Active Messages Space: no indirection Time: stored get
App stores a get with putter, which tags it on messages
Examples: Communication
Email Space: One-to-many, physical
Mapping is one-to-many, delivery is one-to-one (copies)
Time: stored put
Multicast Space: One-to-many, physical
Both mapping and delivery are one-to-many
Time: roughly synchronous?
Examples: Distributed APIs
RPC Space: 1-to-1, physical
Can be 1-to-many Time: synchronous (no indirection)
Messaging systems Space: 1-to-1, physical
Often 1-to-many Time: depends!
Transactional messaging is stored put Exactly-once transmission guaranteed
Other schemes are re-xmitted put At least once transmission. Idempotency of message
becomes important!
Examples: Logic-based APIs
Publish-Subscribe Space: one-to-many, logical Time: stored receiver
Tuplespaces Space: one-to-many, logical Time: stored sender
Indirection Summary
2 binary indirection variables for space, 2 for timeCan have indirection in one without the otherLeads to 24 indirection options 16 joint space/time indirections, 4 space-only, 4
time-only And few lessons about the tradeoffs! Note: issues here in performance and SW
engineering and … E.g. “Are tuplespaces better than pub/sub?” Not a unidimensional question!
Rendezvous
Indirection on both sender and receiver side In time and/or space on each side Most general: neither sender nor
receiver know where or when rendezvous will happen!
Each chases a reference for where Each must persist for when
Join as Rendezvous
Recall pipelining hash join Combine all blue and gray tuples
that match
A batch rendezvous In space: the data items were not
stored in a fixed location, copied into HT In time: both sides do put-persist in the join algorithm via
storage
A hint of things to come: In parallel DBs, the hash table is content-addressed (via the
exchange routing function) What if hash table is distributed?
If a tuple in the join is doing “get”, then is there a distinction between sender/recipient? Between query and data?
Some resonances
We said that query systems are an indirection mechanism. Logical, many-to-many, but synchronous
Query-response
And some dataflow techniques inside query engines seem to provide useful indirection mechanismsIf we add a network into the picture, life gets very interesting Indirection in space very useful Indirection in time is critical Rendezvous is a basic operation
More Resonance
More Interaction: CS262 Experiment w/ Eric Brewer
Merge OS & DBMS grad class, over a yearEric/Joe, point/counterpointSome tie-ins were obvious: memory mgmt, storage, scheduling,
concurrency
Surprising: QP and networks go well side by side E.g. eddies and TCP Congestion Control
Both use back-pressure and simple Control Theory to “learn” in an unpredictable dataflow environment
Figure 3:Example Router Graph
Scout
Paths the key to comm-centric OS “Making Paths Explicit in the Scout Operating
System”, David Mosberger and Larry L. Peterson. OSDI ‘96.
CLICK
A NW router is a query plan! With a twist: flow-based context
An opportunity for “autonomous” query optimization
Revisiting a NW Classic with DB Goggles
Clark & Tennenhouse, SIGCOMM ‘90
Architectural Considerations for a New Generation of ProtocolsLove it for two reasons Tries to capture the essence of what networks do
Great for people who need the 10,000-foot view! I’m a fan of doing this (witness last week)
Tries to move the community up the food chain Resonances everywhere!!
C&T Overview (for amateurs like me)
Core function of protocols: data xfer Data Manipulation
buffer, checksum, encryption, xfer to/from app space, presentation
Transfer Control flow/congestion ctl,
detecting transmission problems, acks, muxing, timestamps, framing
Exchange!
Data Modeling!
Query Opt!
Thesis: nets are good at xfer control, not so good at data manipulationSome C&T wacky ideas for better data manipulation Xfer semantic units, not packets (ALF) Auto-rewrite layers to flatten them (ILP) Minimize cross-layer ordering constraints Control delivery in parallel via packet
content
C & T’s Wacky Ideas
DB People Should Be Experts!
BUT… remember: Basic Internet assumption:
“a network of unknown topology and with an unknown, unknowable and constantly changing population of competing conversations” (Van Jacobson)
Spoils the whole optimize-then-execute architecture of query optimization What happens when
denvironment/dt < query length?? What about the competing conversations? How do we handle the unknown topology? What about partial failure?
Ideally, we’d like: the semantics and optimization of DB dataflow with the agility and efficiency of NW dataflow
The Cosmic Convergence
NETWORKING RESEARCH
XML Routing
Router Toolkits
Content Addressingand DHTs
DirectedDiffusion
Adaptivity, Federated Control, GeoScalability
DATABASE RESEARCH
Adaptive QueryProcessing
ContinuousQueries, Streams
P2P QueryEngines
SensorQuery Engines
Data Models, Query Opt, DataScalability
What does the QP perspective add?
In terms of high-level languages?In terms of a reusable set of operators?In terms of optimization opportunities?In terms of batch-I/O tricks?In terms of approximate answers?A “safe” route to Active Networks?
Not computationally complete Optimizable and reconfigurable -- data independence
applies
Fun to be had here! Addressing a few fronts at Berkeley…
Some of our work at the seams
Starting with centralized engine for remote data sets and streams Telegraph: eddies, SteMs, FLuX “Deep Web”, filesharing systems, sensor
streams
More recently, querying sensor networks TinyDB/TAG: in-network queries
And DHT-based overlay networks PIER
Telegraph Overview
Telegraph: An Adaptive Dataflow System
Themes: Adaptivity and Sharing Adaptivity encapsulated in operators
Eddies for order of operations State Modules (SteMs) for transient state FLuX for parallel load-balance and availability
Work- and state-sharing across flows Unlike traditional relational schemes, try to share
physical structures
Franklin, Hellerstein, Hong and students (to follow)
TeSS
Eddy
Join Select Project Group Aggregate Transitive Closure DupElim
File ReaderIngress
Adaptive Routing and Optimization
FLuX
Online Query Processing
Inte
rMod
ule
Com
m
and
sch
edu
lin
g (F
jord
s)
Sensor Proxy
Request Parsing, Metadata
SQL Explicit Dataflows
Mod
ule
s
XML Catalog
SteM
P2P Proxy
Telegraph Architecture
Juggle
Continuous Adaptivity: Eddies
A little more state per tuple Ready/done bits (extensible a la Volcano/Starburst) Minimal state in Eddy itself
Queue + parameters being learning Decisions: which tuple in queue to which operator
Query processing = dataflow routing!! Ron Avnur
Eddy
Two Key Observations
Break the set-oriented boundary Usual DB model: algebra expressions: (R S) T Common DB implementation: pipelining operators!
Subexpressions needn’t be materialized Typical implementation is more flexible than algebra
We can reorder in-flight operators
Don’t rewrite graph. Impose a router Graph edge = absence of routing constraint Observe operator consumption/production rates
Consumption: cost. Production: cost*selectivity Could break these down per values of tuples
So fun! Simple, incremental, general Brings all of query optimization online
And hence a bridge to ML, Control Theory, Queuing Theory
State Modules (SteMs)
Goal: Further adaptivity through competition
Multiple mirrored sources (AMs) Handle rate changes, failures,
parallelism Multiple alternate operators Join = Routing + State SteM operator manages tradeoffs
State Module, unifies caches, rendezvous buffers, join state
Competitive sources/operators share building/probing SteMs
Join algorithm hybridization!
Eddies + SteMs tackle the full (single-site) query optimization problem online
Vijayshankar Raman, Amol Deshpande
staticdataflows
eddy
eddy+
stems
FLuX: Routing Across ClusterFault-tolerant, Load-balancing eXchangeContinuous/long-running flows need high availabilityBig flows need parallelism
Adaptive Load-Balancing req’d
FLuX operator: Exchange plus… Adaptive flow partitioning (River) Transient state replication & migration
Replication & checkpointing for SteMs Note: set-based, not sequence-based!
Needs to be extensible to different ops: Content-sensitivity History-sensitivity
Dataflow semantics Optimize based on edge semantics Networking tie-in again:
At-least-once delivery? Exactly-once delivery? In/Out of order?
Mehul Shah
Continuously AdaptiveContinuous Queries (CACQ)
Continuous Queries clearly need all this stuff! Natural application of Telegraph infrastructure
4 Ideas in CACQ: Use eddies to allow reordering of ops.
But one eddy will serve for all queries Queries are data: join with Grouped Filter
A la stored get! This idea extended in PSOUP (Chandrasekaran & Franklin)
Explicit tuple lineage Mark each tuple with per-op ready/done bits Mark each tuple with per-query completed bits
Joins via SteMs, shared across all queries Note: mixed-lineage tuples in a SteM. I.e. shared state is not shared algebraic
expressions! Delete a tuple from flow only if it matches no query
Sam Madden, Mehul Shah, Vijayshankar Raman, Sirish Chandrasekaran
Sensor QP: TinyDB/TAG
Smart Dust MotesTinyOS
Palm DevicesLinux
Wireless Sensor Networks
A spectrum of devices Varying degrees of power and network constraints Fun is on the small side!
Our current platform: Mica and TinyOS 4Mhz Atmel CPU, 4KB RAM, 40kBit radio, 512K EEPROM, 128K
Flash Sensors: temp, light, accelerometer, magnetometer, mic, etc. Wireless, single-ported, multi-hop ad-hoc network
Spanning-tree communication through “root”
TinyDB
A query/trigger engine for motesDeclarative (SQL-like) language for optimizability Data independence arguments in spades here! Non-programmers can deal with it
Lots of challenges at the seams of queries and routing Query plans over dynamic multi-hop network With power and bandwidth consumption as key
metrics
Sam Madden (w/Hellerstein, Hong, Franklin)
Query
Number of Messages vs.
Aggregation Function
0
20000
40000
60000
80000
100000
EXTERNAL
MAX
AVERAGECOUNT MEDIAN
Aggregation Function
Focus: Hierarchical Aggregation
Aggregation natural in sensornets The “big picture” typically interesting Aggregation can smooth noise and loss
E.g. signal processing aggs like wavelets Provides data reduction
Power/Network Reduction:in-network aggregation Hierarchical version of parallel
aggregation Tricky design space
power vs. quality topology-selection value-based routing dynamic environment requires
adaptivity
TinyDB Sample Apps
Habitat Monitoring: what is the average humidity in the populated petrel burrows on Great Duck Island right now?
Smart Office: find me the conference rooms that have been reserved but unoccupied for 5 minutes.
Home Automation: lower blinds when light intensity is above a threshold.
Performance in SensorNets
Power consumption Communication >> Computation
METRIC: radio wake time Send > Receive
METRIC: messages generated “Run for 5 years” vs. “Burn power for critical events” vs.
“Run my experiment”
Bandwidth Constraints Internal >> External
Volume >> surface area
Result Quality Noisy sensors Discrete sampling of continuous phenomena Lossy communication channel
TinyDB
SQL-like language for specifying continuous queries and triggers Schema management, etc.
Proxy on desktop, small query engine per mote Plug and play (query snooping) To keep the engine “tiny”, use an eddy-style arch
One explicit copy of each iterator’s code image
Adaptive dataflow in network
Alpha available for download on SourceForge
Some of the Optimization Issues
Extensible Aggregation API: Init(), Iter(), SplitFlow(), Close() Properties
Amount of intermediate state Duplicate sensitivity Monotonicity Exemplary vs. Summary
Hypothesis TestingSnooping and SuppressionCompression, Presumption, Interpolation
Generally, QP and NW issues intertwine!
PIER: Querying the Internet
Querying the Internet
As opposed to querying over the InternetHave to deal with Internet realities Scale, dynamics, federated admin, partial failure, etc. Standard distributed DBs won’t work
Applications Start with real-time, distributed network monitoring
Traffic monitoring, intrusion/spam detection, software deployment detection (e.g. via TBIT), etc.
Use PIER’s SQL as a workload generator for networks? Virtual “tables” determine load produced by each site “Queries” become a way of specifying site-to-site
communication Move to infect the network more deeply?
E.g. Indirection schemes like i3, rendezvous mechanisms, etc. Overlays only?
And p2p QP, Obviously
Gnutella done right And it’s so easy! :-)
Crawler-free web searchBring WYGIWIGY queries to the people Ranking, recommenders, etc.
Got to be more fun here If p2p takes off in a big way, queries have to be a big piece
Why p2p DB, anyway? No good reason I can think of! :-) Focus on the grassroots nature of p2p
Schema integration and transactions and … ?? No! Work with what you got! Query the data that’s out there Nothing complicated for users will fly
Avoid the “DB” word: P2P QP, not P2P DB
Approach: Leverage DHTs
“Distributed Hash Tables” Family of distributed content-routing schemes
CAN, CHORD, Pastry, Tapestry, etc. Internet scale “hash table”
A la wide-area, adaptive Exchange routing table With some notion of storage
Leverage DHTs aggressively As distributed indexes on stored data As state modules for query processing
E.g. use DHTs as the hash tables in a hash join As rendezvous points for exchanging info
E.g. Bloom Filters
PIER: P2p Information Exchange and Retrieval
Relational-style query executor With front-ends for SQL and catalogs Standard and continuous queries With access to DHT APIs
Currently CAN and Chord, working on Tapestry Common DHT API would help
Currently simulating queries running on 10’s of thousands of nodes Look ma, it scales!
Widest-scale relational engine ever, looks feasible Most of the simulator code will live on in implementation
On Millennium and PlanetLab this fall/winter
Ryan Huebsch and Boon Thau Loo (w/Hellerstein, Shenker, Stoica)
PIER Challenges
How does this batch workload stress DHTs?How does republishing of soft-state interact with dataflow?
And semantics of query answers
Materialization/precomputation/caching Physical tuning meets SteMs meets materialized views
How to do query optimization in this context Distributed eddies!
Partial failure a reality At storage nodes, query execution nodes? Impact on results, mitigation
What about aggregation? Similarities/difference with TAG? With Astrolabe [Birman et al]?
The “usual” CQ and data stream query issues, distributed Analogous to work in Telegraph, and at Brown, Wisconsin, Stanford…
All together now?
I thought about changing the names: Telegraph*, Teletiny…? The group didn’t like the branding
Teletubby!
Seriously: integration? It’s a plausible need
Sensor data + map data + historical sensor logs + … Filesharing + Web
We have done both of these cheesily But fun questions of doing it right
E.g. pushing predicates and data into sensor net or not?
References & Resources
Database Texts
Undergrad textbooks Ramakrishnan & Gehrke, Database Management Systems Silberschatz, Korth, Sudarshan, Database System
Concepts Garcia-Molina, Ullman, Widom, Database Systems - The
Complete Book O’Neil & O’Neil, DATABASE Principles, Programming, and
Performance Abiteboul, Hull, Vianu, Foundations of Databases
Graduate texts Stonebraker & Hellerstein, Readings in Database Systems
(a.k.a “The Red Book”) Brewer & Hellerstein: Readings book (e-book?) in
progress. Fall 2003?
Research Links
DB group at Berkeley: db.cs.berkeley.eduGiST: gist.cs.berkeley.eduTelegraph: telegraph.cs.berkeley.eduTinyDB: telegraph.cs.berkeley.edu/tinydb berkeley.intel-research.net/tinydbRed Book: redbook.cs.berkeley.edu