network computing laboratory hifi systems: network-centric query processing for the physical world...
TRANSCRIPT
Network Computing Laboratory
HiFi Systems:Network-Centric Query Processing for the Phy
sical World
Michael J. Franklin, Shawn R. Jeffrey, et alUC Berkeley TelegraphCQ Team
2nd CIDR Conf. 2005
Korea Advanced Institute of Science and Technology
Table of Contents
One line Comment Motivating Scenario HiFi System with CSAVA processing stage Internal Architecture of HiFi Node Critiques New Idea -1,2
Korea Advanced Institute of Science and Technology
One line Comment
It’s a preliminary work describing the group’s vision to distribute their TelegraphCQ system to a hierarchical network
Korea Advanced Institute of Science and Technology
Motivating Scenario – Supply Chain Management
“Smart Shelves” continuously monitor item addition and removal.
Info is sent back through the supply chain.
Korea Advanced Institute of Science and Technology
Hi Fan-In system
Ursa-Minor
(TinyDB-based)
Ursa-Major(TelegraphCQ w/Archiving)
Mid-tier
Stargate Mid-tier Processing Node
Korea Advanced Institute of Science and Technology
Characteristics of HiFi Systems
High Fan-In, globally-distributed architecture Large data volumes generated at edges
Filtering and cleaning must be done there
Successive aggregation as you move inwards Summaries/anomalies continually, details later
Strong temporal focus Strong spatial/geographic focus Streaming data and stored data Integration within and across enterprises
Korea Advanced Institute of Science and Technology
A View on this example
Filtering,Cleaning,Alerts
Monitoring,Time-series
Data mining(recent history)
Archiving(provenanceand schemaevolution)
GeographicScope
local global
SeveralReaders
RegionalCenters
CentralOffice
Korea Advanced Institute of Science and Technology
Headquarters
Regional Centers
Warehouse
Warehouse Doors
Receptor
High fan-in system levels with associated CSAVA processing stages
RFID RFID Clean Remove Anomalies
Smooth Interpolate for lost/garbled
readings
Arbitrate Remove duplicates
Validate Correlate with business rules
Analyze Tactical decision support
Korea Advanced Institute of Science and Technology
Internal Architecture of a HiFi node
MetadataRepository
Data StreamProcessor
Cache Manager
Data Listener
Resource Manager
QueryDispatcher
Local ViewManager
QueryPlacement
Service
QueryListener
ControlManager
DataDisseminator
QueryPlanner DSP
Manager
ArchiveManager
Logical QueryPlanner
Physical QueryPlanner
HiFi Glue
Data Flow
Query Flow
Control Flow
Korea Advanced Institute of Science and Technology
Critiques
Strong Point They classify and formulate five distinct data processing stage They develop the prototype system (in VLDB 05)
Weak Point Designing MDR is critical but no initial effort is done No new system requirement Solutions are not technically deep
Korea Advanced Institute of Science and Technology
New Idea - 1
Data Source CQ engine Web Server
SPAccel
Clients
Filtered out
By-passing
Buffering
Korea Advanced Institute of Science and Technology
New Idea – related to SPAccel
Designing front-end component (Cache??) Filtering out unwanted input data By-passing data matching query predicates Buffering data for windowed queries (views) or distributed queries Buffering Query Results
Korea Advanced Institute of Science and Technology
Issues expected
Cache replacement mechanism How to index cached elements What to cache? How much?
Korea Advanced Institute of Science and Technology
New Idea -2 processing stream data for OLAP queries
OLTP OLAPUsers Clerk, IT professional Knowledge workerFunction Day to day operations decision supportDB design application-oriented subject-orientedData current, up-to-date historical, summarized
detailed, flat relational multidimensional isolated integrated, consolidated
Usage repetitive ad-hocAccess read/write, lots of scans
index/hash on prim. keyUnit of work short, simple transaction complex query#Records accessed tens millions#Users thousands hundredsDB size 100MB-GB 100GB-TBMetric transaction throughput query throughput/response
Korea Advanced Institute of Science and Technology
A Sample Data Cube
sum
sum
sum
USA
Canada
Mexico
Country
Date
Product
CDvideocamera
1Q 2Q 3Q 4Q
Korea Advanced Institute of Science and Technology
New Idea - 2
Stream data in terms of OLAP domain OLAP queries are
Inherently multidimensional Spans a long time Need data from multiple sources
Processing OLAP queries are Memory intensive Computation intensive
Korea Advanced Institute of Science and Technology
Naïve Solution
Pre-computing popular computation path
Korea Advanced Institute of Science and Technology
Supplementary Silde
Cleaning CREATE VIEW cleaned_rfid_stream AS
( SELECT receptor_id, tag_id
FROM rfid_stream rs
WHERE read_strength >= strength_T)
Smoothing CREATE VIEW smoothed_rfid_stream AS
( SELECT receptor_id, tag_id
FROM cleaned_rfid_stream
GROUP BY receptor_id, tag_id
HAVING count(*) >= count_T)