network computing laboratory hifi systems: network-centric query processing for the physical world...

18
Network Computing Laboratory HiFi Systems: Network-Centric Query Processing f or the Physical World Michael J. Franklin, Shawn R. Jeffrey, e t al UC Berkeley TelegraphCQ Team 2 nd CIDR Conf. 2005

Upload: garry-griffith

Post on 02-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Network Computing Laboratory

HiFi Systems:Network-Centric Query Processing for the Phy

sical World

Michael J. Franklin, Shawn R. Jeffrey, et alUC Berkeley TelegraphCQ Team

2nd CIDR Conf. 2005

Korea Advanced Institute of Science and Technology

Table of Contents

One line Comment Motivating Scenario HiFi System with CSAVA processing stage Internal Architecture of HiFi Node Critiques New Idea -1,2

Korea Advanced Institute of Science and Technology

One line Comment

It’s a preliminary work describing the group’s vision to distribute their TelegraphCQ system to a hierarchical network

Korea Advanced Institute of Science and Technology

Motivating Scenario – Supply Chain Management

“Smart Shelves” continuously monitor item addition and removal.

Info is sent back through the supply chain.

Korea Advanced Institute of Science and Technology

Hi Fan-In system

Ursa-Minor

(TinyDB-based)

Ursa-Major(TelegraphCQ w/Archiving)

Mid-tier

Stargate Mid-tier Processing Node

Korea Advanced Institute of Science and Technology

Characteristics of HiFi Systems

High Fan-In, globally-distributed architecture Large data volumes generated at edges

Filtering and cleaning must be done there

Successive aggregation as you move inwards Summaries/anomalies continually, details later

Strong temporal focus Strong spatial/geographic focus Streaming data and stored data Integration within and across enterprises

Korea Advanced Institute of Science and Technology

A View on this example

Filtering,Cleaning,Alerts

Monitoring,Time-series

Data mining(recent history)

Archiving(provenanceand schemaevolution)

GeographicScope

local global

SeveralReaders

RegionalCenters

CentralOffice

Korea Advanced Institute of Science and Technology

Headquarters

Regional Centers

Warehouse

Warehouse Doors

Receptor

High fan-in system levels with associated CSAVA processing stages

RFID RFID Clean Remove Anomalies

Smooth Interpolate for lost/garbled

readings

Arbitrate Remove duplicates

Validate Correlate with business rules

Analyze Tactical decision support

Korea Advanced Institute of Science and Technology

Internal Architecture of a HiFi node

MetadataRepository

Data StreamProcessor

Cache Manager

Data Listener

Resource Manager

QueryDispatcher

Local ViewManager

QueryPlacement

Service

QueryListener

ControlManager

DataDisseminator

QueryPlanner DSP

Manager

ArchiveManager

Logical QueryPlanner

Physical QueryPlanner

HiFi Glue

Data Flow

Query Flow

Control Flow

Korea Advanced Institute of Science and Technology

Critiques

Strong Point They classify and formulate five distinct data processing stage They develop the prototype system (in VLDB 05)

Weak Point Designing MDR is critical but no initial effort is done No new system requirement Solutions are not technically deep

Korea Advanced Institute of Science and Technology

New Idea - 1

Data Source CQ engine Web Server

SPAccel

Clients

Filtered out

By-passing

Buffering

Korea Advanced Institute of Science and Technology

New Idea – related to SPAccel

Designing front-end component (Cache??) Filtering out unwanted input data By-passing data matching query predicates Buffering data for windowed queries (views) or distributed queries Buffering Query Results

Korea Advanced Institute of Science and Technology

Issues expected

Cache replacement mechanism How to index cached elements What to cache? How much?

Korea Advanced Institute of Science and Technology

New Idea -2 processing stream data for OLAP queries

OLTP OLAPUsers Clerk, IT professional Knowledge workerFunction Day to day operations decision supportDB design application-oriented subject-orientedData current, up-to-date historical, summarized

detailed, flat relational multidimensional isolated integrated, consolidated

Usage repetitive ad-hocAccess read/write, lots of scans

index/hash on prim. keyUnit of work short, simple transaction complex query#Records accessed tens millions#Users thousands hundredsDB size 100MB-GB 100GB-TBMetric transaction throughput query throughput/response

Korea Advanced Institute of Science and Technology

A Sample Data Cube

sum

sum

sum

USA

Canada

Mexico

Country

Date

Product

CDvideocamera

1Q 2Q 3Q 4Q

Korea Advanced Institute of Science and Technology

New Idea - 2

Stream data in terms of OLAP domain OLAP queries are

Inherently multidimensional Spans a long time Need data from multiple sources

Processing OLAP queries are Memory intensive Computation intensive

Korea Advanced Institute of Science and Technology

Naïve Solution

Pre-computing popular computation path

Korea Advanced Institute of Science and Technology

Supplementary Silde

Cleaning CREATE VIEW cleaned_rfid_stream AS

( SELECT receptor_id, tag_id

FROM rfid_stream rs

WHERE read_strength >= strength_T)

Smoothing CREATE VIEW smoothed_rfid_stream AS

( SELECT receptor_id, tag_id

FROM cleaned_rfid_stream

GROUP BY receptor_id, tag_id

HAVING count(*) >= count_T)