course project ideas

26
Course Project Ideas Yanlei Diao University of Massachusetts Amherst

Upload: orli

Post on 04-Jan-2016

44 views

Category:

Documents


1 download

DESCRIPTION

Course Project Ideas. Yanlei Diao University of Massachusetts Amherst. New Directions for DB Research. Sensor data : new architecture XML : new data model Streams : new execution model Data quality and lineage : new services …. Querying in Sensor Networks. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Course Project Ideas

Course Project Ideas

Yanlei DiaoUniversity of Massachusetts

Amherst

Page 2: Course Project Ideas

Yanlei Diao, University of Massachusetts Amherst 04/20/23

New Directions for DB Research

Sensor data: new architecture

XML: new data model

Streams: new execution model

Data quality and lineage: new services

Page 3: Course Project Ideas

Yanlei Diao, University of Massachusetts Amherst 04/20/23

Querying in Sensor Networks

Acoustic stream

• Store data locally at sensors and push queries into the sensor network– Flash memory energy-

efficiency.– Limited capabilities of sensor

platforms.

Internet

Gateway

Image stream

Flash Memory

Push query to sensors

Page 4: Course Project Ideas

Yanlei Diao, University of Massachusetts Amherst 04/20/23

Optimize for Flash and Limited RAM

• Flash Memory Constraints– Data cannot be over-written, only

erased– Pages can often only be erased in

blocks (16-64KB)– Unlike magnetic disks, cannot

modify in-place

• Challenges:– Energy: Organize data on flash to

minimize read/write/erase operations

– Memory: Minimize use of memory for flash database.

1. 1. Load block 2. Into Memory

3. Save block back

Eraseblock

Memory

2. Modify in-memory

~16-64 KB

~4-10 KB

Page 5: Course Project Ideas

Yanlei Diao, University of Massachusetts Amherst 04/20/23

StonesDB: System Operation

Image Retrieval: Return images taken last month with at least

two birds one of which is a bird of type A.

• Identify “best” sensors to forward query.

• Provide hints to reduce search complexity at sensor.

Proxy Cache of Image Summaries

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 6: Course Project Ideas

Yanlei Diao, University of Massachusetts Amherst 04/20/23

StonesDB: System Operation

Image Retrieval: Return images taken last month with at least two birds one of which is a bird of type

A.

Query Engine

Partitioned Access Methods

Page 7: Course Project Ideas

Yanlei Diao, University of Massachusetts Amherst 04/20/23

Research Issues in StonesDB

• Local Database Layer– Reduce updates for indexing and aging.– New cost models for self-tuning sensor databases.– Energy-optimized query processing.– Query processing over aged data.

• Distributed Database Layer– What summaries are relevant to queries?– What remainder queries to send to sensors?– What resolution of summaries to cache?

Page 8: Course Project Ideas

Yanlei Diao, University of Massachusetts Amherst 04/20/23

XML (Extensible Markup Language)

<bibliography>

<book> <title> Foundations… </title>

<author> Abiteboul </author>

<author> Hull </author>

<author> Vianu </author>

<publisher> Addison Wesley </publisher>

<year> 1995 </year>

</book>

</bibliography>

XML: a tagging mechanism to describe content.

Page 9: Course Project Ideas

Yanlei Diao, University of Massachusetts Amherst 04/20/23

XML Data Model (Graph)

bookb1

b2

title authorauthor

author

pcdata

Com plete... P rincip les...Cham berlin Bernste in Newcom er

pcdata pcdata pcdata pcdata

publisher

nam e state

CAM organ...

pcdata pcdata

pub pub

db

m kp

#1 #2 #3 #4 #5 #6 #7

#0

book

title

Main structure: ordered, labeled tree

References between node: becoming a graph

Page 10: Course Project Ideas

Yanlei Diao, University of Massachusetts Amherst 04/20/23

XQuery: XML Query Language

• A declarative language for querying XML data

• XPath: path expressions– Patterns to be matched against an XML graph– /bib/paper[author/lastname=‘Croft’]/title

• FLOWR expressions– Combining matching and restructuring of XML data– For $p in distinct(document("bib.xml")//publisher)

Let $b := document("bib.xml")/book[publisher = $p]

Where count($b) > 100

Order by $p/name

Return $p

Page 11: Course Project Ideas

Yanlei Diao, University of Massachusetts Amherst 04/20/23

Metadata Management using XML

• File systems for large-scale scientific simulations– File systems: petabytes or even more– Directory tree (metadata): large, can’t fit in memory– Links between files: steps in a simulation, data derivation

• File Searches– all the files generated on Oct 1, 2005– all the files whose name is like ‘*simu*.txt’– all the files that were generated from the file ‘basic-measures.txt’

Build an XML store to manage directory trees!– XML data model– XML Query language– XML Indices

Page 12: Course Project Ideas

Yanlei Diao, University of Massachusetts Amherst 04/20/23

XML Document Processing

Multi-hierarchical XML markup of text documents– Multi-hierarchies: part-of-speech, page-line – Features in different hierarchies overlap in scope– Need a query language & querying mechanism – References [Nakov et al., 2005; Iacob & Dekhtyar, 2005]

Querying and ranking of XML data– XML fragments returned as results– Fuzzy matches– Ranking of matches– References [Amer-Yahia et al., 2005; Luo et al., 2003]

• Well-defined problems identify your contributions!

Page 13: Course Project Ideas

Yanlei Diao, University of Massachusetts Amherst 04/20/23

Data Stream Management

Queries, RulesQueries, Rules

Event Specs,Event Specs,

SubscriptionsSubscriptions

Results Results

•Data in motion, unending

•Continuous, long-running queries

•Data-driven execution

Data

Traditional Database

Attr1 Attr2 Attr3Query

Data Stream Processor

•Data at rest

•One-shot or periodic queries

•Query-driven execution

Page 14: Course Project Ideas

Yanlei Diao, University of Massachusetts Amherst 04/20/23

• XML is becoming the wire format for data• In-network XML processing

– Authentication

– Authorization

– Routing

– Transformation

– Pattern matching

• XPath widely used for in-network XML processing• Applied directly to streaming XML data• Line-speed performance

In-Network XML Processing

Expedite traffic

Enhance security

Real-time monitoring & diagnosis

Page 15: Course Project Ideas

Yanlei Diao, University of Massachusetts Amherst 04/20/23

Research Issues

Gigabit rate XPath processing– Take one look, process XPath, buffer data for future use if

necessary

– Processing needs to be gigabit rate

– Memory usage needs to be minimized

• Time/space complexity of XPath stream processing– Theoretical analysis for common features of XPath

• Minimizing memory usage of YFilter technolgy– YFilter: state-of-the-art for multi-XPath processing

Page 16: Course Project Ideas

Yanlei Diao, University of Massachusetts Amherst 04/20/23

RFID Technology

• RFID technology

01.01298.6EF.0A

01.01267.60D.01

04.0768E.001.F0

reader_id,tag_id,timestamp

Page 17: Course Project Ideas

Yanlei Diao, University of Massachusetts Amherst 04/20/23

RFID Stream Processing<pml > <tag>01.01298.6EF.0A</tag> <time>00129038</time> <location>shelf 2</location> </pml> +

<pml> <tag>01.01298.6EF.0A</tag> <time>02183947</time> <location>exit1</location> </pml>

RFID tag RFID reader

Page 18: Course Project Ideas

Yanlei Diao, University of Massachusetts Amherst 04/20/23

Example Queries

• Shoplifting: an item was taken out of store without being checked out.

• Out of stocks: the number of items of product X on shelf ≤ 3.

• Misplacement: an item was moved from Shelf A to Shelf B without being purchased or put back.

• …

Page 19: Course Project Ideas

Yanlei Diao, University of Massachusetts Amherst 04/20/23

RFID Processing: Global Tracking

+

<pml> <epc>01.001298.6EF.0A</epc> <ts type=“begin”> <date>…</date> </ts> <entity type=“maker”> <name type=“legal”>X Ltd. </name> </entity> …

<pml> <epc>01.001298.6EF.0A</epc> <ts><date>…</date></ts> <location>…</location> <msr label=“temperature” max=2>90</msr> …

<pml> <epc>01.001298.6EF.0A</epc> <ts><date>…</date></ts> <location>…</location> <msr label=“temperature” max=5>95</msr> …

<pml> <epc>01.001298.6EF.0A</epc> <ts><date>…</date></ts> <location>…</location> <msr label=“temperature” max=2>80</msr> …

<pml> <epc>01.001298.6EF.0A</epc> <ts><date>…</date></ts> <location>…</location> <msr label=“temperature” max=2>85</msr> …

<pml> <epc>01.001298.6EF.0A</epc> <ts type=“end”> <date>…</date></ts> <entity type=“retailer”> <name type=“legal”>CVS </name> </entity> …

Page 20: Course Project Ideas

Yanlei Diao, University of Massachusetts Amherst 04/20/23

Example Queries

• Counterfeit drugs: a bottle is accepted at the retailer if it came from a legal manufacturer and followed all necessary steps in the distribution network

• Expired/spoiled drugs: a bottle is accepted at the retailer if it went through the distribution network in less than 3 months and was never exposed to temperature > 96 F

• Missing pallet, expected case, illegally cloned tags…

Page 21: Course Project Ideas

Yanlei Diao, University of Massachusetts Amherst 04/20/23

Challenges in RFID Management

• Data-Information Mismatch– RFID raw data: (tag id, reader id, timestamp) – Meaningful information: shoplifting, misplaced inventory, out-of-

stocks; expired drugs, spoiled drugs…

• Incomplete, inaccurate data– Readers miss tags– Readers can pick up tags from overlapping areas

• High-volume data – Readers read constantly, from all tags in range, without line-of-sight– Can create up to millions of terabytes of data in a single day

• Low-latency processing– Up-to-the-second information, time-critical actions

Page 22: Course Project Ideas

Yanlei Diao, University of Massachusetts Amherst 04/20/23

Research Issues

• Real-time event stream processing– Handling duplicate readings/results

– Data cleaning

– Data compression

• Handling incomplete readings– Inferences in event databases

– Inferences over event streams

• Distributed processing– Real time anomaly detection

– Distributed inferences

Page 23: Course Project Ideas

Yanlei Diao, University of Massachusetts Amherst 04/20/23

Adaptive Sensing of Atmosphere

• Environmental monitoring: real-time processing of huge-volume meteorological data

• Challenges– Large volume but limited bandwidth– Real-time processing– Uncertain data– Data archiving and querying the

history

Sense Sense

Send Send

MergeMerge

Detection

Prediction

Page 24: Course Project Ideas

Yanlei Diao, University of Massachusetts Amherst 04/20/23

Managing Uncertain Data

• Sources of data uncertainty1)Sensing noise and partial scanning

2)Data compression3)Lossy wireless links

4) Incomplete merging

• Managing uncertain data– Model sources of data uncertainty– Develop uncertainty calculus to

combine the effects of these sources– Augment results with confidence

values

(1) (1)

(2) (2)

(3) (3)

MergeMerge(4)

Tornado Detection

Prediction (confidence?)

Page 25: Course Project Ideas

Yanlei Diao, University of Massachusetts Amherst 04/20/23

Managing Uncertain Data

• Sources of data uncertainty1)Sensing noise and partial scanning2)Data compression3)Lossy wireless links4) Incomplete merging

• Self diagnosis and tuning– Compare predication at t with

observation at t+1 (no ground truth?!)

– System diagnosis when confidence value is low

– Automatically tune the system

(1) (1)

(2) (2)

(3) (3)

MergeMerge(4)

Tornado Detection

Prediction (confidence?)

Page 26: Course Project Ideas

Yanlei Diao, University of Massachusetts Amherst 04/20/23

Questions