enabling signal processing over stream data

17
Enabling Signal Processing over Stream Data Milos Nikolic * , University of Oxford Badrish Chandramouli, Microsoft Research Jonathan Goldstein, Microsoft Research * Work performed during internship at MSR

Upload: others

Post on 01-May-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Enabling Signal Processing over Stream Data

EnablingSignalProcessingoverStreamDataMilosNikolic*,UniversityofOxfordBadrish Chandramouli,MicrosoftResearchJonathanGoldstein,MicrosoftResearch

*WorkperformedduringinternshipatMSR

Page 2: Enabling Signal Processing over Stream Data

SignalsinStreams

• Lotsof“signals”instreamdata• Internet-of-thingsdevices,apptelemetry(e.g.,adclicks)

• IoT workflowscombinerelational&signallogic• Ex:Real-timeapp

M

Group-byID

U

UnionID Time Value0 0:42:19 671 0:42:22 802 0:42:22 850 0:42:23 692 0:42:24 85

RemovenoiseInterpolatemissingdataFindperiodicity

DiscardinvaliddataCorrelatelivedataw/history

σ ⋈ DSP

DSPσ ⋈2

Whichtoolstousetobuildsuchapps?

Page 3: Enabling Signal Processing over Stream Data

Dataprocessingexpert

Digitalsignalprocessingexpert

Engines:streamengines,DBMS,MPPsystems

Datamodel:(tempo)-relational

Language:declarative(SQL,LINQ,functional)

Scenarios: real-time,offline,progressive

Engines:MATLAB,R

Datamodel:array

Language:imperative(arraylanguages,C)

Scenarios:mostlyoffline,real-time

3

How to reconciletwo worlds?

Our solution: • high-performance (2 OOM faster)• one query language• familiar abstractions to both worlds

Page 4: Enabling Signal Processing over Stream Data

TypicalDSPWorkflow

Equally-spacedsamplesstoredinarray

1. Window• windowsize&hopsize

2. Perwindow:pipelineDSPops• arraytoarray• Example:spectralanalysis

FFT➞ user-definedfunction➞ IFFT

3. Unwindow• sumoverlappingsegments

x[n]

x2

y[n]

x0x1

y0

y1

y2

Perdevice+

+

4

Page 5: Enabling Signal Processing over Stream Data

LooseSystemsIntegrationStreamProcessingEngine+R

• Streamengineforrelationalqueries• Per-groupcomputation,windowing,joins,etc.

• Rforhighly-optimizedDSPoperations

• Problem: impedancemismatch• Highcommunicationoverhead(upto95%)• Impracticalforreal-timeanalysis• Disparatequerylanguages

x2

++

x0x1

y0

y1

y2

R

STREAMPROCESSINGSYSTEM

5

Page 6: Enabling Signal Processing over Stream Data

• Performance• 2-4OOMfasterthantoday’sSPE

• Querymodel• Basedontemporalquerymodel(relationalwithtime)

• Real-time,offline,progressivequeries

• Languageintegration• Builtas.NETlibrary• WorkswitharbitraryC#data-types

• Unifiedquerymodel• Non-uniform&uniformsignals• Type-safemixofstream&signaloperators

• Array-basedextensibilityframework• DSPoperatorwriterseesarrays• Supportsincrementalcomputation

• “Walledgarden”ontopofTrill• Nochangesindatamodel• InheritsTrill’sefficientprocessingcapability(e.g.,groupedcomputation)

TRILL DSP

Trill:FastStreamingAnalyticsEngine DSPLibrary

[VLDB2014paper]

7

Page 7: Enabling Signal Processing over Stream Data

Tempo-RelationalModel

• Uniformlyrepresentsofflineandonlinedatasetsasstreamdata

Logicaltime

e1e2

e3

e4

e5

Tempo-RelationalModelRelationalModelt1t2t3t4

snapshotsINPUT

Q=COUNT(*)

4

Logicaltime

1 1 1 1212

OUTPUT

Q Q

8

Page 8: Enabling Signal Processing over Stream Data

TrillExample(Simplified)• Defineeventdata-typeinC#

struct SensorReading { long SensorId; long Time; double Value; }

• Defineingressvar str = Network.ToStream(e => e.Time);

• Writequery(inC#app)var query = str.Where(e => e.Value < 100)

.Select(e => e.Value)

• Subscribetoresultquery.Subscribe(e => Console.Write(e)); // write results to console

9

Page 9: Enabling Signal Processing over Stream Data

Signal=streamw/ooverlappingevents

Time

Inputevents e1

e2

e3

e4

e5 Time

Aggregatedevents

1 1 1 1212

STREAMABLE SIGNALSTREAMABLE

var signal = stream.Where(e => e.Value < 100).Count()STREAMS

SIGNALS

• Transitiontosignaldomain• E.g.,resultofanaggregatequery

• Usingstreamoperatorstobuildsignaloperators• E.g.,addingtwosignalsasatemporaljoinoftwostreamsleft.Join(right, (l, r) => l + r)

Type-safeoperations

10

Page 10: Enabling Signal Processing over Stream Data

Uniformly-sampledsignals

• Samplingwithinterpolation

TimeInputevents

misalignedmissing

30 60 90 120 150 180 210

TimeOutputevents

30 60 90 120 150 180 210

interpolated

var uniformSignal = signal.Sample(30, 0, ip => ip.Linear(60));

Interpolationwindow

STREAMS

SIGNALS

UNIFORM

11

Page 11: Enabling Signal Processing over Stream Data

BringingArrayAbstractionstoDSPUsers

• Initialidea:Window&Unwindow sampleoperators• Window() createsastreamofarrays

• Unwindow() projectsarraysbackintime

• Performanceproblems• Createsdependenciesbetweenwindowsemanticsandsystemperformance• Nodatasharingacrossoverlappingarrays

• Unclearlanguagesemantics• e.g.,streamofarrays:isitasignalornot?

Time

var s = uniformSignal.Window(5,3).FFT()…

Window=5samplesHop=3samples

12

Page 12: Enabling Signal Processing over Stream Data

• Exposearraysonly inside thewindowingoperator

WindowingOperatorforDSPUsers

var query = uniformSignal.Window(512, 256,

w => w.FFT().Select(a => f(a)).IFFT(),a => a.Sum())

)

Uniformsignal UniformsignalUNWINAGGFFT f IFFTWIN

• DSPpipeline&arraysinstantiatedonlyonce➞ betterdatamanagement13

Page 13: Enabling Signal Processing over Stream Data

User-DefinedOperatorFramework

• DSPexpertswritearray-arrayoperators• Matchestheirexpectations• Allowsoptimizedarray-basedlogic(e.g.,SIMD)

• IncrementalDSPoperators• Frameworkusescirculararraystoavoiddatacopyingwithhoppingwindows

• New&olddataavailableforincrementalcomputation

OLD NEW

WindowHop

FFT f IFFT

14

Page 14: Enabling Signal Processing over Stream Data

GroupedComputation

• Group-awareoperators• Onlineprocessingofintertwinedsignals• Onestatepereachgroup

• E.g.,interpolatorkeepsahistoryofsamplesforeachgroup

• StreamingMapReduceinTrill• Parallelexecutiononeachsub-streamcorrespondingtoadistinctgroupingkey

var q = signal.Map(s => s.Select(e => e.Value), e => e.SensorId) .Reduce(s => s.Window(512, 256,

w => w.FFT().Select(a => f(a)).IFFT(), a => a.Sum()))

15

Page 15: Enabling Signal Processing over Stream Data

Performance:FFTwithtumblingwindow

0

2

4

6

8

10

12

128 256 512 1024 2048WINDOWSIZE

TrillDSP WaveScope MATLAB R

Window➞ FFT➞ Unwindow

RUNNINGTIME(secs)

Pre-loadeddatasetsinmemory

PureDSPtask• TrillDSP usesFFTWlibrary

ComparabletobestDSPtools

16

Page 16: Enabling Signal Processing over Stream Data

4

8

16

32

64

128

256 230 179 128 76 25

HOPSIZE

TrillDSP(1core) MATLABSparkR(16cores) SciDB-R(16cores)

Performance:Grouping+DSPPersensor:WindowedFFT➞ Function➞ InverseFFT➞ Unwindow

NORMALIZEDTIMETOTRILLDSPON16CORESPre-loadeddatasetsinmemory• 100groupsinstream

Upto2OOMfasterthanothersPerformancebenefitsfrom:• Efficientgroupprocessing,group-awareDSPwindowing

• Usingcirculararraystomanageoverlappingwindows

• TrillDSP usesFFTWlibrary

17

Page 17: Enabling Signal Processing over Stream Data

Conclusion

• Appsmixrelational&signallogic• Perdevice:findperiodicityinsignals,interpolatemissingdata,recovernoisydata

• Differentdatamodels:relationalvs.array

• ExistingqueryprocessorsintegratedwithR• Impedancemismatch➞ highperformanceoverhead➞ notsuitableforreal-time

• TrillDSP =Relationalprocessing+Signalprocessing• Unifiedquerymodelforrelationalandsignaldata,forbothreal-timeandoffline

• Givesuserstheviewtheyarecomfortablewith• Avoidsimpedancemismatchbetweencomponents

18

Upto2OOM fasterthansystemsintegratedw/R