enabling signal processing over stream data
Post on 01-May-2022
2 Views
Preview:
TRANSCRIPT
EnablingSignalProcessingoverStreamDataMilosNikolic*,UniversityofOxfordBadrish Chandramouli,MicrosoftResearchJonathanGoldstein,MicrosoftResearch
*WorkperformedduringinternshipatMSR
SignalsinStreams
• Lotsof“signals”instreamdata• Internet-of-thingsdevices,apptelemetry(e.g.,adclicks)
• IoT workflowscombinerelational&signallogic• Ex:Real-timeapp
M
Group-byID
U
UnionID Time Value0 0:42:19 671 0:42:22 802 0:42:22 850 0:42:23 692 0:42:24 85
RemovenoiseInterpolatemissingdataFindperiodicity
DiscardinvaliddataCorrelatelivedataw/history
σ ⋈ DSP
DSPσ ⋈2
Whichtoolstousetobuildsuchapps?
Dataprocessingexpert
Digitalsignalprocessingexpert
Engines:streamengines,DBMS,MPPsystems
Datamodel:(tempo)-relational
Language:declarative(SQL,LINQ,functional)
Scenarios: real-time,offline,progressive
Engines:MATLAB,R
Datamodel:array
Language:imperative(arraylanguages,C)
Scenarios:mostlyoffline,real-time
3
How to reconciletwo worlds?
Our solution: • high-performance (2 OOM faster)• one query language• familiar abstractions to both worlds
TypicalDSPWorkflow
Equally-spacedsamplesstoredinarray
1. Window• windowsize&hopsize
2. Perwindow:pipelineDSPops• arraytoarray• Example:spectralanalysis
FFT➞ user-definedfunction➞ IFFT
3. Unwindow• sumoverlappingsegments
x[n]
x2
y[n]
x0x1
y0
y1
y2
Perdevice+
+
4
LooseSystemsIntegrationStreamProcessingEngine+R
• Streamengineforrelationalqueries• Per-groupcomputation,windowing,joins,etc.
• Rforhighly-optimizedDSPoperations
• Problem: impedancemismatch• Highcommunicationoverhead(upto95%)• Impracticalforreal-timeanalysis• Disparatequerylanguages
x2
++
x0x1
y0
y1
y2
R
STREAMPROCESSINGSYSTEM
5
• Performance• 2-4OOMfasterthantoday’sSPE
• Querymodel• Basedontemporalquerymodel(relationalwithtime)
• Real-time,offline,progressivequeries
• Languageintegration• Builtas.NETlibrary• WorkswitharbitraryC#data-types
• Unifiedquerymodel• Non-uniform&uniformsignals• Type-safemixofstream&signaloperators
• Array-basedextensibilityframework• DSPoperatorwriterseesarrays• Supportsincrementalcomputation
• “Walledgarden”ontopofTrill• Nochangesindatamodel• InheritsTrill’sefficientprocessingcapability(e.g.,groupedcomputation)
TRILL DSP
Trill:FastStreamingAnalyticsEngine DSPLibrary
[VLDB2014paper]
7
Tempo-RelationalModel
• Uniformlyrepresentsofflineandonlinedatasetsasstreamdata
Logicaltime
e1e2
e3
e4
e5
Tempo-RelationalModelRelationalModelt1t2t3t4
snapshotsINPUT
Q=COUNT(*)
4
Logicaltime
1 1 1 1212
OUTPUT
Q Q
8
TrillExample(Simplified)• Defineeventdata-typeinC#
struct SensorReading { long SensorId; long Time; double Value; }
• Defineingressvar str = Network.ToStream(e => e.Time);
• Writequery(inC#app)var query = str.Where(e => e.Value < 100)
.Select(e => e.Value)
• Subscribetoresultquery.Subscribe(e => Console.Write(e)); // write results to console
9
Signal=streamw/ooverlappingevents
Time
Inputevents e1
e2
e3
e4
e5 Time
Aggregatedevents
1 1 1 1212
STREAMABLE SIGNALSTREAMABLE
var signal = stream.Where(e => e.Value < 100).Count()STREAMS
SIGNALS
• Transitiontosignaldomain• E.g.,resultofanaggregatequery
• Usingstreamoperatorstobuildsignaloperators• E.g.,addingtwosignalsasatemporaljoinoftwostreamsleft.Join(right, (l, r) => l + r)
Type-safeoperations
10
Uniformly-sampledsignals
• Samplingwithinterpolation
TimeInputevents
misalignedmissing
30 60 90 120 150 180 210
TimeOutputevents
30 60 90 120 150 180 210
interpolated
var uniformSignal = signal.Sample(30, 0, ip => ip.Linear(60));
Interpolationwindow
STREAMS
SIGNALS
UNIFORM
11
BringingArrayAbstractionstoDSPUsers
• Initialidea:Window&Unwindow sampleoperators• Window() createsastreamofarrays
• Unwindow() projectsarraysbackintime
• Performanceproblems• Createsdependenciesbetweenwindowsemanticsandsystemperformance• Nodatasharingacrossoverlappingarrays
• Unclearlanguagesemantics• e.g.,streamofarrays:isitasignalornot?
Time
var s = uniformSignal.Window(5,3).FFT()…
Window=5samplesHop=3samples
12
• Exposearraysonly inside thewindowingoperator
WindowingOperatorforDSPUsers
var query = uniformSignal.Window(512, 256,
w => w.FFT().Select(a => f(a)).IFFT(),a => a.Sum())
)
Uniformsignal UniformsignalUNWINAGGFFT f IFFTWIN
• DSPpipeline&arraysinstantiatedonlyonce➞ betterdatamanagement13
User-DefinedOperatorFramework
• DSPexpertswritearray-arrayoperators• Matchestheirexpectations• Allowsoptimizedarray-basedlogic(e.g.,SIMD)
• IncrementalDSPoperators• Frameworkusescirculararraystoavoiddatacopyingwithhoppingwindows
• New&olddataavailableforincrementalcomputation
OLD NEW
WindowHop
FFT f IFFT
14
GroupedComputation
• Group-awareoperators• Onlineprocessingofintertwinedsignals• Onestatepereachgroup
• E.g.,interpolatorkeepsahistoryofsamplesforeachgroup
• StreamingMapReduceinTrill• Parallelexecutiononeachsub-streamcorrespondingtoadistinctgroupingkey
var q = signal.Map(s => s.Select(e => e.Value), e => e.SensorId) .Reduce(s => s.Window(512, 256,
w => w.FFT().Select(a => f(a)).IFFT(), a => a.Sum()))
15
Performance:FFTwithtumblingwindow
0
2
4
6
8
10
12
128 256 512 1024 2048WINDOWSIZE
TrillDSP WaveScope MATLAB R
Window➞ FFT➞ Unwindow
RUNNINGTIME(secs)
Pre-loadeddatasetsinmemory
PureDSPtask• TrillDSP usesFFTWlibrary
ComparabletobestDSPtools
16
4
8
16
32
64
128
256 230 179 128 76 25
HOPSIZE
TrillDSP(1core) MATLABSparkR(16cores) SciDB-R(16cores)
Performance:Grouping+DSPPersensor:WindowedFFT➞ Function➞ InverseFFT➞ Unwindow
NORMALIZEDTIMETOTRILLDSPON16CORESPre-loadeddatasetsinmemory• 100groupsinstream
Upto2OOMfasterthanothersPerformancebenefitsfrom:• Efficientgroupprocessing,group-awareDSPwindowing
• Usingcirculararraystomanageoverlappingwindows
• TrillDSP usesFFTWlibrary
17
Conclusion
• Appsmixrelational&signallogic• Perdevice:findperiodicityinsignals,interpolatemissingdata,recovernoisydata
• Differentdatamodels:relationalvs.array
• ExistingqueryprocessorsintegratedwithR• Impedancemismatch➞ highperformanceoverhead➞ notsuitableforreal-time
• TrillDSP =Relationalprocessing+Signalprocessing• Unifiedquerymodelforrelationalandsignaldata,forbothreal-timeandoffline
• Givesuserstheviewtheyarecomfortablewith• Avoidsimpedancemismatchbetweencomponents
18
Upto2OOM fasterthansystemsintegratedw/R
top related