e.g.m. petrakissearching signals and patterns1 given a query q and a collection of n objects o 1,o...
TRANSCRIPT
![Page 1: E.G.M. PetrakisSearching Signals and Patterns1 Given a query Q and a collection of N objects O 1,O 2,…O N search exactly or approximately The ideal](https://reader035.vdocuments.site/reader035/viewer/2022062409/56649f225503460f94c3b234/html5/thumbnails/1.jpg)
E.G.M. Petrakis Searching Signals and Patterns
1
Searching Signals and Patterns
Given a query Q and a collection of N objects O1,O2,…ON search exactly or approximately
The ideal method should be: Fast: faster than sequential scanningCorrect: returns all qualifying objectDynamic: allows for insertions,
deletions, updates
![Page 2: E.G.M. PetrakisSearching Signals and Patterns1 Given a query Q and a collection of N objects O 1,O 2,…O N search exactly or approximately The ideal](https://reader035.vdocuments.site/reader035/viewer/2022062409/56649f225503460f94c3b234/html5/thumbnails/2.jpg)
E.G.M. Petrakis Searching Signals and Patterns
2
Similarity Queries
Range queries: find all objects within distance e from the queryD(Q,I) < e, where D,e: user defined
Nearest Neighbor (NN): find the k most similar objects
All-pairs (“spatial join”) queries: find all pairs of objects Oi,Oj within distance e of each other D(Oi,Oj) < e
![Page 3: E.G.M. PetrakisSearching Signals and Patterns1 Given a query Q and a collection of N objects O 1,O 2,…O N search exactly or approximately The ideal](https://reader035.vdocuments.site/reader035/viewer/2022062409/56649f225503460f94c3b234/html5/thumbnails/3.jpg)
E.G.M. Petrakis Searching Signals and Patterns
3
Similarity queries (cont,d)
Whole matching: the whole query Q matches an object Oi
the image is 512x512, the query is 512x512
Partial matching: the query specifies only a part of an object find parts of objects that match the querythe images are 512x512, the query is
32x32
![Page 4: E.G.M. PetrakisSearching Signals and Patterns1 Given a query Q and a collection of N objects O 1,O 2,…O N search exactly or approximately The ideal](https://reader035.vdocuments.site/reader035/viewer/2022062409/56649f225503460f94c3b234/html5/thumbnails/4.jpg)
E.G.M. Petrakis Searching Signals and Patterns
4
Object Types
1D signals: time sequencesscientific datadigitized voice or music
2D signals: digitized images (gray scale, color)video clips
General objects: text, multimedia documents
![Page 5: E.G.M. PetrakisSearching Signals and Patterns1 Given a query Q and a collection of N objects O 1,O 2,…O N search exactly or approximately The ideal](https://reader035.vdocuments.site/reader035/viewer/2022062409/56649f225503460f94c3b234/html5/thumbnails/5.jpg)
E.G.M. Petrakis Searching Signals and Patterns
5
Applications
In many applications searching for similar patterns helps in predictions, decision making, data mining etc.FinancialMarketing & production of 1D signalsScientific databasesDNA/genome databasesAudio databasesImage and Video databases
![Page 6: E.G.M. PetrakisSearching Signals and Patterns1 Given a query Q and a collection of N objects O 1,O 2,…O N search exactly or approximately The ideal](https://reader035.vdocuments.site/reader035/viewer/2022062409/56649f225503460f94c3b234/html5/thumbnails/6.jpg)
E.G.M. Petrakis Searching Signals and Patterns
6
Queries
Find companies whose stock prices move similarly or with similar pattern of growth
Find products with similar selling patterns
Find if a musical score is similar to one of the copyrighted scores
Find images that look like a sunsetFind X-rays showing lung tumor
![Page 7: E.G.M. PetrakisSearching Signals and Patterns1 Given a query Q and a collection of N objects O 1,O 2,…O N search exactly or approximately The ideal](https://reader035.vdocuments.site/reader035/viewer/2022062409/56649f225503460f94c3b234/html5/thumbnails/7.jpg)
E.G.M. Petrakis Searching Signals and Patterns
7
Indexing [Agrawal et.al 93]
To achieve faster than sequential scanning the objects are indexed
Extract f features from each object and apply a SAM to index this objectSearch the SAM to retrieve promising objectsClean-up the response
The indexing method must be correct (i.e., has no “misses”), have small space overhead and be dynamic
![Page 8: E.G.M. PetrakisSearching Signals and Patterns1 Given a query Q and a collection of N objects O 1,O 2,…O N search exactly or approximately The ideal](https://reader035.vdocuments.site/reader035/viewer/2022062409/56649f225503460f94c3b234/html5/thumbnails/8.jpg)
E.G.M. Petrakis Searching Signals and Patterns
8
Objects are mapped to points A query Q becomes a sphere with radius e
Mapping Objects to Space
![Page 9: E.G.M. PetrakisSearching Signals and Patterns1 Given a query Q and a collection of N objects O 1,O 2,…O N search exactly or approximately The ideal](https://reader035.vdocuments.site/reader035/viewer/2022062409/56649f225503460f94c3b234/html5/thumbnails/9.jpg)
E.G.M. Petrakis Searching Signals and Patterns
9
Mapping Objects to Points
F( ): mapping functionDf: object distance in feature spaceD: object distance in actual spaceSelection of F( ) and Df ?
Ideally, Df(Qi,Oj) = D(Qi,Oj) The mapping preserves the distances
The mapping should guarantee no misses
![Page 10: E.G.M. PetrakisSearching Signals and Patterns1 Given a query Q and a collection of N objects O 1,O 2,…O N search exactly or approximately The ideal](https://reader035.vdocuments.site/reader035/viewer/2022062409/56649f225503460f94c3b234/html5/thumbnails/10.jpg)
E.G.M. Petrakis Searching Signals and Patterns
10
GEMINI [Faloutsos 96]
GEMINI: Generic Multimedia Indexing1. Define F( ): mapping of objects to f features
(objects become vectors)2. Determine the distance function Df in the f
space3. Guarantee correctness: prove that Df < D
4. Apply a SAM (e.g., R-tree) to index the f-dimensional vectors
5. Apply the Search Algorithm to eliminate flase drops.
![Page 11: E.G.M. PetrakisSearching Signals and Patterns1 Given a query Q and a collection of N objects O 1,O 2,…O N search exactly or approximately The ideal](https://reader035.vdocuments.site/reader035/viewer/2022062409/56649f225503460f94c3b234/html5/thumbnails/11.jpg)
E.G.M. Petrakis Searching Signals and Patterns
11
Search Algorithm
Problem: Retrieve all objects satisfying D(Q,O) < eRetrieve points Df(Qi,Oj) < eRetrieve the actual objects SKeep only those satisfying D(Q,S) < e (discard false alarms)
![Page 12: E.G.M. PetrakisSearching Signals and Patterns1 Given a query Q and a collection of N objects O 1,O 2,…O N search exactly or approximately The ideal](https://reader035.vdocuments.site/reader035/viewer/2022062409/56649f225503460f94c3b234/html5/thumbnails/12.jpg)
E.G.M. Petrakis Searching Signals and Patterns
12
Lower Bounding
Lemma: To guarantee no false dismissals F( ) should satisfyDf(Q,Oi) <= D(Q,Oi) for all Q, Oi
Proof: prove that if an object qualifies for the query, it will be retrieved in the feature spaceDf(Q,Oi) <= e but since Df(Q,Oi) <=
D(Q,Oi) we have that D(Q,Oi) <= e
![Page 13: E.G.M. PetrakisSearching Signals and Patterns1 Given a query Q and a collection of N objects O 1,O 2,…O N search exactly or approximately The ideal](https://reader035.vdocuments.site/reader035/viewer/2022062409/56649f225503460f94c3b234/html5/thumbnails/13.jpg)
E.G.M. Petrakis Searching Signals and Patterns
13
Indexing 1D Signals
Find all signals S=(s1,s2,…Sn) within distance e from Q=(q1,q2,…qn)D(Q,S) < esi, qi: amplitudes at time ID is defined asApply GEMINIBut how F( ) and Df( ) are defined?
i ii qsSQD 2-),(
![Page 14: E.G.M. PetrakisSearching Signals and Patterns1 Given a query Q and a collection of N objects O 1,O 2,…O N search exactly or approximately The ideal](https://reader035.vdocuments.site/reader035/viewer/2022062409/56649f225503460f94c3b234/html5/thumbnails/14.jpg)
E.G.M. Petrakis Searching Signals and Patterns
14
Definition of F, D
DFT maps signals s=(s1,s2,…sn) to the frequency spectrum S=(S1,S2,…Sn)
F( ) takes first fc Fourier coefficientsfc: “cut-off” frequency (e.g., fc = 5)
Signals become points in an f = 2fc space (because the coefficients s are complex numbers)
Df is defined as i iif QSSQD 2-),(
![Page 15: E.G.M. PetrakisSearching Signals and Patterns1 Given a query Q and a collection of N objects O 1,O 2,…O N search exactly or approximately The ideal](https://reader035.vdocuments.site/reader035/viewer/2022062409/56649f225503460f94c3b234/html5/thumbnails/15.jpg)
E.G.M. Petrakis Searching Signals and Patterns
15
Df Lower Bounds D
Let S, Q be the DFTs of s, qParseval’s: the energy in the time
and frequency domains is the same
This implies that and D(Q,S) <= D (q,s) because D is
computed using fc <= n fewer terms
i ii i SsSs
2222
qsQS
--
![Page 16: E.G.M. PetrakisSearching Signals and Patterns1 Given a query Q and a collection of N objects O 1,O 2,…O N search exactly or approximately The ideal](https://reader035.vdocuments.site/reader035/viewer/2022062409/56649f225503460f94c3b234/html5/thumbnails/16.jpg)
E.G.M. Petrakis Searching Signals and Patterns
16
Experiments
Faster than sequential for all set sizesSlower but more accurate for more
coefficientsThe trade-of reaches an equilibrium for f=3 or
4
![Page 17: E.G.M. PetrakisSearching Signals and Patterns1 Given a query Q and a collection of N objects O 1,O 2,…O N search exactly or approximately The ideal](https://reader035.vdocuments.site/reader035/viewer/2022062409/56649f225503460f94c3b234/html5/thumbnails/17.jpg)
E.G.M. Petrakis Searching Signals and Patterns
17
Intuition
For the majority of 1D signals there will be a few frequencies with high amplitudes
If we index only the first few fc (fc < 5 or 10) coefficients we shall have only a few false drops
R-trees can handle up to 20 dimensions for point data
![Page 18: E.G.M. PetrakisSearching Signals and Patterns1 Given a query Q and a collection of N objects O 1,O 2,…O N search exactly or approximately The ideal](https://reader035.vdocuments.site/reader035/viewer/2022062409/56649f225503460f94c3b234/html5/thumbnails/18.jpg)
E.G.M. Petrakis Searching Signals and Patterns
18
NN Queries [Korn. et. al. 98]
Find the k-NN’s of query Q:1. Search the SAM to the find the k-NN’s
[e.g., Rous95] using Df
2. Compute D for all these k objects3. Let E = max{D(q,si)}, 1<= i <= k
4. Issue a range query D(q,s) <= E on the SAM and retrieve a new set of objects
5. Compute their actual distances D(q,s)6. Output the nearest k objects
![Page 19: E.G.M. PetrakisSearching Signals and Patterns1 Given a query Q and a collection of N objects O 1,O 2,…O N search exactly or approximately The ideal](https://reader035.vdocuments.site/reader035/viewer/2022062409/56649f225503460f94c3b234/html5/thumbnails/19.jpg)
E.G.M. Petrakis Searching Signals and Patterns
19
Correctness of NN AlgorithmLemma: the algorithm has no missesProof: Let sk be the k-NN retrieved object and sl
be the l-th NN object (l < k), prove D(q,sl) < D(q,sk) (then the l-th object is retrieved too !!)
If the algorithm did not retrieve sl then the range query (step 4) has missed it: Df(q,sl) > E
From lower bounding: D(q,sl) > Df(q,sl) > E ®However, Df(q,sk) < E and by combination
Df(q,sl) > D(q,sk) which contradicts ®
![Page 20: E.G.M. PetrakisSearching Signals and Patterns1 Given a query Q and a collection of N objects O 1,O 2,…O N search exactly or approximately The ideal](https://reader035.vdocuments.site/reader035/viewer/2022062409/56649f225503460f94c3b234/html5/thumbnails/20.jpg)
E.G.M. Petrakis Searching Signals and Patterns
20
Partial Matching [Faloutsos94]
Problem: given N data sequences S1,S2,…SN and a query Q, locate data subsequences that match a query subsequence locate stock prices with similar
monthly patterns of growthextract f features, apply a SAM etc.
![Page 21: E.G.M. PetrakisSearching Signals and Patterns1 Given a query Q and a collection of N objects O 1,O 2,…O N search exactly or approximately The ideal](https://reader035.vdocuments.site/reader035/viewer/2022062409/56649f225503460f94c3b234/html5/thumbnails/21.jpg)
E.G.M. Petrakis Searching Signals and Patterns
21
Methodology
Locate matching window of length w on signal (length(S)–w+1 positions)
Assume minimum query length w the method handles any queryshorter queries are of no interest
Longer queries are split into w-queries
![Page 22: E.G.M. PetrakisSearching Signals and Patterns1 Given a query Q and a collection of N objects O 1,O 2,…O N search exactly or approximately The ideal](https://reader035.vdocuments.site/reader035/viewer/2022062409/56649f225503460f94c3b234/html5/thumbnails/22.jpg)
E.G.M. Petrakis Searching Signals and Patterns
22
Splitting a Query
Mapping sequences S=(s1,s2,s3) and S’=(s’1,s’2) and query Q=(q1,q2)
q1
q2
s1
s2
s3
s’1
s’2
e
e
F2
F1
![Page 23: E.G.M. PetrakisSearching Signals and Patterns1 Given a query Q and a collection of N objects O 1,O 2,…O N search exactly or approximately The ideal](https://reader035.vdocuments.site/reader035/viewer/2022062409/56649f225503460f94c3b234/html5/thumbnails/23.jpg)
E.G.M. Petrakis Searching Signals and Patterns
23
Indexing Subsequences
I-naive method: index all w-trailsInefficient in terms of space and speed1:f increase in storage, tall, slow R-tree
ST-index: index the w-trails in groupsSubsequent trails are similar Grouping in the f-dimensional feature
spaceIndex rectangles containing similar trails
![Page 24: E.G.M. PetrakisSearching Signals and Patterns1 Given a query Q and a collection of N objects O 1,O 2,…O N search exactly or approximately The ideal](https://reader035.vdocuments.site/reader035/viewer/2022062409/56649f225503460f94c3b234/html5/thumbnails/24.jpg)
E.G.M. Petrakis Searching Signals and Patterns
24
Grouping of Subsequences
Organize w-trails in the f space in rectangles so that disk accesses are minimizedFixed number of points per rectangle, but
which is the optimal number?Smaller rectangles, less disk accesses
a rectangle L=(l1,l2,…ln) causes Π(li+0.5) accesses
an m point rectangle causes Π(li+0.5)/m accesses
![Page 25: E.G.M. PetrakisSearching Signals and Patterns1 Given a query Q and a collection of N objects O 1,O 2,…O N search exactly or approximately The ideal](https://reader035.vdocuments.site/reader035/viewer/2022062409/56649f225503460f94c3b234/html5/thumbnails/25.jpg)
E.G.M. Petrakis Searching Signals and Patterns
25
I-Adaptive Algorithm
Map the points of w-trails in rectangles in the f space
Assign the first point of a w-trail to a rectangle
For each successive point, if it increases the cost of the rectangle start a new rectangle, else include it in the same rectangle
![Page 26: E.G.M. PetrakisSearching Signals and Patterns1 Given a query Q and a collection of N objects O 1,O 2,…O N search exactly or approximately The ideal](https://reader035.vdocuments.site/reader035/viewer/2022062409/56649f225503460f94c3b234/html5/thumbnails/26.jpg)
E.G.M. Petrakis Searching Signals and Patterns
26
Naïve Method
Fixed number of points per rectangle
![Page 27: E.G.M. PetrakisSearching Signals and Patterns1 Given a query Q and a collection of N objects O 1,O 2,…O N search exactly or approximately The ideal](https://reader035.vdocuments.site/reader035/viewer/2022062409/56649f225503460f94c3b234/html5/thumbnails/27.jpg)
E.G.M. Petrakis Searching Signals and Patterns
27
I-Adaptive Method
Variable number of points per rectangleSmaller rectangles, less disk accesses
![Page 28: E.G.M. PetrakisSearching Signals and Patterns1 Given a query Q and a collection of N objects O 1,O 2,…O N search exactly or approximately The ideal](https://reader035.vdocuments.site/reader035/viewer/2022062409/56649f225503460f94c3b234/html5/thumbnails/28.jpg)
E.G.M. Petrakis Searching Signals and Patterns
28
Range Queries [Petrakis 02]
Input: query Q, distances D,Df, tolerance e
Output: signals S satisfying D(Q,S) <= e
1. Decompose Q = (q1,q2,…,qn)
2. Apply Df(qi,sj) <= e, store results in Ai
3. Compute 4. For each S in A compute D(Q,S)5. Output sequences satisfying D(Q,S) <= e
n
1=
iA=Ai
![Page 29: E.G.M. PetrakisSearching Signals and Patterns1 Given a query Q and a collection of N objects O 1,O 2,…O N search exactly or approximately The ideal](https://reader035.vdocuments.site/reader035/viewer/2022062409/56649f225503460f94c3b234/html5/thumbnails/29.jpg)
E.G.M. Petrakis Searching Signals and Patterns
29
NN Queries [Petrakis 02]
Input: query Q, distance D, Df,, number k Output: the k sequences most similar to Q1. Decompose Q = (q1,q2,…,qn)
2. Apply a k-NN query for each qi Retrieve k distinct w-trails (incremental k-NN
search) [Hjaltason 99] Compute ei their max distance from Q
3. Compute e = min{ei}4. Apply a range query D(Q,S) <=e5. Output the k sequences closest to Q
![Page 30: E.G.M. PetrakisSearching Signals and Patterns1 Given a query Q and a collection of N objects O 1,O 2,…O N search exactly or approximately The ideal](https://reader035.vdocuments.site/reader035/viewer/2022062409/56649f225503460f94c3b234/html5/thumbnails/30.jpg)
E.G.M. Petrakis Searching Signals and Patterns
30
References R. Agrawal, C. Faloutsos, A. Swani, “
Efficient Similarity Search in Sequence Databases”, Proc. of FODO Conf, Oct. 1993
C. Faloutsos, M. Ranganathan, Y. Manolopoulos, “Fast Subsequence Matching in Time-Series Databases”, Proc. of SIGMOD, May 1994
P. Korn, N. Sidiropoulos, C. Faloutsos, E. Siegel, Z. Protopapas, “Fast and Effective Retrieval of Medical Tumor Shapes”, IEEE TKDE, Vol. 11, 1998
Euripides G.M. Petrakis: "Fast Retrieval by Spatial Structure in Image DataBases", Journal of Visual Languages and Computing, 2002 (to appear)
N. Rousopoulos, S. Kelley, F. Vincent: “Nearest-Neighbor Queries”, Proc. ACM SIGMOD, May 1995
G. R. Hjaltason and H. Samet: “Distance Browsing in Spatial Databases”, ACM Trans. on Inf.Syst., 24(2):265–318, 1999