Download - Aggregate features for relational data Claudia Perlich, Foster Provost Pat Tressel 16-May-2005
![Page 1: Aggregate features for relational data Claudia Perlich, Foster Provost Pat Tressel 16-May-2005](https://reader035.vdocuments.site/reader035/viewer/2022070413/5697bfc91a28abf838ca8c3f/html5/thumbnails/1.jpg)
Aggregate features for relational data
Claudia Perlich, Foster Provost
Pat Tressel16-May-2005
![Page 2: Aggregate features for relational data Claudia Perlich, Foster Provost Pat Tressel 16-May-2005](https://reader035.vdocuments.site/reader035/viewer/2022070413/5697bfc91a28abf838ca8c3f/html5/thumbnails/2.jpg)
Overview
• Perlich and Provost provide...– Hierarchy of aggregation methods– Survey of existing aggregation methods– New aggregation methods
• Concerned w/ supervised learning only– But much seems applicable to clustering
![Page 3: Aggregate features for relational data Claudia Perlich, Foster Provost Pat Tressel 16-May-2005](https://reader035.vdocuments.site/reader035/viewer/2022070413/5697bfc91a28abf838ca8c3f/html5/thumbnails/3.jpg)
The issues…
• Most classifiers use feature vectors– Individual features have fixed arity– No links to other objects
• How do we get feature vectors from relational data?– Flatten it:
• Joins•Aggregation
• (Are feature vectors all there are?)
![Page 4: Aggregate features for relational data Claudia Perlich, Foster Provost Pat Tressel 16-May-2005](https://reader035.vdocuments.site/reader035/viewer/2022070413/5697bfc91a28abf838ca8c3f/html5/thumbnails/4.jpg)
Joins
• Why consider them?– Yield flat feature vectors– Preserve all the data
• Why not use them?– They emphasize data with many
references•Ok if that’s what we want•Not ok if sampling was skewed•Cascaded or transitive joins blow up
![Page 5: Aggregate features for relational data Claudia Perlich, Foster Provost Pat Tressel 16-May-2005](https://reader035.vdocuments.site/reader035/viewer/2022070413/5697bfc91a28abf838ca8c3f/html5/thumbnails/5.jpg)
Joins– They emphasize data with many
references:•Lots more Joes than there were before...
![Page 6: Aggregate features for relational data Claudia Perlich, Foster Provost Pat Tressel 16-May-2005](https://reader035.vdocuments.site/reader035/viewer/2022070413/5697bfc91a28abf838ca8c3f/html5/thumbnails/6.jpg)
Joins
• Why not use them?– What if we don’t know the references?
•Try out everything with everything else•Cross product yields all combinations•Adds fictitious relationships•Combinatorial blowup
![Page 7: Aggregate features for relational data Claudia Perlich, Foster Provost Pat Tressel 16-May-2005](https://reader035.vdocuments.site/reader035/viewer/2022070413/5697bfc91a28abf838ca8c3f/html5/thumbnails/7.jpg)
Joins– What if we don’t know the references?
![Page 8: Aggregate features for relational data Claudia Perlich, Foster Provost Pat Tressel 16-May-2005](https://reader035.vdocuments.site/reader035/viewer/2022070413/5697bfc91a28abf838ca8c3f/html5/thumbnails/8.jpg)
Aggregates• Why use them?
– Yield flat feature vectors– No blowup in number of tuples
•Can group tuples in all related tables
– Can keep as detailed stats as desired•Not just max, mean, etc.•Parametric dists from sufficient stats•Can apply tests for grouping
– Choice of aggregates can be model-based•Better generalization• Include domain knowledge in model
choice
![Page 9: Aggregate features for relational data Claudia Perlich, Foster Provost Pat Tressel 16-May-2005](https://reader035.vdocuments.site/reader035/viewer/2022070413/5697bfc91a28abf838ca8c3f/html5/thumbnails/9.jpg)
Aggregates
• Anything wrong with them?– Data is lost– Relational structure is lost– Influential individuals are lumped in
•Doesn’t discover critical individuals•Dominates other data
– Any choice of aggregates assumes a model•What if it’s wrong?
– Adding new data can require calculations•But can avoid issue by keeping
sufficient statistics
![Page 10: Aggregate features for relational data Claudia Perlich, Foster Provost Pat Tressel 16-May-2005](https://reader035.vdocuments.site/reader035/viewer/2022070413/5697bfc91a28abf838ca8c3f/html5/thumbnails/10.jpg)
Taxonomy of aggregates
• Why is this useful?– Promote deliberate use of aggregates– Point out gaps in current use of
aggregates– Find appropriate techniques for each class
• Based on “complexity” due to:– Relational structure
•Cardinality of the relations (1:1, 1:n, m:n)
– Feature extraction•Computing the aggregates
– Class prediction
![Page 11: Aggregate features for relational data Claudia Perlich, Foster Provost Pat Tressel 16-May-2005](https://reader035.vdocuments.site/reader035/viewer/2022070413/5697bfc91a28abf838ca8c3f/html5/thumbnails/11.jpg)
Taxonomy of aggregates• Formal statement of the task:
• Notation (here and on following slides):– t, tuple (from “target” table T, with main
features) – y, class (known per t if training)– Ψ, aggregation function– Φ, classification function– σ, select operation (where joins preserve t)– Ω, all tables; B, any other table, b in B– u, fields to be added to t from other tables– f, a field in u– More, that doesn’t fit on this slide
))),((,( tty
![Page 12: Aggregate features for relational data Claudia Perlich, Foster Provost Pat Tressel 16-May-2005](https://reader035.vdocuments.site/reader035/viewer/2022070413/5697bfc91a28abf838ca8c3f/html5/thumbnails/12.jpg)
Taxonomy of aggregates• Formal statement of the task:
• Notation (here and on following slides):– Caution! Simplified from what’s in the paper!– t, tuple (from “target” table T, with main
features) – y, class (known per t if training)– Ψ, aggregation function– Φ, classification function– σ, select operation (where joins preserve t)– Ω, all tables; B, any other table, b a tuple in B– u, fields to be added to t from joined tables– f, a field in u– More, that doesn’t fit on this slide
))),((,( tty
![Page 13: Aggregate features for relational data Claudia Perlich, Foster Provost Pat Tressel 16-May-2005](https://reader035.vdocuments.site/reader035/viewer/2022070413/5697bfc91a28abf838ca8c3f/html5/thumbnails/13.jpg)
Aggregation complexity
• Simple – One field from one object type
• Denoted by: s
![Page 14: Aggregate features for relational data Claudia Perlich, Foster Provost Pat Tressel 16-May-2005](https://reader035.vdocuments.site/reader035/viewer/2022070413/5697bfc91a28abf838ca8c3f/html5/thumbnails/14.jpg)
Aggregation complexity
• Multi-dimensional– Multiple fields, one object type
• Denoted by:
m
![Page 15: Aggregate features for relational data Claudia Perlich, Foster Provost Pat Tressel 16-May-2005](https://reader035.vdocuments.site/reader035/viewer/2022070413/5697bfc91a28abf838ca8c3f/html5/thumbnails/15.jpg)
Aggregation complexity
• Multi-type– Multiple
object types
• Denoted by:
t
![Page 16: Aggregate features for relational data Claudia Perlich, Foster Provost Pat Tressel 16-May-2005](https://reader035.vdocuments.site/reader035/viewer/2022070413/5697bfc91a28abf838ca8c3f/html5/thumbnails/16.jpg)
Relational “concept” complexity
• Propositional– No aggregation– Single tuple, 1-1 or n-1 joins
•n-1 is just a shared object
– Not relational per se – already flat
))(,(
))(,(
)(
)1:(..*.
)1:1(..*.
BTty
BTty
ty
nTkeybkeytB
TkeybkeytB
![Page 17: Aggregate features for relational data Claudia Perlich, Foster Provost Pat Tressel 16-May-2005](https://reader035.vdocuments.site/reader035/viewer/2022070413/5697bfc91a28abf838ca8c3f/html5/thumbnails/17.jpg)
Relational “concept” complexity
• Independent fields– Separate aggregation per field– Separate 1-n joins with T
)))((,( ):1(... BTty nTkeybkeytiBs
![Page 18: Aggregate features for relational data Claudia Perlich, Foster Provost Pat Tressel 16-May-2005](https://reader035.vdocuments.site/reader035/viewer/2022070413/5697bfc91a28abf838ca8c3f/html5/thumbnails/18.jpg)
Relational “concept” complexity
• Dependent fields in same table– Multi-dimensional aggregation– Separate 1-n joins with T
)))((,( ):1(..*. BTty nTkeybkeytBm
![Page 19: Aggregate features for relational data Claudia Perlich, Foster Provost Pat Tressel 16-May-2005](https://reader035.vdocuments.site/reader035/viewer/2022070413/5697bfc91a28abf838ca8c3f/html5/thumbnails/19.jpg)
Relational “concept” complexity
• Dependent fields over multiple tables– Multi-type aggregation– Separate 1-n joins, still only with T
)))(
),((,(
):1(..*.
):1(..*.
BT
ATty
nTkeybkeytB
nTkeyakeytAt
![Page 20: Aggregate features for relational data Claudia Perlich, Foster Provost Pat Tressel 16-May-2005](https://reader035.vdocuments.site/reader035/viewer/2022070413/5697bfc91a28abf838ca8c3f/html5/thumbnails/20.jpg)
Relational “concept” complexity
• Global– Any joins or combinations of fields
•Multi-type aggregation•Multi-way joins• Joins among tables other than T
)))((,( * tty
![Page 21: Aggregate features for relational data Claudia Perlich, Foster Provost Pat Tressel 16-May-2005](https://reader035.vdocuments.site/reader035/viewer/2022070413/5697bfc91a28abf838ca8c3f/html5/thumbnails/21.jpg)
Current relational aggregation
• First-order logic– Find clauses that directly predict the
class•Ф is OR
– Form binary features from tests•Logical and arithmetic tests•These go in the feature vector•Ф is any ordinary classifier
![Page 22: Aggregate features for relational data Claudia Perlich, Foster Provost Pat Tressel 16-May-2005](https://reader035.vdocuments.site/reader035/viewer/2022070413/5697bfc91a28abf838ca8c3f/html5/thumbnails/22.jpg)
Current relational aggregation
• The usual database aggregates– For numerical values:
•mean, min, max, count, sum, etc.
– For categorical values:•Most common value•Count per value
![Page 23: Aggregate features for relational data Claudia Perlich, Foster Provost Pat Tressel 16-May-2005](https://reader035.vdocuments.site/reader035/viewer/2022070413/5697bfc91a28abf838ca8c3f/html5/thumbnails/23.jpg)
Current relational aggregation
• Set distance– Two tuples, each with a set of related
tuples– Distance metric between related fields
•Euclidean for numerical data•Edit distance for categorical
– Distance between sets is distance of closest pair
![Page 24: Aggregate features for relational data Claudia Perlich, Foster Provost Pat Tressel 16-May-2005](https://reader035.vdocuments.site/reader035/viewer/2022070413/5697bfc91a28abf838ca8c3f/html5/thumbnails/24.jpg)
Proposed relational aggregation• Recall the point of this work:
– Tuple t from table T is part of a feature vector
– Want to augment w/ info from other tables– Info added to t must be consistent w/ values
in t– Need to flatten the added info to yield one
vector per tuple t– Use that to:
•Train classifier given class y for t•Predict class y for t
![Page 25: Aggregate features for relational data Claudia Perlich, Foster Provost Pat Tressel 16-May-2005](https://reader035.vdocuments.site/reader035/viewer/2022070413/5697bfc91a28abf838ca8c3f/html5/thumbnails/25.jpg)
Proposed relational aggregation• Outline of steps:
– Do query to get more info u from other tables
– Partition the results based on:•Main features t•Class y•Predicates on t
– Extract distributions over results for fields in u•Get distribution for each partition•For now, limit to categorical fields•Suggest extension to numerical fields
– Derive features from distributions
![Page 26: Aggregate features for relational data Claudia Perlich, Foster Provost Pat Tressel 16-May-2005](https://reader035.vdocuments.site/reader035/viewer/2022070413/5697bfc91a28abf838ca8c3f/html5/thumbnails/26.jpg)
Do query to get info from other tables• Select
– Based on the target table T– If training, known class y is included in T– Joins must preserve distinct values from T
• Join on as much of T’s key as is present in other table
• Maybe need to constrain other fields?• Not a problem for correctly normalized tables
• Project– Include all of t– Append additional fields u from joined tables
• Anything up to all fields from joins
![Page 27: Aggregate features for relational data Claudia Perlich, Foster Provost Pat Tressel 16-May-2005](https://reader035.vdocuments.site/reader035/viewer/2022070413/5697bfc91a28abf838ca8c3f/html5/thumbnails/27.jpg)
Extract distributions• Partition query results various ways,
e.g.:– Into cases per each t
• For training, include the (known) class y in t
– Also (if training) split per each class• Want this for class priors
– Split per some (unspecifed) predicate c(t)
• For each partition:– There is a bag of associated u tuples
• Ignore the t part – already a flat vector
– Split vertically to get bags of individual values per each field f in u
• Note this breaks association between fields!
![Page 28: Aggregate features for relational data Claudia Perlich, Foster Provost Pat Tressel 16-May-2005](https://reader035.vdocuments.site/reader035/viewer/2022070413/5697bfc91a28abf838ca8c3f/html5/thumbnails/28.jpg)
Distributions for categorical fields• Let categorical field be f with values fi
• Form histogram for each partition– Count instances of each value fi of f in a bag
– These are sufficient statistics for:•Distribution over fi values
•Probability of each bag in the partition
• Start with one per each tuple t and field f– Cf
t, (per-) case vector
– Component Cft[i], count for fi
![Page 29: Aggregate features for relational data Claudia Perlich, Foster Provost Pat Tressel 16-May-2005](https://reader035.vdocuments.site/reader035/viewer/2022070413/5697bfc91a28abf838ca8c3f/html5/thumbnails/29.jpg)
Distributions for categorical fields• Distribution of histograms per
predicatec(t) and field f– Treat histogram counts as random
variables•Regard c(t) true partition as a collection
of histogram “samples”•Regard histograms as vectors of random
variables, one per field value fi
– Extract moments of these histogram count distributions•mean (sort of) – reference vector•variance (sort of) – variance vector
![Page 30: Aggregate features for relational data Claudia Perlich, Foster Provost Pat Tressel 16-May-2005](https://reader035.vdocuments.site/reader035/viewer/2022070413/5697bfc91a28abf838ca8c3f/html5/thumbnails/30.jpg)
Distributions for categorical fields• Net histogram per predicate c(t), field f
– c(t) partitions tuples t into two groups•Only histogram the c(t) true group•Could include ~c as a predicate if we want
– Don’t re-count!•Already have histograms for each t and f –
case reference vectors•Sum the case reference vectors columnwise
– Call this a “reference vector”, Rfc
•Proportional to average histogram over t for c(t) true (weighted by # samples per t)
![Page 31: Aggregate features for relational data Claudia Perlich, Foster Provost Pat Tressel 16-May-2005](https://reader035.vdocuments.site/reader035/viewer/2022070413/5697bfc91a28abf838ca8c3f/html5/thumbnails/31.jpg)
Distributions for categorical fields
• Variance of case histograms per predicatec(t) and field f– Define “variance vector”, Vf
c
•Columnwise sum of squares of case reference vectors / number of samples with c(t) true
•Not an actual variance– Squared means not subtracted
•Don’t care:– It’s indicative of the variance...– Throw in means-based features as well to give
classifier full variance info
![Page 32: Aggregate features for relational data Claudia Perlich, Foster Provost Pat Tressel 16-May-2005](https://reader035.vdocuments.site/reader035/viewer/2022070413/5697bfc91a28abf838ca8c3f/html5/thumbnails/32.jpg)
Distributions for categorical fields
• What predicates might we use?– Unconditionally true, c(t) = true
•Result is net distribution independent of t
•Unconditional reference vector, R
– Per class k, ck(t) = (t.y == k)•Class priors•Recall for training data, y is a field in t•Per class reference vector,
![Page 33: Aggregate features for relational data Claudia Perlich, Foster Provost Pat Tressel 16-May-2005](https://reader035.vdocuments.site/reader035/viewer/2022070413/5697bfc91a28abf838ca8c3f/html5/thumbnails/33.jpg)
Distributions for categorical fields• Summary of notation
– c(t), a predicate based on values in a tuple t– f, a categorical field from a join with T– fi, values of f– Rf
c, reference vector• histogram over fi values in bag for c(t)
true
– Cft, case vector
• histogram over fi values for t’s bag
– R, unconditional reference vector– Vf
c, variance vector• Columnwise average squared ref. vector
– X[i], i th value in some ref. vector X
![Page 34: Aggregate features for relational data Claudia Perlich, Foster Provost Pat Tressel 16-May-2005](https://reader035.vdocuments.site/reader035/viewer/2022070413/5697bfc91a28abf838ca8c3f/html5/thumbnails/34.jpg)
Distributions for numerical data
• Same general idea – representative distributions per various partitions
• Can use categorical techniques if we:– Bin the numerical values– Treat each bin as a categorical value
![Page 35: Aggregate features for relational data Claudia Perlich, Foster Provost Pat Tressel 16-May-2005](https://reader035.vdocuments.site/reader035/viewer/2022070413/5697bfc91a28abf838ca8c3f/html5/thumbnails/35.jpg)
Feature extraction• Base features on ref. and variance vectors• Two kinds:
– “Interesting” values•one value from case reference vector
per t•same column in vector for all t•assorted options for choosing column•choices depend on predicate ref. vectors
– Vector distances•distance between case ref. vector and
predicate ref. vector•various distance metrics
• More notation: acronym for each feature type
![Page 36: Aggregate features for relational data Claudia Perlich, Foster Provost Pat Tressel 16-May-2005](https://reader035.vdocuments.site/reader035/viewer/2022070413/5697bfc91a28abf838ca8c3f/html5/thumbnails/36.jpg)
Feature extraction: “interesting” values• For a given c, f, select that fi which
is...– MOC: Most common overall
• argmaxi R[i]
– Most common in each class• For binary class y
– Positive is y = 1, Negative is y = 0
• MOP: argmaxi Rft.y=1[i]
• MON: argmaxi Rft.y=0[i]
– Most distinctive per class• Common in one class but not in other(s)• MOD: argmaxi |Rf
t.y=1[i] - Rft.y=0[i] |
• MOM: argmaxi MOD / Vft.y=1[i] - Vf
t.y=0[i] – Normalizes for variance (sort of)
![Page 37: Aggregate features for relational data Claudia Perlich, Foster Provost Pat Tressel 16-May-2005](https://reader035.vdocuments.site/reader035/viewer/2022070413/5697bfc91a28abf838ca8c3f/html5/thumbnails/37.jpg)
Feature extraction: vector distance• Distance btw given ref. vector & each case
vector• Distance metrics
– ED: Edit – not defined•Sum of abs. diffs, a.k.a. Manhattan dist?• Σi |C[i] – R[i] |
– EU: Euclidean• √(C[i] T R[i] ), omit √ for speed
– MA: Mahalanobis• √(C[i] T Σ-1 R[i] ), omit √ for speed• Σ should be covariance...of what?
– CO: Cosine, 1- cos(angle btw vectors)•1 - C[i] T R[i] / √ (|C[i] ||R[i] |)
![Page 38: Aggregate features for relational data Claudia Perlich, Foster Provost Pat Tressel 16-May-2005](https://reader035.vdocuments.site/reader035/viewer/2022070413/5697bfc91a28abf838ca8c3f/html5/thumbnails/38.jpg)
Feature extraction: vector distance• Apply each metric w/ various ref.
vectors– Acronym is metric w/ suffix for ref.
vector– (No suffix): Unconditional ref. vector
– P: per-class positive ref. vector, Rft.y=1
– N: per-class positive ref. vector, Rft.y=0
– D: difference between P and D distances
• Alphabet soup, e.g. EUP, MAD,...
![Page 39: Aggregate features for relational data Claudia Perlich, Foster Provost Pat Tressel 16-May-2005](https://reader035.vdocuments.site/reader035/viewer/2022070413/5697bfc91a28abf838ca8c3f/html5/thumbnails/39.jpg)
Feature extraction• Other features added for tests
– Not part of their aggregation proposal– AH: “abstraction hierarchy” (?)
•Pull into T all fields that are just “shared records” via n:1 references
– AC: “autocorrelation” aggregation•For joins back into T, get other cases
“linked to” each t•Fraction of positive cases among
others
![Page 40: Aggregate features for relational data Claudia Perlich, Foster Provost Pat Tressel 16-May-2005](https://reader035.vdocuments.site/reader035/viewer/2022070413/5697bfc91a28abf838ca8c3f/html5/thumbnails/40.jpg)
Learning• Find linked tables
– Starting from T, do breadth-first walk of schema graph•Up to some max depth•Cap number of paths followed
– For each path, know T is linked to last table in path
• Extract aggregate fields– Pull in all fields of last table in path– Aggregate them (using new aggregates)
per t– Append aggregates to t
![Page 41: Aggregate features for relational data Claudia Perlich, Foster Provost Pat Tressel 16-May-2005](https://reader035.vdocuments.site/reader035/viewer/2022070413/5697bfc91a28abf838ca8c3f/html5/thumbnails/41.jpg)
Learning• Classifier
– Pick 10 subsets each w/ 10 features•Random choice, weighted by
“performance”•But there’s no classifier yet...so how do
features predict class?
– Build a decision tree for each feature set•Have class frequencies at leaves
– Features might not completely distinguish classes
•Class prediction:– Select class with higher frequency
•Class probability estimation:– Average frequencies over trees
![Page 42: Aggregate features for relational data Claudia Perlich, Foster Provost Pat Tressel 16-May-2005](https://reader035.vdocuments.site/reader035/viewer/2022070413/5697bfc91a28abf838ca8c3f/html5/thumbnails/42.jpg)
Tests• IPO data
– 5 tables•Most fields in the “main” table, used as T•Other tables had key & one data field•Predicate on one field in T used as the
class
• Tested against:– First-order logic aggregation
•Extract clauses using an ILP system•Append evaluated clauses to each t
– Various ILP systems•Using just data in T• (Or T and AH features?)
![Page 43: Aggregate features for relational data Claudia Perlich, Foster Provost Pat Tressel 16-May-2005](https://reader035.vdocuments.site/reader035/viewer/2022070413/5697bfc91a28abf838ca8c3f/html5/thumbnails/43.jpg)
Tests• IPO data
– 5 tables w/ small, simple schema• Majority of fields were in the “main” table, i.e. T
– The only numeric fields were in main table, so no aggregation of numeric features needed
• Other tables had key & one data field• Max path length 2 to reach all tables, no
recursion• Predicate on one field in T used as the class
• Tested against:– First-order logic aggregation
• Extract clauses using an ILP system• Append evaluated clauses to each t
– Various ILP systems• Using just data in T (or T and AH features?)
![Page 44: Aggregate features for relational data Claudia Perlich, Foster Provost Pat Tressel 16-May-2005](https://reader035.vdocuments.site/reader035/viewer/2022070413/5697bfc91a28abf838ca8c3f/html5/thumbnails/44.jpg)
Test results• See paper for numbers• Accuracy with aggregate features:
– Up to 10% increase over only features from T– Depends on which and how many extra
features used– Most predictive feature was in a separate table– Expect accuracy increase as more info
available– Shows info was not destroyed by aggregation– Vector distance features better
• Generalization
![Page 45: Aggregate features for relational data Claudia Perlich, Foster Provost Pat Tressel 16-May-2005](https://reader035.vdocuments.site/reader035/viewer/2022070413/5697bfc91a28abf838ca8c3f/html5/thumbnails/45.jpg)
Interesting ideas (“I”) & benefits (“B”)• Taxonomy
– I: Division into stages of aggregation• Slot in any procedure per stage• Estmate complexity per stage
– B: Might get the discussion going
• Aggregate features– I: Identifying a “main” table
• Others get aggregated
– I: Forming partitions to aggregate over • Using queries with joins to pull in other tables• Abstract partitioning based on predicate
– I: Comparing case against reference histograms
– I: Separate comparison method and reference
![Page 46: Aggregate features for relational data Claudia Perlich, Foster Provost Pat Tressel 16-May-2005](https://reader035.vdocuments.site/reader035/viewer/2022070413/5697bfc91a28abf838ca8c3f/html5/thumbnails/46.jpg)
Interesting ideas (“I”) & benefits (“B”)• Learning
– I: Decision tree tricks•Cut DT induction off short to get class
freqs•Starve DT of features to improve
generalization
![Page 47: Aggregate features for relational data Claudia Perlich, Foster Provost Pat Tressel 16-May-2005](https://reader035.vdocuments.site/reader035/viewer/2022070413/5697bfc91a28abf838ca8c3f/html5/thumbnails/47.jpg)
Issues• Some worrying lapses...
– Lacked standard terms for common concepts
• “position i [of vector has] the number of instances of [ith value]”... -> histogram
• “abstraction hierarchy” -> schema• “value order” -> enumeration• Defined (and emphasized) terms for trivial and
commonly used things
– Imprecise use of terms• “variance” for (something like) second moment• I’m not confident they know what Mahalanobis
distance is• They say “left outer join” and show inner join
symbol
![Page 48: Aggregate features for relational data Claudia Perlich, Foster Provost Pat Tressel 16-May-2005](https://reader035.vdocuments.site/reader035/viewer/2022070413/5697bfc91a28abf838ca8c3f/html5/thumbnails/48.jpg)
Issues• Some worrying lapses...
– Did not connect “reference vector” and “variance vector” to underlying statistics
• Should relate to bag prior and field value conditional probability, not just “weighted”
– Did not acknowledge loss of correlation info from splitting up joined u tuples in their features
• Assumes fields are independent• Dependency was mentioned in the taxonomy
– Fig 1 schema cannot support § 2 example query
• Missing a necessary foreign key reference
![Page 49: Aggregate features for relational data Claudia Perlich, Foster Provost Pat Tressel 16-May-2005](https://reader035.vdocuments.site/reader035/viewer/2022070413/5697bfc91a28abf838ca8c3f/html5/thumbnails/49.jpg)
Issues• Some worrying lapses...
– Their formal statement of the task did not show aggregation as dependent on t•Needed for c(t) partitioning
– Did not clearly distinguish when t did or did not contain class•No need to put it in there at all
– No, the higher Gaussian moments are not all zero!•Only the odd ones are. Yeesh.•Correct reason we don’t need them is:
all can be computed from mean and variance
– Uuugly notation
![Page 50: Aggregate features for relational data Claudia Perlich, Foster Provost Pat Tressel 16-May-2005](https://reader035.vdocuments.site/reader035/viewer/2022070413/5697bfc91a28abf838ca8c3f/html5/thumbnails/50.jpg)
Issues• Some worrying lapses...
– Did not cite other uses of histograms or distributions extracted as features•“Spike-triggered average” / covariance /
etc.– Used by: all neurobiology, neurocomputation– E.g.: de Ruyter van Steveninck & Bialek
•“Response-conditional ensemble”– Used by: Our own Adrienne Fairhall &
colleagues– E.g.: Aguera & Arcas, Fairhall, Bialek
•“Event-triggered distribution”– Used by: me ☺– E.g.: CSE528 project
![Page 51: Aggregate features for relational data Claudia Perlich, Foster Provost Pat Tressel 16-May-2005](https://reader035.vdocuments.site/reader035/viewer/2022070413/5697bfc91a28abf838ca8c3f/html5/thumbnails/51.jpg)
Issues• Some worrying lapses...
– Did not cite other uses of histograms or distributions extracted as features...
– So, did not use “standard” tricks•Dimension reduction:
– Treat histogram as a vector– Do PCA, keep top few eigenmodes, new
features are projections
– Nor “special” tricks:•Subtract prior covariance before PCA
– Likewise competing the classes is not new
![Page 52: Aggregate features for relational data Claudia Perlich, Foster Provost Pat Tressel 16-May-2005](https://reader035.vdocuments.site/reader035/viewer/2022070413/5697bfc91a28abf838ca8c3f/html5/thumbnails/52.jpg)
Issues• Non-goof issues
– Would need bookkeeping to maintain variance vector for online learning
• Don’t have sufficient statistics• Histograms are actual “samples” • Adding new data doesn’t add new “samples”:
changes existing ones• Could subtract old contribution, add new one• Use a triggered query
– Don’t bin those nice numerical variables!• Binning makes vectors out of scalars• Scalar fields can be ganged into a vector across
fields!• Do (e.g.) clustering on the bag of vectors
• That’s enough of that