knowledge extraction from scientific data roy williams california institute of technology...
TRANSCRIPT
Knowledge Extractionfrom
Scientific Data
Roy WilliamsCalifornia Institute of Technology
SDMIV24 October 2002
Edinburgh
KE Tools S Data
Scientific Data Datacubes
N-dimensional array– spectrum, time-series, – image, voxels, hyperspectral image
Concentration Pattern matching Integration
Event Sets Often derived from pattern matching A set of events is a table Integrating Event Sets Clustering
Knowledge Extraction
Concentration principle components cluster/outlier finding
Datacube Eventset Pattern matching From theory or from training set
Integration registration of datacubes join / crossmatch of eventsets
DatacubeSome stars from the DPOSS survey
DatacubeAn AVIRIS image of San Francisco Bay
400-2500 nm in 224 bandsR. Green, JPL
atmosphericabsorption
Concentrating Information
eg Principle Component Analysis Given a set of vectors Compute dot products
(same as correlations)
Diagonalize Throw out weaker (noise) components
Information concentrationPrinciple Component Analysis
Event Sets
Created by pattern matching from a known rule from a training set by finding clusters
Event Set = Table
name=longitudecontent=Earth coordinateunits=degreesdatatype=doubledisplay=f6.2
43.487.283.2
name=IDcontent=keyunits=nonedatatype=char
E3948547E3948545E3943766108?
103?
Gravitational Lenses
A. Szalay, Johns Hopkins
Pattern matching finds events in datacubes
Black hole collisionsLIGO: Laser Interferometric Gravitational Wave Experiment
Creating Event SetsGiven a set of volcanoes, find a lot more volcanoesHere we use Singular Value Decomposition
Supervised Classification
all sources
stellargalaxy
compactgalaxy
high fX/fopt
low fX/fopt
all sources
activedM stars
BLAGN
medium fX/fopt
NELGs
possible hi-z quasar
F/G stars?
normalgalaxies?
symbols: X-ray source counterpartscontours: all optical objects
BLAGN
Multiparameterdatacolour-colour-fx/fopt
Mike WatsonLeicester University
Integrating Datacubes
Find a mapping from one domain to the otherRegistration of DPOSS and Hubble Deep Field
Datacube RegistrationMovement of ice inferred from registration
Integrating Event Sets
Database Join Fuzzy Join
eg astronomical crossmatch
Distributed Join does the Grid do databases?
Integration of Star Catalogs
Roy Williams
2MASS versus DPOSS cross-identification with- j_m as 2MASS magnitude and - I_mtotn as DPOS magnitude
2MASS : j_m ,+ 15DPOSS: I_mtotn <= 18
DPOSS unmatched
2MASS matched
DPOSS matched
2MASS unmateched
Cross Matching
Visualizing Event SetsUnsupervised clustering
50000 stars in color-color space
A Grid of Services
Human gets Data
Network of Services
Understood by humanFurther processing after format change
Grid of pipes and enginesSwitches and actuators
data flow
Example Grid of Services
StorageService
DPOSSService
CatalogService
User’s code CrossmatchService
2MASSService
Query CheckService
QueryEstimator
flexible complex metadataAND
broadband binary
Computing Challenges
• High-dimensionalClustering & ClassificationVisualizationOutlier Detection
• Visualization of 1010 points
• Database access to 1010 points
• Large Distributed Join
Standards needed
• Bundling diverse objects togetherwith code and references
• Referencing data resources on the Gridlocal, remote, replicated, ....
Problem Solving Environment
StorageService
DPOSSService
CatalogService
User’s code CrossmatchService
2MASSService
Query CheckService
QueryEstimator
•Plumbing (big data) and electrical (control, metadata)
•Web service and workflow
•Finding service classes/implementations by semantics
•GUI / Executive / IO adapters / Algorithms