having your cake and eating it too
DESCRIPTION
Having Your Cake and Eating It Too. With Apache OODT and Apache Solr. Andrew F. Hart Paul M. Ramirez. About Myself…. Software Engineer NASA Jet Propulsion Laboratory “Data Management” Committer: OODT, SIS, Gora, Streams (Incubating) Mentor: Streams (Incubating). What We’ll Cover. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Having Your Cake and Eating It Too](https://reader035.vdocuments.site/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/1.jpg)
Having Your Cake and Eating It Too
With Apache OODT and Apache Solr
Andrew F. Hart
Paul M. Ramirez
![Page 2: Having Your Cake and Eating It Too](https://reader035.vdocuments.site/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/2.jpg)
About Myself…• Software Engineer– NASA Jet Propulsion Laboratory– “Data Management”
• Committer: – OODT, SIS, Gora, Streams (Incubating)
• Mentor: Streams (Incubating)
![Page 3: Having Your Cake and Eating It Too](https://reader035.vdocuments.site/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/3.jpg)
What We’ll Cover• Overview of OODT & Solr Projects
• Strategies for Combining OODT and Solr
• Detailed Deployment/Config. Example
• Where to Learn More & Participate
![Page 4: Having Your Cake and Eating It Too](https://reader035.vdocuments.site/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/4.jpg)
Apache OODT• Object Oriented Data Technology• Origin in NASA mission data systems• Components for– Information integration– Data cataloging and archiving– Configurable workflow processing
![Page 5: Having Your Cake and Eating It Too](https://reader035.vdocuments.site/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/5.jpg)
Apache OODT• OODT @ Apache– Incubation: 2010, Graduation: 2011– 29 Committers– Latest Release: 0.5 (Dec. 26, 2012)
![Page 6: Having Your Cake and Eating It Too](https://reader035.vdocuments.site/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/6.jpg)
Apache OODT• Karoo Array Telescope (KAT-7)
![Page 7: Having Your Cake and Eating It Too](https://reader035.vdocuments.site/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/7.jpg)
Apache OODT• Virtual Pediatric Intensive Care Unit
![Page 8: Having Your Cake and Eating It Too](https://reader035.vdocuments.site/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/8.jpg)
Apache OODT• Regional Climate Model Evaluation
System
![Page 9: Having Your Cake and Eating It Too](https://reader035.vdocuments.site/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/9.jpg)
Apache OODT• Commonalities between systems– Lots of data– Defined processing steps / algorithms
• Archives important (… search important)
![Page 10: Having Your Cake and Eating It Too](https://reader035.vdocuments.site/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/10.jpg)
Apache OODT• Strengths of OODT for the above use
cases– Loosely coupled components– Standard protocols, well-defined
interfaces– Highly configurable– Vetted, reliable code
![Page 11: Having Your Cake and Eating It Too](https://reader035.vdocuments.site/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/11.jpg)
Apache Solr• Search + Web Services– Powerful features– Flexible formats– Highly configurable
![Page 12: Having Your Cake and Eating It Too](https://reader035.vdocuments.site/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/12.jpg)
Apache Solr• The White House
![Page 13: Having Your Cake and Eating It Too](https://reader035.vdocuments.site/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/13.jpg)
Apache Solr• Netflix
![Page 14: Having Your Cake and Eating It Too](https://reader035.vdocuments.site/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/14.jpg)
Apache Solr• NASA Planetary Data System
![Page 15: Having Your Cake and Eating It Too](https://reader035.vdocuments.site/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/15.jpg)
OODT & Solr• Why use these projects together?• Archives often need search capability• Similarities / Compatibilities– XML-based configuration– Environment (Java, Tomcat)
![Page 16: Having Your Cake and Eating It Too](https://reader035.vdocuments.site/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/16.jpg)
Example Integration“Standard” Data Archive Pipeline
![Page 17: Having Your Cake and Eating It Too](https://reader035.vdocuments.site/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/17.jpg)
Example Integration“Standard” Data Archive Pipeline + Search
![Page 18: Having Your Cake and Eating It Too](https://reader035.vdocuments.site/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/18.jpg)
OODT Products• Typically 1-1 with Files• Each uniquely identifiable (GUID)• Support for higher-level
“ProductType”– A way to define collections
![Page 19: Having Your Cake and Eating It Too](https://reader035.vdocuments.site/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/19.jpg)
OODT Metadata• Annotations for products• Key:{Val|Multival}• Common across all OODT components• Two general classes: – System– User
![Page 20: Having Your Cake and Eating It Too](https://reader035.vdocuments.site/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/20.jpg)
OODT Metadata• System Metadata– Added automatically by OODT
Components– Used to track state– Used to encode relationships between
data
![Page 21: Having Your Cake and Eating It Too](https://reader035.vdocuments.site/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/21.jpg)
OODT Metadata• User Metadata– Specified as “policy”– Can be product-level, or productType-
level– Used to extract & persist information
from files as they are ingested (become products)
![Page 22: Having Your Cake and Eating It Too](https://reader035.vdocuments.site/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/22.jpg)
OODT Metadata• Metadata (Policy) Example
(external)
![Page 23: Having Your Cake and Eating It Too](https://reader035.vdocuments.site/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/23.jpg)
Solr Schema• XML document• Define what will be indexed (“Fields”)• Provide high-level context hints– Data type, behavior, pre-processing
• Extremely flexible, extensible
![Page 24: Having Your Cake and Eating It Too](https://reader035.vdocuments.site/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/24.jpg)
Solr Schema• Solr Schema Example
(external)
![Page 25: Having Your Cake and Eating It Too](https://reader035.vdocuments.site/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/25.jpg)
Making the Connection• SolrIndexer Tool– Part of the File Manager component
tools–Map OODT Metadata to Solr Fields– Create Solr documents from OODT
products– Note: only talking about metadata
![Page 26: Having Your Cake and Eating It Too](https://reader035.vdocuments.site/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/26.jpg)
SolrIndexer Tool• Org.Apache.Oodt.Cas.Filemgr.Tools
• Available since 0.4 Release• Recommend to use 0.5+ as some
stability improvements were added• Several modes of operation
![Page 27: Having Your Cake and Eating It Too](https://reader035.vdocuments.site/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/27.jpg)
SolrIndexer Tool
![Page 28: Having Your Cake and Eating It Too](https://reader035.vdocuments.site/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/28.jpg)
SolrIndexerTool• Invocation Examples: Ingest all
products from the specified File Manager instancejava -DSOLR_INDEXER_CONFIG=/path/to/indexer.properties \ -Djava.ext.dirs=/path/to/cas/filemgr/lib/ \ org.apache.oodt.cas.filemgr.tools.SolrIndexer \ --all \ --fmUrl http://localhost:9000 \ --solrUrl http://localhost:8080/solr
![Page 29: Having Your Cake and Eating It Too](https://reader035.vdocuments.site/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/29.jpg)
SolrIndexerTool• Invocation Examples: Ingest all
products from the specified ProductType(s)java -DSOLR_INDEXER_CONFIG=/path/to/indexer.properties \ -Djava.ext.dirs=/path/to/cas/filemgr/lib/ \ org.apache.oodt.cas.filemgr.tools.SolrIndexer \ --types urn:some:ProductType \ --fmUrl http://localhost:9000 \ --solrUrl http://localhost:8080/solr
![Page 30: Having Your Cake and Eating It Too](https://reader035.vdocuments.site/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/30.jpg)
SolrIndexerTool• Invocation Examples: Ingest a single
product by its unique product id
java -DSOLR_INDEXER_CONFIG=/path/to/indexer.properties \ -Djava.ext.dirs=/path/to/cas/filemgr/lib/ \ org.apache.oodt.cas.filemgr.tools.SolrIndexer \ --product 19bcb4b8-7999-11e1-b581-8b771498975d \ [--delete] \ --fmUrl http://localhost:9000 \ --solrUrl http://localhost:8080/solr
![Page 31: Having Your Cake and Eating It Too](https://reader035.vdocuments.site/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/31.jpg)
SolrIndexerTool• Invocation Examples: Force
optimization of the Solr index
java -DSOLR_INDEXER_CONFIG=/path/to/indexer.properties \ -Djava.ext.dirs=/path/to/cas/filemgr/lib/ \ org.apache.oodt.cas.filemgr.tools.SolrIndexer \ --optimize --solrUrl http://localhost:8080/solr
![Page 32: Having Your Cake and Eating It Too](https://reader035.vdocuments.site/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/32.jpg)
Indexer.properties• Configuration file for the SolrIndexer• Specify mapping between OODT
product metadata and Solr fields• Additional “pre-processing” features
![Page 33: Having Your Cake and Eating It Too](https://reader035.vdocuments.site/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/33.jpg)
Indexer.properties• Example Indexer.properties file
(external)
![Page 34: Having Your Cake and Eating It Too](https://reader035.vdocuments.site/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/34.jpg)
Use Case I• Building a searchable data archive• “Long-term” / “Lights-out” archive• Products & metadata immutable• Many NASA mission data systems
use this model• Want to make it easily searchable
![Page 35: Having Your Cake and Eating It Too](https://reader035.vdocuments.site/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/35.jpg)
Use Case I“Standard” Data Archive Pipeline + Search
![Page 36: Having Your Cake and Eating It Too](https://reader035.vdocuments.site/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/36.jpg)
Use Cases II• Building an interactively editable,
searchable data archive• Data and metadata mutable• Want to dynamically select
product(s) to edit based on metadata
![Page 37: Having Your Cake and Eating It Too](https://reader035.vdocuments.site/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/37.jpg)
Use Case IIInteractively Editable Data Archive Pipeline + Search
![Page 38: Having Your Cake and Eating It Too](https://reader035.vdocuments.site/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/38.jpg)
Use Case IIInteractively Editable Data Archive Pipeline + Search
Solr catalog out of sync!
![Page 39: Having Your Cake and Eating It Too](https://reader035.vdocuments.site/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/39.jpg)
Synchronization• Two ways (at least) to solve this:
A. Modify the OODT Curator ServicesB. Treat OODT Curator Services as “black
box” and write “wrapper” service to invoke Curator Services AND update Solr (via scripted call to SolrIndexer, for example)
![Page 40: Having Your Cake and Eating It Too](https://reader035.vdocuments.site/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/40.jpg)
Modify Curator Services• Services implemented in JAX-RS• /curator/src/main/java/org/apache/oodt/cas/
curation/service
• [curator_url]/services/metadata/update• Options:– Utilize Solr Java API–Wrap call to OODT SolrIndexer tool
![Page 41: Having Your Cake and Eating It Too](https://reader035.vdocuments.site/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/41.jpg)
Use Case II-AModified Curator Services to Simultaneously update Solr
![Page 42: Having Your Cake and Eating It Too](https://reader035.vdocuments.site/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/42.jpg)
Example• Interactive event
tagging
![Page 43: Having Your Cake and Eating It Too](https://reader035.vdocuments.site/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/43.jpg)
Wrap Curator Services• Curator Service/API is “black box”• Develop custom service that: – Issues POST request to Curator service– Updates Solr index via, e.g.:• Utilize Solr Java API• Wrap call to OODT SolrIndexer tool
![Page 44: Having Your Cake and Eating It Too](https://reader035.vdocuments.site/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/44.jpg)
Use Case II-BWrapping OODT Curation Services with Custom UI & Services
![Page 45: Having Your Cake and Eating It Too](https://reader035.vdocuments.site/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/45.jpg)
Example
![Page 46: Having Your Cake and Eating It Too](https://reader035.vdocuments.site/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/46.jpg)
Lessons• Solr compliments OODT File Manager• RESTful interfaces (Solr + OODT
Curator) allow for great flexibility in designing services and UI
• “Best” approach depends on situation
![Page 47: Having Your Cake and Eating It Too](https://reader035.vdocuments.site/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/47.jpg)
Next Steps• Develop “SolrCatalog” for OODT File
Manager?– Pros: Reduction in “moving parts”– Cons: Restrictive?
• Implement Use Case II-A as optional mode for Curator web service layer
![Page 48: Having Your Cake and Eating It Too](https://reader035.vdocuments.site/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/48.jpg)
Learning More• Solr– http://lucene.apache.org/solr• [email protected]
• OODT– http://oodt.apache.org• https://cwiki.apache.org/confluence/display/
OODT/Home• [email protected]
![Page 49: Having Your Cake and Eating It Too](https://reader035.vdocuments.site/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/49.jpg)
Thanks!• Questions?