Download - Introduction to Apache OODT
![Page 1: Introduction to Apache OODT](https://reader034.vdocuments.site/reader034/viewer/2022051115/568147ad550346895db4ea7e/html5/thumbnails/1.jpg)
Introduction to Apache OODT
Yang Li
Mar 9, 2012
![Page 2: Introduction to Apache OODT](https://reader034.vdocuments.site/reader034/viewer/2022051115/568147ad550346895db4ea7e/html5/thumbnails/2.jpg)
What is OODT
• Object Oriented Data Technology
• Science data management
• Archiving Systems that span scientific disciplines
• Enable interoperability among data agnostic systems (astrophysics, planetary, space science data systems, open source web analytics)
![Page 3: Introduction to Apache OODT](https://reader034.vdocuments.site/reader034/viewer/2022051115/568147ad550346895db4ea7e/html5/thumbnails/3.jpg)
History
• 2001– deployed to make virtual specimen bank for Early
Detection Research Network (oncology)• 2004
– Core architectural software of Planetary Data System Data Distribution deployed by NASA (planetary science)
• 2007– deployed for the Orbiting Carbon Observatory and
Seawinds missions (earth science)• 2008
– deployed in for National Polar-Orbiting Environmental Satellite System (atmospheric science)
![Page 4: Introduction to Apache OODT](https://reader034.vdocuments.site/reader034/viewer/2022051115/568147ad550346895db4ea7e/html5/thumbnails/4.jpg)
Framework
• Catalog & Archive
• Utilities
• Grid
• Agility
![Page 5: Introduction to Apache OODT](https://reader034.vdocuments.site/reader034/viewer/2022051115/568147ad550346895db4ea7e/html5/thumbnails/5.jpg)
Catalog & Archive
• Deal with large-scale ingest of data, metadata extraction of data, post-processing of data into derived and higher-order products, cataloging of data, searching of catalogs, versioning, and retrieval
• Components:– Catalog, Crawling framework, Curation, File
manager, Metadata, PCS, Push/Pull framework, Resource management, Workflow, CAS install, Web apps
![Page 6: Introduction to Apache OODT](https://reader034.vdocuments.site/reader034/viewer/2022051115/568147ad550346895db4ea7e/html5/thumbnails/6.jpg)
Catalog
• Virtualize underlying catalogs for use in the CAS system
• Heterogeneous catalog models are mapped to a common dictionary, and then integrated locally so that they may be queried across and ingested into
![Page 7: Introduction to Apache OODT](https://reader034.vdocuments.site/reader034/viewer/2022051115/568147ad550346895db4ea7e/html5/thumbnails/7.jpg)
CAS Crawler
• Standardize the common ingestion activities– identification of files and directories to
crawl– satisfaction of ingestion pre-conditions– metadata extraction
• Ingestion
![Page 8: Introduction to Apache OODT](https://reader034.vdocuments.site/reader034/viewer/2022051115/568147ad550346895db4ea7e/html5/thumbnails/8.jpg)
CAS Crawler
![Page 9: Introduction to Apache OODT](https://reader034.vdocuments.site/reader034/viewer/2022051115/568147ad550346895db4ea7e/html5/thumbnails/9.jpg)
Curation
• A web application for managing policy for products and files and metadata that have been ingested via the CAS component– Use a servlet container to deploy the web app– Staging area
• Directories on local machine holding data products
– Metadata generation area• Create metadata files to associate with data
products
![Page 10: Introduction to Apache OODT](https://reader034.vdocuments.site/reader034/viewer/2022051115/568147ad550346895db4ea7e/html5/thumbnails/10.jpg)
File Manager
• Provide everything to catalog, archive and manage files, and directories, and their associated metadata
• Separate data stores and metadata stores as standard interfaces
![Page 11: Introduction to Apache OODT](https://reader034.vdocuments.site/reader034/viewer/2022051115/568147ad550346895db4ea7e/html5/thumbnails/11.jpg)
Workflow
• Provides everything to execute workflows, and science processing pipelines.
• Separate workflow repositories and workflow engines as standard interfaces
![Page 12: Introduction to Apache OODT](https://reader034.vdocuments.site/reader034/viewer/2022051115/568147ad550346895db4ea7e/html5/thumbnails/12.jpg)
Resource Management
• Job management– Execution, monitoring, traking
• Underlying software system and hardware resources– e.g. disk space, computational resources,
and shared identity
![Page 13: Introduction to Apache OODT](https://reader034.vdocuments.site/reader034/viewer/2022051115/568147ad550346895db4ea7e/html5/thumbnails/13.jpg)
Resource Management (Cont)
• Critical objects– Job, Job Input, Job Spec, Job Instance,
Resource Node
![Page 14: Introduction to Apache OODT](https://reader034.vdocuments.site/reader034/viewer/2022051115/568147ad550346895db4ea7e/html5/thumbnails/14.jpg)
Metadata
• A Multi-valued, generic Metadata container class
• Internal map of string keys pointing to vectors of strings – [std:string key] std:vector of std:strings⇒
![Page 15: Introduction to Apache OODT](https://reader034.vdocuments.site/reader034/viewer/2022051115/568147ad550346895db4ea7e/html5/thumbnails/15.jpg)
Framework
• Catalog & Archive
• Common Utilities
• Grid
• Agility
![Page 16: Introduction to Apache OODT](https://reader034.vdocuments.site/reader034/viewer/2022051115/568147ad550346895db4ea7e/html5/thumbnails/16.jpg)
Common Utilities
• Provide needed support for catalogs, archives, and grids
• Query Expression – Platform neutral and extensible way of
posing questions
• Single Sign On
• Commons– Lots of miscellaneous utilities, including I/O
streams, logging, XML, and more
![Page 17: Introduction to Apache OODT](https://reader034.vdocuments.site/reader034/viewer/2022051115/568147ad550346895db4ea7e/html5/thumbnails/17.jpg)
Query Expression
• Provide a way to express queries in a generic manner
• Use boolean postfix expressions to capture the domain, range, and constraint of a query, regardless of the source of the query
• Encapsulate the results of a query– standard way to pass a query and its
results between servers, clients, nodes, and other components.
![Page 18: Introduction to Apache OODT](https://reader034.vdocuments.site/reader034/viewer/2022051115/568147ad550346895db4ea7e/html5/thumbnails/18.jpg)
Framework
• Catalog & Archive
• Utilities
• Grid
• Agility
![Page 19: Introduction to Apache OODT](https://reader034.vdocuments.site/reader034/viewer/2022051115/568147ad550346895db4ea7e/html5/thumbnails/19.jpg)
Grid
• Profile (metadata) and Product (data) services• Product
– Retrieves resources (products) in platform-neutral formats
• Profile– Describes and discovers resources using
extensible metadata called "profiles"• Web Grid
– provides profile and product services over a REST-ful interface.
• XML Product/Profile handlers– provides XML-configurable, Database profile and
product handlers.
![Page 20: Introduction to Apache OODT](https://reader034.vdocuments.site/reader034/viewer/2022051115/568147ad550346895db4ea7e/html5/thumbnails/20.jpg)
Product
• Provide access to data products– datasets, images, documents, or anything
with an electronic representation
• Accept standard query expressions and return zero or more matching products
• Transform products from proprietary formats and into Internet standard formats without impacting local stores or operations.
![Page 21: Introduction to Apache OODT](https://reader034.vdocuments.site/reader034/viewer/2022051115/568147ad550346895db4ea7e/html5/thumbnails/21.jpg)
Profile
• Describes and Locates resources using metadata descriptions– resource's inception, composition, and
location
• Catalogs metadata descriptions and provides creating, updating, and querying capabilities.
![Page 22: Introduction to Apache OODT](https://reader034.vdocuments.site/reader034/viewer/2022051115/568147ad550346895db4ea7e/html5/thumbnails/22.jpg)
Framework
• Catalog & Archive
• Utilities
• Grid
• Agility
![Page 23: Introduction to Apache OODT](https://reader034.vdocuments.site/reader034/viewer/2022051115/568147ad550346895db4ea7e/html5/thumbnails/23.jpg)
Agility
• Re-implementation of Grid in Python with a focus on high performance in the face of gargantuan data sets as well as accelerated development and integration into existing systems.
![Page 24: Introduction to Apache OODT](https://reader034.vdocuments.site/reader034/viewer/2022051115/568147ad550346895db4ea7e/html5/thumbnails/24.jpg)
Questions