data foundation ig df organizing chairs: gary berg-cross peter wittenburg

9
Data Foundation IG DF Organizing Chairs: Gary Berg-Cross & Peter Wittenburg

Upload: jeremy-morgan

Post on 19-Jan-2018

213 views

Category:

Documents


0 download

DESCRIPTION

3  The goal is to develop alternative views, components and aspects of the DF concept and related infrastructure.  Conceptualizations be discussed to come to an agreed RDA view on how the evolving DF landscape can be productively described.  As part of this, essential DF components and their interrelation need to be identified and defined.  Some of the existing RDA groups including metadata WGs and IGs are working on DF components and need to be positioned in such a landscape.  New working groups need to be defined to work on identified components and interfaces. Goals

TRANSCRIPT

Page 1: Data Foundation IG DF Organizing Chairs: Gary Berg-Cross  Peter Wittenburg

Data Foundation IGDF

Organizing Chairs: Gary Berg-Cross & Peter Wittenburg

Page 2: Data Foundation IG DF Organizing Chairs: Gary Berg-Cross  Peter Wittenburg

2Introduction to DF Group

Coming to a reproducible data science is a high priority. Only highly automated self-documenting procedures respecting proper data organization principles will overcome current barriers. The Data Fabric group needs to work out directions in the belief that integrated data fabrics are a critical component of infrastructures

paving the way to reproducible science. The idea for this IG emerged from the discussions amongst the

chairs of various RDA WGs

Characteristics of a Data Fabric:We are just beginning to scout out the landscape of data fabric . In one view it is a minimalistic set of infrastructure and service requirements by which services can plug into (belong to) the defined  fabric. In a data fabric we ask how the separate components, developed separately, can be made to work together, this means that for different sets of components the data fabric will be different. We note, strongly, that it is meant as a descriptive/conceptual way to deal with the interrelation between many components, rather than prescriptive (like you would have with an architecture).

Page 3: Data Foundation IG DF Organizing Chairs: Gary Berg-Cross  Peter Wittenburg

3

The goal is to develop alternative views, components and aspects of the DF concept and related infrastructure. Conceptualizations be discussed to come to an agreed RDA view on

how the evolving DF landscape can be productively described. As part of this, essential DF components and their interrelation

need to be identified and defined. Some of the existing RDA groups including metadata WGs and IGs

are working on DF components and need to be positioned in such a landscape.

New working groups need to be defined to work on identified components and interfaces.

Goals

Page 4: Data Foundation IG DF Organizing Chairs: Gary Berg-Cross  Peter Wittenburg

4

This diagram provides a high-level view of possible actions within a Data Fabric running from raw data to increasingly documented data that has been enriched and analyzed creating referable and citable data. As shown publications are part of this Data Fabric since they are often used for data mining and other analysis.

High Level View

Page 5: Data Foundation IG DF Organizing Chairs: Gary Berg-Cross  Peter Wittenburg

5Infrastructure Component View(after Reagan Moore)

A data fabric is the set of software and hardware infrastructure components that are used to manage data, information, and knowledge.

When an enterprise implements a data management solution, one of multiple types of DFs infrastructure is typically chosen to enable the:

Data management –enterprise to build a data repository, manage an information catalog, & enforce management policy

Data analysis –enterprise to process a data collection, apply analysis tools, and automate a processing pipeline.

Data preservation –enterprise to build reference collections and knowledge bases that comprise the intellectual capital, while managing technology evolution

Data publication –discovery and access of data collections. Data sharing – controlled sharing of a data collection, shared analysis

workflows, and information catalogs.

Page 6: Data Foundation IG DF Organizing Chairs: Gary Berg-Cross  Peter Wittenburg

6Data Fabric Service View (after Beth Plale)

A DF should: Be self-documenting – a service contributes to the lifecycle of data objects

it handles and must keep track of the scientifically relevant actions it performs on those data objects.  

The resulting log files are periodically be sent to a provenance consolidator. 

Track data objects through its service processing using one of the well-known object identifier schemes

Identify itself as one type of service as drawn from an RDA- agreed upon list of service types. 

Implement an interface to a publish-subscribe system which serves as the Data Fabric Control mechanism.

Page 7: Data Foundation IG DF Organizing Chairs: Gary Berg-Cross  Peter Wittenburg

7

Data Object View(after Peter Wittenburg)

The data fabric covers a domain of registered digital objects (DO) that are stored in well managed repositories.

DOs are associated with metadata describing its creation context and history (provenance).

The Data Fabric covers a domain of registered software components (workflows, services) that are in fact a special class of DOs.

Actions on DOs may be guided by abstract policies that are explicit and thus auditable.

There can be multiple data fabric implementations that should be highly interoperable.

Page 8: Data Foundation IG DF Organizing Chairs: Gary Berg-Cross  Peter Wittenburg

8

The suggested Data Fabric IG is planned as a forum to discuss these alternative views, components and aspects of the DF concept.

To be discussed:• What is the agreed RDA view on a Data Fabric.• How the outputs from the RDA working groups fit in the DF

concept and how they relate to each other and to various related WGs and IGs within the RDA.

• Which further activities are required to push the data fabric concept ahead.

• Continuation and initialization of working group activities related to the DF.

• Improving the uptake of the WG outputs by communicating them as a coherent whole within the DF concept.

Discussion

Page 9: Data Foundation IG DF Organizing Chairs: Gary Berg-Cross  Peter Wittenburg

Thanks for your attention.