the big metadata
TRANSCRIPT
The Big Metadata
Stories from the dark underbelly of data
operations
By Daniela Tomova
Origin Story
ID=112056 Name=Julia’s file
What was Julia’s file?
Who was Julia?
What is metadata?
Data: qualifies or quantifies a concept or a real-world occurence, often in the form of a variable across time. Used to measure and understand.
Metadata: classifies and describes data. Used to understand, structure, track and manipulate data.
What is metadata?
Metadata and ”dumb” data
This commentator is basically your average data.
What is metadata?
ID Time Dimension 1 Time Dimension 2 Value
112056 27-11-2006 23:00:00 28-11-2006 01:00:00 830
112056 27-11-2006 23:00:00 28-11-2006 02:00:00 12.7
Descriptive or Semantic Metadata:
CommodityVariableContract typeFacility typeTechnologyGeographySectorEtc.
Structural or Technical Metadata:
Creation dateOrigin systemSet IDPublication freq.Value freq.Variable typeChange dateSourceSource fileEtc.
Precisely!
We cannot afford not to use metadata:
- Structure, traceability and common standards save time and resources. The more data – the greater the savings.
- Matadata removes the human bottleneck. Enables data usage and reusage by both people and processes.
But that’s even more data! Don’t we have enough/too much already?
No.
-Aggregation. Easier to process than the underlying data even across sets and dimensions.
- Abstraction. Easier for people with different levels of experience to understand.
- Tool. It has a bi-directional relationship with its subject and can be used to manipulate it.
Just data about data?
- Julia’s File or WeaCity.ECeENS_Europe.Precip;;WeaCity;PC;EC.Ens;F;H.12;UTC;SVK.SK01.BRATIS;Wea.Precip;mm;H.6;;03;
How do we use it?
Common standard
Result
Multiple tuples linked to a curve ID:
Application dictionaryEasy, powerful, and robust Matlab quieries.
Easy groupings of data in containers: charts, files, tables.
Reusable and pivotable code.
Efficient manipulation of groups of curves.
Powerful and scalable monitoring and debugging of large amounts of heterogenous data.
Human dictionary
A store of analyst knowledge about the data in a common vocabulary.
Searchable
Some cool stuff which would be impossible without meta
Smart homes and IoTMachine learningNatural language processingBitcoin operations and new uses for the blockchain meta Tergeted online contentSmart gridsBig data analysisModern video and audio librariesiTunes
Future usesEmergent algorithms – like those underpinning swarm intelligence behavior and artificial neural networks
Emergent technology – technology the effects of which are greated than its building blocks
Singularity?
SummaryHumans are not optimised for raw data processing.
We think in abstractions, relationships and tool manipulation. If we want to keep up with data, we need to shape it to the way our brains work.
That’s what metadata does.
“I've seen things you people wouldn't believe…” – Roy in Blade Runner
Questions?