i've always wanted to data model - data week 2013

49
I’ve Always Wanted To Data Model Ian Varley, Salesforce.com Data Week, 2013-10-02 Lightning Talk (10 minutes)

Upload: ian-varley

Post on 19-Jun-2015

260 views

Category:

Technology


0 download

DESCRIPTION

One of the tenets of Big Data is that it allows developers to work with "unstructured" data. But unless you're piping /dev/random, there's no such thing as *truly* unstructured data; only data whose structure you don't understand yet. In this lightning talk, we'll take a tour of the core fundamentals of deep data structure modeling, and see how the rigid tools and techniques of the past have failed us in the modern world of agile software and big data. We'll delve into what hope there is for understanding the semantics and structure of data that doesn't play by the rules of an RDBMS.

TRANSCRIPT

Page 1: I've Always Wanted To Data Model - Data Week 2013

I’ve Always Wanted To Data Model

Ian Varley, Salesforce.comData Week, 2013-10-02

Lightning Talk (10 minutes)

Page 2: I've Always Wanted To Data Model - Data Week 2013

Who am I?Ian VarleyAustin, TX

Salesforce.comBig Data Team@thefutureian

Page 3: I've Always Wanted To Data Model - Data Week 2013

What’s Data Modeling?

Page 4: I've Always Wanted To Data Model - Data Week 2013

The act of taking the intelligible structure of the world around us, and

making it concrete enough for computers to act on it.

(More specifically, data modeling usually has to do with storing it in a database.)

Page 5: I've Always Wanted To Data Model - Data Week 2013

Traditionally, data modeling has meant Entity Attribute Relationship

modeling techniques.

There are variants that are more “OO” (like UML) but they share most of the same core assumptions.

Page 6: I've Always Wanted To Data Model - Data Week 2013

Many a project was sunk due to shitty data modeling.

Page 7: I've Always Wanted To Data Model - Data Week 2013

It’s a difficult occupation.You have to be part engineer, part psychologist, and part philosopher.

Page 9: I've Always Wanted To Data Model - Data Week 2013

But.

Page 10: I've Always Wanted To Data Model - Data Week 2013

The expressive power of our conceptual modeling techniques hasn’t

improved much since the 1970s.

We mostly look at the world in the same static way we did 40 years ago.

Page 11: I've Always Wanted To Data Model - Data Week 2013

Partly, this is because our discipline is wedded to relational (SQL) DBs.

When the only tool you have is a hammer ...

Page 12: I've Always Wanted To Data Model - Data Week 2013

A book that opened my eyes ...

(He said a lot of the stuff I’m about to say back in 1978!)

Page 13: I've Always Wanted To Data Model - Data Week 2013

I don’t have a lot of answers.But I want to raise some questions.

And hopefully, start a conversation.

Page 14: I've Always Wanted To Data Model - Data Week 2013

Here are 5 observations about the tools of traditional data modeling.

Page 15: I've Always Wanted To Data Model - Data Week 2013

#1: nobody actually knows what an “entity” really is.

Page 16: I've Always Wanted To Data Model - Data Week 2013

“Entity” is another word for Category, in linguistics terms.

And an important property of linguistic categories is that they are slippery.

See:● Steven Pinker: The Stuff Of Thought● Douglas Hofstadter: Surfaces & Essences● George Lakoff: Women, Fire, and Dangerous Things

Page 17: I've Always Wanted To Data Model - Data Week 2013

part: an abstract definition of a connected set of physical materials that serve some purpose, and that people are willing to buy

part: one instance of a part type, which arrives on the QA line at a specific time and either does or doesn't meet quality standards

Page 18: I've Always Wanted To Data Model - Data Week 2013

And if you think you can “solve” the problem, I’ve got some world trade

center insurance policies to sell you.

Page 19: I've Always Wanted To Data Model - Data Week 2013

That said, there are a couple tools we could adopt that would help:

● First-class Sub- / Super-Typing● First-class Scoping and Aliasing

(Not that there aren’t ways to do this in ERD models, but they’re unobvious and not widely used.)

Page 20: I've Always Wanted To Data Model - Data Week 2013

#2: entities, attributes, and relationships are really the

same thing, maaaan ...

http://the-hippie-portfolio.tumblr.com/

Page 21: I've Always Wanted To Data Model - Data Week 2013

Say I’ve got a “parent” in my model.

Is it:● A “parent” entity?● A “person” entity with

an “isParent” attribute?● Two “person” entities in

a “parent” relationship?

It’s all of them; the distinction is arbitrary.

Page 22: I've Always Wanted To Data Model - Data Week 2013

The real structure is just a graph … but none of our modeling tools are that flexible, nor is it helpful to think that

abstractly about most software.

Page 23: I've Always Wanted To Data Model - Data Week 2013

Normally, we make the choice based on our experience and gut feeling, and

pretend there’s a science to it.

Page 24: I've Always Wanted To Data Model - Data Week 2013

But the whole way of thinking is a convenience based on “records”.

Page 25: I've Always Wanted To Data Model - Data Week 2013

I have no idea what to do about this.

Tools that allow you to view any part of your model in any of those ways?

Page 26: I've Always Wanted To Data Model - Data Week 2013

I have no idea what to do about this.

Tools that allow you to view any part of your model in any of those ways?

Page 27: I've Always Wanted To Data Model - Data Week 2013

I have no idea what to do about this.

Tools that allow you to view any part of your model in any of those ways?

Page 28: I've Always Wanted To Data Model - Data Week 2013

This isn’t realistic with today’s tools, so this is just idle speculation.

Page 29: I've Always Wanted To Data Model - Data Week 2013

#3: prescriptive models encourage black & white thinking in a gray world

Page 30: I've Always Wanted To Data Model - Data Week 2013

You have to make decisions (about entities, attributes, relationships, types) up front. But sometimes that’s not right.

Page 31: I've Always Wanted To Data Model - Data Week 2013

This is a strength of (some) NoSQL databases: you can do data first, and

surface structure later.

Page 32: I've Always Wanted To Data Model - Data Week 2013

Sometimes the deep structure is actually ambiguous.

Page 33: I've Always Wanted To Data Model - Data Week 2013
Page 34: I've Always Wanted To Data Model - Data Week 2013

This can apply broadly.(What if an employee isn’t really “in” a department, but has

flexible membership based on where she spends her time?)

Page 35: I've Always Wanted To Data Model - Data Week 2013

You can represent that in a traditional data model, sure.

But you’re not encouraged to.

Page 36: I've Always Wanted To Data Model - Data Week 2013

#4: static models make the time dimension unwieldy

Page 37: I've Always Wanted To Data Model - Data Week 2013

Entity models are generally silent on the ways data changes.

Page 38: I've Always Wanted To Data Model - Data Week 2013

Many modern databases can keep older versions of objects.

But should they? For which entities How many versions? etc.

Page 39: I've Always Wanted To Data Model - Data Week 2013

Worse, what about when the model changes at runtime, and you need to also retain knowledge of what the old

model was?

Page 40: I've Always Wanted To Data Model - Data Week 2013

As in #3, there are ways to model this in entity models, but it’s not easy, so most people just don’t think about it.

Page 41: I've Always Wanted To Data Model - Data Week 2013

#5: boxes & lines aren’t how we actually think

Page 42: I've Always Wanted To Data Model - Data Week 2013

Our spatial processing of diagrams doesn’t map well to our temporal,

spatial, and causal comprehension of data structure.

Page 43: I've Always Wanted To Data Model - Data Week 2013

What do people really do?

Skip making models when their models look too complicated.

Page 44: I've Always Wanted To Data Model - Data Week 2013
Page 45: I've Always Wanted To Data Model - Data Week 2013

F*** THAT NOISE.

Page 46: I've Always Wanted To Data Model - Data Week 2013

Is there an alternative? Not yet.

Page 47: I've Always Wanted To Data Model - Data Week 2013

What could move the needle?● Prototype based modeling● Proper scoping● Semantic zooming

Page 48: I've Always Wanted To Data Model - Data Week 2013

The map is not the territory.

Page 49: I've Always Wanted To Data Model - Data Week 2013

In conclusion … if you dig this stuff, let’s talk!

@thefutureian