engineering empathy origins of data architecture conflicts ... empathy...–developed topgun in...

27
Engineering Empathy Origins of data architecture conflicts. Tips and strategies for resolution. Neil Hepburn This presentation by IRMAC is licensed under a Creative Commons Attribution-NonCommercial- ShareAlike 2.5 Canada License. Based on a work at wikipedia.org.

Upload: others

Post on 03-Jul-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Engineering Empathy Origins of data architecture conflicts ... Empathy...–Developed TOPGUN in MySQL + Pentaho DI, an open source data warehouse + ETL tools (later abandoned)

Engineering Empathy Origins of data architecture conflicts.

Tips and strategies for resolution.

Neil Hepburn

This presentation by IRMAC is licensed under a

Creative Commons Attribution-NonCommercial-

ShareAlike 2.5 Canada License.

Based on a work at wikipedia.org.

Page 2: Engineering Empathy Origins of data architecture conflicts ... Empathy...–Developed TOPGUN in MySQL + Pentaho DI, an open source data warehouse + ETL tools (later abandoned)

Neil Hepburn (bio)

Neil Hepburn is a Certified Data Management Professional (mastery level), holds an honours B. Math in Computer Science, and has over 20 years IT and data management experience. Neil works within PriceWaterhouseCooper’s Information Management practice and is a recognized thought leader in the area of Agile Analytics. Neil has spoken on the topic of information management in numerous public forums including: Enterprise Data World; FSOSS (Free and Open Source Software Symposium); CMA IT Symposium; and numerous universities including University of Toronto, Waterloo, Wilfred Laurier, Ryerson, and McMaster.

Page 3: Engineering Empathy Origins of data architecture conflicts ... Empathy...–Developed TOPGUN in MySQL + Pentaho DI, an open source data warehouse + ETL tools (later abandoned)

Preamble

From 2010 through 2013, Neil Hepburn has:

• Developed an Open Source DW for Twitter

• Developed two talks on the history of analytics and data bases, respectively

• Visited 10 Ontario universities’ computer science classes

• Met with students, profs, and chairs

This talk is about why he did this and what he learned

Page 4: Engineering Empathy Origins of data architecture conflicts ... Empathy...–Developed TOPGUN in MySQL + Pentaho DI, an open source data warehouse + ETL tools (later abandoned)

The Macro Data Problem

• We struggle to manage the data we have against straightforward requirements – $600 billion/year wasted on Data Quality – Struggle to connect the dots – Cannot answer straightforward questions, quickly and

flexibly

Common denominator? Macro Data Problem is only clearly understood at macro level: goes across systems, across people, and across perspectives. Solutions are counter-intuitive at micro level…

Page 5: Engineering Empathy Origins of data architecture conflicts ... Empathy...–Developed TOPGUN in MySQL + Pentaho DI, an open source data warehouse + ETL tools (later abandoned)

Who is The Master of Data?

Peggy Dodd (Super-ego)

Lancaster Dodd (Ego)

Freddy Quell (Id)

Page 6: Engineering Empathy Origins of data architecture conflicts ... Empathy...–Developed TOPGUN in MySQL + Pentaho DI, an open source data warehouse + ETL tools (later abandoned)

What we know guides how we think

Sociologist: Where there is always snow, bears are white. At the North Pole there is always snow, what colour are the bears there? …What do my words convey? Tribal Headman: I've only seen brown bears …Such a thing is not to be settled by words but by testimony.

+ =

Page 7: Engineering Empathy Origins of data architecture conflicts ... Empathy...–Developed TOPGUN in MySQL + Pentaho DI, an open source data warehouse + ETL tools (later abandoned)

How People Think about Data?

Application architects and most developers think of data as graphs (typically hierarchies). Detailed and efficient for Apps, but hard to see the big picture, and can lead to query bias.

Analysts prefer to see data from bird’s eye view, flattening out data structures into a report that conforms to a map, grid, or cube. Easy to see the big picture (from a given perspective), but details can go missing

Page 8: Engineering Empathy Origins of data architecture conflicts ... Empathy...–Developed TOPGUN in MySQL + Pentaho DI, an open source data warehouse + ETL tools (later abandoned)

Data is like Light

Astronomers think of light as waves.

Particle physicists think of light as particles.

Quantum physicists embrace the wave-particle duality.

Page 9: Engineering Empathy Origins of data architecture conflicts ... Empathy...–Developed TOPGUN in MySQL + Pentaho DI, an open source data warehouse + ETL tools (later abandoned)

How Data Architects think about Data

• Data architects think of data in terms of normalized sets (Relations), which are neither graphs nor cubes, but can be easily transformed into either.

• Forces data architects to think more deeply about semantics

• Codd set the ground rules of the relational model (and this mode of thinking) as thus: – Based on foundation of mathematics and formal logic – Physical and logical independence – Guarantees of integrity

Page 10: Engineering Empathy Origins of data architecture conflicts ... Empathy...–Developed TOPGUN in MySQL + Pentaho DI, an open source data warehouse + ETL tools (later abandoned)

Micro (intuitive) vs Macro (counter-intuitive)

Adam Smith Founder of Micro-Economics

Maynard Keynes Founder of Macro-Economics

Claude Shannon Founder of [Micro]-Information Theory

Ted Codd Founder of [Macro]-Relational Model

Page 11: Engineering Empathy Origins of data architecture conflicts ... Empathy...–Developed TOPGUN in MySQL + Pentaho DI, an open source data warehouse + ETL tools (later abandoned)

Only 56% of Ontario Universities make learning Relational Model Mandatory

Worse for thought leaders

University Is Mandatory

Brock Mandatory

Carleton Mandatory

McMaster Mandatory

Ryerson Mandatory

Ontario Institute of Technology Mandatory

Ottawa Mandatory

RMC Mandatory

Wilfrid Laurier Mandatory

Windsor Mandatory

York Mandatory

Algoma Optional

Guelph Optional

Lakehead Optional

Queens Optional

Toronto Optional

Trent Optional

Waterloo Optional

Western Optional

Page 12: Engineering Empathy Origins of data architecture conflicts ... Empathy...–Developed TOPGUN in MySQL + Pentaho DI, an open source data warehouse + ETL tools (later abandoned)

There is a problem. What did I do?

• Prepared learning materials and tools for computer science and business students

• Contacted university computer science professors in Ontario

• Organized targeting of university computer science heads/chairs in Ontario

• Travelled to 10 universities to give talks to students

Page 13: Engineering Empathy Origins of data architecture conflicts ... Empathy...–Developed TOPGUN in MySQL + Pentaho DI, an open source data warehouse + ETL tools (later abandoned)

Universities Visited

Contacted University/College Visited

Brock Yes

McMaster Yes

Queens Yes

Ryerson* Yes

Toronto* Yes

Trent Yes

Waterloo Yes

Western Yes

Wilfrid Laurier Yes

Windsor Yes

Algoma No

Carleton No

Guelph No

Lakehead No

Ontario Institute of Technology No

Ottawa No

RMC No

York No

Page 14: Engineering Empathy Origins of data architecture conflicts ... Empathy...–Developed TOPGUN in MySQL + Pentaho DI, an open source data warehouse + ETL tools (later abandoned)

Approach to Engaging Students to think differently

• Need to make data relevant in itself – A Brief History of Analytics talk was developed to get

students thinking about data as having intrinsic value • Brought together history that is not widely known (pivoting

on the Business Analytics Enlightenment), so as to pique natural curiosity

– A Brief History of Databases talk is designed to show that the Relational Model arose as a Macro solution to all problems that prior Logical Data Models created

– Developed TOPGUN in MySQL + Pentaho DI, an open source data warehouse + ETL tools (later abandoned)

Page 15: Engineering Empathy Origins of data architecture conflicts ... Empathy...–Developed TOPGUN in MySQL + Pentaho DI, an open source data warehouse + ETL tools (later abandoned)

A Brief History of Analytics and Databases

ProductKeywords

PK,FK1 Product ID

PK,FK2 Keyword ID

Products

PK Product ID

Title

Author

Year

Pages

ProductRatings

PK,FK2 Rating ID

PK,FK1 Product ID

Ratings

PK Rating ID

Rating

Keywords

PK Keyword ID

Keyword

Tweets

PK Tweet ID

Tweet

KeywordTwitterSearches

PK,FK1 Keyword ID

PK,FK2 Tweet ID

Page 16: Engineering Empathy Origins of data architecture conflicts ... Empathy...–Developed TOPGUN in MySQL + Pentaho DI, an open source data warehouse + ETL tools (later abandoned)

A Brief History of Analytics and Databases

In the History of Analytics presentation you will learn:

• The role statistics and analytics played in business prior to the establishment of Evidence-Based Management

• Who the original “Whiz Kids” were and how they reoriented business management to be based on quantifiable facts, paving the way for modern database management systems

• How business analytics have changed over the past 50 years

• The current political realities of Evidence-Based Management

In the History of Databases presentation, you will learn:

• How data has been physically and logically managed over the past 125 years

• The difference and trade-offs between bottom-up (Network/NoSQL) and top-down (Relational) approaches to data architecture

• Comparison of modern cloud database management systems including Google Spanner and Microsoft SQL Azure

• Why the Relational Model is the basis for Enterprise Information Architecture

Page 17: Engineering Empathy Origins of data architecture conflicts ... Empathy...–Developed TOPGUN in MySQL + Pentaho DI, an open source data warehouse + ETL tools (later abandoned)

Approach to explaining data architecture

How do we, in simple and accurate terms, communicate the benefits of sound normalized data architecture, and the drawbacks of “laissez-faire” data architecture?

Page 18: Engineering Empathy Origins of data architecture conflicts ... Empathy...–Developed TOPGUN in MySQL + Pentaho DI, an open source data warehouse + ETL tools (later abandoned)

Bottom Up [Topography] vs. Top Down [Architecture]

Manhattan commissioners plan, 1811 Map of London, 1300

Page 19: Engineering Empathy Origins of data architecture conflicts ... Empathy...–Developed TOPGUN in MySQL + Pentaho DI, an open source data warehouse + ETL tools (later abandoned)

What did I find from talking to students?

• Little real world experience, so took everything at face value

• Most students appeared engaged and interested (especially in the Analytics talk)

• Many students spoke with me afterwards realizing there were big problem out there, but not sure what to do specifically

• Some students reacted strongly and said they would never dream of not using an RDBMS, equating it with the relational model – not necessarily the desired outcome

Page 20: Engineering Empathy Origins of data architecture conflicts ... Empathy...–Developed TOPGUN in MySQL + Pentaho DI, an open source data warehouse + ETL tools (later abandoned)

Findings from Profs and Chairs

• Most Computer Science Professors recognize the importance of listening to industry professionals and welcome them

• Many profs were unaware of The Macro Data Problem

• Heads/chairs of departments where learning Relational Model mandatory were surprised that other universities do not make it mandatory

• Universities generally promote micro-information thinking over macro-information thinking

Page 21: Engineering Empathy Origins of data architecture conflicts ... Empathy...–Developed TOPGUN in MySQL + Pentaho DI, an open source data warehouse + ETL tools (later abandoned)

Problem Re-Cap

• The Macro Data Problem is huge and festers • Control of data at a macro level is largely in the

hands of technologists; Wittingly or unwittingly – Technologists are intuitively focused on micro data

management and do not have a consistent theory for macro data management grounded in mathematics

– Do not think deeply about data semantics

• Universities are currently not doing enough to ensure graduates are aware of and understand the trade-offs between micro and macro data management

Page 22: Engineering Empathy Origins of data architecture conflicts ... Empathy...–Developed TOPGUN in MySQL + Pentaho DI, an open source data warehouse + ETL tools (later abandoned)

General Prescription

• How do we address the problem just described?

• Data management community should change its posture and approach to conflict

• Academia should change its curriculum and even its teaching methods to so Computer Science (and related) graduates are able to think in data-centric modes

Page 23: Engineering Empathy Origins of data architecture conflicts ... Empathy...–Developed TOPGUN in MySQL + Pentaho DI, an open source data warehouse + ETL tools (later abandoned)

Advice for Data Management community

1. Acknowledge that the RDBMS and SQL are not the same as (and have failed to meet) the goals of the Relational Model

2. Instead of debating developers and application architects, ask questions

– How might this data be used by other applications?

– What types of questions might be asked of the data?

– How will you control data integrity?

– How will you make the data available?

– How will make sure the data is secure?

Page 24: Engineering Empathy Origins of data architecture conflicts ... Empathy...–Developed TOPGUN in MySQL + Pentaho DI, an open source data warehouse + ETL tools (later abandoned)

Prescription for Universities

• At least – Require all students to learn about The Relational

Model (in context) in order to graduate with a Computer Science Major degree

• Ideally: – Adopt Inquiry-based learning when teaching the

relational model. • Create lessons and exercises that force students to deal

with a data model from multiple perspectives, and consider semantic implications

Page 25: Engineering Empathy Origins of data architecture conflicts ... Empathy...–Developed TOPGUN in MySQL + Pentaho DI, an open source data warehouse + ETL tools (later abandoned)

What would happen if technologists internalized the Relational Model?

Enterprise Data Normalization

• Wide adoption of a Data Services model; OR – Each entity is a service

– Could have NoSQL back-end

– Tunable foreign key constraint management

• Consolidation of RDBMSs into single Hadoop or Spark cluster, normalized with integrity constraints; OR

• Adoption of a scalable open ERP

Page 26: Engineering Empathy Origins of data architecture conflicts ... Empathy...–Developed TOPGUN in MySQL + Pentaho DI, an open source data warehouse + ETL tools (later abandoned)

Conclusion

• The Macro Data Problem is counter-intuitive and will continue to be challenged by micro thinking

• Certain Vendors appear to recognize The Macro Data Problem and have scale-out relational technologies (Spanner, Redshift, SQLAzure)

• If we can change the way we talk and think about the problem such that a general realization sets in, trust the technologists and academics to do the right thing

Page 27: Engineering Empathy Origins of data architecture conflicts ... Empathy...–Developed TOPGUN in MySQL + Pentaho DI, an open source data warehouse + ETL tools (later abandoned)

Questions and Discussion