engineering empathy origins of data architecture conflicts ... empathy...–developed topgun in...
TRANSCRIPT
Engineering Empathy Origins of data architecture conflicts.
Tips and strategies for resolution.
Neil Hepburn
This presentation by IRMAC is licensed under a
Creative Commons Attribution-NonCommercial-
ShareAlike 2.5 Canada License.
Based on a work at wikipedia.org.
Neil Hepburn (bio)
Neil Hepburn is a Certified Data Management Professional (mastery level), holds an honours B. Math in Computer Science, and has over 20 years IT and data management experience. Neil works within PriceWaterhouseCooper’s Information Management practice and is a recognized thought leader in the area of Agile Analytics. Neil has spoken on the topic of information management in numerous public forums including: Enterprise Data World; FSOSS (Free and Open Source Software Symposium); CMA IT Symposium; and numerous universities including University of Toronto, Waterloo, Wilfred Laurier, Ryerson, and McMaster.
Preamble
From 2010 through 2013, Neil Hepburn has:
• Developed an Open Source DW for Twitter
• Developed two talks on the history of analytics and data bases, respectively
• Visited 10 Ontario universities’ computer science classes
• Met with students, profs, and chairs
This talk is about why he did this and what he learned
The Macro Data Problem
• We struggle to manage the data we have against straightforward requirements – $600 billion/year wasted on Data Quality – Struggle to connect the dots – Cannot answer straightforward questions, quickly and
flexibly
Common denominator? Macro Data Problem is only clearly understood at macro level: goes across systems, across people, and across perspectives. Solutions are counter-intuitive at micro level…
Who is The Master of Data?
Peggy Dodd (Super-ego)
Lancaster Dodd (Ego)
Freddy Quell (Id)
What we know guides how we think
Sociologist: Where there is always snow, bears are white. At the North Pole there is always snow, what colour are the bears there? …What do my words convey? Tribal Headman: I've only seen brown bears …Such a thing is not to be settled by words but by testimony.
+ =
How People Think about Data?
Application architects and most developers think of data as graphs (typically hierarchies). Detailed and efficient for Apps, but hard to see the big picture, and can lead to query bias.
Analysts prefer to see data from bird’s eye view, flattening out data structures into a report that conforms to a map, grid, or cube. Easy to see the big picture (from a given perspective), but details can go missing
Data is like Light
Astronomers think of light as waves.
Particle physicists think of light as particles.
Quantum physicists embrace the wave-particle duality.
How Data Architects think about Data
• Data architects think of data in terms of normalized sets (Relations), which are neither graphs nor cubes, but can be easily transformed into either.
• Forces data architects to think more deeply about semantics
• Codd set the ground rules of the relational model (and this mode of thinking) as thus: – Based on foundation of mathematics and formal logic – Physical and logical independence – Guarantees of integrity
Micro (intuitive) vs Macro (counter-intuitive)
Adam Smith Founder of Micro-Economics
Maynard Keynes Founder of Macro-Economics
Claude Shannon Founder of [Micro]-Information Theory
Ted Codd Founder of [Macro]-Relational Model
Only 56% of Ontario Universities make learning Relational Model Mandatory
Worse for thought leaders
University Is Mandatory
Brock Mandatory
Carleton Mandatory
McMaster Mandatory
Ryerson Mandatory
Ontario Institute of Technology Mandatory
Ottawa Mandatory
RMC Mandatory
Wilfrid Laurier Mandatory
Windsor Mandatory
York Mandatory
Algoma Optional
Guelph Optional
Lakehead Optional
Queens Optional
Toronto Optional
Trent Optional
Waterloo Optional
Western Optional
There is a problem. What did I do?
• Prepared learning materials and tools for computer science and business students
• Contacted university computer science professors in Ontario
• Organized targeting of university computer science heads/chairs in Ontario
• Travelled to 10 universities to give talks to students
Universities Visited
Contacted University/College Visited
Brock Yes
McMaster Yes
Queens Yes
Ryerson* Yes
Toronto* Yes
Trent Yes
Waterloo Yes
Western Yes
Wilfrid Laurier Yes
Windsor Yes
Algoma No
Carleton No
Guelph No
Lakehead No
Ontario Institute of Technology No
Ottawa No
RMC No
York No
Approach to Engaging Students to think differently
• Need to make data relevant in itself – A Brief History of Analytics talk was developed to get
students thinking about data as having intrinsic value • Brought together history that is not widely known (pivoting
on the Business Analytics Enlightenment), so as to pique natural curiosity
– A Brief History of Databases talk is designed to show that the Relational Model arose as a Macro solution to all problems that prior Logical Data Models created
– Developed TOPGUN in MySQL + Pentaho DI, an open source data warehouse + ETL tools (later abandoned)
A Brief History of Analytics and Databases
ProductKeywords
PK,FK1 Product ID
PK,FK2 Keyword ID
Products
PK Product ID
Title
Author
Year
Pages
ProductRatings
PK,FK2 Rating ID
PK,FK1 Product ID
Ratings
PK Rating ID
Rating
Keywords
PK Keyword ID
Keyword
Tweets
PK Tweet ID
Tweet
KeywordTwitterSearches
PK,FK1 Keyword ID
PK,FK2 Tweet ID
A Brief History of Analytics and Databases
In the History of Analytics presentation you will learn:
• The role statistics and analytics played in business prior to the establishment of Evidence-Based Management
• Who the original “Whiz Kids” were and how they reoriented business management to be based on quantifiable facts, paving the way for modern database management systems
• How business analytics have changed over the past 50 years
• The current political realities of Evidence-Based Management
In the History of Databases presentation, you will learn:
• How data has been physically and logically managed over the past 125 years
• The difference and trade-offs between bottom-up (Network/NoSQL) and top-down (Relational) approaches to data architecture
• Comparison of modern cloud database management systems including Google Spanner and Microsoft SQL Azure
• Why the Relational Model is the basis for Enterprise Information Architecture
Approach to explaining data architecture
How do we, in simple and accurate terms, communicate the benefits of sound normalized data architecture, and the drawbacks of “laissez-faire” data architecture?
Bottom Up [Topography] vs. Top Down [Architecture]
Manhattan commissioners plan, 1811 Map of London, 1300
What did I find from talking to students?
• Little real world experience, so took everything at face value
• Most students appeared engaged and interested (especially in the Analytics talk)
• Many students spoke with me afterwards realizing there were big problem out there, but not sure what to do specifically
• Some students reacted strongly and said they would never dream of not using an RDBMS, equating it with the relational model – not necessarily the desired outcome
Findings from Profs and Chairs
• Most Computer Science Professors recognize the importance of listening to industry professionals and welcome them
• Many profs were unaware of The Macro Data Problem
• Heads/chairs of departments where learning Relational Model mandatory were surprised that other universities do not make it mandatory
• Universities generally promote micro-information thinking over macro-information thinking
Problem Re-Cap
• The Macro Data Problem is huge and festers • Control of data at a macro level is largely in the
hands of technologists; Wittingly or unwittingly – Technologists are intuitively focused on micro data
management and do not have a consistent theory for macro data management grounded in mathematics
– Do not think deeply about data semantics
• Universities are currently not doing enough to ensure graduates are aware of and understand the trade-offs between micro and macro data management
General Prescription
• How do we address the problem just described?
• Data management community should change its posture and approach to conflict
• Academia should change its curriculum and even its teaching methods to so Computer Science (and related) graduates are able to think in data-centric modes
Advice for Data Management community
1. Acknowledge that the RDBMS and SQL are not the same as (and have failed to meet) the goals of the Relational Model
2. Instead of debating developers and application architects, ask questions
– How might this data be used by other applications?
– What types of questions might be asked of the data?
– How will you control data integrity?
– How will you make the data available?
– How will make sure the data is secure?
Prescription for Universities
• At least – Require all students to learn about The Relational
Model (in context) in order to graduate with a Computer Science Major degree
• Ideally: – Adopt Inquiry-based learning when teaching the
relational model. • Create lessons and exercises that force students to deal
with a data model from multiple perspectives, and consider semantic implications
What would happen if technologists internalized the Relational Model?
Enterprise Data Normalization
• Wide adoption of a Data Services model; OR – Each entity is a service
– Could have NoSQL back-end
– Tunable foreign key constraint management
• Consolidation of RDBMSs into single Hadoop or Spark cluster, normalized with integrity constraints; OR
• Adoption of a scalable open ERP
Conclusion
• The Macro Data Problem is counter-intuitive and will continue to be challenged by micro thinking
• Certain Vendors appear to recognize The Macro Data Problem and have scale-out relational technologies (Spanner, Redshift, SQLAzure)
• If we can change the way we talk and think about the problem such that a general realization sets in, trust the technologists and academics to do the right thing
Questions and Discussion