building guerrilla analytics teams

24
Building Guerrilla Analytics Teams Presented by: Enda Ridge, PhD People, Process and Technology for Doing Data Science Copyright Enda Ridge 2014

Upload: enda-ridge

Post on 14-Jul-2015

260 views

Category:

Data & Analytics


1 download

TRANSCRIPT

Building Guerrilla Analytics Teams

Presented by:

Enda Ridge, PhD

People, Process and Technologyfor Doing Data Science

Copyright Enda Ridge 2014

What this talk is about

• Data Science: expectations and reality

• 3 Drivers for doing Data Science

• Why Data Science projects are so challenging

• Introduction to Guerrilla Analytics

• Building Guerrilla Analytics Capability

Copyright Enda Ridge 2014 1

Guerrilla Analytics

People

ProcessTech

What we hear about Data Science

2Copyright Enda Ridge 2014

“Data is the new science. Big data holds the answers.”

“the sexy job in the next 10 years will be statisticians”

“Data Scientist: The Sexiest Job of the 21st Century”

“Information is the oil of the 21st century, and analytics is the combustion engine.”

http://www.gapminder.org/http://www.statistics.com/data-science-quotes/https://github.com/mbostock/d3/wiki/Gallery

What we really want from Data Science

Copyright Enda Ridge 2014 3

• “I have made data available, now how do I use it?”

Leverage

• “I want to make data available or buy a data product. How do I know it will be worth it?”

Justify

• “I think I have a fraud problem / security breach / etc”

• “Help me better understand my customers”

Ad-hoc

My background

PhD Computer Science

• Design of Experiments for Tuning Algorithms”

Boutique Consultancy

• Social Network Analysis for Fraud

Forensic Data Analytics

• Professional Services

Senior Manager

• Data Science Consulting& Data Product Development

Copyright Enda Ridge 2014 4

Misconception about how we do Data Science

Copyright Enda Ridge 2014 5

Shearer C., The CRISP-DM model: the new blueprint for data mining, J Data Warehousing (2000); 5:13—22

Reality – Guerrilla Analytics

• Disruptions

• Data

• Requirements

• Resources

• Business Rules

• Constraints

• Time

• Toolsets

• People

• Repeatable

• Explainable

• Tested

Copyright Enda Ridge 2014 6

Guerrilla Analytics Workflow

Copyright Enda Ridge 2014 7

Data

• Extract

• Receive

• Load

Analytics

• Transform

• Algorithm

• Consolidate

Insight

• Reports

• Work Products

Disruptions

Some Guerrilla Analytics Principles

• Prefer simple, project structures over heavily documented and complex ones. 1

• Prefer automation with program code over manual graphical approaches. 2

• Link data on the file system, to data in the analytics environment, to data in work products.3

• Version control changes to program code AND data. 4

Copyright Enda Ridge 2014 8

Building Guerrilla Analytics Capability

Copyright Enda Ridge 2014 9

Leverage

Justify

Ad-hoc

Guerrilla Analytics

People

ProcessTech

People Capability

Copyright Enda Ridge 2014 10

People

Hard Skills

Programming

Software Engineering

Visualization

Maths / Stats

Soft Skills

Communication

Domain Knowledge

Mindset

Capability: Data Programming

“Using a programming language to describe and execute data manipulations, data analyses, data visualizations”

Copyright Enda Ridge 2014 11

Guerrilla Environment

• Wide variety of data

• Poor quality data

• Evolving understanding

• Reproduce and repeat

Benefit

• Flexibility

• Consolidation

• Knowledge transfer

• Self describing

Capability: Software Engineering

“the application of a systematic, disciplined, quantifiable approach to the development, operation, and maintenance of software”

Copyright Enda Ridge 2014 12

Guerrilla Environment

• Changing data

• Iterations of work products

• Reproduce despite pace

• Correctness despite complexity

Benefit

• Version control

• Testing

• Automation

• Issue/bug tracking

Capability: Domain Knowledge & Communication

Prefer analytics skills with great communication

Analytics

Forensic Accounting

Forensic Accountant

Data Scientist

Copyright Enda Ridge 2014 13

Capability: Mind-set

Guerrilla Environment

• Changing requirements

• Poorly understood data

• Constraints

• Time pressure

• Iterations

• Dead Ends

Required Capability

• Tenacity

• Curiosity

• Problem solving

• Communication

The attitude and approach to work that best matches Guerrilla Analytics

Copyright Enda Ridge 2014 14

TECHNOLOGY

Copyright Enda Ridge 2014 15

Guerrilla Analytics

People

ProcessTech

Common Misconceptions about Technology

“If we use this tech, my team don’t need to code”

“We can productionise all possible data science scenarios”

“We need to invest in a platform to get value from our data”

“We need Big Data technology X”

Copyright Enda Ridge 2014 16

Technology Capability

Copyright Enda Ridge 2014 17

People

Agility

Data Manipulation Environment

Scripting & Command Line

Shared Space

Visualization

Consolidate

Code Libraries

Machine Images

Project Wiki

Process Support

Source Code Control

Issue Tracking

Security

PROCESS

Copyright Enda Ridge 2014 18

Guerrilla Analytics

People

ProcessTech

Guerrilla Analytics Workflow

Copyright Enda Ridge 2014 19

Data

• Extract

• Receive

• Load

Analytics

• Transform

• Algorithm

• Consolidate

Insight

• Reports

• Work Products

Disruptions

Common Misconceptions about Process

“We must document everything”

“We can completely plan a data science job”

“We should track everything in a traditional top-down way”

“Work products must be right first time”

Copyright Enda Ridge 2014 20

Process Capability

Copyright Enda Ridge 2014 21

Data• Extract

• Receive

• Load

Analytics• Transform

• Algorithm

• Consolidate

Insight• Reports

• Work Products

Log Data ReceiptTrack Work

Product VersionsTrack Work

Product Release

Summary

• Leverage

• Justify

• Ad-hoc

Data Science Aims

• Disruptions

• Constraints

• Reproducible, Testable, Explainable

Guerrilla Analytics

Copyright Enda Ridge 2014 22

• Hard Skills

• Soft SkillsPeople Capability

• Analytics Agility

• Consolidation

• Process Support

Technology Capability

• Tracking Data (Inputs)

• Tracking Work Products Creation

• Tracking Outputs

Process Capability

Keep in Touch!

Copyright Enda Ridge 2014 23

@Enda_Ridge

[email protected]

www.guerrilla-analytics.net