kognitio spark modern data platform print
DESCRIPTION
TRANSCRIPT
@Kognitio #SparkEvent
Hadoop meets Mature BI: Where the rubber meets the road for
the Modern Data Platform
Michael HiskeyFuturist, Product Evangelist
(and VP, Marketing & Business Development)
@Kognitio #SparkEvent
Today, and the Future
Big DataAdvanced Analytics
In-memory
Modern Data Platform
Hybrid Data Ecosystem ‘Logical Data Warehouse’
Predictive Analytics
Data Scientists
Data
@Kognitio #SparkEvent
The Data ScientistSexiest job of the 21st Century?
@Kognitio #SparkEvent
Data Scientist
The Analytical Enterprise
Business Analyst
Systems Admin
@Kognitio #SparkEvent
Remember: Decision Support Systems?
…accessed with easeand simplicity
Historical information, latency
BI tools have plateaued
0 1 2 3 4 5 6 7 8 9
Advanced analytics & data science
More math…a lot more math
select Trans_Year, Num_Trans,count(distinct Account_ID) Num_Accts,sum(count( distinct Account_ID)) over (partition by Trans_Year order by Num_Trans) Total_Accts,cast(sum(total_spend)/1000 as int) Total_Spend,cast(sum(total_spend)/1000 as int) / count(distinct Account_ID) Avg_Yearly_Spend,rank() over (partition by Trans_Year order by count(distinct Account_ID) desc) Rank_by_Num_Accts,rank() over (partition by Trans_Year order by sum(total_spend) desc) Rank_by_Total_Spendfrom( select Account_ID,
Extract(Year from Effective_Date) Trans_Year,count(Transaction_ID) Num_Trans,sum(Transaction_Amount) Total_Spend,avg(Transaction_Amount) Avg_Spend
from Transaction_factwhere extract(year from Effective_Date)<2009and Trans_Type='D' and Account_ID<>9025011and actionid in (select actionid from DEMO_FS.V_FIN_actions
where actionoriginid =1)group by Account_ID, Extract(Year from Effective_Date) ) Acc_Summary
group by Trans_Year, Num_Transorder by Trans Year desc Num Trans;
Behind the numbers
@Kognitio #SparkEvent
What has changed?
More connected-users?
More-connected users?
@Kognitio #SparkEvent
Don’t be a Railroad Stoker!Highly skilled engineering required … but the world innovated around them.
@Kognitio #SparkEvent
Machine learning algorithms Dynamic
Simulation
Statistical Analysis
Clustering
Behaviormodelling
The drive for deeper understanding
Reporting & BPMFraud detection
Dynamic Interaction
Technology/Automation
Analytical Com
plexity
Campaign Management
@Kognitio #SparkEvent
Key: “Graduation”Projects will need
to Graduatefrom the
Data Science Lab and become part
of Business as Usual
@Kognitio #SparkEvent
Your goal:
PRESS HERE…and really cool Big Data stuff happens!
@Kognitio #SparkEvent
Data flow
@Kognitio #SparkEvent
© 20th Century Fox
@Kognitio #SparkEvent
No need to pre‐process No need to align to schema
No need to triage
Null storage concerns
@Kognitio #SparkEvent
Hadoop just too slow for interactive
BI!
…loss of train‐of‐thought
“while Hadoop shines as a processingplatform, it is painfully slow as a query tool”
@Kognitio #SparkEvent
Lots of these
Not so many of theseinherently disk oriented
typically low ratio of CPU to Disk
Hadoop is…
@Kognitio #SparkEvent
Analytics needslow latency, no I/O wait
High speed in‐memory processing
A*Modern Data Platform Reference Architecture
AnalyticalPlatform Near‐line
Storage(optional)
AccessApplication &Client Layer
All BI Tools All OLAP Clients Excel
PersistenceLayer
HadoopClusters
Enterprise DataWarehouses
LegacySystems
…
Reporting
Cloud Storage
*(not THE)
© Hortonworks Inc. 2013
(another) Next-Generation Data Architecture
Page 20
APPLICAT
IONS
DAT
A SYSTEM
S
Microsoft Applications
DAT
A SO
URC
ES
Traditional Sources (RDBMS, OLTP, OLAP)
In‐memory MPP Accelerator
BI Tools & OLAP Clients
TRADITIONAL REPOSRDBMS EDW MPP
OPERATIONALTOOLS
MANAGE & MONITOR
DEV & DATATOOLS
BUILD & TEST
New Sources (web logs, email, sensors, social media)
HORTONWORKS DATA PLATFORM
Analytical Platform
@Kognitio #SparkEvent
It’s all about getting work done
Used to be simple fetch of valueTasks evolving:
Then was compute dynamic aggregate
Now complex algorithms!
Now complex algorithms!