big data and mstr bridge the elephant

45
Big Data and MicroStrategy: Building a Bridge for the Elephant Jan 2013 Paul Groom, Chief Innovation Officer

Upload: kognitio

Post on 22-Nov-2014

1.034 views

Category:

Technology


5 download

DESCRIPTION

Presentation: “Big Data and MicroStrategy: Building a Bridge for the Elephant” Intelligent engineering of an agile business requires the ability to connect the vast array of requirements, technologies and data that build up over time, while avoiding the pitfalls commonly encountered on the road to giving users comprehensive, yet nimble business analytics with MicroStrategy. The Google generation armed with iPads, Droid Phones bring big bold ideas on how “Big Data” will solve the new wave of business problems; traditional users know that addressing them requires more than just embracing the buzzwords like “sentiment”, “R” and “Hadoop.” Overall success requires building a bridge between the stable, proven, mature BI solutions in place today with the disruptive new world. Enabling deeper analytics, predictive modeling, social media analysis in combination with scalable self-service dashboards, reporting and analytics is no longer an idea but a MUST DO. This informative presentation describes these business challenges and how an organization leveraged the Kognitio Analytical Platform under MicroStrategy to build such a bridge.

TRANSCRIPT

Page 1: Big data and mstr   bridge the elephant

Big Data and MicroStrategy: Building a Bridge for the Elephant

Jan 2013Paul Groom, Chief Innovation Officer

Page 2: Big data and mstr   bridge the elephant

Let’s start at…

The End.

Page 3: Big data and mstr   bridge the elephant

Panacea

Page 4: Big data and mstr   bridge the elephant

You…built the DWE

Page 5: Big data and mstr   bridge the elephant

You…built the BICC

Page 6: Big data and mstr   bridge the elephant

and yes you built… lots of cool reports and dashboards

Page 7: Big data and mstr   bridge the elephant

EpilogueA comfortable status quo

Page 8: Big data and mstr   bridge the elephant

How are you really judged?

• Fast?• Consistent?• All users?

Page 9: Big data and mstr   bridge the elephant
Page 10: Big data and mstr   bridge the elephant

Rrrrrriiiiiiinnnnnngggggg!

Back to the real world

Page 11: Big data and mstr   bridge the elephant

Disruption

Page 12: Big data and mstr   bridge the elephant

Disruptor: New Data

Page 13: Big data and mstr   bridge the elephant

Disruptor: Social Media & Sentiment

Page 14: Big data and mstr   bridge the elephant

Data ?

Disruptor:

Page 15: Big data and mstr   bridge the elephant

Disruptor: More Connected Users

Page 16: Big data and mstr   bridge the elephant

Disruptor: Data Discovery Tools

Choices for engaging quickly with data

Business users head’s distracted from core BI!

Page 17: Big data and mstr   bridge the elephant

BI Wild West

Page 18: Big data and mstr   bridge the elephant

Where it matters

Page 19: Big data and mstr   bridge the elephant
Page 20: Big data and mstr   bridge the elephant

Lots of variety of DW and EDW

Page 21: Big data and mstr   bridge the elephant

analytical workload

The Reality of the DW

Page 22: Big data and mstr   bridge the elephant

EDW says no or not now!…and CFO says no big upgrades

Page 23: Big data and mstr   bridge the elephant

Pragmatism

…ok so you enable plenty of caching,limit drill anywhere and add Intelligent Cubes

Page 24: Big data and mstr   bridge the elephant
Page 25: Big data and mstr   bridge the elephant

And then came…

Page 26: Big data and mstr   bridge the elephant

http://oris-rake.deviantart.com/

BoonDistraction

or

Page 27: Big data and mstr   bridge the elephant

Scalable, resilient, bit bucket

Page 28: Big data and mstr   bridge the elephant

Experimenting

© 20th Century Fox

Page 29: Big data and mstr   bridge the elephant

The Hadoop stack

HDFSHDFS

HB

ase

HB

ase

MapReduceMapReduceO

ozie

Ooz

ie

ZooK

eppe

r/ A

mba

riZo

oKep

per/

Am

bari

HCatalogHCatalog

PigPig HiveHive

Page 30: Big data and mstr   bridge the elephant

Hadoop Performance Reality

• Hadoop is batch oriented• HDFS access is fast but crude• MapReduce is powerful but has overheads

– ~30 second base response time– Too much latency in stack and processing model– Trade-off in optimization and latency

• MapReduce complex– Typically multiple Java routines

https://www.facebook.com/notes/facebook-engineering/under-the-hood-scheduling-mapreduce-jobs-more-efficiently-with-corona/10151142560538920

Page 31: Big data and mstr   bridge the elephant

SQL to the Rescue• So MapReduce is complicated

HDFSHDFS

HB

ase

HB

ase

MapReduceMapReduce

Ooz

ieO

ozie

ZooK

eppe

r/ A

mba

riZo

oKep

per/

Am

bari

HCatalogHCatalog

PigPig HiveHive

– use Hive (SQL) as the easy way out

Page 32: Big data and mstr   bridge the elephant

Hive• Simplifies access

“Hive is great, but Hadoop’s execution engine

makes even the smallest queries take minutes!”

• Only basic SQL support• Concurrency needs careful system admin• It’s not a silver bullet for interactive BI usage

Page 33: Big data and mstr   bridge the elephant

Hadoop just too slow for interactive BI!

…loss of train-of-thought

Conclusion

“while hadoop shines as a processing

platform, it is painfully slow as a query tool”

Page 34: Big data and mstr   bridge the elephant

Hive is based on Hadoop which is a batch processing system. Accordingly, this system does not and cannot promise low latencies on queries. The paradigm here is strictly of submitting jobs and being notified when the jobs are completed as opposed to real time queries. As a result it should not be compared with systems like Oracle where analysis is done on a significantly smaller amount of data but the analysis proceeds much more iteratively with the response times between iterations being less than a few minutes. For Hive queries response times for even the smallest jobs can be of the order of 5-10 minutes and for larger jobs this may even run into hours.

I remain skeptical on the practical performance of the Hive query approach and have yet to talk to any beta customers. A more practical approach is loading some of the Hadoop data into the in-memory cube with the new Hadoop connector.

Page 35: Big data and mstr   bridge the elephant
Page 36: Big data and mstr   bridge the elephant

Why can’t Hadoopbe in-memory?Why can’t I have a

giant icubes?

Page 37: Big data and mstr   bridge the elephant

Lots of these

Not so many of these

Remember…

Hadoop inherently disk oriented

Typically low ratio of CPU to Disk

Page 38: Big data and mstr   bridge the elephant

Larger cubes

Issues: Time to Populate, Proliferation

Page 39: Big data and mstr   bridge the elephant

Analytics requires CPU,RAM keeps the data close

Alternative - In-memory Processing

Cores do the work!Scale with the data

Page 40: Big data and mstr   bridge the elephant

Goals: Minimise Disruption, Cut Latency

• Don’t change the existing BI and analytics• Support more creative and dynamic BI• Don’t introduce yet more slow disk

– Help the DW investment• No complex ETL, just pull data as required• Pull data simply and intelligently from Hadoop• Simplify – less cubes, caches• Improve sharing of data• Increase concurrency and throughput

– Its all about queries per hour!• Minimal DBA requirement

Page 41: Big data and mstr   bridge the elephant
Page 42: Big data and mstr   bridge the elephant

Kognitio Hadoop Connectors

HDFS Connector• Connector defines access to hdfs file system• External table accesses row-based data

in hdfs• Dynamic access or “pin” data into memory• Selected hdfs file(s) loaded into memory

Filter Agent Connector• Connector uploads agent to Hadoop nodes• Query passes selections and relevant

predicates to agent• Data filtering and projection takes place

locally on each Hadoop node• Only data of interest is loaded into memory

via parallel load streams

Page 43: Big data and mstr   bridge the elephant

Centrally defined data modelsPersist data in natural storeFetch when needed, agileAvailable to all tools

Analytical power

BI – Central Governance

Page 44: Big data and mstr   bridge the elephant

Engineering for Success

Thomas Herbrich

Page 45: Big data and mstr   bridge the elephant

connect

www.kognitio.com

twitter.com/kognitiolinkedin.com/companies/kognitio

tinyurl.com/kognitio youtube.com/kognitio

NA: +1 855  KOGNITIOEMEA: +44 1344 300 770