how to quickly and easily draw value from big data sources_q3 symposia(moa)
TRANSCRIPT
How to Quickly and Easily Draw Value from Big Data Sources
Moacyr PassadorSenior Sales Engineer
DevelopersAnalystsConsumers Data Scientists
Business Users IT Users
Types of BI Users
The Old BI World
Today’s BI World
Business Users are getting more involved in producing analytical content
The Role Of Business Users In BI Today Has Greatly Evolved
RelationalDatabases
MapReduce & NoSQL
Multi-Dimensional & Other BI Tools
Cloud Applications
DepartmentalData
Social Media
Business Users Today Want Direct Access To More DataTo make insightful decisions on their own, business users demand instant access to Data from multiple enterprise sources
&
>100xMore contentcreation and consumption
5-10xMore Content
5-10xMore Content Creators
5-10xMore Sharing
More productiveMore content per creator
More producersMore users can create content
More collaborativePeer-to-peer sharing
&
Adoption Of Self-service Analytics By Business Users Increases Productivity
Introduction to Big Data
What is Big Data, Really?The Three Vs of Big Data According to Gartner
Volume• Orders of magnitude bigger than conventional data (Terabytes, Petabytes,
Exabytes)
• Cost-prohibitive or practically impossible to store, manage or analyze in typical database software
Variety• Structured, semi-structured, unstructured formats
• Diverse sources - complex event processing, application logs, machine sensors, social media data
Velocity• Speed of ingesting incoming data streams
• Processing and real-time analysis of streaming and complex event data
Volume
Variety
Velocity
Four broad categories of Big Data sources and their value
Traditionalsourcesbecomingbigger
Company,Government,Financialsector,Businessandconsumerstudies,Surveys,Polls
Allbusinessperformancedrivers– Operationalefficiency,Revenuemanagement,Strategicplanning
SOURCE
VALUE
Digitalexhaustfrominteractions
Onlineclick-stream,Applicationlogs,Call/servicerecords,IDscans,Securitycameras
Newrevenuesources,Consumerpromotions,Riskmanagement,Frauddetection
SOURCE
VALUE
Web2.0phenomenon
Contentgeneratedfromsocialmediaposts,tweets,blogs,pictures,videos,ratings
Customerengagement,Customerservice,Brandmanagement,Viralmarketing
SOURCE
VALUE
Internetofthings
Machinegeneratedsensordataand“connecteddevice”communication
Operationalefficiency,Costcontrol,Riskavoidance
SOURCE
VALUE
Business Oriented Use Cases for Big Data
SOURCE
VALUE
Data Lakes and MicroStrategy
• Manicured, Often Relational
• Known Data Volumes
• Expected Formats
• Little to No Change
DATA SOURCES ETL DATA WAREHOUSE BI & ANALYTICS
• Complex Structures
• Rigid Transformations
• Extensive Monitoring Required
• Transformed Historical to Read Structures
• Flat, Canned Access to Data
• Report Chaos
• Extensive Data Load Delays
• Inflexible with new sources
Traditional Approaches And Current State Of A DWH
• Increase in Data types
• Rising Data Volumes
• Pressure on the DWH
• Constant change
DATA SOURCES ETL DATA WAREHOUSE BI & ANALYTICS
• Delay in reacting to new sources
• Monitoring is abandoned
• Transformations cant keep pace
• Repaid, Adjust and Redesign ETL
• Reports become invalid
• Delay in updates
• Users seek silos
• Business is disconnected w/ IT
Challenges With Traditional DWHs With Growing Data Demands
A storage repository, usually in Hadoop, that holds a vast amount of raw data in its native format until it is needed
• Low cost• Flexible and Loosely governed• No need to decide up front what data to store or for how long• Contains a mix of structured, semi-structured and unstructured data• Allows for freeform data exploration without having to wait for ETL
“If you think of a datamart as a store of bottled water – cleansed and packaged and structured for easy consumption – the data lake is a large body of water in a more natural state. The contents of the data lake stream in from a source to fill the lake, and various
users of the lake can come to examine, dive in, or take samples.” -- James Dixon
What is a Data Lake?
Data is an asset• Today many organizations discard data due to cost of storage even though business
value may be mined from the data in the future• A Data Lake allows you to store and process data essentially indefinitely
Unification• Data Discovery, Data Science and Enterprise BI are treated as silos in many
organizations today• A Data Lake can help unify these concepts and allow cross team collaboration
Utility• A Data Lake creates the possibility of answering future questions of your data without
knowing the question in advance
Why Have A Data Lake?
Building the lake is easy / using it is hard
Governance is still key• Without a meaningful strategy for maintaining
data quality this strategy can quickly fail• You can quickly create a Data Swamp (dirty)
or a Data Graveyard (useless)
Sandbox strategy
Beware Of The Swamp
• Segment portions of your lake for experiments, testing and data that may fall outside of your standard governance process
Analytical Datamarts
Relational Databases
MDM
ETL and G
overnance
Data WarehouseStructured
Enterprise Data
Operational Data
Enterprise Metadata
MicroStrategyAnalytics Platform
Enterprise Applications
Traditional Data Architecture
Data Lake
Enterprise Metadata
Prime
ELT with
Governance
ETL
Relational DataEnterprise
Applications
Cloud-based dataWeb LogsFlat Files
Analytical Datamarts
MicroStrategyAnalytics Platform
In Comes The Data Lake
⎸07062016
Build Analytics and Mobility Applications Using Data Stored with Big Data
Hadoop Vendors SQL on Hadoop NoSQL Databases
Elastic Map Reduce
The MicroStrategy platform empowers organizations to build applications that leverage big data and Hadoop distributions. All of the major Hadoop distributions are certified to work with MicroStrategy and once connected, data stored in Hadoop becomes just like any other data source.
Users can connect using Hive, Pig, or proprietary SQL-on-Hadoop connectors like Cloudera Impala or IBM BigInsights.
The MicroStrategy big data engine can natively tap into HDFS, generating schema on-read and making Hadoop suitable for ad-hoc querying. It also enables parallel loading of data from HDFS, resulting in high-performance data loading.
MicroStrategy’s native connectivity saves users from the tedious process of ETL from HDFS to Hive and helps to overcome the ODBC overhead associated with Hive.
MicroStrategy’s data wrangling capability lets users cleanse and refine their big data directly in MicroStrategy’s data discovery tool.
Big DataEnterprise assets
Mobility appsthat source information from multiple locations, and submit
transactions to your ERP systems
Analytics applications that blend data from databases and big data and deliver insights to users via reports, dashboards,
and apps
16
MicroStrategy Platform
⎸07062016
MicroStrategy allows organizations to easily access and analyze data in all shapes and sizes, from a single place. Business and IT users can easily blend multiple data-sources including big data sources in seconds. From personal spreadsheets to cloud sources, to even HDFS, big data access is made quick and easy with native HDFS connectors or via any flavor of hive products including Cloudera, Hortonworks, MapR, Spark and more.
Batch SQL: Fulfill your batch processing needs with certified Hive/ODBC drivers from different Hadoop distributors: Cloudera, Hortonworks, MapR, and Amazon EMR
Interactive SQL: Leverage advanced SQL on Hadoop technologies for interactive queries such as Cloudera Impala, MapR Drill, Apache Spark, IBM BigInsights, Pivotal HAWQ, and Facebook Presto
No SQL: Connect, query, and analyze data from No SQL sources such as HBase and Cassandra
Search: Dynamically search on semi-structured and unstructured data with MicroStrategy’s integration to Apache Solr and Splunk
17
MicroStrategy Analytics Platform
Distributed File Systems (HDFS, Amazon S3, GFS…)
No SQL Batch SQL Interactive SQL Search
Big Data Analytics for Most Common ScenariosBig data is just another data source
Product Demonstration
5 Differentiators for Big Data Analytics
“Leverage MicroStrategy's improved data discovery capabilities, as an alternative to augmenting your BI portfolio with products such as Tableau and Qlik, to lower cost of
ownership and improve enterprise self-service via a single, broader solution.”
Gartner Research Note – December 11, 2015
Enterprise data access with complete data governance
Self-servicedataexplorationandproductiondashboards
Useraccessibleadvancedandpredictiveanalytics
Analysisofsemi-structuredandunstructureddata
Real-timeanalysisfromliveupdatingdata
1
2
3
4
5
Five Differentiators for MicroStrategy in Big Data AnalyticsThe MicroStrategy Analytics Platform enables every business user to get these capabilities
Optimized Access to Your Entire Big Data Ecosystem as If It Were a Single Database
Data Warehouse Appliances
MapReduce & NOSQL Databases
Relational Databases
MultidimensionalDatabases
ColumnarDatabases
SaaS-Based App Data
HANA
BigInsights
Parallel Data Warehouse
Elastic Map Reduce
Analysis Services
Redshift
Brin
g A
ll R
elev
ant
Dat
a to
Dec
isio
n M
aker
s
Distribution
Zendesk
HDFS
Generic Web Services SOAP REST
User / Departmental Data
1. Enterprise data access with complete data governance
Stunning Visualizations Instant Query Results Effortless Dashboards No IT Needed
Lightning fast insights, easy for everyone
2. Self-servicedataexplorationandproductiondashboards
User / Departmental Data
Data Warehouse Appliances
MapReduceDatabases
RelationalDatabases
MultidimensionalDatabases
ColumnarDatabases
SaaS-BasedApp Data
MicroStrategyMultisource Engine
2 & 3
Join data on-the-fly.
No need to move it to a staging database first.
Access your entire Big Data ecosystem as if it were a single databaseCombine Data from Multiple Sources
23
1
21
23
MicroStrategy Supports Data Discovery at ScaleStart with Departmental teams and grow exponentially to publish to 1000s of users without fear
24
PUBLISH
TeamDepartment
Enterprise
Value Chain
10s 100s 1,000s 10,000+
MicroStrategy Desktop
Proven Scalability
Built-in clustering, failover, and comprehensive administrative tools for performance optimization
In-Memory Performance
Tested sub-second response times on web and mobile, even at highest user volume
Advanced Monitoring
Admin tools to automate, report and alert on system utilization
Content Personalization
Users only see relevant data, and only access functionality they are authorized to use
Extend MicroStrategy’s Sophisticated Analytical Capabilities with 3rd party statistics toolkits
25
Industry’s most powerful SQL Engine and 300+ native analytical functions
Projections
Relationship Analysis
Benchmarking
Trend Analysis
Data Summarization
Anal
ytic
al M
atur
ity
What is likely to happen based on past history?
What factors influence activity or behavior?
How are we doing versus comparables?
What direction are we headed in?
What is happening in the aggregate?
Optimization What do we want to happen?
World’s most popular advanced analytics tool.
Free, open source.
More
Specialty Tools
3. Business User Friendly Predictive and Advanced Analytics
Streaming AnalyticsInteractive Search Text Analytics
Quicklyinvestigate:
• Websitelogs
• Applicationusage
• Surveysandfreeformtextfields
• Eventanderrormonitoringlogs
80% of data in most businesses is unstructured and this proportion will keep on rising
4. Analysisofsemi-structuredandunstructureddata
Find keyword and event occurrences in any data
Apply semantic and syntactic models to text data
Assess rapidly changing data streams
Extractrelevantinformationto:
• Optimizesearchenginemarketing
• Understandsentimentontopics
• Geta360degreeviewofcustomers
• Detectfraud
Analyzeanarrayofdatafrom:
• Sensorsanddevices
• Images,audio,andvideo
• Emailanddocumentmanagementsystems
• Otheroperationalandtransactionaldata
New live data update technology
5. Real-timeanalysisfromliveupdatingdata
27
⎸07062016
Native HDFSConnector
Native Big Data Wrangling
Support for searchsources
In-memory parallel architecture
MicroStrategy
Tableau
Qlik
Power BI
IBM Cognos
SAP BOBJ
Oracle OBIEE
MicroStrategy enables organizations to quickly harness the value of big data by deploying analytics at scaleBig Data Analytics: Product Differentiators
28
Product Demonstration
Powerful data preparation for more accurate analysisEmpowering data analysts to deliver deeper insights with intuitive and integrated data wrangling capabilities
Data integration in the hands of every userAccess and combine data from multiple sources “on-the-fly” to drive more productivity
MicroStrategy Prime – Analyze more data in memoryIn-memory engine tightly coupled to the underlying DB
MicroStrategy Multi-SourceEffectively navigate data across multiple data sources
MicroStrategy Enabling Technologies for Big Data
Empowering data analysts to deliver deeper insights with intuitive and integrated data wrangling capabilities
31
Streamlined workflows to parse and prepare data
Hundreds of inbuilt functionsto profile and clean data
Multi-Table in-memory support from different sources
Automatically parse and preparedata with every refresh
Create custom groups On the fly and without coding
Local / URL Files
Hadoop
Data Preparation
New
Inte
grat
ed D
ata
Pre
para
tion
Cap
abili
ty
Salesforce
Twitter/Facebook
Powerful Data Preparation For More Accurate Analysis
Source 1
Source 2
Source 3
Source 4Public Data
Dat
a Bl
endi
ng
Data Access
Live Connection
In-memory Data
OR
Other BI Tools
SaaS Data
Native HDFS
Access and combine data from multiple sources “on-the-fly” to drive more productivity
Data Integration In The Hands Of Every User
Data Upload 4x Faster
Server
With MicroStrategy 9.xSerial access to in-memory data
Database
OLAP Cube
With MicroStrategy 10Multi-threaded access to in-memory data
Database
PRIME
Server
2B 2B | 2B . . . 2B
Data Volumes 80x Larger
………Core 1 Core 2 Core 16CPU………Core 1 Core 2 Core 16
CPU
Data Interactions 50% faster
Bottleneck …… Up to 8 parallel threads
In-memory engine tightly coupled to the underlying DB
MicroStrategy Prime – Analyze More Data In Memory
Seamlessly traverse multiple DBsEnd user agnostic
Effectively use aggregatesAutomatic navigation
Works with different source typesMove from Hadoop to Relational
Metadata drivenNo need to write SQL
Effectively Navigate Data Across Multiple Data Sources
MicroStrategy Multi-Source
Data lakes are a powerful toolMicroStrategy is ready to support your hybrid architecture
Meet the needs of multiple user personasEnterprise BI capabilities combined with ad-hoc data discovery
A scalable solution for all workloadsHighly scalable in memory engine to combine data from multiple sources
MicroStrategy Multi-SourceEffectively navigate data across multiple data sources
Summary
Questions
Thank you