big data for small businesses & startups
DESCRIPTION
Big Data is not just for Big Businesses. In this slideshare we will cover how small businesses and startups can leverage Big Data to increase revenue. HPCC Systems lets you get started with only one machine and grow to exabytes. 1. Mining and understanding customers behavior from data outside the firewall and joining it with internal data to turn it into actionable marketing strategies. 2. Understanding your whole business with BI tools. Learn how Big Data help join data from different parts of your business to see the big picture.TRANSCRIPT
HPCC Systems !Big Data for Small
Business
By Fujio Turner
@FujioTurner
LexisNexis is a provider of legal, tax, regulatory, news, business information, and analysis to legal, corporate, government,!
accounting and academic markets. !!
LexisNexis has been in business since 1977 with over 30,000 employees worldwide.
What is HPCC Systems?Who is ?
LexisNexis Risk is the division of the LexisNexis which focuses on data, Big Data processing, linking and vertical expertise and supports HPCC Systems as an open source project under Apache 2.0 License.
3 Parts to Big Data
Complex Query
SpeedAmount
3 Parts to Big Data
Complex Query
SpeedAmount
+ Low CostLet focus on
Comparison
JAVA C++
Petabytes
1-80,000 Jobs/day
Since 2005
Exabytes
Non-Indexed 4X-13X
Since 2000
Indexed: 2K-3K Jobs/sec
? ? ? ? ? ?
Thor Roxie
Speed
Block Based File Based
BusinessDevelopmentCustomers1 20
Non-Indexed Full Data Set
http://hpccsystems.com/why-hpcc/benchmarks Low Cost
Map/Reduce
SQL w/ JOINS
GraphDB
Machine Learning
Simple to Complex Queries
ECL (Enterprise Control Language) C++ based query language
Complex
Your!Business Analytics
You know whats going on inside your business
Your!Business
What do customers think
before interacting with you?
What do customers think after interacting
with you?
Analytics
You know whats going on inside your business
Your!Business
Before After
Analytics
?Your!Competition
You know whats going on inside your business
Power of Relationships
Friends
Other
Your Customers
Not Your Customers
CustomersBasic Social Media
Who are they following?
Customers Following
Followers (50)
Followers (13M)
Followers (5.8M)
Basic Social Media
Customers Friend
John Legend
News Source
Lets focus on one person
@waterboy11
Customers Following
Followers (5.8M)
Basic Social Media
John Legend
!15 - 20% of all your customers
follow John Legend
Customers Following
Followers (5.8M)
Basic Social Media
John Legend!
What % of John Legends !followers ..
.. are friends of my customers?
.. have heard of my company?
.. behave like my customers?
.. purchasing triggers are the same as my customers?
How do I Query HPCC Systems?ECL (Enterprise Control Language) is a C++ based query language for use with HPCC Systems Big Data platform. ECLs syntax and format is very simple and easy to learn.!!
Note - ECL is very similar to Hadoop’s pig ,but!more expressive and feature rich.
! {'waterboy11',[{'nytimes'},{'jsonlover'},{'johnlegend'}]}
Parent Data Set!“Customer”
Child Data Set!“List of who my customer is following”
HPCC Systems does De-normalized Data
//Schema childrec := {string15 users}; parentrec := {String15 customers, dataset(childrec) following}; !!!//Data cAll:= dataset([ {'waterboy11',[{'nytimes'},{'jsonlover'},{'johnlegend'}]}, {'rocky-o',[{'paulie'}, {'mickey'}]}, {'coolguy60', [{'walter78'}, {'johnlegend'}]} ], parentrec); output(cAll); //Output
“USE DATABASE;” (inline data or file)
Schema of CustomersSchema of who my customer is following
Use above schema on this inline data
Results
//Schema childrec := {string15 users}; parentrec := {String15 customers, dataset(childrec) following}; !!!//Data cAll := dataset([ {'waterboy11',[{'nytimes'},{'jsonlover'},{'johnlegend'}]}, {'rocky-o',[{'paulie'}, {'mickey'}]}, {'coolguy60', [{'walter78'}, {'johnlegend'}]} ], parentrec); //Filter legend := cAll(EXISTS(following(users = 'johnlegend'))); output(legend); //Output
Which customers are following “John Legend”?
Which customers are following “John Legend”?
Results
//Schema childrec := {string15 users}; parentrec := {String15 customers, dataset(childrec) following}; !//Data cAll := dataset([ {'waterboy11',[{'nytimes'},{'jsonlover'},{'johnlegend'}]}, {'rocky-o',[{'paulie'}, {'mickey'}]}, {'coolguy60', [{'walter78'}, {'johnlegend'}]} ], parentrec); !follower1 := GROUP(SORT(cAll.following,users)); follower2 := TABLE(follower1 , {users , uC := COUNT(GROUP)},users); !//Output output(follower2); //Output
Who are my customers following the most?
Who are my customers following the most?
Results
Your Customers
Advanced Social Media
How many different types of customers do you have? How did red customers become my customers?
X = Price Y = Discount %
Machine Learning!!
Grouping customers by attributes
http://cdn.hpccsystems.com/pdf/machinelearning.pdf
Your Customers
Advanced Social Media
How many different types of customers do you have? How did red customers become my customers?
X = Price Y = Discount %
Machine Learning!!
Grouping customers by multiple attributes
X = Price Y = Discount % Z = Zip Code
IMPORT * FROM ML; IMPORT * FROM ML.Cluster; IMPORT * FROM ML.Types; //{gender,age,price} // gender 1 = male , 2 = female , 1-99 age , Price Ex. $7 x2 := DATASET([ {1, 18, 11}, {1, 35, 30}, {2, 50, 12}, {2, 36, 7}, {1, 23, 67}, {2, 34, 44}, {1, 29, 70}, {1, 18, 20}, {1, 45, 90}, {2, 21, 95}, {2, 34, 44}, {1, 51, 34}, {2, 26, 9}, {2, 32, 76}],NumericField); c := DATASET([ {1, 20, 20}, {1, 30, 60}, {1, 50, 50}, {2, 20, 25}, {2, 35, 37}, {2, 50, 60}],NumericField); x3 := Kmeans(x2,c); OUTPUT(x3); http://hpccsystems.com/ml/ml-getting-started
Machine Learning!Built-in
Your Customers Not Your Customers
Advanced Social Media
How many different types of customers do you have? How did red customers become my customers?
How many non-customers are red ?
? ?
X = Price Y = Discount %
X = Price Y = Discount % Z = Zip Code
Combine!+
//Schema childrec := {string15 users}; parentrec := {String15 customers, dataset(childrec) following}; parentrec2 := {String15 noncustomers, dataset(childrec) following}; //Data cAll := dataset([ {'waterboy11',[{'nytimes'}, {'jsonlover'},{'johnlegend'}]}, {'rocky-o',[{'paulie'}, {'mickey'}]}, {'coolguy60', [{'walter78'}, {'johnlegend'}]} ], parentrec); ncAll := dataset([ {‘fish99',[{'cake_bake'}]}, {'johnlegend',[{'ralphy1'}, {‘fixxyman’},…….]}, {'nytimes', [{‘sub_lime9'}, {‘johnlegend’},…..]} ], parentrec2); j1:= JOIN(cAll, ncAll,LEFT.users = RIGHT.noncustomers, JoinThem(LEFT,RIGHT)); Output(j1); //Output
Combine Customers and Non-Customers
Your Customers Not Your Customers
Advanced Social Media
? ?
Correct Sales Pipe Line
Better Predict Conversion
Plugin for HPCC Systems
http://hpccsystems.com/products-and-services/products/plugins/r-integration
Plug n Play BI Tools
Social Meda CRMAnalytic3rd Party API
http://hpccsystems.com/products-and-services/products/plugins/ecl-pentaho-data-integration
http://www.slideshare.net/Engauge/pinterest-is-social-medias-newest-sweetheart-the-real-deal-11978662
+ HPCC Systems
Users
Use Case - Pinterest
http://www.slideshare.net/FujioTurner/
For More HPCC!“How To’s”!
Go to SlideShare
http://www.youtube.com/watch?v=8SV43DCUqJg
Watch how to install HPCC Systems
in 5 Minutes
Download HPCC Systems Open Source
Community Edition
or
Source Codehttps://github.com/hpcc-systems
http://hpccsystems.com/download/