big data - load csv file & query the ez way - hpcc systems
DESCRIPTION
A "How To" to load CSV files into HPCC Systems and query them. You can use this method to migrate your RDBMS data ,MySQL / Oracle / SQL, into HPCC Systems.TRANSCRIPT
HPCC Systems Loading csv Data
& QueryingBy Fujio Turner
@FujioTurner
BusinessDevelopmentCustomers1 20
Non-Indexed Full Data Set
http://hpccsystems.com/why-hpcc/benchmarks
Map/Reduce
SQL w/ JOINS
GraphDB
Machine Learning
Simple to Complex Queries
ECL (Enterprise Control Language) C++ based query language
“I’m sub-second fast.”
“I can query all or part of your
data.”
Thor RoxieHard Disk
Index(optional)Hard Disk
Index(optional) In-memory Index
SSD
Either/Both
Architecture
Load File into HPCC QueryFile
Example
http://catalog.data.gov/dataset/consumer-complaint-databaseCSV data sample source
Administrator Web GUI!on
Port 8010IP / Url of HPCC install
4. add ,\t
5.
1. Upload file*!2. Distribute to cluster!3. Name of file in cluster!4. Most CSV have \t!5. Push to cluster
*2GB file size limit through web No limit if uploaded via SOAP
Load !! ! ! Data
In Thor Cluster
Loaded*optional file rename
How do I Query HPCC Systems ? What Is ECL?
ECL (Enterprise Control Language) is a C++ based query language for use with HPCC Systems Big Data platform. ECLs syntax and format is very simple and easy to learn.!!
Note - ECL is very similar to Hadoop’s pig ,but!more expressive and feature rich.
Query w/ ECL
Com := DATASET(‘~test::complaints’,ComS, CSV(HEADING(1), SEPARATOR([',','\t'])));
ComS :=RECORD UNSIGNED3 ComplaintID; STRING23 Product; STRING38 State; …………………………. …………………………. STRING31Consumer_disputed; END;
Ma; //output
Ma := Com(State = ‘MA’);
WHERE `State` = ‘MA’
File Type
File Location,!“FROM Table”“USE DATABASE;”
“SELECT * ….”
Schema
1. Go to playground!2. Edit ECL!3. Pick “thor” Cluster!4. Submit
http://www.meetup.com/HPCC-SV/pages/ECL_EXAMPLE__-_CSV_LOAD_and_QUERY
Practice
Schema Made EZ
http://hpccsystems.com/demos/data-profiling-demo
CSV IN
Schema OUTClick
Storing a new file and want to make a quick schema? !
Take a small part of your CSV data and go to the link below to make an ECL Schema
ECL Guidehttp://hpccsystems.com/download/docs/ecl-language-referenceJOIN!
MERGE!LENGTH!REGEX!
ROUND!SUM!
COUNT!TRIM!WHEN!
AVE!ABS!
CASE!DEDUP!
NORMALIZE!DENORMALIZE!
IF!SORT!
GROUP!more ….
http://www.slideshare.net/FujioTurner/
For More HPCC!“How To’s”!
Go to SlideShare
http://www.youtube.com/watch?v=8SV43DCUqJg
Watch how to install HPCC Systems
in 5 Minutes
Download HPCC Systems Open Source
Community Edition
or
Source Codehttps://github.com/hpcc-systems
http://hpccsystems.com/download/