big data - load csv file & query the ez way - hpcc systems

15
HPCC Systems Loading csv Data & Querying By Fujio Turner @FujioTurner

Upload: fujio-turner

Post on 15-Jan-2015

261 views

Category:

Technology


3 download

DESCRIPTION

A "How To" to load CSV files into HPCC Systems and query them. You can use this method to migrate your RDBMS data ,MySQL / Oracle / SQL, into HPCC Systems.

TRANSCRIPT

Page 1: Big Data - Load CSV File & Query the EZ way - HPCC Systems

HPCC Systems Loading csv Data

& QueryingBy Fujio Turner

@FujioTurner

Page 2: Big Data - Load CSV File & Query the EZ way - HPCC Systems

BusinessDevelopmentCustomers1 20

Non-Indexed Full Data Set

http://hpccsystems.com/why-hpcc/benchmarks

Page 3: Big Data - Load CSV File & Query the EZ way - HPCC Systems

Map/Reduce

SQL w/ JOINS

GraphDB

Machine Learning

Simple to Complex Queries

ECL (Enterprise Control Language) C++ based query language

Page 4: Big Data - Load CSV File & Query the EZ way - HPCC Systems

“I’m sub-second fast.”

“I can query all or part of your

data.”

Thor RoxieHard Disk

Index(optional)Hard Disk

Index(optional) In-memory Index

SSD

Either/Both

Architecture

Page 5: Big Data - Load CSV File & Query the EZ way - HPCC Systems

Load File into HPCC QueryFile

Example

http://catalog.data.gov/dataset/consumer-complaint-databaseCSV data sample source

Page 6: Big Data - Load CSV File & Query the EZ way - HPCC Systems

Administrator Web GUI!on

Port 8010IP / Url of HPCC install

Page 7: Big Data - Load CSV File & Query the EZ way - HPCC Systems

4. add ,\t

5.

1. Upload file*!2. Distribute to cluster!3. Name of file in cluster!4. Most CSV have \t!5. Push to cluster

*2GB file size limit through web No limit if uploaded via SOAP

Load !! ! ! Data

Page 8: Big Data - Load CSV File & Query the EZ way - HPCC Systems

In Thor Cluster

Loaded*optional file rename

Page 9: Big Data - Load CSV File & Query the EZ way - HPCC Systems

How do I Query HPCC Systems ? What Is ECL?

ECL (Enterprise Control Language) is a C++ based query language for use with HPCC Systems Big Data platform. ECLs syntax and format is very simple and easy to learn.!!

Note - ECL is very similar to Hadoop’s pig ,but!more expressive and feature rich.

Page 10: Big Data - Load CSV File & Query the EZ way - HPCC Systems

Query w/ ECL

Com := DATASET(‘~test::complaints’,ComS, CSV(HEADING(1), SEPARATOR([',','\t'])));

ComS :=RECORD UNSIGNED3 ComplaintID; STRING23 Product; STRING38 State; …………………………. …………………………. STRING31Consumer_disputed; END;

Ma; //output

Ma := Com(State = ‘MA’);

WHERE `State` = ‘MA’

File Type

File Location,!“FROM Table”“USE DATABASE;”

“SELECT * ….”

Schema

Page 11: Big Data - Load CSV File & Query the EZ way - HPCC Systems

1. Go to playground!2. Edit ECL!3. Pick “thor” Cluster!4. Submit

http://www.meetup.com/HPCC-SV/pages/ECL_EXAMPLE__-_CSV_LOAD_and_QUERY

Practice

Page 12: Big Data - Load CSV File & Query the EZ way - HPCC Systems

Schema Made EZ

http://hpccsystems.com/demos/data-profiling-demo

CSV IN

Schema OUTClick

Storing a new file and want to make a quick schema? !

Take a small part of your CSV data and go to the link below to make an ECL Schema

Page 13: Big Data - Load CSV File & Query the EZ way - HPCC Systems

ECL Guidehttp://hpccsystems.com/download/docs/ecl-language-referenceJOIN!

MERGE!LENGTH!REGEX!

ROUND!SUM!

COUNT!TRIM!WHEN!

AVE!ABS!

CASE!DEDUP!

NORMALIZE!DENORMALIZE!

IF!SORT!

GROUP!more ….

Page 14: Big Data - Load CSV File & Query the EZ way - HPCC Systems

http://www.slideshare.net/FujioTurner/

For More HPCC!“How To’s”!

Go to SlideShare

Page 15: Big Data - Load CSV File & Query the EZ way - HPCC Systems

http://www.youtube.com/watch?v=8SV43DCUqJg

Watch how to install HPCC Systems

in 5 Minutes

Download HPCC Systems Open Source

Community Edition

or

Source Codehttps://github.com/hpcc-systems

http://hpccsystems.com/download/