![Page 1: IBM Research ® © 2007 IBM Corporation A Brief Overview of Hadoop Eco-System](https://reader035.vdocuments.site/reader035/viewer/2022070414/5697c01f1a28abf838cd18da/html5/thumbnails/1.jpg)
IBM Research
®
© 2007 IBM Corporation
A Brief Overview of Hadoop Eco-System
![Page 2: IBM Research ® © 2007 IBM Corporation A Brief Overview of Hadoop Eco-System](https://reader035.vdocuments.site/reader035/viewer/2022070414/5697c01f1a28abf838cd18da/html5/thumbnails/2.jpg)
IBM Research | India Research Lab
Hive SQL-like language to query data stored on HDFS
Example – “Select c.ID, c.Name, c.AGE, o.Amount From Customers c JOIN Orders o on (c.ID = o.CUSTOMER)
Data Model Tables – Column types (int, float, string, data, Boolean)
Supports array / map / struct for Json like data
Meta-Store Name-space containing set of tables, list of columns and their types and SerDe info
CLI
Other languages – Jaql, Pig
![Page 3: IBM Research ® © 2007 IBM Corporation A Brief Overview of Hadoop Eco-System](https://reader035.vdocuments.site/reader035/viewer/2022070414/5697c01f1a28abf838cd18da/html5/thumbnails/3.jpg)
IBM Research | India Research Lab
HBase
Hadoop performs only Batch processing. Data will be accessed only in a sequential manner.
One has to search the entire dataset for the simplest of jobs. HBase provides random read/write access to data in HDFS Data Model –
A table is a collection of rows
A row is a collection of column families
A column family is a collection of columns
A column is a collection of key-value pairs
![Page 4: IBM Research ® © 2007 IBM Corporation A Brief Overview of Hadoop Eco-System](https://reader035.vdocuments.site/reader035/viewer/2022070414/5697c01f1a28abf838cd18da/html5/thumbnails/4.jpg)
IBM Research | India Research Lab
HBase
Reading – Get and Scan. Reader will always read the last written values
Rows are ordered.
Hbase is not an SQL database, relational, joins, secondary-indices,
Horizontally Scalable
![Page 5: IBM Research ® © 2007 IBM Corporation A Brief Overview of Hadoop Eco-System](https://reader035.vdocuments.site/reader035/viewer/2022070414/5697c01f1a28abf838cd18da/html5/thumbnails/5.jpg)
IBM Research | India Research Lab
![Page 6: IBM Research ® © 2007 IBM Corporation A Brief Overview of Hadoop Eco-System](https://reader035.vdocuments.site/reader035/viewer/2022070414/5697c01f1a28abf838cd18da/html5/thumbnails/6.jpg)
IBM Research | India Research Lab
Oozie Workflow management and coordination of these workflows
Workflow consist of Action nodes (MR, Pig, Hive) and Control Nodes. Specified through an xml file
![Page 7: IBM Research ® © 2007 IBM Corporation A Brief Overview of Hadoop Eco-System](https://reader035.vdocuments.site/reader035/viewer/2022070414/5697c01f1a28abf838cd18da/html5/thumbnails/7.jpg)
IBM Research | India Research Lab
Cascading and Scalding
![Page 8: IBM Research ® © 2007 IBM Corporation A Brief Overview of Hadoop Eco-System](https://reader035.vdocuments.site/reader035/viewer/2022070414/5697c01f1a28abf838cd18da/html5/thumbnails/8.jpg)
IBM Research | India Research Lab
Word-Count in Java
![Page 9: IBM Research ® © 2007 IBM Corporation A Brief Overview of Hadoop Eco-System](https://reader035.vdocuments.site/reader035/viewer/2022070414/5697c01f1a28abf838cd18da/html5/thumbnails/9.jpg)
IBM Research | India Research Lab
Apache Mahaout
![Page 10: IBM Research ® © 2007 IBM Corporation A Brief Overview of Hadoop Eco-System](https://reader035.vdocuments.site/reader035/viewer/2022070414/5697c01f1a28abf838cd18da/html5/thumbnails/10.jpg)
IBM Research | India Research Lab
Cascading
A simple, high-level java API for MR easy to understand and work with
![Page 11: IBM Research ® © 2007 IBM Corporation A Brief Overview of Hadoop Eco-System](https://reader035.vdocuments.site/reader035/viewer/2022070414/5697c01f1a28abf838cd18da/html5/thumbnails/11.jpg)
IBM Research | India Research Lab
Scalding
The power of scala over cascading
No boilerplate code
![Page 12: IBM Research ® © 2007 IBM Corporation A Brief Overview of Hadoop Eco-System](https://reader035.vdocuments.site/reader035/viewer/2022070414/5697c01f1a28abf838cd18da/html5/thumbnails/12.jpg)
IBM Research | India Research Lab
Sqoop
Apache Sqoop is designed for efficiently transferring bulk data between Apache Hadoop and RDBMS
Imports data from external structured datastores into HDFS or related systems like Hbase
![Page 13: IBM Research ® © 2007 IBM Corporation A Brief Overview of Hadoop Eco-System](https://reader035.vdocuments.site/reader035/viewer/2022070414/5697c01f1a28abf838cd18da/html5/thumbnails/13.jpg)
IBM Research | India Research Lab
Mahout