a new generation of data transfer tools for hadoop: sqoop 2
TRANSCRIPT
![Page 1: A New Generation of Data Transfer Tools for Hadoop: Sqoop 2](https://reader033.vdocuments.site/reader033/viewer/2022060810/62984cfb09e6b418e6776263/html5/thumbnails/1.jpg)
A New Generation of Data Transfer Tools for Hadoop:
Sqoop 2
Arvind Prabhakar | Kathleen Ting
![Page 2: A New Generation of Data Transfer Tools for Hadoop: Sqoop 2](https://reader033.vdocuments.site/reader033/viewer/2022060810/62984cfb09e6b418e6776263/html5/thumbnails/2.jpg)
2
Who Are We? Arvind Prabhakar
Apache Sqoop Committer, PMC Chair, ASF Member Engineering Manager, Cloudera [email protected], @aprabhakar
Kathleen Ting Apache Sqoop Committer, PMC Member Customer Operations Engineering Manager, Cloudera [email protected], @kate_ting
![Page 3: A New Generation of Data Transfer Tools for Hadoop: Sqoop 2](https://reader033.vdocuments.site/reader033/viewer/2022060810/62984cfb09e6b418e6776263/html5/thumbnails/3.jpg)
3
What is Sqoop?
Apache Top-Level Project SQl to hadOOP Tool to transfer data from relational databases
Teradata, MySQL, PostgreSQL, Oracle, Netezza
To Hadoop ecosystem HDFS (text, sequence file), Hive, HBase, Avro
And vice versa
![Page 4: A New Generation of Data Transfer Tools for Hadoop: Sqoop 2](https://reader033.vdocuments.site/reader033/viewer/2022060810/62984cfb09e6b418e6776263/html5/thumbnails/4.jpg)
Why Sqoop? Efficient/Controlled resource utilization
Concurrent connections, Time of operation
Datatype mapping and conversion Automatic, and User override
Metadata propagation Sqoop Record Hive Metastore Avro
4
![Page 5: A New Generation of Data Transfer Tools for Hadoop: Sqoop 2](https://reader033.vdocuments.site/reader033/viewer/2022060810/62984cfb09e6b418e6776263/html5/thumbnails/5.jpg)
Sqoop 1
5
![Page 6: A New Generation of Data Transfer Tools for Hadoop: Sqoop 2](https://reader033.vdocuments.site/reader033/viewer/2022060810/62984cfb09e6b418e6776263/html5/thumbnails/6.jpg)
Sqoop 1 Based on Connectors
Responsible for Metadata lookups, and Data Transfer
Majority of connectors are JDBC based Non-JDBC (direct) connectors for optimized data
transfer
Connectors responsible for all supported functionality HBase Import, Avro Support, ...
6
![Page 7: A New Generation of Data Transfer Tools for Hadoop: Sqoop 2](https://reader033.vdocuments.site/reader033/viewer/2022060810/62984cfb09e6b418e6776263/html5/thumbnails/7.jpg)
7
Sqoop 1 Challenges
Cryptic, contextual command line arguments Security concerns Type mapping is not clearly defined Client needs access to Hadoop binaries/
configuration and database JDBC model is enforced
![Page 8: A New Generation of Data Transfer Tools for Hadoop: Sqoop 2](https://reader033.vdocuments.site/reader033/viewer/2022060810/62984cfb09e6b418e6776263/html5/thumbnails/8.jpg)
Sqoop 1 Challenges Non-uniform functionality
Different connectors support different capabilities
Overlap/Duplicated functionality Different connectors may implement same
capabilities differently
High Coupling with Hadoop Database vendors required to understand Hadoop
idiosyncrasies in order to build connectors.
8
![Page 9: A New Generation of Data Transfer Tools for Hadoop: Sqoop 2](https://reader033.vdocuments.site/reader033/viewer/2022060810/62984cfb09e6b418e6776263/html5/thumbnails/9.jpg)
Sqoop 2
9
![Page 10: A New Generation of Data Transfer Tools for Hadoop: Sqoop 2](https://reader033.vdocuments.site/reader033/viewer/2022060810/62984cfb09e6b418e6776263/html5/thumbnails/10.jpg)
Sqoop 2 – Design Goals Ease of Use
Uniform functionality Domain Specific Interactions
Ease of Extension No low-level Hadoop Knowledge Needed No functional overlap between Connectors
Security and Separation of Concerns Role based access and use
10
![Page 11: A New Generation of Data Transfer Tools for Hadoop: Sqoop 2](https://reader033.vdocuments.site/reader033/viewer/2022060810/62984cfb09e6b418e6776263/html5/thumbnails/11.jpg)
11
Sqoop 2: Connection vs Job metadata
There are two distinct sets of options to pass in to Sqoop: Connection (distinct per database) Job (distinct per table)
![Page 12: A New Generation of Data Transfer Tools for Hadoop: Sqoop 2](https://reader033.vdocuments.site/reader033/viewer/2022060810/62984cfb09e6b418e6776263/html5/thumbnails/12.jpg)
Sqoop 2: Workings Connectors Register Metadata Metadata enables creation of Connections
and Jobs Connections and Jobs stored in Metadata
Repository Operator runs Jobs that use appropriate
connections Admins set policy for connection use 12
![Page 13: A New Generation of Data Transfer Tools for Hadoop: Sqoop 2](https://reader033.vdocuments.site/reader033/viewer/2022060810/62984cfb09e6b418e6776263/html5/thumbnails/13.jpg)
13
Sqoop 2: Security
Support for secure access to external systems via role-based access to connection objects Administrators create/edit/delete connections Operators use connections
![Page 14: A New Generation of Data Transfer Tools for Hadoop: Sqoop 2](https://reader033.vdocuments.site/reader033/viewer/2022060810/62984cfb09e6b418e6776263/html5/thumbnails/14.jpg)
Sqoop 2: Usability & Extensibility Connections and Jobs use domain specific
inputs (Tables, Operations, etc.) Domain Isolation and thus easy to understand and
use
Connectors work with Intermediate Data Format
Any downstream functionality needed is provided by Sqoop Framework
14
![Page 15: A New Generation of Data Transfer Tools for Hadoop: Sqoop 2](https://reader033.vdocuments.site/reader033/viewer/2022060810/62984cfb09e6b418e6776263/html5/thumbnails/15.jpg)
15
Demo
![Page 16: A New Generation of Data Transfer Tools for Hadoop: Sqoop 2](https://reader033.vdocuments.site/reader033/viewer/2022060810/62984cfb09e6b418e6776263/html5/thumbnails/16.jpg)
16
Current Status: Sqoop 2
Primary focus of the Sqoop Community First cut: 1.99.1
bits and docs: http://sqoop.apache.org/
![Page 17: A New Generation of Data Transfer Tools for Hadoop: Sqoop 2](https://reader033.vdocuments.site/reader033/viewer/2022060810/62984cfb09e6b418e6776263/html5/thumbnails/17.jpg)
17