spark + hbase

Spark + HBase Bringing HBase Data Efficiently into Spark with DataFrame Support Zhan Zhang Software Engineer 04/08/2016

Upload: dataworks-summithadoop-summit

Post on 16-Apr-2017

1.890 views

Category:

Technology

1 download

Report

Download

Embed Size (px):

TRANSCRIPT

Spark + HBaseBringing HBase Data Efficiently into Spark with DataFrame Support Zhan ZhangSoftware Engineer04/08/2016

Page 2 © Hortonworks Inc. 2014

About Zhan Zhang

Zhan Zhang (Software Engineer at Hortonworks)

Currently Focus on Apache Spark and Hadoop, etc

Contribute to Apache Spark, Yarn, HBase, Ambari, etc

Experiences on Computer Networks, Distributed System and Machine Learning Platform

Page 3 © Hortonworks Inc. 2014

Why Revamp the Existing HBase Connector?

Limited Spark Support in HBase Upstream– Scalability– RDD level, but Spark is moving to DataFrame/Dataset– Data Loss and Data Duplication

Stability– Correctness– Stability Impact with Co-processor.– Serialized RDD Lineage to HBase– Maintenance Overhead: Internal Hacks

Page 4 © Hortonworks Inc. 2014

What Improvement Have We Made? Combine Spark and HBase

– Spark Catalyst Engine for Query Plan and Optimization– HBase for Fast Access KV Store– Implement Standard External Data Source with Built-in Filter

High Performance– Data Locality: Move Computation to Data– Partition Pruning: Task only Performed in RS Holding Requested Data– Column Pruning / Predicate Pushdown: Reduce Network Overhead

Full Fledged DataFrame Support– Spark-SQL– Integrated Language Query

Run on Top of Existing HBase Table– Native Support Java Primitive Types

Page 5 © Hortonworks Inc. 2014

More …

Composite Key

Avro Format

Customized Serdes

Page 6 © Hortonworks Inc. 2014

Usage - Define the Catalog

Header (Calibri Bold 28 pt)

Page 7 © Hortonworks Inc. 2014

Usage– Write to HBase

Page 8 © Hortonworks Inc. 2014

Usage– Construct DataFrame

Page 9 © Hortonworks Inc. 2014

Usage - Language Integrate Query

Page 10 © Hortonworks Inc. 2014

Usage - Spark SQL

Page 11 © Hortonworks Inc. 2014

Usage - With Other Data Sources

Page 12 © Hortonworks Inc. 2014

Page 13 © Hortonworks Inc. 2014

Header (Calibri Bold 28 pt)

Page 14 © Hortonworks Inc. 2014

Spark HBase Connector Architecture

Page 15 © Hortonworks Inc. 2014

Byte Array Order: SHORT/INT/LONG

0 21 … … MAX -2 -1MIN … …

WHERE X <= 2

WHERE X >= -2

Page 16 © Hortonworks Inc. 2014

Implementation

Partition Pruning: – Split into Multiple Range, e.g., WHERE X < 2

Data Locality: – Each RDD Partition Has Preferred Location

Column Pruning: – Required Column in Scan/BulkGet

Predicate Pushdown: – HBase Built-in Filters

Scan/BulkGets: – Grouped by Region Server

Page 17 © Hortonworks Inc. 2014

Page 18 © Hortonworks Inc. 2014

Page 19 © Hortonworks Inc. 2014

BACK UP

Page 20 © Hortonworks Inc. 2014

Kerberos Cluster Kerberos Ticket

Token Retrieval and Renewal

Long Running Service

Page 21 © Hortonworks Inc. 2014

FLOAT/DOUBLE: IEEE-754

0.0 0.2… … … MAX -2.0… MIN…

WHERE X <= 2.0D

WHERE X >= -2.0D

-0.0

HBase Meta Table

BigData Tools Seyyed mohammad Razavi. Outline Introduction Hbase Cassandra Spark Acumulo Blur MongoDB Hive Giraph Pig

Delivering Real-Time Data with Azure · 2019-10-16 · Azure Event Hubs Azure IoT Hub Apache Kafka Data Storage Azure ... Spark Streaming Analytical Data Store SQL DW Hbase Spark

HTrace: Tracing in HBase and HDFS (HBase Meetup)

Accelerating Apache Arrow and - RainFocus€¦ · projects like – Hbase, Impala, Kudu, Parquet, Phoenix, Spark, Storm etc. •Fast - Take Advantage of SIMD operations with better

IIHTiihttrichy.com/brochuers/BigdataHadoopCoursesBrochure.pdf · Java Fundamentals, Hadoop Fundamentals, HDFS, MapReduce, Spark, Hive, Pig and Latin, HBase, Sqoop, Yarn, MongoDB and

Hbase in action - Chapter 09: Deploying HBase

Introduction Big Data - BCIT School of Businessfaculty.bcitbusiness.ca/kevinw/4800/Lecture_Slides/...Hadoop/Spark; HBase/Cassandra BI Reporting OLAP & Dataware house Business Objects,

Accelerator Design for Big Data Processing Frameworksmatutani/papers/m... · Machine learning (Apache Spark MLlib) Serialization (Apache Thrift) KVS / Column DB (Redis, HBase) Document

Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API and the HBase API

final copy - DataPlatformGeeks · NoSQL Database Solutions (CosmosDB, MongoDB) Open Source Solutions (Spark / Storm / Sqoop / Hive / Pig / HBase etc) Azure Data Factory / Cloud Data

HBase Schema Design - HBase-Con 2012

HBaseとSparkでセンサーデータを有効活用 #hbasejp

Apache Spark streaming and HBase

hadoop developer - SevenMentor · 2021. 2. 17. · D. HBASE: Introduction to HBASE Basic Configurations of HBASE Fundamentals of HBase What is NoSQL? HBase Data Model Table and Row

CTBD X preparation - GitHub Pages · •Hive, HBase, Yarn •Futures, Promises, Actors •Spark •Spark streaming 2 Papers •MapReduce •GFS •Spark Exercises •MapReduce •Futures,

The DAP - Where YARN, HBase, Kafka and Spark go to Production

Chapter 1: Installing and Configuring Spark · Chapter 7: Structured Streaming with PySpark [ 60 ] [ 61 ] [ 62 ] ... Spark Drill Impala HBase Arrow Memory Parquet Cassandra Kudu Model

Big Data Analytics - 7. Resilient Distributed Datasets ...€¦ · Big Data Analytics 2. Apache Spark ApacheSparkStack Data platform: Distributedﬁlesystem/database I Ex: HDFS,HBase,Cassandra

Splice Machine: Architecture of an Open Source RDMS powered by HBase and Spark

Best Practices for Migrating Big Data Workloads to AWS · Hive, Pig, Spark SQL/Streaming/ML, Flink, Mahout, Sqoop, Phoenix Hbase Presto Batch MapReduce Interactive Tez In Memory Spark

Total Duration: 80 Hours (10 Days)€¦ · Pig, Hive, Impala, HBase, Sqoop, Flume, Oozie, Zookeeper, Spark and Storm. The course will also include Spark, along with hands-on integration

HBase + Hue - LA HBase User Group

Beyond MapReduce and Spark: CAP, HBase, and Hive · Beyond MapReduce and Spark: CAP, HBase, and Hive ... achieves scalability by using a weaker consistency model that does not ensure

Intro to HBase Internals & Schema Design (for HBase users)

MyLife with HBase or HBase three flavors

Apache Spark und HBase zur Speicherung und Analyse von ... · TDWI ROUNDTABLE RALTUNGEN tdwi.eu Freiburg. anuar Im Vortrag wurde neben der allgemeinen Arbeitsweise von HBase auch

Arista Universal Spine - NLNOG · Kudu, Druid, Prometheus, etc Kafka, ActiveMQ, ZeroMQ, RabbitMQ Spark, Storm, Heron, etc Kibana, Grafana Arista Telemetry HBase Kafka CloudVision

Hoya : HBase on YARN (2013-08-20 HBase Hug)

Huawei - CNews.ru · Вклад Huawei в Apache • No.1 ICT Company Certified Commercial Release • No.4 in Spark Contributions in 2015 • Open Source of Spark SQL on HBase •

How Hadoop Changes the Analytics Paradigm...Search Solr Model Machine Learning SAS, R, Spark, Mahout Serve NoSQL Database HBase Streaming Spark Streaming Unlimited Storage HDFS, HBase

Building a LINQ Provider for HBase MapReduce · 2019-04-30 · HBase/ Hadoop Building a LINQ Provider for HBase MapReduce Building a LINQ Provider for HBase MapReduce Summary HBase

Das Apache Hadoop Framework im industriellen Einsatz€¦ · Apache Hadoop (HDP, Spark, Hive, HBase) Maschinelles Lernen (TF, CNTK, Keras) Idee entstand durch Forschungsprojekt und

BIG DATA HADOOP FULLlBulk Loading in HBase lCreate, Insert, Read Tables in HBase lHBase Admin APIs l HBase Security lHBase vs Hive lBackup & Restore in HBase lApache HBase External

Modulhandbuch des berufsbegleitenden Studiengangs ...€¦ · Big Data in der Praxis: Lösungen mit Hadoop, Spark, HBase und Hive. Daten speichern, aufbereiten, visualisieren, Jonas

HBaseConEast2016: HBase and Spark, State of the Art