solving low latency query over big data with spark sql-(julien pierre, microsoft)

Client Data Fluency

Office

Modern Data Capability

Instrumentation & Ingestion

Processing & Storage

Reporting & Analytics

Information Management

Mobile-First Analytics Experience

Experimentation

Data Size

Query Latency

Get results inline in Zeppelin

Need to open the results in Excel

0 20 40 60 80 100 120 140 160 180 200

Cosmos

SparkSQL

SparkSQL with Cache

Write and Compile Query Submit and Wait in Job Queue Job Run Time

Mesos Cluster/HDFS

Job Manager Zookeeper

Job Frontend Web API

Spark Driver Host Pool

Spark Hive Thrift Server Zeppelin Server

Avocado (Hive Query + Schedule Task)

Rover (Drag & Drop BI tool with Hive Code

Zeppelin Web UI

MetastoreDB Hive Loader

Cosmos Storage

Partition 1

Partition 2

Partition n

Export Cosmos Partition

Partition 1

Partition 2

Partition n

Task 2

HDFS.copyFromLocalFile

Task n

Partition 1

Partition 2

Partition n

saveAsParquetFile

Task 2...

Task n

MetastoreDB

Hive Thrift Server

Hive Loader

Zeppelin Server

UserQueryQuery

Data Ingest

Services

Clients

Transform Compute

Data Streams

Data Sets

Event Processing

HDFS Data Transportation

Spark Streaming Receiver

Analyst

Zeppelin Notebooks

Avocado

Simple query

Query language

“Analyze”

“Debug”

“Mine”

“Glance”

Unified platform Intelligence Interactive

analytics Data

Products

Better Digital

Experiences

Dual users

“Bing”

“Office”

solving low latency query over big data with spark sql-(julien pierre, microsoft)

Data & Analytics

making analytics viable in enterprises: potential routes...

julien ppt

clipper: a low-latency online prediction serving system:...

low latency execution for apache spark

ngk spark pÚü6s resistor type spark plugs spark plugs...

2013 12 02 sparrow - eecs at uc...

real-world adc interrupt latency...preempt-rt latency...

cpu sizing vs. latency analysis fts edr latency simulation

high -fidelity latency measurements in low -latency...

ultra-low latency switches xg2000 series - fujitsu ·...

julien didier robin jean jean-luc julien blondel poli

sparrow distributed low-latency spark scheduling kay...

saint pierre-julien eymard - adoperp.fr...saint...

running spark on kubernetes: julien dumazer t, co-founder

latency aware elastic streaming for estimating online...

drizzle—low latency execution for apache spark: spark...

sparrow: distributed, low latency...

improving python and spark performance and ... · •...

julien dupré

network measurement & aaa Œ overview of my previous...