kinetica master chug_9.12

16
Kinetica – Industry’s Fastest Analytics Database 1

Upload: chicago-hadoop-users-group

Post on 21-Jan-2018

55 views

Category:

Technology


1 download

TRANSCRIPT

Kinetica – Industry’s Fastest Analytics Database 1

AboutMe

• EngineeringBackground- AppDev• OpensourceContributor• Hadoop– 10years• HWXPrincipleSolutionEngineer• Director,SolutionsEngineering@Kinetica

• Kinetica LocalContactInformation• Sunile Manjee,DirectorSolutionsEngineering,[email protected]• PhilZacharia,DirectorCentralRegion,[email protected]

2

The image part with relationship ID rId2 was not found in the file.

WhatisKinetica?

3

PatentedInMemory

ColumnarDistributed

GPUAcceleratedDatabase

The image part with relationship ID rId2 was not found in the file.

DevelopedtoIdentifyTerroristicThreatsinReal-Time

4

Kinetica incubated as a massively parallel computational engine for US Army INSCOM

Ingests 200+ sources of streaming data –mobile devices, drones, social media, cyber data

200B new records per hour

Incorporates geospatial and temporal data

Real-time, actionable threat intelligence

First high-performance database leveraging GPUs4

The image part with relationship ID rId2 was not found in the file.

WhoisKinetica?20

09

‘HPC Research Project’ incubated by US military

2010

2011

Patent # US8373710 B1 issued to GPUdb

2012

US Army deploys GPUdb

2013

GPUdb commercially available

2014

IDC HPC innovation excellence award

Army

GPUdb goes into production

at USPS

2015

Iron Net selects GPUdb for Cyber

Defense

2015

PG&E selects GPUdbfor electric grid

analysis

IDC HPC innovation excellence award

USPS

2016

Rebrand to

The image part with relationship ID rId3 was not found in the file.

4

2012

Confidential Information

Confidential Information6

Current Data Architectures Can’t Keep Up | Complex, Rigid, Agility

Challenges• Infrastructure complexity, costs – stitch together multiple

tools – separate tools for BI, ML, OLAP cubes, databases• High Latency – can’t handle big data’s volume, variety,

velocity• Data needs to be pre-aggregated and transformed to cubes• Processing is batch and not real-time

• Rigid – can’t handle changing requirements, changing data• Dashboard slowness pains

• Datamarts in Tableau, caching, very complex query

• Difficult to simultaneously ingest and analyze at scale• Limited Agility – admin overhead, resources, skills

Tableau

EDW(Teradata, Oracle)

Star schema – facts & dimensions

DATA

3rd partyERP, CRM, SFA Databases Flat files

MSTR SAS

Data Integration (INFA, Talend)

Others

Hadoop(Horton,

Cloudera)

DATA MARTS

OLAP CUBES INDICES SUMMARY

Tables

NiFi, Kafka

Confidential Information7

Kinetica Database | Real-Time, Flexible, Simple Data and Analytics

Tableau

EDW(Teradata, Oracle)

DATA

3rd partyERP, CRM, SFA Databases Flat files

MSTR SAS

Data Integration (INFA, Talend)

Others

Hadoop(HDP, CDH,

MapR)

Kinetica

NiFi, Kafka

Solution• Low Latency – millisecond response time• Real-time at scale – simultaneously ingest and analyze• Full data provisioning – ingest, manage, analyze, visualize• Flexible – handle changing requirements, changing data,

minimize aggregates, indexes, cubes• Simplicity – minimize admin overhead, resources, skillsPlus• Converge AI and BI• Location-based Analytics• Deploy on commodity hardware on-prem, cloud

The image part with relationship ID rId2 was not found in the file.

Confidential Information

Kinetica : Unique Strengths & Capabilities

Fast,Distributed,In-MemoryAnalyticsEngineforFastMoving,LargeScaleData

KineticaisdesignedtotakeadvantageoftheparallelprocessingnatureoftheGPU.Itdeliverslow-latency,highperformanceanalyticsonlargedatasets,andmakesstreamingdataavailableforqueryinreal-time.

8

OLAP Performance,

Scalability, Stability

Geospatial Processing & Visualization

API for GPU Powered Data &

Compute Orchestration

ConvergedAIandBI

UserDefinedFunctions(UDFs)andorchestrationofdatainadistributedmannerenableKineticatoofferlow-levelcustomizationsformachinelearningandAIworkloads

NativeGeospatial&VisualizationPipeline

Nativevisualizationpipelinemakesiteasiertoworkwithlargegeospatialdatasets.IdealforIoT use-cases,andpoweringgeospatialapplications

SonicLayer(Fast/TrueRealtime

Analytics)

HistoricandPredictiveInsights

InteractiveLocation-BasedAnalytics

c

c

9

CUDA

SELECTa*x+y FROMTABLE

SQL

Python

importgpudb

h_db = gpudb.GPUdb(encoding ='BINARY',host = '127.0.0.1', port = '9191’)

response=h_db.get_records_by_column(’TABLE',["(a*x+y)"],0,10,'json',{})

Make/Build

Cuda Abstraction,SaxPy Example

https://devblogs.nvidia.com/parallelforall/easy-introduction-cuda-c-and-c/

Confidential Information

Kinetica | Reference Architecture

KineticaArchitecture

7

VISUALIZATIONviaODBC/JDBCAPIs

JavaAPI

JavaScriptAPI

RESTAPI

C++API

Node.jsAPI

PythonAPI

OPENSOURCEINTEGRATION

ApacheNiFi

ApacheKafka

ApacheSpark

ApacheStorm

GEOSPATIALCAPABILITIESGeometricObjects

Tracks

GeospatialEndpoints

WMS

WKT

KINETICA CLUSTEROnDemandScale

CommodityHardwareW/GPU’s

Disk

A1 B1 C1

A2 B2 C2

A3 B3 C3

A4 B4 C4

ColumnarIn-memory

HTTPHeadNode

CommodityHardwareW/GPU’s

Disk

A1 B1 C1

A2 B2 C2

A3 B3 C3

A4 B4 C4

ColumnarIn-memory

HTTPHeadNode

CommodityHardwareW/GPU’s

Disk

A1 B1 C1

A2 B2 C2

A3 B3 C3

A4 B4 C4

ColumnarIn-memory

HTTPHeadNode

CommodityHardwareW/GPU’s

Disk

A1 B1 C1

A2 B2 C2

A3 B3 C3

A4 B4 C4

ColumnarIn-memory

HTTPHeadNode

OTHERINTEGRATION

MessageQueues

ETLTools

StreamingTools

• Reliable,AvailableandScalable• Diskbasedpersistence• Addnodesondemand• Datareplicationforhighavailability• Scaleupand/orout

• Performance• GPUAccelerated(1000’sCoresperGPU)• IngestBillionsofrecordsinminutes• Ultralowlatencyqueryperformance

• MassiveDataSizes• 100’sofTerabytesScale• Billionsofentries

• Connectors• ODBC/JDBC• RestfulEndpoints• RichAPI’s• StandardGeospatialCapabilities

• RunAnywhere• Onpremise,Amazon,Azure,GoogleCloud,

Nimbix,SoftLayer• HardwarePartners

• IBM,Dell,Cisco,HP

The image part with relationship ID rId2 was not found in the file.

CoreDesign&Architecture

12

GPU

SHARD

Chunk

Logical Node

Chunk

Chunk

SHARD

Chunk

Chunk

Chunk

SHARD

Chunk

Chunk

Chunk

SHARD

Chunk

Chunk

Chunk

SHARD

Chunk

Chunk

SHARD

Chunk

Chunk

SHARD

Chunk

Chunk

SHARD

Chunk

Chunk

GPU

Logical Node

GPU

SHARD

Chunk

Logical Node

CPU Socket

Chunk

Chunk

SHARD

Chunk

Chunk

Chunk

SHARD

Chunk

Chunk

Chunk

SHARD

Chunk

Chunk

Chunk

SHARD

Chunk

Chunk

SHARD

Chunk

Chunk

SHARD

Chunk

Chunk

SHARD

Chunk

Chunk

GPU

Logical Node

System Memory (RAM)

ChunkChunk ChunkChunk ChunkChunk Chunk Chunk

Table:Column:Data

Table:Column:Data

Table:Column:Data

Table:Column:Data

Table:Column:Data

Table:Column:Data

Table:Column:Data

Table:Column:Data

Table:Column:Data

Table:Column:Data

Table:Column:Data

Table:Column:Data

Table:Column:Data

Table:Column:Data

Table:Column:Data

Table:Column:Data

Table:Column:Data

Table:Column:Data

Table:Column:Data

Table:Column:Data

Table:Column:Data

Table:Column:Data

Table:Column:Data

Table:Column:Data

Map to Persist

CPU Socket

Confidential Information

The image part with relationship ID rId2 was not found in the file.

KineticaUDF

13

GPUSHARD

Chunk

Logical Node

CPU Socket

Chunk

Chunk

SHARD

Chunk

Chunk

Chunk

SHARD

Chunk

Chunk

Chunk

SHARD

Chunk

Chunk

Chunk

SHARD

Chunk

Chunk

SHARD

Chunk

Chunk

SHARD

Chunk

Chunk

SHARD

Chunk

Chunk

Logical NodeGPU

SHARD

Chunk

Logical Node

CPU Socket

Chunk

Chunk

SHARD

Chunk

Chunk

Chunk

SHARD

Chunk

Chunk

Chunk

SHARD

Chunk

Chunk

Chunk

SHARD

Chunk

Chunk

SHARD

Chunk

Chunk

SHARD

Chunk

Chunk

SHARD

Chunk

Chunk

Logical NodeSystem Memory (RAM)

ChunkChunk ChunkChunk ChunkChunk Chunk Chunk

The image part with relationship ID rId3 was not found in the file.

The image part with relationship ID rId3 was not found in the file.

The image part with relationship ID rId3 was not found in the file.

The image part with relationship ID rId3 was not found in the file.

The image part with relationship ID rId3 was not found in the file.

GPU GPU

The image part with relationship ID rId3 was not found in the file.

The image part with relationship ID rId3 was not found in the file.

The image part with relationship ID rId3 was not found in the file.

The image part with relationship ID rId3 was not found in the file.

The image part with relationship ID rId3 was not found in the file.

The image part with relationship ID rId3 was not found in the file.

The image part with relationship ID rId3 was not found in the file.

The image part with relationship ID rId3 was not found in the file.

The image part with relationship ID rId3 was not found in the file.

The image part with relationship ID rId3 was not found in the file.

The image part with relationship ID rId3 was not found in the file.

The image part with relationship ID rId3 was not found in the file.

The image part with relationship ID rId3 was not found in the file.

The image part with relationship ID rId3 was not found in the file.

The image part with relationship ID rId3 was not found in the file.

The image part with relationship ID rId3 was not found in the file.

Confidential Information

The image part with relationship ID rId2 was not found in the file.

CPUBound"RealTime”Architectures

14

Data Stream

Buy/Add More Nodes

Concurrent Ingest & Analytics

Confidential Information

The image part with relationship ID rId2 was not found in the file.

KineticaRealTimeAnalyticsArchitecture

15

Data Stream

Concurrent Ingest & Analytics

GPU

Confidential Information

The image part with relationship ID rId2 was not found in the file.

Demo 16