ignite data science toolkit - gridgain systems · 2018. 9. 30. · distributed sql jdbc odbc sql...

15
© 2017 GridGain Systems, Inc. In-Memory Hammer for Your Data Science Toolkit Apache Ignite Akmal Chaudhri Technology Evangelist GridGain

Upload: others

Post on 29-Sep-2020

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ignite data science toolkit - GridGain Systems · 2018. 9. 30. · Distributed SQL JDBC ODBC SQL API Java .NET C++ BI SELECT, UPDATE, INSERT, MERGE, DELETE, CREATE and ALTER DDL,

© 2017 GridGain Systems, Inc.

In-Memory Hammer for Your Data Science Toolkit Apache Ignite

Akmal Chaudhri Technology Evangelist GridGain

Page 2: ignite data science toolkit - GridGain Systems · 2018. 9. 30. · Distributed SQL JDBC ODBC SQL API Java .NET C++ BI SELECT, UPDATE, INSERT, MERGE, DELETE, CREATE and ALTER DDL,

© 2017 GridGain Systems, Inc.

• Apache Ignite Overview • Use Cases

• Data Science Toolkit Box • Data Grid • Durable Memory • Distributed SQL • Compute Grid • Machine Learning Grid (Beta)

• Q&A

Agenda

Page 3: ignite data science toolkit - GridGain Systems · 2018. 9. 30. · Distributed SQL JDBC ODBC SQL API Java .NET C++ BI SELECT, UPDATE, INSERT, MERGE, DELETE, CREATE and ALTER DDL,

© 2017 GridGain Systems, Inc.

Apache Ignite In-Memory Computing Platform

Memory-Centric Storage

Ignite Native Persistence (Flash, SSD, Intel 3D XPoint)

Third-Party Persistence (RDBMS, HDFS, NoSQL)

SQL Transactions Compute Services MLStreaming

Applications

Key/Value

IoTFinancial Services

Pharma & Healthcare

E-CommerceTravel & Logistics

Telco

Page 4: ignite data science toolkit - GridGain Systems · 2018. 9. 30. · Distributed SQL JDBC ODBC SQL API Java .NET C++ BI SELECT, UPDATE, INSERT, MERGE, DELETE, CREATE and ALTER DDL,

© 2017 GridGain Systems, Inc.

Apache Ignite Use Cases

FinTech

Financial Services Software Logistics & Travel

E-commerce

Telco

IoT

Pharma & HealthcareAdtech

Page 5: ignite data science toolkit - GridGain Systems · 2018. 9. 30. · Distributed SQL JDBC ODBC SQL API Java .NET C++ BI SELECT, UPDATE, INSERT, MERGE, DELETE, CREATE and ALTER DDL,

© 2017 GridGain Systems, Inc.

e-Therapeutics provides a computer-based drug discovery platform and a specialized approach to network biology.

Problem • Analysis of a network of proteins influencing a disease and

drugs discovery could be measured in weeks • Could not parallelize existing algorithms

Apache Ignite Solution • 80x speed increase over the non-parallelized environment • Analysis projects completion in hours and minutes • Computational resources for abandoned research projects

- Drug Discovery and Network Biology

Cache & Compute

API

e-Therapeutics Platform

100x Cluster Nodes 5x Physical Nodes

Server Nodes

Clients Nodes

Page 6: ignite data science toolkit - GridGain Systems · 2018. 9. 30. · Distributed SQL JDBC ODBC SQL API Java .NET C++ BI SELECT, UPDATE, INSERT, MERGE, DELETE, CREATE and ALTER DDL,

© 2017 GridGain Systems, Inc.

Data Grid

JCache Transactions Compute SQL

Server Node

Distributed Key-Value Store

Dynamic Scaling

Distributed partitioned hash map

ACID TransactionJCache & SQL

Server Node Server Node

DURABLE MEMORY DURABLE MEMORY DURABLE MEMORYRDBMS

NoSQLHDFS

3rd party storage caching

Page 7: ignite data science toolkit - GridGain Systems · 2018. 9. 30. · Distributed SQL JDBC ODBC SQL API Java .NET C++ BI SELECT, UPDATE, INSERT, MERGE, DELETE, CREATE and ALTER DDL,

© 2017 GridGain Systems, Inc.

Durable Memory

Off-heap Removes noticeable GC pauses

Automatic Defragmentation

Stores Superset of Data

Predictable memory consumption

Fully Transactional (Write-Ahead Log)

DURABLE MEMORY DURABLE MEMORY DURABLE MEMORY

Server Node Server Node Server Node

Ignite Cluster

Page 8: ignite data science toolkit - GridGain Systems · 2018. 9. 30. · Distributed SQL JDBC ODBC SQL API Java .NET C++ BI SELECT, UPDATE, INSERT, MERGE, DELETE, CREATE and ALTER DDL,

© 2017 GridGain Systems, Inc.

Ignite Native Persistence

1. Update

RAM

2. Persist

Write-Ahead Log

Partition File 13. Ack

4. Checkpointing

Partition File N

Server Node

Page 9: ignite data science toolkit - GridGain Systems · 2018. 9. 30. · Distributed SQL JDBC ODBC SQL API Java .NET C++ BI SELECT, UPDATE, INSERT, MERGE, DELETE, CREATE and ALTER DDL,

© 2017 GridGain Systems, Inc.

Distributed SQL

JDBC ODBC SQL API

Java .NET C++ BI

SELECT, UPDATE, INSERT, MERGE, DELETE, CREATE and ALTER

DDL, DML Support

Cross-platform Compatibility

Indexes in RAM or Disk

Dynamic Scaling

Server Node Server NodeServer Node

Apache Ignite Cluster

DURABLE MEMORY DURABLE MEMORY DURABLE MEMORY

Tools

Page 10: ignite data science toolkit - GridGain Systems · 2018. 9. 30. · Distributed SQL JDBC ODBC SQL API Java .NET C++ BI SELECT, UPDATE, INSERT, MERGE, DELETE, CREATE and ALTER DDL,

© 2017 GridGain Systems, Inc.

1. Initial Query 2. Query execution over local data 3. Reduce multiple results in one

1. Initial Query 2. Query execution (local + remote data) 3. Potential data movement 4. Reduce multiple results in one

2

2

1

Collocated Joins Non-Collocated Joins

Server Node

ON-DISK

Server Node

ON-DISK

Client Node

3

2

2

1

Server Node

ON-DISK

Server Node

ON-DISK

Client Node

4

3

Page 11: ignite data science toolkit - GridGain Systems · 2018. 9. 30. · Distributed SQL JDBC ODBC SQL API Java .NET C++ BI SELECT, UPDATE, INSERT, MERGE, DELETE, CREATE and ALTER DDL,

© 2017 GridGain Systems, Inc.

Compute Grid

DURABLE MEMORY

DURABLE MEMORY

Ignite Cluster

C1

R1

C2

R2

C = C1 + C2

R = R1 + R2

C = Compute

R = Resultin T/2 time

Automatic Failover

Load Balancing

Zero Deployment

Page 12: ignite data science toolkit - GridGain Systems · 2018. 9. 30. · Distributed SQL JDBC ODBC SQL API Java .NET C++ BI SELECT, UPDATE, INSERT, MERGE, DELETE, CREATE and ALTER DDL,

© 2017 GridGain Systems, Inc.

1. Initial Request 2. Fetch data from remote nodes 3. Process entire data-set

1. Initial Request 2. Co-located processing with data 3. Reduce multiple results in one

3

1Data 12

2 Data 2

2

2

1Client Node

Client-Server Processing Co-located Processing

Server Node

ON-DISK

Server Node

ON-DISK

Server Node

ON-DISK

Server Node

ON-DISK

Client Node

3

Page 13: ignite data science toolkit - GridGain Systems · 2018. 9. 30. · Distributed SQL JDBC ODBC SQL API Java .NET C++ BI SELECT, UPDATE, INSERT, MERGE, DELETE, CREATE and ALTER DDL,

© 2017 GridGain Systems, Inc.

Genetic Algorithms Grid

DURABLE MEMORY

DURABLE MEMORY

Ignite Cluster

F2, C2, M2

F = F1 + F2

C = C1 + C2

Collocated Computation

Biological Evolution Simulation

Chromosome and Genes Cluster

M = M1 + M2

F1, C1, M1

F = Fitness Calculation C = Crossover M = Mutation

Page 14: ignite data science toolkit - GridGain Systems · 2018. 9. 30. · Distributed SQL JDBC ODBC SQL API Java .NET C++ BI SELECT, UPDATE, INSERT, MERGE, DELETE, CREATE and ALTER DDL,

© 2017 GridGain Systems, Inc.

Machine Learning Grid

K-Means Regressions Decision Trees

R C++ Python Java

Server Node Server NodeServer Node

Distributed Core Algebra

DURABLE MEMORY DURABLE MEMORY DURABLE MEMORY

Scala REST

Random ForestDistributed Algorithms

Dense and Sparse Algebra

Large scale parallelization

Multi-Language Support

No ETL

Page 15: ignite data science toolkit - GridGain Systems · 2018. 9. 30. · Distributed SQL JDBC ODBC SQL API Java .NET C++ BI SELECT, UPDATE, INSERT, MERGE, DELETE, CREATE and ALTER DDL,

© 2017 GridGain Systems, Inc.

Thank you for joining us. Follow the conversation.http://ignite.apache.org

Any Questions?

#apacheignite #denismagda