getting started with apache ignite sql · •ignite sql basics: dml, ddl, connectivity,...

Getting Started With Apache Ignite SQL

Denis Magda, GridGain Developer RelationsIgor Seliverstov, GridGain Architecture Group

Topics

• Ignite SQL Basics: DML, DDL, connectivity, configuration• Affinity Co-Location and Distributed JOINs• Beyond Memory Capacity: Disk Tier Usage and Memory Quotas• Ignite SQL Evolution With Apache Calcite

Ignite SQL Basics

Ignite SQL = ANSI SQL at Scale

• ANSI-99 DML and DDL syntax– SELECT, UPDATE, CREATE…

• Distributed joins, grouping, sorting

• Schema changes in runtime– ALTER TABLE, CREATE/DROP INDEX

• Works with in-memory and disk-only records– If Ignite Persistence is used as a disk tier

Connectivity Options

• Thick Client APIs– Java, C#/.NET, C++

• JDBC and ODBC drivers

• Thin Client APIs– Multi-language support

Configuration Option #1: Programmatically With Annotations

Usage Scenario:• Spring-style development by annotating POJOs• DDL can be used to apply changes in runtime.

Configuration Option #2: Spring XML With Query Entities

Usage Scenario:• Ignite as a cache that writes-through

changes to an external database.

• DDL can be used to apply changes in runtime.

Configuration Option #3: In Pure SQL With DDL

Usage Scenario:• SQL-driven applications• Green-field applications using Ignite as a

database with its native persistence

Demo Time

Cluster Startup and Database Creation

Affinity Co-Location andDistributed JOINs

Ignite SQL Engine Internals

Data & Indexes

Ignite SQL

H2 Engine

Data & Indexes

Ignite SQL

H2 Engine

Data & Indexes

Ignite SQL

H2 Engine

Query Execution Phases

City

City

Thick Client

Map

Map

Map

Reduce

Reduce

Default Data Distribution

Canada

Toronto

Calgary

Paris

France

MarseilleMontreal

Ottawa

Country Table

City Table

SQL JOIN With Data Shuffling

Thick Client

Canada

Toronto

CalgaryParis

France

MarseilleOttawa

Montreal

ParisOttawaMontreal

1 & 4

2

2

3

1. Initiating Execution2. Execution on Servers (map phase)3. Data Shuffling4. Reduce Phase

Co-Located Distribution (aka. Affinity Co-Location)

Canada

Toronto

Calgary

France

Marseille

Country Table

City Table

MontrealOttawa Paris

All You Need is to Configure Affinity Key

Affinity Key to Node Mapping Process

Affinity Key Partition

Application Process Network Call

Node

CityRecord

High-Performance SQL JOIN

Thick Client

Canada

Toronto

Calgary

France

Marseille

1 & 3

2

2

1. Initiating Execution2. Execution on Servers (map phase)3. Reduce Phase

Ottawa

Paris

Demo Time

Queries With JOINs

Beyond Memory Capacity:Disk-Tier and Memory Quotas

Multi-Tier Storage architecture

1. In-Memory - General in-memory caching, high-performance computing

2. In-Memory + Native Persistence - Ignite as an in-memory database

3. In-Memory + External Database - Acceleration of services and APIs with write-through and write-behind capability

Multi-Tier Storage Architecture

Index Page(root)

Index page(inner)

Inner page 2

Leaf page 2Index Page(leaf)

Leaf page 3

Data page Index Page(root) Leaf page Metadata page Leaf page Data page Metadata page

Memory segment

Key-Value

Key-Value

Key-Value

Key-Value

Data page

DataIndex

Multi-Tier Storage Architecture

Data page #0 Data page #1

Partition file with Data

Data page #2 Data page #3 Data page #4 Data page #5 Data page #6

Data page #5 Inner page Leaf page Metadata page Leaf page Data page #2 Metadata page

Memory segment

PageIdPages map Pointer in a memory segment

(read/write ops)

Position in a file (load page/checkpoint)

Java off-heap vs Java heap

SQL query processing

ParsingPlanning

Computing(filters, joins, expressions)

Scanning(index or table scan)

Heap

Heap

More Heap

Off-heap

Java off-heap vs Java heap

Sorting

Renaming

Aggregation

Projection

Join

Filtering

ScanningCOUNTRYCITY

σcode in (‘CAN’, ‘FRA’)

⋈country.code = city.countrycode

country.name, city.nameℱMAX(city.population)

τmax_pop

πcountry.name, city.name, city.population

ρname, name0, max_pop

Here we need full set in heap

Here we need full set in heap too

Query memory quotas

How to configure:

Interim results offloading

Sorting

Renaming

Aggregation

Projection

Join

Filtering

ScanningCOUNTRYCITY

σcode in (‘CAN’, ‘FRA’)

⋈country.code = city.countrycode

country.name, city.nameℱMAX(city.population)

τmax_pop

πcountry.name, city.name, city.population

ρname, name0, max_pop

Why don’t you flush result sets to disk?

And it

Intermediate results offloading

How to configure:

When you need quotas/offloading enabled

● Sorting (ORDER BY)

● Grouping (DISTINCT, GROUP BY)

● Complex subqueries

Demo Time

Running SQL Over Disk-Only Records

Apache Ignite SQL EvolutionWith Apache Calcite

Why do we need it?

Here we need Map-Reduce phase

Here we need Map-Reduce phase too

Typical execution flow

User

Parser

Optimizer modeRule-based optimizer

Cost-based optimizer

Dictionary

Row source generator Execution

SQL Query

RBO CBOStatistics

Result

Query plan

ValidatorValidation

Schema

Query AST

Apache Calcite

JDBC Client

JDBC Server

SQL Parser/Validator

Query optimizer

3rd party ops 3rd party ops

Metadata SPI

Plugable rules

3rd party data

3rd party data

Optional

Core

Pluggable

Need to implement:

● Splitter

● Runtime

● Indexes support

● DML support

● DDL support

Query Parser and Transformer

Select

expr=p.id, d.name

from=person, dep

cond=p.depId = d.id AND (p.id > 10 OR p.id < 10000)

order=p.name DESC

offset=10

dep(d)person(p)

σp.id > 10 OR p.id < 10000

⋈p.depId = d.id

τp.name DESC

πp.id, p.name

Query AST

Relational operators tree (query plan)

Cost-Based Optimizer

Estimator Dictionary

Plan generator

Query transformer

Query AST (from parser)

Relational tree

Relational tree + costs

Statistics

Equivalent relational

tree

Query plan (to Row source generator)

Rules

Cost-Based Splitter

dep(d)person(p)

σp.id > 10 OR p.id < 10000

⋈p.depId = d.id

τp.name DESC

πp.id, p.name

Root

Reactive Execution Flow

Scan Filter Sender Receiver Client cursor

Push Push Send PushData flow

Request Request Acknowledge Request

Node buffer Node buffer Node buffer Node buffer

Backpressure

Network communication

Demo Time

Calcite Prototype Demo With Sub-Queries

Learn More

• Apache Ignite SQL– https://apacheignite-sql.readme.io/docs

• Memory Quotas (available in GridGain Community Edition):

– https://www.gridgain.com/docs/latest/developers-guide/memory-configuration/memory-quotas

• Demos shown in this webinar– https://github.com/GridGain-Demos/ignite-sql-intro-samples

• New Apache Calcite-based engine– https://cwiki.apache.org/confluence/display/IGNITE/IEP-37%3A+New+

query+execution+engine

https://apacheignite-sql.readme.io/docs

https://www.gridgain.com/docs/latest/developers-guide/memory-configuration/memory-quotas

https://www.gridgain.com/docs/latest/developers-guide/memory-configuration/memory-quotas

https://github.com/GridGain-Demos/ignite-sql-intro-samples

https://cwiki.apache.org/confluence/display/IGNITE/IEP-37%3A+New+query+execution+engine

https://cwiki.apache.org/confluence/display/IGNITE/IEP-37%3A+New+query+execution+engine

getting started with apache ignite sql · •ignite sql basics: dml, ddl, connectivity,...

Documents