getting started with apache ignite sql · •ignite sql basics: dml, ddl, connectivity,...
TRANSCRIPT
Getting Started With Apache Ignite SQL
Denis Magda, GridGain Developer RelationsIgor Seliverstov, GridGain Architecture Group
Topics
• Ignite SQL Basics: DML, DDL, connectivity, configuration• Affinity Co-Location and Distributed JOINs• Beyond Memory Capacity: Disk Tier Usage and Memory Quotas• Ignite SQL Evolution With Apache Calcite
Ignite SQL Basics
Ignite SQL = ANSI SQL at Scale
• ANSI-99 DML and DDL syntax– SELECT, UPDATE, CREATE…
• Distributed joins, grouping, sorting
• Schema changes in runtime– ALTER TABLE, CREATE/DROP INDEX
• Works with in-memory and disk-only records– If Ignite Persistence is used as a disk tier
Connectivity Options
• Thick Client APIs– Java, C#/.NET, C++
• JDBC and ODBC drivers
• Thin Client APIs– Multi-language support
Configuration Option #1: Programmatically With Annotations
Usage Scenario:• Spring-style development by annotating POJOs• DDL can be used to apply changes in runtime.
Configuration Option #2: Spring XML With Query Entities
Usage Scenario:• Ignite as a cache that writes-through
changes to an external database.
• DDL can be used to apply changes in runtime.
Configuration Option #3: In Pure SQL With DDL
Usage Scenario:• SQL-driven applications• Green-field applications using Ignite as a
database with its native persistence
Demo Time
Cluster Startup and Database Creation
Affinity Co-Location andDistributed JOINs
Ignite SQL Engine Internals
Data & Indexes
Ignite SQL
H2 Engine
Data & Indexes
Ignite SQL
H2 Engine
Data & Indexes
Ignite SQL
H2 Engine
Query Execution Phases
City
City
Thick Client
Map
Map
Map
Reduce
Reduce
Default Data Distribution
Canada
Toronto
Calgary
Paris
France
MarseilleMontreal
Ottawa
Country Table
City Table
SQL JOIN With Data Shuffling
Thick Client
Canada
Toronto
CalgaryParis
France
MarseilleOttawa
Montreal
ParisOttawaMontreal
1 & 4
2
2
3
1. Initiating Execution2. Execution on Servers (map phase)3. Data Shuffling4. Reduce Phase
Co-Located Distribution (aka. Affinity Co-Location)
Canada
Toronto
Calgary
France
Marseille
Country Table
City Table
MontrealOttawa Paris
All You Need is to Configure Affinity Key
Affinity Key to Node Mapping Process
Affinity Key Partition
Application Process Network Call
Node
CityRecord
High-Performance SQL JOIN
Thick Client
Canada
Toronto
Calgary
France
Marseille
1 & 3
2
2
1. Initiating Execution2. Execution on Servers (map phase)3. Reduce Phase
Ottawa
Paris
Demo Time
Queries With JOINs
Beyond Memory Capacity:Disk-Tier and Memory Quotas
Multi-Tier Storage architecture
1. In-Memory - General in-memory caching, high-performance computing
2. In-Memory + Native Persistence - Ignite as an in-memory database
3. In-Memory + External Database - Acceleration of services and APIs with write-through and write-behind capability
Multi-Tier Storage Architecture
Index Page(root)
Index page(inner)
Inner page 2
Leaf page 2Index Page(leaf)
Leaf page 3
Data page Index Page(root) Leaf page Metadata page Leaf page Data page Metadata page
Memory segment
Key-Value
Key-Value
Key-Value
Key-Value
Data page
DataIndex
Multi-Tier Storage Architecture
Data page #0 Data page #1
Partition file with Data
Data page #2 Data page #3 Data page #4 Data page #5 Data page #6
Data page #5 Inner page Leaf page Metadata page Leaf page Data page #2 Metadata page
Memory segment
PageIdPages map Pointer in a memory segment
(read/write ops)
Position in a file (load page/checkpoint)
Java off-heap vs Java heap
SQL query processing
ParsingPlanning
Computing(filters, joins, expressions)
Scanning(index or table scan)
Heap
Heap
More Heap
Off-heap
Java off-heap vs Java heap
Sorting
Renaming
Aggregation
Projection
Join
Filtering
ScanningCOUNTRYCITY
σcode in (‘CAN’, ‘FRA’)
⋈country.code = city.countrycode
country.name, city.nameℱMAX(city.population)
τmax_pop
πcountry.name, city.name, city.population
ρname, name0, max_pop
Here we need full set in heap
Here we need full set in heap too
Query memory quotas
How to configure:
Interim results offloading
Sorting
Renaming
Aggregation
Projection
Join
Filtering
ScanningCOUNTRYCITY
σcode in (‘CAN’, ‘FRA’)
⋈country.code = city.countrycode
country.name, city.nameℱMAX(city.population)
τmax_pop
πcountry.name, city.name, city.population
ρname, name0, max_pop
Why don’t you flush result sets to disk?
And it
Intermediate results offloading
How to configure:
When you need quotas/offloading enabled
● Sorting (ORDER BY)
● Grouping (DISTINCT, GROUP BY)
● Complex subqueries
Demo Time
Running SQL Over Disk-Only Records
Apache Ignite SQL EvolutionWith Apache Calcite
Why do we need it?
Here we need Map-Reduce phase
Here we need Map-Reduce phase too
Typical execution flow
User
Parser
Optimizer modeRule-based optimizer
Cost-based optimizer
Dictionary
Row source generator Execution
SQL Query
RBO CBOStatistics
Result
Query plan
ValidatorValidation
Schema
Query AST
Apache Calcite
JDBC Client
JDBC Server
SQL Parser/Validator
Query optimizer
3rd party ops 3rd party ops
Metadata SPI
Plugable rules
3rd party data
3rd party data
Optional
Core
Pluggable
Need to implement:
● Splitter
● Runtime
● Indexes support
● DML support
● DDL support
Query Parser and Transformer
Select
expr=p.id, d.name
from=person, dep
cond=p.depId = d.id AND (p.id > 10 OR p.id < 10000)
order=p.name DESC
offset=10
dep(d)person(p)
σp.id > 10 OR p.id < 10000
⋈p.depId = d.id
τp.name DESC
πp.id, p.name
Query AST
Relational operators tree (query plan)
Cost-Based Optimizer
Estimator Dictionary
Plan generator
Query transformer
Query AST (from parser)
Relational tree
Relational tree + costs
Statistics
Equivalent relational
tree
Query plan (to Row source generator)
Rules
Cost-Based Splitter
dep(d)person(p)
σp.id > 10 OR p.id < 10000
⋈p.depId = d.id
τp.name DESC
πp.id, p.name
Root
Reactive Execution Flow
Scan Filter Sender Receiver Client cursor
Push Push Send PushData flow
Request Request Acknowledge Request
Node buffer Node buffer Node buffer Node buffer
Backpressure
Network communication
Demo Time
Calcite Prototype Demo With Sub-Queries
Learn More
• Apache Ignite SQL– https://apacheignite-sql.readme.io/docs
• Memory Quotas (available in GridGain Community Edition):
– https://www.gridgain.com/docs/latest/developers-guide/memory-configuration/memory-quotas
• Demos shown in this webinar– https://github.com/GridGain-Demos/ignite-sql-intro-samples
• New Apache Calcite-based engine– https://cwiki.apache.org/confluence/display/IGNITE/IEP-37%3A+New+
query+execution+engine