the glass half fullweb.fdi.ucm.es/posgrado/conferencias/zsoltistvan-slides.pdf · x100 [cidr05]...

32
The Glass Half Full Using Programmable Hardware Accelerators in Analytical Databases Zsolt István IMDEA Software Institute 1

Upload: others

Post on 21-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

  • The Glass Half Full Using Programmable Hardware Accelerators in

    Analytical Databases

    Zsolt IstvánIMDEA Software Institute

    1

  • IMDEA Software Institute

    • 16 Faculty in the areas of:• Program Analysis and Verification• Languages and Compilers• Security and Privacy• Theoretical Computer Science• Distributed Systems and Databases

    • ~10 Post-docs, ~25 PhD Students, ~10 Interns

    • Located in UPM Montegancedo Campus, Madrid

    • We are hiring! https://software.imdea.org/

  • ▪ OLAP – Online Analytical Processing▪ Large datasets – up to TBs

    ▪ Ad-hoc querying to extract insight, recurring reporting – Possibly complex operations

    ▪ Read-mostly workloads, updates in batches

    ▪ OLTP – Online Transaction Processing▪ Smaller datasets

    ▪ Queries known, relate to business actions

    ▪ Makes heavy use of indexes

    ▪ Reads and updates intermixed

    3

    Context: Analytical Databases

  • 4

    Databases were a 25 Billion $ market in 2018…

    Could we specialize machines to them?

    https://www.statista.com/statistics/810188/worldwide-commercial-database-market-size/

  • ▪ Fully custom machine for databases▪ Processors – special ISA microprocessors

    ▪ Memory – magnetic bubbles and CCDs

    ▪ Semiconductor technology and general purpose CPUs took over

    5

    Database Computer – ’70s

    “The first goal is to design it with the

    capability of handling a very large on-line

    database of 10^10 bytes or beyond since

    special-purpose machines are not likely to

    be cost-effective for small databases.”

    Jayanta Banerjee, David K. Hsiao, Krishnamurthi Kannan: DBC - A Database Computer for Very Large Databases.

    IEEE Trans. Computers 28(6): 414-429 (1979)

  • ▪ Based on VAX multi-processor system

    ▪ By the time the software and hardware were developed, CPUs have become much faster ▪ Couldn’t keep up with

    Moore’s law

    6

    Gamma Machine – ’80s

    David J. DeWitt, Robert H. Gerber, Goetz Graefe, Michael L. Heytens, Krishna B. Kumar, M. Muralikrishna: GAMMA - A High

    Performance Dataflow Database Machine. VLDB 1986: 228-237

  • 7

    Data/Compute Gap

    CPU Scaling Commodity in CloudSpecialized

    Hardware

    Revival

  • 8

    Renewed interest in Specialized Hardware

    ASICsFPGAsCPUs

  • Field Programmable Gate Array (FPGA)

    ▪ Free choice of architecture

    ▪ Fine-grained pipelining, communication, distributed memory

    ▪ Tradeoff: all “code” occupies chip space

    ▪ Evolving platform: larger chips, more heterogeneity

    9

    Re-programmable Specialized Hardware

    Op 1

    Op 2

    Op 3

  • 10

    Integration Options

    Accel.

    1) On the side 2) In data-path 3) Co-processor

    Data

    Data Data

    Accel.

    Accel.

  • ▪ Accelerator▪ Amazon F1

    ▪ In data path▪ Microsoft Catapult

    ▪ Co-processor▪ Intel Xeon+FPGA

    11

    In the Cloud Today

    Socket1 Socket2

    CPU FPGA

    Socket1

    CPU FPGACPU

    FPGA

    Intel Xeon+FPGA Gen.1 Intel Xeon+FPGA Gen.2

  • 12

    The Glass Half Empty…

  • ▪ 1) On the side acceleration introduces overhead

    ▪ Many related work offers no real speedup if we factor in data movement, transformation, software overhead… 13

    The Glass Half Empty…

    0

    20

    40

    60

    80

    100

    120

    Software With Acceleration

    Query execution time

    Compute Data Movement

    2xAccel.Data

  • ▪ 2) “All or nothing” behavior makes query planning difficult▪ Example: fixed capacity hash table on FPGA

    ▪ Constant time access for reads and writes

    ▪ What happens if data doesn’t fit?

    ▪ Can’t always know the number of keys aprioi

    14

    The Glass Half Empty…

    #

  • ▪ 3) Analytical databases becoming more optimized / not much compute in core SQL

    ▪ X100 [CIDR05] showed that

  • ▪ On the side acceleration introduces overhead

    ▪ “All or nothing” behavior makes query planning difficult

    ▪ Analytical databases becoming more optimized / not much compute in core SQL

    16

    The Glass Half Empty…

  • ▪ On the side acceleration introduces overhead

    ✓ Reduce data movement bottlenecks

    17

    The Glass Half Full…

  • ▪ IBEX: Database storage engine with processing offload▪ Filter and pre-aggregate for analytic workloads

    18

    Processing in data path: Smart Flash

    Database Server

    IBEX

    SSD

    IBEX – An Intelligent Storage Engine with Support for Advanced SQL Off-loading. L. Woods, Z. Istvan

    and G. Alonso, VLDB’14

    → Larger bandwidth, more IOPS (Samsung YourSQL, MIT BlueDBM)

    ▪ Opportunity to extend SSDs/Flash with complex offload

    Samsung “smart” SSD

  • 19

    Processing in data path: Distributed Processing

    Wo

    rke

    rs (

    Com

    pu

    te)

    Sto

    rag

    e

    + Provisioning

    + Scalability

    Caribou: Distributed

    storage with processing

    • Specialized HW nodes

    • 10Gbps access

    • 25W power cons.

    Zsolt István, David Sidler, Gustavo Alonso: Caribou: Intelligent Distributed Storage. PVLDB 10(11), 2017.

  • 20

    Smart Storage in Databases: Filter push-down

    Intel Hyperscan library (Xeon E5-2680 v2)

    2.8x

    SELECT … FROM customer

    WHERE age2

    AND address LIKE “%PO. Box 123%”

    ▪ Challenge: guarantee that filtering never slows down retrieval

    ▪ Algorithms can be re-imagined to become bandwidth-bound instead of compute-bound▪ Extend the state of the art: parameterization without re-programming [FCCM16]

    ▪ Many options: Regular expressions, comparisons, decompression, …

    [FCCM16] Runtime Parameterizable Regular Expression Operators for Databases. Zs. Istvan, D. Sidler, G. Alonso. FCCM’16

  • ✓ Reduce data movement bottlenecks

    ▪ “All or nothing” behavior makes query planning difficult

    ✓ Hybrid processing

    21

    The Glass Half Full…

  • ▪ Group-by: Compute aggregate function over categories▪ select avg(salary) from employees group by department

    22

    IBEX’s Hybrid Group-by

    CPUIbex with SW-only Group-By

    Projection Selection Group-byFinal

    Groups

    Input tableFiltered

    data

  • ▪ Group-by: Compute aggregate function over categories▪ select avg(salary) from employees group by department

    23

    IBEX’s Hybrid Group-by

    CPUIbex with HW-only Group-By

    Projection Selection Group-byFinal

    Groups

    Input tableFiltered

    data

  • CPUIbex with HW-only Group-By

    Projection Selection Group-byFinal

    Groups

    Input tableFiltered

    data

    ▪ Group-by: Compute aggregate function over categories▪ select avg(salary) from employees group by department

    ▪ If number of groups does not fit on FPGA? ▪ Send partial aggregates – finalize in SW

    ▪ Worst case: same as no acceleration

    ▪ Best- case: All in HW!

    24

    IBEX’s Hybrid Group-by

    CPUIbex with Hybrid Group-by

    Input table Projection Selection Group-by Group-byFinal

    Groups

    Filtered data

    Partial Group

    s

    Challenge: How to split across accelerator and software?

  • ✓ Reduce data movement bottlenecks

    ✓ Hybrid Processing

    ▪ Analytical databases becoming more optimized / not much compute in core SQL

    ✓ Emerging compute-intensive workloads

    25

    The Glass Half Full

  • ▪ Databases adopting new ways of analyzing the data▪ SAP Hana, Oracle, SQL Server, etc.

    ▪ Specialized hardware can help both with model building [Kara18], inference [Owaida18]

    ▪ Benefits for “classical” algorithms as well

    [Kara18] Kara et al: ColumnML: Column-Store Machine Learning with On-The-Fly Data Transformation. PVLDB 12(4): 348-361 (2018)

    [Owaida18] Owaida et al: Application Partitioning on FPGA Clusters: Inference over Decision Tree Ensembles. FPL 2018: 295-30026

    The Rise of Machine Learning

  • CPUFPGA Co-processor

    27

    doppioDB: a hybrid database engine

    Database

    Engine

    (MonetDB)

    Hardware

    OperatorSoftware

    operator

    Software

    operator

    Hardware

    Operator

    Hardware

    Operator

    ▪ Goal: extend the capabilities of analytical databases▪ FPGA works on the same data as software (cache-coherent access)

    ▪ Can combine SW and HW operators inside the same query

    ▪ Challenge: ensure high utilization of FPGA, use in many queries

    DRAM (DB Tables)

    No data copy,

    transformation,

    partitioning, etc.

    Hardware

    Operator

  • K-means – Algorithm

    ◼ Goal: partition unlabeled data into several

    clusters, where the number of clusters is

    the “k” in the k-means.

    ◼ Two steps in each iteration:

    ◼ Assignment: assign data points to

    closet centroid according to distance

    metric

    ◼ Centroid update: the centroids are re-

    calculated by averaging all the data

    points within each cluster

    ◼ Long process if the data set and number of

    iterations are large

    28

  • DRAM

    (DB Tables)

    Design – Execution Walk-Through

    Receives K-Means parameters1

    Fetch the initial centroids and

    the data

    2

    3 Calculates the distance between a data point and all the centroids

    and assign it to closest centroid

    4 Accumulates data points per cluster and counts how many data points are assigned to

    each cluster

    Collect partial results from each pipeline5

    Division for updating new centroid6

    Writes back the final results7

    1

    2

    3 4

    56

    7

    Zhenhao He, David Sidler, Zsolt István, Gustavo Alonso: A Flexible K-Means Operator for Hybrid Databases. FPL 201829

  • 30

    Uses of Parallelism

    K is known /

    Centroids

    known

    Need to determine K

    (Elbow method)

    ▪ K-Means algorithm▪ FPGA outperforms several cores of the CPU

    ▪ Can use parallelism in two ways – cover more queries

  • ▪ Text: Regular expression matching, Edit distance, …

    ▪ Database ops.: Skyline queries, Group-by aggregations, …

    ▪ Statistics: Histograms, Count-min sketch, Bloom filters, …

    ▪ Machine learning: Clustering (K-means), Stochastic Gradient Descent, Decision Trees, …

    ▪ Data management: Hash tables, hash functions, …

    ▪ [Your algorithm here]

    31

    Wide range of algorithms can benefit from hardware

  • ✓ Reduce data movement bottlenecks

    ✓ Hybrid Processing

    ✓ Emerging compute-intensive workloads

    32

    The Glass Half Full…

    Future Challenges…

    ▪ Managing Programmable Hardware accelerators▪ Is this the job of the OS or does the DB has to take control?

    ▪ How to share programmable hardware across tenants

    ▪ Compilation/synthesis of hardware accelerators▪ Can we derive accelerators from user queries?

    ▪ Intermediary DSL or building blocks we could use?

    For more details, see: The Glass Half Full: Using Programmable Hardware Accelerators in Analytics. Z. István. IEEE Data Engineering

    Bulletin, March 2019.