presto summit sf 2019 - starburst data€¦ · presto summit sf 2019 martin traverso, dain...
TRANSCRIPT
Presto Summit SF 2019Martin Traverso, Dain Sundstrom, David Phillips
Presto Software Foundation“An independent, non-profit organization with the mission of supporting a community of passionate users and developers devoted to the advancement of the Presto distributed SQL query engine for big data.”
“It is dedicated to preserving the vision of high quality, performant, and dependable software.”
“Ensuring the project remains open, collaborative and independent for decades to come”
Presto Community
• Github: https://github.com/prestosql
• Website: https://prestosql.io
• Blog: https://prestosql.io/blog
• Twitter: @prestosql
• Slack: prestosql.slack.com
Since the Launch…• Launched on January 31, 2019
• 16 releases (~1 per week)
• 1300+ commits
• 200k lines changed
• 650+ pull requests closed
• 50+ contributors
• 170 weekly active members on Slack
Contributors
kokosing
raunaqmorarka
pgagnonMiguelWeezardo
MarvinCai
Praveen2112
chancez
hustnn
kasiafi
sopel39
stagraqubole
yui-knk
Yaliang
dain
11xor6
Lewuathe
garvit-gupta
VicoWu
qqibrow
findepi
pettyjamesm
martint
electrum
vincentpoon
wyukawa
guyco33
bill-warshaw vkorukanti
anusudarsandilipkasana
sshardool linxingyuan1102
luohao
zhenxiao
rzeyde-varada
takezoe
kabunchiryanrupp
ilfrinChethanUK
ebyhrxumingming
Recent Improvements(since the launch)
ORC Performance
ORC Performance
Semijoin
Performance• S3 network bandwidth/latency for Parquet and ORC
• ZSTD and LZ4 for ORC/Parquet
• Skip redundant ORDER BY
• ORDER BY + LIMIT with OUTER JOIN
• IN (SELECT DISTINCT …)
• JOIN involving coercions and inline tables
• Spilling
• Coming soon: UNNEST improvements
• … and more
ROW subscript operator
WITH t(r) AS ( VALUES ROW(ROW(1, 'a')), ROW(ROW(2, 'b'))) SELECT r[1], r[2] FROM t
r :: row(? smallint, ? varchar(1))
Access field by ordinal
Visualize plan structure
Clearer subplanschema
SELECT max(totalprice) FROM ( SELECT totalprice FROM orders ORDER BY orderkey)
Warn on redundant ORDER BY
Pushdown•Limit
•TableSample
•Filter (simple range predicates)
•Projection (column and ROW field dereference)
•Coming soon
•Generalized projections and filters
•Aggregation
•Join
https://github.com/prestosql/presto/issues/18
New Plugins
• Elasticsearch connector
• Apache Phoenix connector
• Apache Ranger
• https://cwiki.apache.org/confluence/display/RANGER/Presto+Plugin
Other Improvements• Docker image
• Spill-to-disk improvements
• CLI output formats
• UUID type and functions
• format(), combinations() functions
• ORC bloom filters (non-legacy)
• Connector-provided view definitions
• More type mappings for various connectors
• … and more!
• FETCH FIRST … WITH TIES syntax
• OFFSET syntax
• COMMENT ON <table> IS …
• [LEFT/RIGHT/FULL] JOIN LATERAL (…) ON …
• Pass-through security (client provided credentials)
• Kerberos security improvements
• Role-based security
• Secure query results in client API
• Current user security mode for views
Roadmap
Roadmap
• Dynamic
• Real world priorities and requirements
• What volunteers work on
• Not a wish list
• https://github.com/prestosql/presto/labels/roadmap
Core Engine• Case-sensitive identifiers
• Timestamp semantics
• Dynamic filtering
• Dynamically-resolved functions
• SQL-defined functions (CREATE FUNCTION)
• Operator fusion and late materialization
Connectors
• Iceberg (in progress)
• Kinesis (in progress)
• Druid
• Pinot
• Clickhouse
Infrastructure
• Coordinator High Availability
• Spot instances
• Kubernetes
Getting Involved• Join Slack
• https://prestosql.io/community.html
• #troubleshooting channel
• File issues/bugs:
• https://github.com/prestosql/presto
• Write blog posts
• https://prestosql.io/blog