intro cassandra - meetupfiles.meetup.com/16806932/bda_meetup5-introduction... · cassandra was...
TRANSCRIPT
![Page 2: Intro Cassandra - Meetupfiles.meetup.com/16806932/BDA_Meetup5-Introduction... · Cassandra was designed as a fast, reliable and scalable operational data store. Hadoop was designed](https://reader034.vdocuments.site/reader034/viewer/2022042220/5ec5e6068570db798767196e/html5/thumbnails/2.jpg)
Who am I and what do I do?
• Alex Lourie
• Worked at Red Hat, Datastax and now Instaclustr
• We currently manage x10s nodes for various customers, who do various things with it.
![Page 3: Intro Cassandra - Meetupfiles.meetup.com/16806932/BDA_Meetup5-Introduction... · Cassandra was designed as a fast, reliable and scalable operational data store. Hadoop was designed](https://reader034.vdocuments.site/reader034/viewer/2022042220/5ec5e6068570db798767196e/html5/thumbnails/3.jpg)
Objectives
• A quick history of Databases
• Introducing Cassandra
• Why use Cassandra?
• Introducing Instaclustr
• Demo?
![Page 4: Intro Cassandra - Meetupfiles.meetup.com/16806932/BDA_Meetup5-Introduction... · Cassandra was designed as a fast, reliable and scalable operational data store. Hadoop was designed](https://reader034.vdocuments.site/reader034/viewer/2022042220/5ec5e6068570db798767196e/html5/thumbnails/4.jpg)
Start Demo
• It takes a while, so let’s do it.
![Page 5: Intro Cassandra - Meetupfiles.meetup.com/16806932/BDA_Meetup5-Introduction... · Cassandra was designed as a fast, reliable and scalable operational data store. Hadoop was designed](https://reader034.vdocuments.site/reader034/viewer/2022042220/5ec5e6068570db798767196e/html5/thumbnails/5.jpg)
1980 - Stand Alone and Mainframes
1990 - 2005 Networked Computing
2005+ Real Time Web and Big Data
![Page 6: Intro Cassandra - Meetupfiles.meetup.com/16806932/BDA_Meetup5-Introduction... · Cassandra was designed as a fast, reliable and scalable operational data store. Hadoop was designed](https://reader034.vdocuments.site/reader034/viewer/2022042220/5ec5e6068570db798767196e/html5/thumbnails/6.jpg)
End 90’s - Early 2000’s
• What happens when you have more data than could fit on a single server?
![Page 7: Intro Cassandra - Meetupfiles.meetup.com/16806932/BDA_Meetup5-Introduction... · Cassandra was designed as a fast, reliable and scalable operational data store. Hadoop was designed](https://reader034.vdocuments.site/reader034/viewer/2022042220/5ec5e6068570db798767196e/html5/thumbnails/7.jpg)
Throw money away at the problem
![Page 8: Intro Cassandra - Meetupfiles.meetup.com/16806932/BDA_Meetup5-Introduction... · Cassandra was designed as a fast, reliable and scalable operational data store. Hadoop was designed](https://reader034.vdocuments.site/reader034/viewer/2022042220/5ec5e6068570db798767196e/html5/thumbnails/8.jpg)
Lets try a little computer science instead
• BigTable (2006) - 1 Key: Lots of values, Fast sequential access
• Dynamo (2007) - Reliable, Performant, Always On,
• Cassandra (2008) - Dynamo Architecture, BigTable data model and storage (NoSQL)
![Page 9: Intro Cassandra - Meetupfiles.meetup.com/16806932/BDA_Meetup5-Introduction... · Cassandra was designed as a fast, reliable and scalable operational data store. Hadoop was designed](https://reader034.vdocuments.site/reader034/viewer/2022042220/5ec5e6068570db798767196e/html5/thumbnails/9.jpg)
What is NoSQL?
• Key/Value (data) store
• Non-relational, distributed and horizontally scalable
• Schema-free, easy replication support, simple API, eventually consistent
• Supports huge amounts of data
![Page 10: Intro Cassandra - Meetupfiles.meetup.com/16806932/BDA_Meetup5-Introduction... · Cassandra was designed as a fast, reliable and scalable operational data store. Hadoop was designed](https://reader034.vdocuments.site/reader034/viewer/2022042220/5ec5e6068570db798767196e/html5/thumbnails/10.jpg)
Cassandra story
• Started at Facebook
• Released as an Open Source project
• Datastax formed
![Page 11: Intro Cassandra - Meetupfiles.meetup.com/16806932/BDA_Meetup5-Introduction... · Cassandra was designed as a fast, reliable and scalable operational data store. Hadoop was designed](https://reader034.vdocuments.site/reader034/viewer/2022042220/5ec5e6068570db798767196e/html5/thumbnails/11.jpg)
Cassandra• Massively scalable, partitioned row store
• Masterless architecture
• Linear scale performance
• No single points of failure
• Read/Write support across multiple data centers & cloud availability zones.
![Page 12: Intro Cassandra - Meetupfiles.meetup.com/16806932/BDA_Meetup5-Introduction... · Cassandra was designed as a fast, reliable and scalable operational data store. Hadoop was designed](https://reader034.vdocuments.site/reader034/viewer/2022042220/5ec5e6068570db798767196e/html5/thumbnails/12.jpg)
Why use C*?
• You need to support tens of thousands to tens of millions operations per second
• You need to store and access terabytes to petabytes of data;
• You need fast (less than 5-10 millisecond) response time to database operations; and
• You need a service with no downtime.
![Page 13: Intro Cassandra - Meetupfiles.meetup.com/16806932/BDA_Meetup5-Introduction... · Cassandra was designed as a fast, reliable and scalable operational data store. Hadoop was designed](https://reader034.vdocuments.site/reader034/viewer/2022042220/5ec5e6068570db798767196e/html5/thumbnails/13.jpg)
What are the benefits to this approach• Linear scalability
• High Availability
• Use commodity hardware
![Page 14: Intro Cassandra - Meetupfiles.meetup.com/16806932/BDA_Meetup5-Introduction... · Cassandra was designed as a fast, reliable and scalable operational data store. Hadoop was designed](https://reader034.vdocuments.site/reader034/viewer/2022042220/5ec5e6068570db798767196e/html5/thumbnails/14.jpg)
Linear scalability
![Page 15: Intro Cassandra - Meetupfiles.meetup.com/16806932/BDA_Meetup5-Introduction... · Cassandra was designed as a fast, reliable and scalable operational data store. Hadoop was designed](https://reader034.vdocuments.site/reader034/viewer/2022042220/5ec5e6068570db798767196e/html5/thumbnails/15.jpg)
High Availability?“During Hurricane Sandy, we lost an entire
data center. Completely. Lost. It. Our application fail-over resulted in us losing just
a few moments of serving requests for a particular region of the country, but our data
in Cassandra never went offline.”
Nathan Milford, Outbrain’s head of U.S. IT operations management
![Page 16: Intro Cassandra - Meetupfiles.meetup.com/16806932/BDA_Meetup5-Introduction... · Cassandra was designed as a fast, reliable and scalable operational data store. Hadoop was designed](https://reader034.vdocuments.site/reader034/viewer/2022042220/5ec5e6068570db798767196e/html5/thumbnails/16.jpg)
![Page 17: Intro Cassandra - Meetupfiles.meetup.com/16806932/BDA_Meetup5-Introduction... · Cassandra was designed as a fast, reliable and scalable operational data store. Hadoop was designed](https://reader034.vdocuments.site/reader034/viewer/2022042220/5ec5e6068570db798767196e/html5/thumbnails/17.jpg)
What are the benefits to this approach
![Page 18: Intro Cassandra - Meetupfiles.meetup.com/16806932/BDA_Meetup5-Introduction... · Cassandra was designed as a fast, reliable and scalable operational data store. Hadoop was designed](https://reader034.vdocuments.site/reader034/viewer/2022042220/5ec5e6068570db798767196e/html5/thumbnails/18.jpg)
How does it work ?0
4
28
![Page 19: Intro Cassandra - Meetupfiles.meetup.com/16806932/BDA_Meetup5-Introduction... · Cassandra was designed as a fast, reliable and scalable operational data store. Hadoop was designed](https://reader034.vdocuments.site/reader034/viewer/2022042220/5ec5e6068570db798767196e/html5/thumbnails/19.jpg)
One database, many servers
• All servers (nodes) participate in the cluster
• Need more capacity add more servers
• Multiple servers == built in redundancy
1
3
24
![Page 20: Intro Cassandra - Meetupfiles.meetup.com/16806932/BDA_Meetup5-Introduction... · Cassandra was designed as a fast, reliable and scalable operational data store. Hadoop was designed](https://reader034.vdocuments.site/reader034/viewer/2022042220/5ec5e6068570db798767196e/html5/thumbnails/20.jpg)
PartitioningName Age Postcode Gender
Alice 34 2000 F
Bob 26 2000 M
Eve 25 2004 F
Frank 41 2902 M
![Page 21: Intro Cassandra - Meetupfiles.meetup.com/16806932/BDA_Meetup5-Introduction... · Cassandra was designed as a fast, reliable and scalable operational data store. Hadoop was designed](https://reader034.vdocuments.site/reader034/viewer/2022042220/5ec5e6068570db798767196e/html5/thumbnails/21.jpg)
How does it work ?
client
consistentHash(“Alice”)
2
6
48
Replication Factor = 3
![Page 22: Intro Cassandra - Meetupfiles.meetup.com/16806932/BDA_Meetup5-Introduction... · Cassandra was designed as a fast, reliable and scalable operational data store. Hadoop was designed](https://reader034.vdocuments.site/reader034/viewer/2022042220/5ec5e6068570db798767196e/html5/thumbnails/22.jpg)
How do we keep data consistent ?
client
consistentHash(“Alice”)
0
4
28
CL.ONE
Write
Ack
![Page 23: Intro Cassandra - Meetupfiles.meetup.com/16806932/BDA_Meetup5-Introduction... · Cassandra was designed as a fast, reliable and scalable operational data store. Hadoop was designed](https://reader034.vdocuments.site/reader034/viewer/2022042220/5ec5e6068570db798767196e/html5/thumbnails/23.jpg)
How do we keep data consistent ?
client
consistentHash(“Alice”)
0
4
28
CL.ALL
Write
AckAck
Ack
![Page 24: Intro Cassandra - Meetupfiles.meetup.com/16806932/BDA_Meetup5-Introduction... · Cassandra was designed as a fast, reliable and scalable operational data store. Hadoop was designed](https://reader034.vdocuments.site/reader034/viewer/2022042220/5ec5e6068570db798767196e/html5/thumbnails/24.jpg)
How do we keep data consistent ?
client
consistentHash(“Alice”)
0
4
28
CL.QUORUM
Write
Ack
Ack
X
![Page 25: Intro Cassandra - Meetupfiles.meetup.com/16806932/BDA_Meetup5-Introduction... · Cassandra was designed as a fast, reliable and scalable operational data store. Hadoop was designed](https://reader034.vdocuments.site/reader034/viewer/2022042220/5ec5e6068570db798767196e/html5/thumbnails/25.jpg)
Also supports multi-dc replication
client
0
4
28
0
4
28
![Page 26: Intro Cassandra - Meetupfiles.meetup.com/16806932/BDA_Meetup5-Introduction... · Cassandra was designed as a fast, reliable and scalable operational data store. Hadoop was designed](https://reader034.vdocuments.site/reader034/viewer/2022042220/5ec5e6068570db798767196e/html5/thumbnails/26.jpg)
Add capacity1
5
37
client
consistentHash(“Alice”)
0
4
2
6
![Page 27: Intro Cassandra - Meetupfiles.meetup.com/16806932/BDA_Meetup5-Introduction... · Cassandra was designed as a fast, reliable and scalable operational data store. Hadoop was designed](https://reader034.vdocuments.site/reader034/viewer/2022042220/5ec5e6068570db798767196e/html5/thumbnails/27.jpg)
C* internals
• Logging data in the commit log
• Writing data to the memtable
• Flushing data from the memtable
• Storing data on disk in SSTables
• Compaction + Repairs
![Page 28: Intro Cassandra - Meetupfiles.meetup.com/16806932/BDA_Meetup5-Introduction... · Cassandra was designed as a fast, reliable and scalable operational data store. Hadoop was designed](https://reader034.vdocuments.site/reader034/viewer/2022042220/5ec5e6068570db798767196e/html5/thumbnails/28.jpg)
C* vs RDBMS
Cassandra scales well beyond relational databases and is more manageable for high-availability at scale. It is highly cost-effective compared to commercial relational databases.
But…Cassandra does not have the same analytical query capabilities (ie aggregations, joins) as a relational database.
![Page 29: Intro Cassandra - Meetupfiles.meetup.com/16806932/BDA_Meetup5-Introduction... · Cassandra was designed as a fast, reliable and scalable operational data store. Hadoop was designed](https://reader034.vdocuments.site/reader034/viewer/2022042220/5ec5e6068570db798767196e/html5/thumbnails/29.jpg)
C* vs Hadoop
Cassandra was designed as a fast, reliable and scalable operational data store. Hadoop was designed as a data store for vast amounts of data for batch analytic processing. Cassandra can provide faster operations and higher reliability that Hadoop.
![Page 30: Intro Cassandra - Meetupfiles.meetup.com/16806932/BDA_Meetup5-Introduction... · Cassandra was designed as a fast, reliable and scalable operational data store. Hadoop was designed](https://reader034.vdocuments.site/reader034/viewer/2022042220/5ec5e6068570db798767196e/html5/thumbnails/30.jpg)
C* vs MongoDB
MongoDB is not masterless making it harder to manage and imposing hard limits on it’s scalability. On the other hand, MongoDB does provide some additional flexibility in querying data.
![Page 31: Intro Cassandra - Meetupfiles.meetup.com/16806932/BDA_Meetup5-Introduction... · Cassandra was designed as a fast, reliable and scalable operational data store. Hadoop was designed](https://reader034.vdocuments.site/reader034/viewer/2022042220/5ec5e6068570db798767196e/html5/thumbnails/31.jpg)
Pitfalls
• CQL is not SQL!
• NoSQL is not RDBMS!
• Design your schema to match your data and usage patterns.
![Page 32: Intro Cassandra - Meetupfiles.meetup.com/16806932/BDA_Meetup5-Introduction... · Cassandra was designed as a fast, reliable and scalable operational data store. Hadoop was designed](https://reader034.vdocuments.site/reader034/viewer/2022042220/5ec5e6068570db798767196e/html5/thumbnails/32.jpg)
Instaclustr
Instaclustr is a company with extensive experience in designing, deploying and managing critical infrastructure for solutions that require immense scale.
![Page 33: Intro Cassandra - Meetupfiles.meetup.com/16806932/BDA_Meetup5-Introduction... · Cassandra was designed as a fast, reliable and scalable operational data store. Hadoop was designed](https://reader034.vdocuments.site/reader034/viewer/2022042220/5ec5e6068570db798767196e/html5/thumbnails/33.jpg)
Instaclustr
• Hosted and Managed Apache Cassandra, DSE, Apache Spark and other complimentary technologies
• We also deliver a wide range of related consulting and support services for these technologies.
![Page 34: Intro Cassandra - Meetupfiles.meetup.com/16806932/BDA_Meetup5-Introduction... · Cassandra was designed as a fast, reliable and scalable operational data store. Hadoop was designed](https://reader034.vdocuments.site/reader034/viewer/2022042220/5ec5e6068570db798767196e/html5/thumbnails/34.jpg)
Our tech
• AWS, Azure, IBM Softlayer, Heroku and others in development
• CoreOS as a common OS on all platforms.
• Docker containers for running applications.
![Page 35: Intro Cassandra - Meetupfiles.meetup.com/16806932/BDA_Meetup5-Introduction... · Cassandra was designed as a fast, reliable and scalable operational data store. Hadoop was designed](https://reader034.vdocuments.site/reader034/viewer/2022042220/5ec5e6068570db798767196e/html5/thumbnails/35.jpg)
Customers feedback
“Instaclustr provided us with a method for getting underway quickly with Cassandra and also delivered the support and expertise necessary to help us make our database as efficient as possible.”
Andre Barbosa, Head of Platform, Fling
![Page 36: Intro Cassandra - Meetupfiles.meetup.com/16806932/BDA_Meetup5-Introduction... · Cassandra was designed as a fast, reliable and scalable operational data store. Hadoop was designed](https://reader034.vdocuments.site/reader034/viewer/2022042220/5ec5e6068570db798767196e/html5/thumbnails/36.jpg)
Customers feedback
“Instaclustr has enabled us to get underway quickly, the support team have been there from the beginning helping us to get it right the first time with our schema and architecture.”
Richard Wilson, Co-Founder, Maths Pathway
![Page 37: Intro Cassandra - Meetupfiles.meetup.com/16806932/BDA_Meetup5-Introduction... · Cassandra was designed as a fast, reliable and scalable operational data store. Hadoop was designed](https://reader034.vdocuments.site/reader034/viewer/2022042220/5ec5e6068570db798767196e/html5/thumbnails/37.jpg)
Customers feedback
“Instaclustr is awesome”, -
Instaclustr
![Page 38: Intro Cassandra - Meetupfiles.meetup.com/16806932/BDA_Meetup5-Introduction... · Cassandra was designed as a fast, reliable and scalable operational data store. Hadoop was designed](https://reader034.vdocuments.site/reader034/viewer/2022042220/5ec5e6068570db798767196e/html5/thumbnails/38.jpg)
Check demo?
![Page 39: Intro Cassandra - Meetupfiles.meetup.com/16806932/BDA_Meetup5-Introduction... · Cassandra was designed as a fast, reliable and scalable operational data store. Hadoop was designed](https://reader034.vdocuments.site/reader034/viewer/2022042220/5ec5e6068570db798767196e/html5/thumbnails/39.jpg)
Questions?