data, data, data - meetup intro to datastores.pdf · data, data, data... a developer’s...

22
2013-10-03 Data, Data, Data... A Developer’s Introduction To Open Source Data Stores Thursday, October 3, 13

Upload: others

Post on 19-Jul-2020

39 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data, Data, Data - Meetup Intro To Datastores.pdf · Data, Data, Data... A Developer’s Introduction To Open Source Data Stores ... perspective will bias you towards existing solutions

2013-10-03

Data, Data, Data... A Developer’s Introduction To Open Source Data Stores

Thursday, October 3, 13

Page 2: Data, Data, Data - Meetup Intro To Datastores.pdf · Data, Data, Data... A Developer’s Introduction To Open Source Data Stores ... perspective will bias you towards existing solutions

Presentation Goals

✤ Questions, not answers

✤ Programmers, not data architects

✤ Proof of concepts, not production deployments

Thursday, October 3, 13

As an introductory presentation, I hope you come out of this with more questions than answers. This is only the tip of information to help point you in a direction to start your research.

The scope is to show programmers what kind of data stores are available so they can write code and develop proof of concepts quickly. This presentation isn’t about how to design the next Facebook. We’re going to skip a lot of the production side of things. Not going to talk about hardware, but how you as a programmer interfaces with a data store.

Page 3: Data, Data, Data - Meetup Intro To Datastores.pdf · Data, Data, Data... A Developer’s Introduction To Open Source Data Stores ... perspective will bias you towards existing solutions

Overview

✤ Understanding your data

✤ Database Choices

Thursday, October 3, 13

Page 4: Data, Data, Data - Meetup Intro To Datastores.pdf · Data, Data, Data... A Developer’s Introduction To Open Source Data Stores ... perspective will bias you towards existing solutions

Understanding Your Data

✤ What is your data?

✤ How does your data relate to each other?

✤ How do you plan on querying your data?

✤ Understanding at a business application level

Thursday, October 3, 13

What is your data? Is it an address book? Video game scores? Are you tracking what people have bought online vs. retail stores.

How does the data relate to each other? An address book is pretty self contained. But a Facebook profile has links to other profiles.

How do you plan on querying the data? Are you just going to tell the database a key to retrieve? Are you going to aggregate the data all together?

It’s important to understand how data is used from a non-technical perspective. Can you explain it to your parents? Can you draw it on a white board? Thinking in a technical perspective will bias you towards existing solutions that may not be the right tool for the job.

Page 5: Data, Data, Data - Meetup Intro To Datastores.pdf · Data, Data, Data... A Developer’s Introduction To Open Source Data Stores ... perspective will bias you towards existing solutions

Common Data Models

✤ Relational

✤ Document

✤ Key Value

Thursday, October 3, 13

These are the three common groupings of data that we will cover in this presentation. There are a couple of other formats that we won’t be going over, such as graph databases, BigTable such as Hadoop/HBase/Cassandra because they are hybrids of the above with more specific use cases.

Page 6: Data, Data, Data - Meetup Intro To Datastores.pdf · Data, Data, Data... A Developer’s Introduction To Open Source Data Stores ... perspective will bias you towards existing solutions

Relational

✤ Relational Algebra and set theory, not data that’s related

✤ Think Venn Diagrams, not spreadsheets

XThursday, October 3, 13

The first data model we will cover is relational. It’s generally the most popular one because it is the most mature of all the models.

It’s important to remember that relational databases are based on Relational Algebra, and does not mean data that is related to each other. Relational algebra is set theory, and can be best visualized using Venn diagrams.

Relational model forces you to first think about your data in order to set it up. After it has been set up, it is generally very easy to execute arbitrary queries.

Page 7: Data, Data, Data - Meetup Intro To Datastores.pdf · Data, Data, Data... A Developer’s Introduction To Open Source Data Stores ... perspective will bias you towards existing solutions

Relational Example

✤ Enterprise Resource Planning (ERP)

Inventory

Manufacturing Sales Orders

Customers

Thursday, October 3, 13

Additional Topics: Accounting, reporting, shipping, archives

Focus is on linking many different data sources in a common way. You have to structure everything before hand, so when the data comes in, you can be sure that any future query will work.

This is where the common “database” started out as. The big businesses using computers in the 1990s, so there was a lot of people with experience, and the technology matured.

Page 8: Data, Data, Data - Meetup Intro To Datastores.pdf · Data, Data, Data... A Developer’s Introduction To Open Source Data Stores ... perspective will bias you towards existing solutions

Document

✤ Best example: File System

✤ No predefined structure

Thursday, October 3, 13

The second data model is the document store. The best example of this model type is a file system. The file name is your key, the actual data on disk is the document. What’s inside the file can be anything. There’s no predefined structure. Word files contain text, Excel files contain spreadsheets.

Page 9: Data, Data, Data - Meetup Intro To Datastores.pdf · Data, Data, Data... A Developer’s Introduction To Open Source Data Stores ... perspective will bias you towards existing solutions

Document Example

✤ Address Book

{ "nationalID": "123-456-7890", "firstName": "John", "lastName": "Smith", "age": 25, "address": { "streetAddress": "21 2nd Street", "city": "New York", "state": "NY", "postalCode": 10021 }, "phoneNumbers": [ { "type": "home", "number": "212 555-1234" }, { "type": "fax", "number": "646 555-4567" } ]}

Thursday, October 3, 13

Most document stores use JSON to make it easy to interface with many different languages. An address book entry is an example of document stores.

While the relational model can do this, the document model allows us to add an infinite amount of attributes. Let’s say we want to list who this person is married to. And for another person, we want to list their pets. If we were to do this with the relational model, we would need to set up different tables and set up some complex joins. But this is an address book. Rows generally are not related, bad term, but really makes it easy to understand, to each other. You pull out John Smith, you only care about all his details.

This provides a lot of flexibility if you don’t know ahead of time what you’re going to store for each person. But with flexibility, there’s a price. Since there’s no structure, there’s nothing to enforce. If there were no laws, there is no guilty or innocent. So if we accidentally typed in “brown” for the Age, the database doesn’t care, because there’s no structure to say you can’t do that.

Page 10: Data, Data, Data - Meetup Intro To Datastores.pdf · Data, Data, Data... A Developer’s Introduction To Open Source Data Stores ... perspective will bias you towards existing solutions

Key Value

✤ Python’s dictionary

✤ Simple and Fast

✤ Used with other data stores

Thursday, October 3, 13

The Key Value model is fairly simple. It’s just like Python’s dictionary structure, so we won’t get into too much detail on this one.

Because it’s simple and fast, it compliments other data stores very nicely.

Page 11: Data, Data, Data - Meetup Intro To Datastores.pdf · Data, Data, Data... A Developer’s Introduction To Open Source Data Stores ... perspective will bias you towards existing solutions

Key Value Example

✤ Persistent session data

User

Load Balancer

Node 1 Node 2

Thursday, October 3, 13

Two use cases for key value stores are storing temporary and aggregated data. For example, when hosting your website in the cloud such as Heroku, a user hits the load balancer, but may end up on different web servers. Using a key value database, you can store variables that will persist for the life of the user’s session.

Page 12: Data, Data, Data - Meetup Intro To Datastores.pdf · Data, Data, Data... A Developer’s Introduction To Open Source Data Stores ... perspective will bias you towards existing solutions

Key Value Example

✤ Top Scores

User Score

X 100

Y 86

Z 42

Thursday, October 3, 13

Another use case is pre-aggregated data. By caching top scores, you don’t need to query the database every time someone wants to see what the top scores are.

Page 13: Data, Data, Data - Meetup Intro To Datastores.pdf · Data, Data, Data... A Developer’s Introduction To Open Source Data Stores ... perspective will bias you towards existing solutions

Choices

✤ MySQL

✤ PostgreSQL

✤ MongoDB

✤ CouchDB

✤ Redis

✤ Riak

✤ ...

Thursday, October 3, 13

So now that we understand our data, which database should we use? There is MySQL, PG, MongoDB, CouchDB, Redis, Riak, etc. etc. Which technology is the best?

Page 14: Data, Data, Data - Meetup Intro To Datastores.pdf · Data, Data, Data... A Developer’s Introduction To Open Source Data Stores ... perspective will bias you towards existing solutions

Choices

✤All of the above

Thursday, October 3, 13

All of them.

Data stores are the core of your business so it’s very important to use the right tool for the job. The whole reason for this NoSQL movement was because everyone was using a relational database to store and query all kinds of data. This is what we call “the wrong tool for the job”.

Page 15: Data, Data, Data - Meetup Intro To Datastores.pdf · Data, Data, Data... A Developer’s Introduction To Open Source Data Stores ... perspective will bias you towards existing solutions

Proper Tools

Thursday, October 3, 13

For example, we can use a screwdriver or an iPhone as a hammer to drive in nails. It could work well enough, but there are clearly better things we can use. Just like using a scooter instead of a car.

Page 16: Data, Data, Data - Meetup Intro To Datastores.pdf · Data, Data, Data... A Developer’s Introduction To Open Source Data Stores ... perspective will bias you towards existing solutions

Proper Tools

Thursday, October 3, 13

Use the scooter to go to work, use a car to bring the family out for shopping.

So we want to use the right tool for the job, and in many cases, that means having more than one tool.

Page 17: Data, Data, Data - Meetup Intro To Datastores.pdf · Data, Data, Data... A Developer’s Introduction To Open Source Data Stores ... perspective will bias you towards existing solutions

Combination

System of Record (SOR)

RDBMS

Product InformationDocument Store

Most Popular ItemsKey Value Store

Thursday, October 3, 13

We can use a relational database such as MySQL or PostgreSQL for the back end processing of sales orders, invoices, product inventory. Since each product item has different attributes, we can store product information in a document database such as MongoDB or CouchDB. People love to see lists, so we can use a Key Value database like Redis or Riak as a way to keep simple tables that are easy to generate.

Page 18: Data, Data, Data - Meetup Intro To Datastores.pdf · Data, Data, Data... A Developer’s Introduction To Open Source Data Stores ... perspective will bias you towards existing solutions

Shopping Page

Thursday, October 3, 13

So we see a product page from Amazon for the Kindle. Each product Amazon sells is vastly different. They all have different attributes. Some have sizes, some have colors. A document database handles this very well.

Then we have the “Add to cart”. Once we get it in the shopping system, all those attributes are generally irrelevant. The shopping cart doesn’t care what size the Kindle is, it just cares that you wanted one.

Then over in the Accessories, we have have a dynamically generated list that gets populated for the most popular accessories people buy when they buy a Kindle.

This is collaboration!

Page 19: Data, Data, Data - Meetup Intro To Datastores.pdf · Data, Data, Data... A Developer’s Introduction To Open Source Data Stores ... perspective will bias you towards existing solutions

Technical Considerations

✤ Data stores are relatively similar within each data class

✤ Switching between data stores in the same class is generally easy

✤ Switching between classes generally keeps busy

Thursday, October 3, 13

Technically, most of the data stores within each data model are the same. MySQL and PostgreSQL have very similar feature set, similar to switching between Python and Ruby. The key concepts are there, but the syntax vary slightly enough.

But switching between the classes is much more difficult. The concepts on how things are stored are very different. Similar to comparing a scripting language to machine language. Scripting languages focuses on business logic, while machine language focuses on CPU registers and memory addresses.

If you’re able to start off using the right type of data store from the beginning, you can more easily switch when requirements change.

Page 20: Data, Data, Data - Meetup Intro To Datastores.pdf · Data, Data, Data... A Developer’s Introduction To Open Source Data Stores ... perspective will bias you towards existing solutions

Non-Technical Considerations

✤ Licensing

✤ Expertise

✤ Support

Thursday, October 3, 13

And then there are the non-technical considerations that have a lot more influence on which technology a project will use.

For example, we have different licensing where MySQL uses GPL license, where if you distribute it, you need to release the source code, and PostgreSQL uses a license similar to the BSD license where you can do whatever you want, no one cares.

Another consideration is expertise and experience. Just because a technology may be the best fit for your needs, if you don’t know how to use it, you won’t get any gain from it. PostgreSQL has a lot of features that would be really great to use. But it’s a lot harder to find PostgreSQL developers than MySQL. In a start up environment, you may not have the ability to hand pick your dream team. Someone who knows MySQL can probably figure out a viable workaround to get the features you need from PostgreSQL.

Another thing you want to consider is what kind of support options are available. Are there commercial providers who you can pay to fix the problem? Are they local, speak the language? What about community support?

Page 21: Data, Data, Data - Meetup Intro To Datastores.pdf · Data, Data, Data... A Developer’s Introduction To Open Source Data Stores ... perspective will bias you towards existing solutions

Conclusions

✤ Understand your data

✤ Right tool for the job

Thursday, October 3, 13

Page 22: Data, Data, Data - Meetup Intro To Datastores.pdf · Data, Data, Data... A Developer’s Introduction To Open Source Data Stores ... perspective will bias you towards existing solutions

謝謝 for listening!

✤ Will Fong

✤ http://digitaldev.com/will

✤ Freenode IRC: seekwill

Thursday, October 3, 13