discussion mysql&cassandra zhanggang 2012/11/22. optimize mysql

19
Discussion MySQL&Cassandra ZhangGang 2012/11/22

Upload: bryan-hampton

Post on 02-Jan-2016

225 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Discussion MySQL&Cassandra ZhangGang 2012/11/22. Optimize MySQL

DiscussionMySQL&Cassandra

ZhangGang

2012/11/22

Page 2: Discussion MySQL&Cassandra ZhangGang 2012/11/22. Optimize MySQL

Optimize MySQL

Page 3: Discussion MySQL&Cassandra ZhangGang 2012/11/22. Optimize MySQL

Optimize MySQL• Index in MySQL: the dump include the index.

– type_*_job has no index.– in_*_job has two indexes.

– key_*_* has three indexes.

Page 4: Discussion MySQL&Cassandra ZhangGang 2012/11/22. Optimize MySQL

Optimize MySQL• innodb_file_per_table [we can set it in my.cnf]

– default innodb_file_per_table = OFF. Store the whole tables and index in one big file named ibdata1.

– set innodb_file_per_table = ON. Can store each InnoDB table and its indexes in its own file.

– effect:

before now

Diskspace groupby ProcessingType 94s 80s

CPUTime group by Site (08-10) 86s 73s

CPUTime group by Site (10-12) 97s 81s

Page 5: Discussion MySQL&Cassandra ZhangGang 2012/11/22. Optimize MySQL

Optimize MySQL• innodb_buffer_pool_size:

– as I know from the web that 70-80% of memory is a safe bet. – My computer’s memory is 2GB, I set the

innodb_buffer_pool_size=1GB. But when run the script to communicate with MySQL, the computer becomes very slow and must restart it.

Page 6: Discussion MySQL&Cassandra ZhangGang 2012/11/22. Optimize MySQL

Learning Cassandra

Page 7: Discussion MySQL&Cassandra ZhangGang 2012/11/22. Optimize MySQL

Learning Cassandra

• RDBMS: use the ‘join’ operation, increase the normalization and reduce the redundancy.

• NoSQL: In contrast with the RDBMS, for getting a better performance and high scalability, get rig of ‘join’ operation, which means denormalizing the data and maintaining multiple copies of data(increase the redundancy).

And this is what Cassandra do.

Page 8: Discussion MySQL&Cassandra ZhangGang 2012/11/22. Optimize MySQL

Learning Cassandra• column-oriented? row-oriented?

– Cassandra is based on Dynamo and BigTable. So it is not incorrect to say it is column-oriented. But each row has a unique key, which makes its data accessible, so it may be more helpful to think of it as an indexed, row-oriented store.

– Cassandra stores data in a multidimensional hash table. That means you don’t have to decide ahead of time precisely what your data structure must look like, or what fields your records will need.

– In Cassandra, we should think of our queries first, and then provide the data that answers them.

Page 9: Discussion MySQL&Cassandra ZhangGang 2012/11/22. Optimize MySQL

Learning Cassandra• Installing Cassandra.

– compare with Hadoop HBase, installing Cassandra is simple. Just download the source code and set the right JAVAHOME, input the command “ant” ,then Cassandra is successfully installed.

– start the Cassandra server: >>bin/cassandra –f– we can use the command line interface:>>bin/cassandra-cli

Page 10: Discussion MySQL&Cassandra ZhangGang 2012/11/22. Optimize MySQL

Learning Cassandra• The Cassandra Data Model

– Cassandra also has concepts like row, columfamily, column. But the meaning is different.

– the column is a name/value pair------->cell – the columnfamily is a container for rows that have similar, but not

identical, column sets----->table– the keyspace is the outermost container for data in Cassansra

----- > database

Page 11: Discussion MySQL&Cassandra ZhangGang 2012/11/22. Optimize MySQL

Learning Cassandra– If we wanted to create a group of related columns, Cassandra

allows us to do this with something called a super column family. A super column family can be thought of as a map of maps.

Page 12: Discussion MySQL&Cassandra ZhangGang 2012/11/22. Optimize MySQL

Learning Cassandra• Keyspace ------has a name and a set of attributes that

define keyspace-wide behavior. There are some basic attributes that we can set per keyspace:– Replication factor: refer to the number of nodes that will act as

copies of each row of data.– Replica placement strategy: refer to how the replicas will be

placed in the cluster. (SimpleStrategy, OldNetworkTopologyStrategy, NetworkTopologyStrategy)

– Column families: keyspace is a container for a list of one or more column families. Column families represent the structure of our data.

Page 13: Discussion MySQL&Cassandra ZhangGang 2012/11/22. Optimize MySQL

Learning Cassandra• Column Families------is a container for an ordered

collection of rows, it likes the table in RDBMS, but it’s not.– It’s schema-free because although the column families are

defined, the columns are not.– A column family has two attributes: a name and a comparator. The

comparator value indicates how columns will be sorted when they are returned to us in a query.

– Cassandra column families as similar to a four-dimensional hash: [Keyspace][ColumnFamily][Key][Column]

– If define the column families as super, it will be a five-dimensional hash : [Keyspace][ColumnFamily][Key][SuperColumn][SubColumn]

Page 14: Discussion MySQL&Cassandra ZhangGang 2012/11/22. Optimize MySQL

Learning Cassandra• Column Family Options----- There are a few additional

parameters that we can define for each column family:– keys_cached– rows_cached– read_repair_chance– preload_row_cache– …

Page 15: Discussion MySQL&Cassandra ZhangGang 2012/11/22. Optimize MySQL

Learning Cassandra• Column Sorting---- In Cassandra, we specify how column

names will be compared for sort order when results are returned to the client. Here are some choices:– AsciiType– BytesType– LongType– UTF8Type– …

• Sorting is a design decision– In RDBMS we can use order by to change the orders. In

Cassandra, we can’t change the orders after we dictate the it when create a column family.

Page 16: Discussion MySQL&Cassandra ZhangGang 2012/11/22. Optimize MySQL

Learning Cassandra• Secondary Indexes:

– Secondary Indexes is supported from Cassandra 0.7. It means we can create indexes on column values.

• Denormalization: – Normalization is not an advantage when working with

Cassandra because it performs best when the data model is denormalized.

– Instead of modeling the data first and then writing queries, with Cassandra we model the queries and let the data be organized around them. Think of the most common query paths the application will use, and then create the column families that we need to support them.

Page 17: Discussion MySQL&Cassandra ZhangGang 2012/11/22. Optimize MySQL

Learning Cassandra• Design patterns:

– Materialized View: writing our data to a second column family that is created specifically to represent specified query.

– Valueless Column: column name also can save useful information, often used in materialized view。

– Aggregate Key:When use the Valueless Column pattern, we may also need to employ the Aggregate Key pattern.It likes xxx:xxx(use colon as the separator)

Page 18: Discussion MySQL&Cassandra ZhangGang 2012/11/22. Optimize MySQL

Learning Cassandra• API & python library:

– There is a client generation layer, provided by the Thrift API and the Avro project.

– There are also high-level Cassandra clients different languages, for python, there has a library named pycassa. Users can easily use python to communicate with Cassandra by using pycassa. Now I'm getting familiar with it.

Page 19: Discussion MySQL&Cassandra ZhangGang 2012/11/22. Optimize MySQL

Thanksnow discussing…