hbase an introduction
TRANSCRIPT
Introduction to HBase
Ciaociao
Vai a fare
ciao ciao
Dr. Fabio Fumarola
Contents
• BigTable• HBase
– Shell– Admin– Put– Get– Scan
• Coding Session
2
BigTable
3
Bigtable at google
• "Bigtable is a distributed storage system for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers. Many projects at Google store data in Bigtable including web indexing, Google Earth, and Google Finance.”
4
Feature
• Distributed
• Sparse
• Column-Oriented
• Versioned
5
1. The map is indexed by a – <row key, column key, and a timestamp>
1. each value in the map is an uninterpreted array of bytes.
6
(row key, column key, timestamp) => value
Key Concepts
• row key => 20120407152657
• column family => "personal:"• column key => "personal:givenName",
"personal:surname”
• timestamp => 1239124584398
• Column value => “mario”, “rossi”
7
Example 1
8
Get row 20120407145045
9
HBase
• Use HBase when you need random, realtime read/ write access to your Big Data.This project's goal is the hosting of very large tables -- billions of rows X millions of columns -- atop clusters of commodity hardware. HBase is an open-source, distributed, versioned, column-oriented store modeled after Google's Bigtable.
http://hbase.apache.org
10
HBase Shell
hbase(main):001:0> create 'blog', 'info', 'content'
0 row(s) in 4.3640 seconds
hbase(main):002:0> put 'blog', '20120320162535', 'info:title', 'Document-oriented storage using CouchDB'
0 row(s) in 0.0330 seconds
hbase(main):003:0> put 'blog', '20120320162535', 'info:author', 'Bob Smith'
0 row(s) in 0.0030 seconds
hbase(main):004:0> put 'blog', '20120320162535', 'content:', 'CouchDB is a
document-oriented...'
0 row(s) in 0.0030 seconds
11
HBase shellhbase(main):005:0> put 'blog', '20120320162535', 'info:category', 'Persistence'
0 row(s) in 0.0030 seconds
hbase(main):006:0> get 'blog', '20120320162535'
COLUMN
content:
info:author
info:category
info:title
4 row(s) in 0.0140 seconds
CELL
timestamp=1239135042862, value=CouchDB is a doc...
timestamp=1239135042755, value=Bob Smith
timestamp=1239135042982, value=Persistence
timestamp=1239135042623, value=Document-oriented...
12
HBase shellhbase(main):015:0> get 'blog', '20120407145045', {COLUMN=>'info:author', VERSIONS=>3 }
timestamp=1239135325074, value=John Doe
timestamp=1239135324741, value=John
2 row(s) in 0.0060 seconds
hbase(main):016:0> scan 'blog', { STARTROW => '20120300', STOPROW => '20120400' }
ROW
20120320162535
20120320162535
20120320162535
20120320162535
COLUMN+CELL
column=content:, timestamp=1239135042862, value=CouchDB is...
column=info:author, timestamp=1239135042755, value=Bob Smith
column=info:category, timestamp=1239135042982, value=Persistence
column=info:title, timestamp=1239135042623, value=Document...
4 row(s) in 0.0230 seconds
13
Java API
14
Admin API// Create a new table
Configuration conf = HBaseConfiguration.create(); HBaseAdmin admin = new HBaseAdmin(conf);
String tableName = "people";
HTableDescriptor desc = new HTableDescriptor(tableName); desc.addFamily(new HColumnDescriptor("personal")); desc.addFamily(new HColumnDescriptor("contactinfo")); desc.addFamily(new HColumnDescriptor("creditcard")); admin.createTable(desc);
System.out.printf("%s is available? %b\n", tableName, admin.isTableAvailable(tableName));
15
Client APIimport static org.apache.hadoop.hbase.util.Bytes.toBytes;
// Add some data into 'people' table
Configuration conf = HBaseConfiguration.create();
Put put = new Put(toBytes("connor-john-m-43299"));
put.add(toBytes("personal"), toBytes("givenName"), toBytes("John"));
put.add(toBytes("personal"), toBytes("mi"), toBytes("M")); put.add(toBytes("personal"), toBytes("surname"), toBytes("Connor"));
put.add(toBytes("contactinfo"), toBytes("email"), toBytes("[email protected]")); table.put(put);
table.flushCommits(); table.close();
16
Finding Data
• GET (by row key)
• Scan (by row key ranges, filtering)
17
Get
// Get a row. Ask for only the data you need. Configuration conf = HBaseConfiguration.create();
HTable table = new HTable(conf, "people");
Get get = new Get(toBytes("connor-john-m-43299")); get.setMaxVersions(2); get.addFamily(toBytes("personal"));
get.addColumn(toBytes("contactinfo"), toBytes("email"));
Result result = table.get(get);
18
Update// Update existing values, and add a new one
Configuration conf = HBaseConfiguration.create();
HTable table = new HTable(conf, "people");
Put put = new Put(toBytes("connor-john-m-43299")); put.add(toBytes("personal"), toBytes("surname"), toBytes("Smith"));
put.add(toBytes("contactinfo"), toBytes("email"), toBytes("[email protected]"));
put.add(toBytes("contactinfo"), toBytes("address"), toBytes("San Diego, CA"));
table.put(put);
table.flushCommits();
table.close();
19
Scans// Scan rows...
Configuration conf = HBaseConfiguration.create();
HTable table = new HTable(conf, "people");
Scan scan = new Scan(toBytes(”jhon-")); scan.addColumn(toBytes("personal"), toBytes("givenName")); scan.addColumn(toBytes("contactinfo", toBytes("email")); scan.addColumn(toBytes("contactinfo", toBytes("address")); scan.setFilter(new PageFilter(numRowsPerPage)); ResultScanner scanner = table.getScanner(scan);
for (Result result : scanner) {
// process result...
}
20
Time to CodeThis is when things start to do hard
21
Setup HBase Docker
• https://registry.hub.docker.com/u/banno/hbase-standalone/• https://registry.hub.docker.com/u/oddpoet/hbase-cdh5/
22
Steps
• Shell• Java Project– Maven– Gradle
23