Accumulo
Granular Access Control Using Cell Level Security In Accumulo
Table of Contents1.0 SUMMARY/ABSTRACT 2
1.1 PROBLEM STATEMENT 21.2 OVERVIEW OF STEPS 21.3 TECHNOLOGY USED21.1 ISSUES 21.1 LESSON LEARNED 21.1 SUMMARY 2
2.0 TECHNOLOGY USED 3
3.0 INSTALLATION/CONFIGURATION 6
3.1 HIGH LEVEL OVERVIEW 63.2 DETAILED STEPS 6
PHASE 1: DOWNLOAD 6PHASE 2: INSTALLATION 6
INSTALL HADOOP 7INSTALL ZOOKEEPER 10INSTALL ACCUMULO11
PHASE 3: RUNNING ACCUMULO 14PHASE 4: RUN JAVA PROGRAM TO POPULATE DEMO DATASET 19PHASE 5: DEMONSTRATE ACCUMULO CAPABILITIES USING SHELL 20PHASE 6: STOPPING ACCUMULO 21
4.0 DEMO AND WORKING CODE 21
4.1 JAVA CODE214.2 DEMO 24
CASE 1 24CASE 2 24CASE 3 24CASE 4 25CASE 5 25CASE 6 26CASE 7 27
5.0 ISSUES ENCOUNTERED 28
6.0 LESSONS LEARNED 28
7.0 CONCLUSION29
8.0 REFERENCES/USEFUL RESOURCES 29
8.1 REFERENCES 298.2 USEFUL RESOURCES 298.3 YOUTUBE LINKS FOR PRESENTATIONS 29
1.0 Summary/Abstract
1.1 Problem Statement
Organizations and governments rely heavily on information provided by big data however, secrecy and privacy issues become magnified because systems are more exposed to vulnerabilities from the use of large-scale cloud infrastructures, with a diversity of software platforms, spread across large networks of computers. Traditional security mechanisms are no longer adequate due to the velocity, volume and variety of big data used today.
In this paper, we will be looking at the security property that matters from the perspective of access control i.e. how do we prevent access to data by people that should not have access?
1.2 Overview of Steps
As a solution to the problem statement, I will be looking at the concept of granular access control (the ability to allow data sharing as much as possible without compromising secrecy) to show how its theory can be adapted to big data sets. After installing Zookeeper and Accumulo on my MacOS, I ran both servers and used a Java Script to create a large randomized data set simulating a claims processor. The example demonstrates how different levels of access can be administered depending on who you are: an administrator, insurer or part of the general public.
1.3 Technology Used
Accumulo is based on Google’s BigTable design and is built on top of Apache Hadoop, Zookeeper, and Thrift. Accumulo’s key feature is that it is well suited to store sparse high dimensional data and uses ColumnVisibility to allow the filtering of users based on the presentation of the appropriate authorization i.e. only data that has the correct visibility label will be returned to the user. This allows the implementation of granular access control at the cell in contrast to more traditional access methods where rows, columns or even tables would be restricted to users. This form of security maximizes the utility we receive by aggregating various sources of big data without compromising privacy or secrecy. This is particularly useful for Big Data where concerns around privacy of data has been rising over the past few years.
1.4 Issues
Throughout the installation of Zookeeper and Accumulo, there were some issues encountered but the biggest one would be the scarcity of documentation available. There was a great deal of research done in user forums in order to resolve some of the installation issues. However, there are good conceptual presentations in Slideshare.
1.5 Lessons Learned
Pros ConsAccumulo does not require a schema Accumulo does not perform query optimizationAccuulo is a wide column database, similar to HBase or Cassandra
Accumulo does not have a standard query language like RDF or SQL
Accumulo scales horizontally
1.6 Summary
Accumulo proved to be a relatively straight forward technology to use once installation humps had been overcome. Its cell-based security model is very useful as data sharing without compromising secrecy is a big security issue we face in terms of big data. The ability of implementing granular access control with Accumulo gives data managers more flexibility in sharing data securely.
2.0 Technology Used
According the book Accumulo by Rinaldi, Wall and Cordova6:Apache Accumulo is a highly scalable, distributed, open source database modeled after Google’s BigTable design. Accumulo is built to store up to trillions of data elements and keeps them organized so that users can perform fast lookups. Accumulo supports flexible data schemas and scales horizontally across thousands of machines. Applications built on Accumulo are capable of serving a large number of users and can process many requests per second, making Accumulo an ideal choice for terabyte to petabyte-scale projects.
Accumulo began its development in 2008 when a group of computer scientists and mathematicians at the National Security Agency were evaluating various big data technologies to help solve the issues involved with storing and processing large amounts of data of different sensitivity levels. In 2011, Accumulo joined Apache community with Doug Cutting (founder of Hadoop) as its sponsoring champion. In March of the following year, Accumulo graduated to a top level project1.
Apache Accumulo is based on Google's BigTable design and is built on top of Apache Hadoop, Zookeeper, and Thrift. Accumulo relies on Hadoop HDFS to provide persistent storage, replication, and fault tolerance, Zookeeper for highly reliable distributed coordination of servers and Thrift to define and create services in languages other than Java - Accumulo is written in the latter.
At its core, Accumulo stores key-value pairs which allow users to look up the value of a particular key or range of keys very quickly. Values are stored as byte arrays and Accumulo doesn’t restrict the type or size of the values stored. The data model is illustrated below.
The key is multi-dimensional and consist of a row id, a column family, a column qualifier, a column visibility and a timestamp. In the Accumulo, all data that share the same Row ID are considered to be part of the same record i.e. multiple rows usually contribute to one record. This is in contrast to more traditional data models where each record is stored on a row. The columnFamily and the ColumnQualifier are used as attributes to uniquely qualify each row of the Accumulo such that each row in Accumulo can be thought as a cell of traditional data model. This ability to store data in individual cell
makes Accumulo well suited to store sparse high dimensional data. The ColumnVisibility is used to allow the filtering of users based on the presentation of the appropriate authorization i.e. only data that has the correct visibility label will be returned to the user. This allows the implementation of granular access control at the cell in contrast to more traditional access methods where rows, columns or even tables would be restricted to users. This form of security maximizes the utility we receive by aggregating various sources of big data without compromising privacy or secrecy. This particularly useful for Big Data where concerns around privacy of data has been rising over the past few years.
In the physical data representation below, only data that with the ColumnVisibility Public will be returned to a user that has the authorization public while the remaining data with the inappropriate Visibility label are not returned to the user.
Accumulo will also not allow users to write data that does not match their visibility label. In our previous example, someone with the Public ColumnVisibility label cannot write a row where the ColumnVisibility is set to Private
Accumulo supports user access control level. However it is usually easier to label information visibility based on groups. For example if John is leaving the Finance department for the Marketing department, it is easier to change the authorization associated with John from Finance to Marketing rather than having all visibilities associated with visibility label John in the database changed to the person that is replacing John. Accumulo supports logical AND & and OR | combinations of tokens, as well as nesting groups () of tokens together. This allows only users that meet a combination of labels to read those rows.
Label DescriptionA & B Both 'A' and 'B' are requiredA | B Either 'A' or 'B' is requiredA & (C | B) 'A' and 'C' or 'A' and 'B' are requiredA | (B & C) 'A' or both 'B' and 'C' are required
Using this approach we can further divide from groups to functions within that group. For example two people working for the Finance department could have the label Finance&Reporting and Finance&Auditing.
A typical use of granular access control is shown below.
Like any security measures the features Accumulo provides must be coordinated with other system security measures in order to achieve the maximum protection. Other security considerations when using Accumulo are:
Accumulo will authenticate a user and authorize that user to read data according to the security labels present within that data and the authorizations granted to the user. All other means of accessing Accumulo table data must be restricted. Rinaldi, Wall and Cordova6 propose the following points to help in that respect:
Access to files stored by Accumulo on HDFS must be restricted. This includes access to both the RFiles, which store long term data, and Accumulo’s write-ahead logs, which store recently written data. Accumulo should be the only application allowed to access these files in HDFS.
HDFS stores blocks of files in an underlying Linux file system. Users who have access to blocks of HDFS data stored in the Linux filesystem would also bypass data-level protections. Access to the file directories on which HDFS data is stored should be limited to the HDFS daemon user.
Direct access to Tablet Servers must be limited to trusted applications - this is because the application is trusted to present the proper Authorizations at scan time. A rogue client may be configured to pass in Authorizations the user does not have.
IPTables or other firewall implementations can be used to help restrict access to TCP ports. Access to ZooKeeper should be restricted as Accumulo uses it to store configuration
information about the cluster. Communication between nodes and to HDFS and ZooKeeper should be protected against
unauthorized access. accumulo-site.xml file should be readable only to the accumulo user, as it contains
the instance-secret and the trace user’s password. A separate conf directory with files readable by other users can be created for client use, with an accumulo-site.xml file that does not contain those two properties.
Source: Winick, Jared, Slideshare
3.0 Installation/Configuration
3.1: High-level overview
Phase 1: DownloadPhase 2: InstallationPhase 3: Running AccumuloPhase 4: Run Java program to populate the demo datasetPhase 5: Demonstrate Accumulo capabilities using shellPhase 6: Stopping Accumulo
3.2: Detailed steps
Phase 1: Download1. Download Accumulo 1.6.2.tar.gz2. Download Hadoop 2.7.0.tar.gz3. Download Zookeeper-3.4.6.tar.gz
Phase 2: InstallationPrerequisite: You need Java 7 JRE for the software and JDK for project software. I am using openjava-7-jdk. This can be done by using the command
It is also important that OpenJDK is default java. This can be verified by using the commandjava –version
It should report java version "1.0.7_79" OpenJDK Runtime Environment
Note: To find the install path for OpenJDK; you can use commandreadlink -f $(which java) Ensure that the Java/bin has been added to $PATH by using the command
The java configuration should look similar to the print screen below
sudo apt-get openjdk-jdk
Echo $PATH
Prerequisite: You need a SSH server and a SSH client to perform passwordless access to localhost. Typically, you would use the following command to install themsudo apt-get ssh-client ssh-server
Assuming we are logged at the machine called "ubuntu" as user "maja". Create the accumulo directorycd ~mkdir accumulocd accumulo
This is going to be the project directory (/home/maja/accumulo).
Install HadoopWe will install Hadoop into a user home directory. Unzip and untar Hadoop to /home/maja, this creates directory /home/maja/hadoop-2.7.0/
We will call this the Hadoop directory
The appropriate documentation for Hadoop can be found at the following websitehttp://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html
We are installing a single node cluster, and will run it is pseudo distributed mode
Change directory to the Hadoop directory to configure installation by editing etc/hadoop/hadoop-env.sh
Modify etc/hadoop/core-site.xml<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://localhost:9000</value> </property></configuration>
Modify etc/hadoop/hdfs-site.xml<configuration> <property> <name>dfs.replication</name> <value>1</value> </property></configuration>
Verify installation by running the following commandbin/hadoop version
Install Zookeeper
We will install zookeeper into the user home directory. Unzip and untar zookeeper to /home/maja. This creates directory /home/maja/zookeeper-3.4.6/
We will call this the Zookeeper directory
Change the directory to Zookeeper and edit conf/zoo.cfg tickTIme=2000 dataDir=/home/maja/zookeeper-3.4.6/data clientPort=2181 server.1=localhost:2888:3888
Install AccumuloWe will install Accumulo into the user home directory. Unzip and untar Accumulo to /home/maja. This creates directory /home/maja/accumulo-1.6.2
We will call this the Accumulo directory
Change directory to the Accumulo directory and copy the example of a configuration file for Accumulo to conf directory by using the commandcp conf/examples/1GB/standalone/* conf
Edit accumulo-env.shexport ACCUMULO_HOME=/home/maja/accumulo/accumulo-1.6.2export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64export HADOOP_PREFIX=/home/maja/hadoop/hadoop-2.7.0export ZOOKEEPER_HOME=home/maja/accumulo/zookeeper-3.4.6
Edit accumulo-site.xml<property> <name>instance.zookeeper.host</name> <value>localhost:2181</value></property>
Edit bin/start-server.sh (there is some bug, that prevents starting the monitor). After line 50 add the following:# ACCUMULO-1985 patchif [ ${SERVICE} == "monitor" -a ${ACCUMULO_MONITOR_BIND_ALL} == "true" ]; then ADDRESS = "0.0.0.0"fi
Phase 3: Running Accumulo
Set the following environment variables for JAVA_HOME and HADOOP_PREFIX using the commandexport JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64export HADOOP_PREFIX=/home/maja/hadoop-2.7.0
Start SSH-serversudo service ssh restart
This should say: ssh stop/waitingssh start/running, process XXXX (some number)
Test passwordless ssh to localhostssh localhost
This should say:Welcome to Ubuntu ...
Exit the new shell back to the original shellexit
------first start hadoop DFS-------
Let's assume we are in directory home/maja/accumulo
cd hadoop-2.7.0bin/hadoop versionbin/hdfs namenode -formatsbin/start-dfs.sh
There is a web server to monitor status of the Hadoop DFS: http://ubuntu:50070
Perform the following operations to set the files on the HDFS:bin/hdfs dfs -mkdir /userbin/hdfs dfs -mkdir /user/majabin/hdfs dfs -put etc/hadoop input
Check the web server utilities/browse directories to see some files under user/maja/input
------Second start zookeeper-----Let's assume we are in directory home/maja/accumulo
cd ../zookeeper-3.4.6sudo bin/zkServer.sh startbin/zkServer.sh status
----- Third start Accumulo---------Let's assume we are in directory home/maja/accumulo
cd ../accumulo-1.6.2bin/accumulo init
Call the instance "MyAccumulo", agree to remove the instance from Zookeeper if it exists, select password for user "root", retype password
Start Accumulo by using the commandbin/start-all.sh
Check the web server http://ubuntu:50095 to verify that Accumulo server is working correctlycheck to see that Accumulo server is working
Check Accumulo shell by using the commandbin/accumulo shell -u root
Shell should return the following 3 tablesaccumulo.metadataaccumulo.roottrace
Exit the shellexit
Double check Hadoop filesystem to see some files under user accumulo
Phase 4: Run project Java program to populate the demo dataset
The demo project includes a Java program that connects to accumulo and creates a demo dataset.
To compile the program, use the build.sh script to properly set classpath
The dataset includes 2 tables: records and insurers
The records table has one column family with the following columns:-date // date of a medical procedure-client // name of the client-procedure // type of the procedure-insurer // name of the insurer-provider // mane of the medical provider-amount // dollar amount charged
The insurers table has a single column family with the following columns:- insurer // name of the insurer- rank // rank of the insurer
In the records table, date and procedure cells are authorized to "public". Other cells are authorized to a particular insurer.
./build.sh InsertWithBatchWriter.java 2>&1 |lessrm InsertWithBatchWriter.jarjar cvf InsertWithBatchWriter.jar InsertWithBatchWriter.classcp InsertWithBatchWriter.jar /home/maja/accumulo/accumulo-1.6.2/lib/ext/
First, manually create table insurersbin/accumulo shell -u rootcreatetable insurers
You check that the new table has been created by using the commandtables
Exit the Accumulo Shellexit
Run the InsertWithBatchWriter programbin/accumulo InsertWithBatchWriter -i MyAccumulo -z localhost:2181 -u root -t records
The program generates random set of insurers, random set of providers and a random set of procedure types. Then it generates a demo dataset of 1,000,000 records with random dates in the period 1900-2015, and random patient names. Each sensitive cell has the visibility of the corresponding provider. Different cells in the table have different visibilities.The program prints to screen after each 1000 records.
Go back to the Accumulo Shell to validate that the tables have been createdbin/accumulo shell -u roottables
The records and insurers tables will be listed
Phase 5: Demonstrate Accumulo capabilities using shell
This is shown in the next section.
Phase 6: Stop Accumulo
1. Stop Accumulocd accumulo-1.6.2bin/stop-all.sh
2. Stop Zookeepercd ../zookeeper-3.4.6sudo bin/zkServer.sh stop
3. Stop Hadoopcd ../hadoop-2.7.0sbin/stop-dfs.sh
4.0 Demo and Working Code
4.1 Java Code:
import org.apache.accumulo.core.cli.BatchWriterOpts;import org.apache.accumulo.core.cli.ClientOnRequiredTable;import org.apache.accumulo.core.client.AccumuloException;import org.apache.accumulo.core.client.AccumuloSecurityException;import org.apache.accumulo.core.client.BatchWriter;import org.apache.accumulo.core.client.Connector;import org.apache.accumulo.core.client.MultiTableBatchWriter;import org.apache.accumulo.core.client.MutationsRejectedException;import org.apache.accumulo.core.client.TableExistsException;import org.apache.accumulo.core.client.TableNotFoundException;import org.apache.accumulo.core.data.Mutation;import org.apache.accumulo.core.data.Value;import org.apache.hadoop.io.Text;
import org.apache.accumulo.core.security.ColumnVisibility;
import java.util.Random;import java.util.GregorianCalendar;
/** * Inserts 10K rows (50K entries) into accumulo with each row having 5 entries. */public class InsertWithBatchWriter {
public static void main(String[] args) throws AccumuloException, AccumuloSecurityException, MutationsRejectedException, TableExistsException, TableNotFoundException {
// public static void main(String[] args) { ClientOnRequiredTable opts = new ClientOnRequiredTable(); BatchWriterOpts bwOpts = new BatchWriterOpts(); opts.parseArgs(InsertWithBatchWriter.class.getName(), args, bwOpts);
Connector connector = opts.getConnector(); MultiTableBatchWriter mtbw = connector.createMultiTableBatchWriter(bwOpts.getBatchWriterConfig());
if (!connector.tableOperations().exists(opts.tableName)) connector.tableOperations().create(opts.tableName); BatchWriter bw = mtbw.getBatchWriter(opts.tableName);
int maxProc=20; String[] proc=new String[maxProc+1]; for(int i=0;i<maxProc;i++) { proc[i]=randomString(5); }
BatchWriter ibw=mtbw.getBatchWriter("insurers"); Text coli=new Text("insurer");
int maxIns=50; String[] insurer=new String[maxIns+1]; for(int i=0;i<maxIns;i++) { insurer[i]=randomString(5); System.out.println("Generating Insurer "+i+insurer[i]); Mutation mi = new Mutation(new Text(String.format("ins_%d",i))); long ts=System.currentTimeMillis();ColumnVisibility colVisAdmin = new ColumnVisibility("Admin"); mi.put(coli, new Text("name"), colVisAdmin, ts, new Value(insurer[i].getBytes()));int rank=rnd.nextInt( 10 ); // System.out.println("rank=" + rank); mi.put(coli, new Text("rank"), colVisAdmin, ts, new Value((Integer.toString(rank)).getBytes())); ibw.addMutation(mi); }
int maxPro=50; String[] provider=new String[maxPro+1]; for(int i=0;i<maxPro;i++) { provider[i]=randomString(5); }
Text colf = new Text("colfam"); System.out.println("writing ..."); for (int i = 0; i < 1000000; i++) {
Mutation m = new Mutation(new Text(String.format("id_%d", i)));
long timestamp=System.currentTimeMillis();
int ppi=rnd.nextInt( maxIns );// System.out.println("insurer #=" + ppi);String ins=insurer[ ppi ]; // System.out.println("insurer=" + ins);
String dd=randomDate();ColumnVisibility colVisPublic = new ColumnVisibility("public"); m.put(colf, new Text("date"), colVisPublic, timestamp, new Value(dd.getBytes()));
ColumnVisibility colVis = new ColumnVisibility( ins );
String cl=randomString(8);// System.out.println("client=" + cl); m.put(colf, new Text("client"), colVis, timestamp, new Value(cl.getBytes()));
ppi=rnd.nextInt( maxProc );// System.out.println("procedure #=" + ppi);String pp=proc[ ppi ]; // System.out.println("procedure=" + pp); m.put(colf, new Text("procedure"), colVisPublic, timestamp, new Value(pp.getBytes()));
m.put(colf, new Text("insurer"), colVis, timestamp, new Value(ins.getBytes()));
ppi=rnd.nextInt( maxPro );// System.out.println("provider #=" + ppi);String pro=provider[ ppi]; // System.out.println("provider=" + pro); m.put(colf, new Text("provider"), colVis, timestamp, new Value(pro.getBytes()));
int amt=rnd.nextInt( 10000 ); // System.out.println("amount=" + amt); m.put(colf, new Text("amount"), colVis, timestamp, new Value((Integer.toString(amt)).getBytes()));
bw.addMutation(m); if (i % 100 == 0) System.out.println(i); } mtbw.close(); }
static final String AB="0123456789ABCDEFGIJKLMNOPQRSTUVWXYZ";static Random rnd=new Random();
static String randomString( int len ) { StringBuilder sb = new StringBuilder( len ); for( int i=0;i<len; i++ ) sb.append( AB.charAt( rnd.nextInt(AB.length()))); return sb.toString();}
static String randomDate() { GregorianCalendar gc=new GregorianCalendar(); int year=randomBetween(1900,2015); gc.set(gc.YEAR, year); int dayOfYear = randomBetween(1,gc.getActualMaximum(gc.DAY_OF_YEAR)); gc.set(gc.DAY_OF_YEAR, dayOfYear); String yymmdd=gc.get(gc.YEAR)+"-"+gc.get(gc.MONTH)+"-"+gc.get(gc.DAY_OF_MONTH);// System.out.println( "date="+yymmdd); return yymmdd;}
private static int randomBetween(int start, int end) { return start+(int)Math.round(Math.random() * (end-start));}}
Please note that this code generates the randomized claims processor data that is used in the demo. Each client has an identifier, the procedure they got, the year and the insurer through which the client got paid through. When Accumulo is up and running, the Java script is then run and the data is populated and used for further demonstration.
4.2 Demo
Now let’s demonstrate the different visibility settings we have once the code has generated our randomized data. We have two tables produced, records and insurer. Based on which table we are scanning through and under which authorization, we are restricted to certain types of information as per what is allowed. Let’s start with just looking in the records table.
Starting the Accumulo shell and checking for our two tables:
Case 1: Scan records table without authorization: no records are visible.
Case 2: Switch to insurers table and set authorization to admin: their visibility allows them to see the various insurer name, rank, and ID code from that table.
Case 3: From records table, set authorization to insurer “GU”: Their visibility allows them to see the various claims data related to that insurer: Client, provider, insurer and amount paid by them only.
Case 4: Similarly for insurer ZP:
Case 5: From records table set authorization to public: Their visibility only allows a person from the general public to view the procedure done and date for all de-identified clients. They cannot see from which insurer, or how much was paid.
Case 6: We have a new user Bob, let’s create him. When he tries to access the data, it is completely restricted to him as he does not have permissions. The root user must allow him to view the data as one of the three people: insurer, public or administrator in order to access any data.
Case 7: Let’s give bob some permissions relating to insurer “GU” As the root we grant the permissions, once we are bob again notice he can now access the records related to GU however, he
does not have any permissions to set different authorization types. Hence, he cannot read other records or write to any.
5.0 Issues Encountered
There were a couple of issues encountered throughout the installation process. The following below are worthy of noting as they did cause quite a bit of time to correct.
Issue: Accumulo's monitor does not work on the localhost. Solution: You will need to apply the patch Accumulo-1985 to bin/start-server.sh
Issue: Zookeeper’s default way of starting its server does not display error messages, so the server often gives an impression of having started successfully, while it fact it failed. Solution: Logs need to be carefully inspected to verify this. In order to see the messages one needs to start the server in foreground. Instead of bin/zkServer.sh start do bin/zkServer.sh start-foreground
Issue: Accumulo's documentation is scarce. Solution: You will do a lot of Google search to resolve some of the installation issues. The answers can be located in the user forums. There are good conceptual presentations in slideshare as well.
6.0 Lessons Learned
In general, a few lessons learned from using Accumulo in the demo were:
Its cell-based security model is very useful. Every key-value pair has its own security label, stored under the column visibility element of the key, which is used to determine whether a given user meets the security requirements to read the value. This enables data of various security levels to be stored within the same row, and users of varying degrees of access to query the same table, while preserving data confidentiality.
Its wide-column model is useful for aggregating information using the same key (one can have multiple column families and column qualifiers)
Based on research done from the Accumulo User Manual and overall findings, some pros and cons to using the technology as well as a high level comparison to other technologies are listed below:
Pros: Accumulo does not require a schema Accumulo is a wide-column database, similar to HBase or Cassandra Accumulo scales horizontally
Cons: Accumulo does not have a standard query language like RDF or SQL Accumulo does not perform query optimization
Accumulo Compared to: SQL:
o Accumulo does not have a schema, o Accumulo scales horizontally; o Accumulo does not have a standard query language (like SQL)
other wide-column databases: o Accumulo sorts keys
other noSQL databases: o Accumulo does not have rest API and does not do Java Script
graph databases: o Accumulo scales horizontally
RDF (resource description framework); o Accumulo scales horizontally, o Accumulo does not have a standard query language (like Sparql)
7.0 ConclusionSecurity and privacy issues are amplified by the velocity, volume and variety characteristic that are inherent of big data. As Big Data is quickly becoming a critically important driver of business success across sectors, solutions are sought to balance access to large amount of data without sacrificing privacy and secrecy. One possible solution that we have discussed today is Accumulo. The latter is a NoSQL database that extends the basic BigTable data model by adding an element called Column Visibility. This allows Accumulo to enforce Granular Access by labelling each key-value pair with its own visibility expression. Data of different sensitivity levels to be stored and indexed in the same physical tables, and for users of varying degrees of access to read these tables without seeing any data they are not authorized to see. Granular access control gives data managers the tools to share data as much as possible without compromising secrecy and satisfy the most stringent data access requirements. This combined with Accumulo’s ability to handle sparse data and unstructured data makes Accumulo an excellent tool for storing Big Data.
8.0 References
8.1 References
1. http:// www.apache.org/ 2. Winick, Jared, Introduction to Accumulo Presentation
http :// www.slideshare.net/jaredwinick/introduction-to-apache-accumulo 3. Miner,Donald , Introduction to Accumulo Presentation
http :// www.slideshare.net/DonaldMiner/an-introduction-to-accumulo 4. Cardova, Aaron , Introduction to Accumulo Presentation
http :// www.slideshare.net/acordova00/introductory-training 5. Billie Rinaldi, Aaron Cordova, and Michael Wall. Accumulo (early release). O'Reilly Media, Inc.
2015. Ebook. Available at safaribooksonline.com
8.2 Useful ResourcesDownload Accumulo: https://accumulo.apache.org/ Download Zookeeper: https://zookeeper.apache.org/ Download Hadoop: https://hadoop.apache.org/ Apache Accumulo 1.6 User Manual: http://accumulo.apache.org/1.6/accumulo_user_manual.htmlAccumulo Installation Instruction: http://sqrrl.com/quick-accumulo-install/