what’s new in datastax enterprise 3.0?

8
What’s New in DataStax Enterprise 3.0? A Guide for Developers, Architects and IT Managers White Paper BY DATASTAX CORPORATION FEBRUARY 2013

Upload: others

Post on 12-Sep-2021

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: What’s New in DataStax Enterprise 3.0?

What’s New in DataStax Enterprise 3.0?A Guide for Developers, Architects and IT Managers

White PaperBY DATASTAX CORPORATION

FEBRUARY 2013

Page 2: What’s New in DataStax Enterprise 3.0?

ContentsIntroduction 3

Why DataStax Enterprise? 3

Use Cases Handled by DataStax Enterprise 4

What’s New in DataStax Enterprise Edition 3.0? 4

Security 4

Internal Authentication 4

External Authentication 4

Permission Management 5

Transparent Data Encryption 5

Data Auditing 5

Client to Node Encryption 6

External Security Firm Validation 6

Enterprise Manageability 6

Visual Cluster Provisioning 6

Visual Restore and Custom Backup Management 7

Enhanced Visual Object Management 8

Automated Problem Diagnostic Collection 8

Off Cluster Metadata Repository 8

Solr Upgrade 8

Conclusion 8

About DataStax 8

Page 3: What’s New in DataStax Enterprise 3.0?

IntroductionDataStax Enterprise Edition is a complete big data platform, built on a production-certified version of Apache Cassandra™, that is architected to manage real-time, batch analytic, and enterprise search data all in the same database cluster. DataStax Enterprise consists of three components:

1. DataStax Enterprise Server, which is a big data management platform that uses Apache Cassandra for real-time data management, Hadoop for batch analytics, and Apache Solr for enterprise search operations.

2. OpsCenter Enterprise, which is a visual, browser-based management tool that allows administrators to manage all database clusters, whether they are on premise, across multiple data centers, or in the cloud, from a single interface.

3. Expert support, which is delivered by the big data professionals at DataStax and includes around-the-clock coverage.

This paper discusses the enhancements included in version 3.0 of DataStax Enterprise Edition, which is targeted at addressing the areas of security and enterprise manageability.

IntroductionModern businesses are using data – and specifically big data – to transform the way they do business. Their line-of-business (LOB) applications are evolving to meet the need of providing greater capabilities and data insights than ever before, and this necessitates a new kind of technology aimed at handling big data in a technically efficient and cost effective way.

Industry experts state with one voice that legacy databases are not designed or equipped to handle the big data use cases for today’s new LOB applications. For example, IDC says: Big data technologies describe a new generation of technologies and architectures, designed to economi-cally extract value from very large volumes of a wide variety of data, by enabling high-velocity capture, discovery, and/or analysis.1

DataStax Enterprise provides the type of technology and architecture that enables what IDC describes. Moreover, DataStax Enterprise is unique in the big data marketplace in that it smartly handles all key data dimensions – real time, batch analytic, and enterprise search – all in one easily managed database cluster.

The benefits delivered by DataStax Enterprise can be summarized as follows:

Abstract DataStax Enterprise Edition is a complete big data platform, built on a production-certified version of Apache Cassandra™. It manages real-time, batch analyt-ics, and enterprise search data all in the same database cluster, and offers visual administration capabilities as well as around-the-clock expert support.

The release of DataStax Enter-prise Edition 3.0 offers the most comprehensive security of any NoSQL solution provider: internal and external authentication, permissions management, transparent data encryption, contextual data auditing, and confidence that data in flight is secure. It also delivers the enterprise manageability that modern businesses need.

Feature BenefitProduction-Certified Cassandra Apache Cassandra is a massively scalable NoSQL database that is an

acknowledged industry leader at handling real time big data. DataStax certifies a version of Cassandra for its big data platform via rigorous testing, benchmarking, validation with 3rd party software, and defect resolution to ensure a chosen version is ready for production environments.

Continuously Available Hadoop for Analytics The integrated Hadoop distribution is built on top of Cassandra and therefore contains no single points of failure, offers location-indepen-dent read/write capabilities, is extremely easy to use and setup, and can easily span multiple data centers and the cloud.

Fault-Tolerant Solr for Search Solr, the most popular open source search platform, is enhanced by running on top of Cassandra so that it is completely fault-tolerant, scales well, and runs across multiple data centers with ease.

1 Extracting Value from Chaos, IDC, June 2011: http://idcdocserv.com/1142.

3

Page 4: What’s New in DataStax Enterprise 3.0?

Workload Separation/Isolation The mixed workload problem is solved in the platform as no workload (real time, analytics, search) competes with any other for compute resources; all are isolated and yet integrated together.

The benefits delivered by DataStax Enterprise (continued):

Feature Benefit

No Need for ETL The need to extract-transform-load data from real time to analytic to search systems goes away as built-in replication keeps all data domains in sync.

Easy Data Migration Migrating data from existing RDBMS systems is easy via built-in migration utilities. Third party migration tools also support the platform.

Simple Log Integration Application logging data is easily consumed via logging interfaces, and then can be analyzed with Hadoop or searched via Solr.

Enterprise Management Control Administrators are instantly productive and save time with OpsCenter Enterprise, which allows management of all database clusters within a single interface.

Expert Support Professional, around-the-clock support ensures questions get quickly answered and help is available so applications stay online.

Cost Efficient Cost reductions over typical RDBMS vendors routinely run 80%.

Use Cases Handled by DataStax Enterprise

Because DataStax Enterprise is a comprehensive and integrated big data platform, it can handle use cases that have real-time, batch analytics, and enterprise search requirements. Use cases like these can be managed by DataStax Enterprise:

Real-Time:

Time series data

Device/Sensor/Data “exhaust” systems

Distributed applications

Media streaming

Online Web retail (transactional, shopping carts, etc.)

Real-time data analytics

Social media capture and analysis

Web click-stream analysis

Write-intensive transactional systems

Batch Analytics:

Buyer behavior analytics

Compliance/regulatory analysis

Customer recommendation output

Fraud detection

Risk analysis

Sales program campaign analysis

Supply chain analytics

Batch Web clickstream analysis

Enterprise Search:

General Web search

Web retail faceted (categorization) search

Search/hit prioritization and highlighting

Application log search and analysis

Document (PDF, MS Word, etc.) search and analysis

Geospatial search

Real estate location and property search

Social media match ups

What’s New in DataStax Enterprise Edition 3.0?The following are new enhancements included with DataStax Enterprise Edition 3.0.

SecurityAn April 2012 InformationWeek special report entitled “Why NoSQL Equals No Security”2 began by stating: “If it seems security is an afterthought at best in the big data ecosystem, you’re right.”

DataStax Enterprise 3.0 overcomes this perception and is the first big data platform in the NoSQL industry to bring the type of enterprise security used in traditional RDBMS’s to secure systems and important data to the big data/NoSQL market. The following describe each aspect of DataStax Enterprise 3.0’s security feature set in more detail. Note that all security features are optional; the administrator can decide to use none, some, or all of them depending on the specific application.

Internal AuthenticationVersion 3.0 of DataStax Enterprise supports internal-based authentication, which allows administrators to easily create users who can be authenticated to Cassandra database clusters. Those migrating to DataStax Enterprise from RDBMS’s will find the authentication framework extremely familiar. Administrators can use the CREATE USER command to create new users with passwords that will then be internally managed by Cassandra and used to authenticate into a database cluster. User accounts may also easily be altered and dropped.

A default superuser, ‘cassandra’, is supplied by default to initially enable the security authentica-tion definition process.

Client drivers and libraries that support the passing of credentials (e.g. Hector) will all work with internal authentication.

2 http://reports.informationweek.com/abstract/2/8758/Business-Continuity/strategy-why-nosql-equals-nosecurity*.html

4

Page 5: What’s New in DataStax Enterprise 3.0?

External AuthenticationCorporations and organizations that use external, 3rd party security packages to manage security in their environment can easily integrate DataStax Enterprise 3.0 into their infrastructure. The most widely used and trusted external security software – Kerberos – is supported in version 3.0. Authentication with Kerberos covers Cassandra, Hadoop, and Solr, and primary DataStax Enterprise utilities and client drivers are also supported.

LDAP may also be used to supply Kerberos with user data. For more information on how to configure Kerberos and LDAP to work with DataStax Enterprise 3.0, please reference the DataStax online documentation.

Permission ManagementOnce authenticated into a database cluster – using either internal or external authentication – the next security issue to be tackled is permission management; i.e. what can the user do inside the database? DataStax Enterprise 3.0 supplies easy-to-use authorization capabilities for Cassandra that use the very familiar GRANT/REVOKE security paradigm.

Control over DDL, DML, and SELECT operations are all handled via the granting and revoking of user privileges. The permissions that each user account possesses as well as what rights have been granted to certain objects may easily be determined by LIST CQL commands.

For example, if an administrator wants to grant read privileges on a table name ‘EMP’ to a user named ‘LAURA04’, they would execute the following CQL command:

GRANT SELECT ON EMP TO LAURA04

Note that a GRANT may be done with or without the GRANT OPTION, which allows the user receiving the grant to provide the same privileges on that object to other users.

Transparent Data EncryptionTransparent Data Encryption (TDE) in version 3.0 of DataStax Enterprise protects data at rest from being stolen and used in an authorized manner. TDE may be a good option for objects containing sensitive information such as social security numbers, credit card information, etc.

Administrators can encrypt tables / column families via a CQL command with AES 128 being the default, although other encryption algorithms can be used. Column families / tables may also be decrypted via the same CQL command.

Encryption is transparent to all end user activities; data may be read, inserted, updated, etc., with nothing having to change from the application end.

In addition to data in Cassandra, Hadoop data may also be encrypted. As Hadoop data is stored in the Cassandra File System (CFS), which is made up of several column families, encrypting those tables in effect allows all Hadoop data to be encrypted and protected.

Data AuditingDataStax Enterprise 3.0 supports the ability to configure data auditing so an administrator can understand what user activities took place on a particular node or entire cluster. Data auditing allows for a “who looked at what/when, who changed what/when” type of documentation that many large-scale enterprises need to have to comply with various internal or external security policies.

The data auditing contained in version 3.0 of DataStax Enterprise is not some afterthought implementa-tion that is inflexible and not practically usable as some supposed auditing subsystems are in various databases. Instead, an administrator has flexibility and total control over what gets audited and where the audit information is written and stored. The granularity of activities that can be audited include:

All activity (DDL, DML, queries, errors)DML onlyDDL onlySecurity changes (assigning/revoking privileges, dropping users, etc.) Queries onlyErrors only (e.g. login failures, etc.) 5

Page 6: What’s New in DataStax Enterprise 3.0?

An administrator can also omit certain keyspaces from being audited if they choose and only focus on keyspaces that are in production or are of particular interest.

Auditing is configured and written using DataStax Enterprise’s log4j interface, which ensures resource efficient auditing operations. Audit data can also be inserted into Cassandra tables and easily queried via CQL.

Client to Node EncryptionDataStax Enterprise 3.0 includes an optional secure form of communication from a client machine to a database cluster. Client to server SSL ensures data in flight is not compromised and is securely transferred back/forth from client machines.

External Security Firm ValidationDataStax contracted with the security industry expert group iSECpartners to perform a review of the security implementation and feature set in DataStax Enterprise to ensure no key security holes existed in the platform. The conclusion of iSECpartner’s review was that no major security concerns exist in the DataStax Enterprise 3.0 platform.

Enterprise ManageabilityDataStax Enterprise 3.0 makes it easier than ever to create and manage database clusters that manage big data, whether on premise, across multiple data centers, or in the cloud. Enhancements made to OpsCenter Enterprise, the visual, browser-based management solution that’s bundled as part of DataStax Enterprise, increase the enterprise-wide capabilities of the tool and make administrators more productive in carrying out operations such as creating new clusters, handling restore operations, and more.

Visual Cluster ProvisioningOpsCenter Enterprise 3.0 makes it very easy for administrators to create and stand up new clusters, both in their own IT environment and in the cloud. OpsCenter’s new provisioning interface supplies point-and-click management of what used to be a manual, command line driven process:

Figure 1:

OpsCenter Provisioning Interface for Creating a Cloud Database Cluster

6

Page 7: What’s New in DataStax Enterprise 3.0?

With OpsCenter 3.0, administrators can simply select the software they want installed on their new cluster, specify the make up of the cluster (e.g. how many real-time, analytic, and search nodes), massage any configuration parameters, and then OpsCenter will do the rest. The operations OpsCenter carries out includes:

Downloading the specified software onto each target machine in the cluster.

Configuring the nodes as Cassandra, Hadoop, or Solr machines.

Setting the data distribution parameters and other properties for the cluster.

Starting the cluster.

Installing OpsCenter agents on each node.

Setting up the new cluster to be monitored by OpsCenter.

The new provisioning functionality in OpsCenter 3.0 makes it possible to create a new, multi-node cluster that is fully monitored in just minutes.

Further, version 3.0 of OpsCenter also includes new functionality to visually add new nodes to a cluster, modify configuration files on each node, and more. All of the new provisioning and management capabilities make the administrator much more productive and cut down on errors that might be made through manual processes.

Visual Restore and Custom Backup ManagementVersion 3.0 of OpsCenter Enterprise includes a visual restore feature for restoring database clusters from prior backups that have been taken. Previous versions of OpsCenter contain backup capabilities, but restore functionality has been absent until now.

The restore utility allows an administrator to visually click through restoring a cluster from a snapshot that they select. Full snapshot restores bring a cluster back from the full backup that was taken, with all keyspaces being restored.

OpsCenter also offers more granular restore functionality than just full snapshot restore. Object-level restore is also possible within OpsCenter. This allows an administrator to select only the objects they want to restore from one or more keyspaces. This helps in cases where only one table / column families’ data has been deleted or changed in error.

Figure 2:

OpsCenter running a cluster restore operation

7

Page 8: What’s New in DataStax Enterprise 3.0?

In addition to new restore capabilities, OpsCenter now provides the ability to customize backup operations. Users have the ability to specify scripts in the backup interface that can run before and after a backup job. This allows an administrator to easily specify prerequisite tasks that need to run before a backup as well as perform various operations (e.g. moving snapshot files to other physical locations) after a backup.

Enhanced Visual Object ManagementOpsCenter 3.0 contains enhanced object management capabilities that allow users to visually create, edit, and drop keyspaces, tables / column families and indexes. This means that developers and administrators don’t have to know the CQL syntax needed to create and alter objects, but rather they can point and click their way through building and maintaining database objects, and do so without worrying about syntax errors.

Automated Problem Diagnostic CollectionTo help troubleshoot issues in a database cluster more quickly, OpsCenter 3.0 supplies a single button that a user can click on that collects all relevant system logs and other diagnostic informa-tion into a single package that is then sent to DataStax support engineers for analysis. The automatic collection and presentation of all needed diagnostic data helps DataStax support staff more quickly identify and resolve database issues, and gets a cluster operating back in the way a customer desires.

Off Cluster Metadata RepositoryBy default, OpsCenter stores the metadata it collects about a cluster on the cluster itself. However, in version 3.0 of OpsCenter, administrators can move OpsCenter’s information off the managed cluster and onto a different cluster. This helps reduce any I/O overhead on busy and resource intensive clusters where every drop of performance is needed.

Solr UpgradeDataStax Enterprise 3.0 contains a major update for Solr, which is used for enterprise search operations. Included in version 3.0 is an upgrade to Solr 4.0, which contains many new features and maintenance fixes.

ConclusionDataStax Enterprise Edition 3.0 supplies the most comprehensive security feature set of any NoSQL solution provider and also delivers the type of enterprise manageability that modern businesses need to efficiently manage their big data systems.

For more information on DataStax Enterprise Edition 3.0, visit www.datastax.com for down-loads, online documentation, and more. Note that DataStax Enterprise Edition 3.0 may be downloaded and used free of charge in development environments with no restrictions (e.g. data size, RAM, CPU, etc.), however production deployments do require a subscription be purchased. For information on subscription pricing, please send an email to DataStax at [email protected].

About DataStax

DataStax provides a massively scalable big data platform to run mission-critical business applications for some of the world’s most innovative and data-inten-sive enterprises. Powered by the open source Apache Cassandra™ database, DataStax delivers a fully distributed, continuously available platform that is faster to deploy and less expensive to maintain than other database platforms.

DataStax has more than 250 customers including leaders such as Netflix, Rackspace, Pearson Education, and Constant Contact, and spans verticals including web, financial services, telecommunications, logistics, and government. Based in San Mateo, Calif., DataStax is backed by industry-leading investors including Lightspeed Venture Partners, Meritech Capital, and Cross-link Capital.

For more information, visit www.datastax.com.

DataStax powers the big data apps that transform business for more than 200 customers, including startups and 20 of the Fortune 100. DataStax delivers a massively scalable, flexible and continuously available big data platform built on Apache Cassandra™. DataStax integrates enterprise-ready Cassandra, Apache Hadoop™ for analytics and Apache Solr™ for search across multi-datacenters and in the cloud.

Companies such as Adobe, Healthcare Anytime, eBay and Netflix rely on DataStax to transform their businesses. Based in San Mateo, Calif., DataStax is backed by industry-leading investors: Lightspeed Venture Partners, Crosslink Capital and Meritech Capital Partners. For moreinformation, visit DataStax.com or follow us on Twitter @DataStax.

777 Mariners Island Blvd #510 San Mateo, CA 94404 650-389-6000