cask solutionscustomers.cask.co/rs/882-oyr-915/images/cask_solution-brief_final.… · real-time...

2
Cask Data Application Platform (CDAP) The first unified integration platform for big data, Cask Data Application Platform (CDAP) lets architects, developers and data scientists focus on applications and insights rather than infrastructure and integration. CDAP accelerates time to value from Hadoop and Spark through standardized APIs, configurable templates and visual interfaces. It enables IT organizations to broaden the big data user base within the enterprise with a radically simplified developer experience and a code-free self-service environment. CDAP is 100% open source, Hadoop-native and seamlessly integrates with existing MDM, BI and security solutions. Cask Solutions Managed Data Lake The concept of a data lake is often challenging for data and analytics leaders to understand. For some, the data lake is simply a staging area that is accessible to several categories of business users. To others, the “data lake” is a proxy for “not a data warehouse.” While data warehousing is highly structured and needs schema to be defined before data is stored, data lakes take the opposite approach. They collect information in a near-native form without considering semantics, quality, or consistency. Data collected in data lakes is defined as schema-on-read, where the structure of the data collected is not known upfront, but needs to be evaluated through discovery when it is read. Data lakes unlock value which was not previously attainable or was hard to achieve. Even with the multiple benefits that data lakes provide, there are substantial challenges because: • Technology is complex with lots of open source alternatives • Big data talent is expensive Point-solutions are limited and cannot be easily put in production, often requiring custom integration code Petabytes of data distributed across a cluster complicates operations, security, and data governance • Lack of structure and pre-processing can result in data swamps A significant part of improving the adoption of data lakes is to provide comprehensive, built-in support to both deploy and maintain a data lake over time. At Cask, we have created a Unified Integration Platform for big data that provides all aspects of data lake management, including data integration, metadata and lineage, security and operations, and app development on Hadoop and Spark. This means that companies can focus on application logic and insights instead of infrastructure and integration. EDW Offload The cost of maintaining a traditional Enterprise Data Warehouse (EDW) is skyrocketing as legacy systems buckle under the weight of exponentially growing data and increasingly complex processing needs. Hadoop has massive horizontal scalability along with system-level services that allow developers to free up storage and processing power from premium EDW platforms, increasing the ability for IT organizations to meet SLAs. But Hadoop is not a complete ETL (Extract-Transform-Load) solution. While the platform offers powerful utilities and virtually unlimited horizontal scalability, it does not provide the complete set of functionality users need for enterprise ETL. In most cases, these gaps must be filled manually with a large number of lines of complex coding, slowing Hadoop adoption and frustrating organizations eager to deliver results. CDAP fills the gaps between Hadoop and ETL needs. Cask’s EDW Offload solution package comes with pre-built pipelines which can be edited in a studio environment that consists of drag-and-drop sources, transforms, analytics, sinks and actions. The bulk load pipeline reads data from your EDW, transforms fields in the data using the wrangling tool, and writes it to a denormalized table on Hadoop. After you have migrated data and workloads to Hadoop, CDAP ensures extensive security and sophisticated governance across your datasets. This is done by automating the collection metadata from ingestion to data delivery. Additionally, CDAP provides access controls that meet typical security needs of the enterprise.

Upload: others

Post on 15-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Cask Solutionscustomers.cask.co/rs/882-OYR-915/images/Cask_Solution-Brief_final.… · Real-Time Analytics and IoT The world is projected to have 50 billion connected devices by 2020

Cask Data Application Platform (CDAP)The first unified integration platform for big data, Cask Data Application Platform (CDAP) lets architects, developers and data scientists focus on applications and insights rather than infrastructure and integration. CDAP accelerates time to value from Hadoop and Spark through standardized APIs, configurable templates and visual interfaces. It enables IT organizations to broaden the big data user base within the enterprise with a radically simplified developer experience and a code-free self-service environment. CDAP is 100% open source, Hadoop-native and seamlessly integrates with existing MDM, BI and security solutions.

Cask Solutions

Managed Data LakeThe concept of a data lake is often challenging for data and analytics leaders to understand. For some, the data lake is simply a staging area that is accessible to several categories of business users. To others, the “data lake” is a proxy for “not a data warehouse.” While data warehousing is highly structured and needs schema to be defined before data is stored, data lakes take the opposite approach. They collect information in a near-native form without considering semantics, quality, or consistency. Data collected in data lakes is defined as schema-on-read, where the structure of the data collected is not known upfront, but needs to be evaluated through discovery when it is read. Data lakes unlock value which was not previously attainable or was hard to achieve. Even with the multiple benefits that data lakes provide, there are substantial challenges because:

• Technology is complex with lots of open source alternatives• Big data talent is expensive• Point-solutions are limited and cannot be easily put in production, often requiring custom integration

code• Petabytes of data distributed across a cluster complicates operations, security, and data governance• Lack of structure and pre-processing can result in data swamps

A significant part of improving the adoption of data lakes is to provide comprehensive, built-in support to both deploy and maintain a data lake over time. At Cask, we have created a Unified Integration Platform for big data that provides all aspects of data lake management, including data integration, metadata and lineage, security and operations, and app development on Hadoop and Spark. This means that companies can focus on application logic and insights instead of infrastructure and integration.

EDW OffloadThe cost of maintaining a traditional Enterprise Data Warehouse (EDW) is skyrocketing as legacy systems buckle under the weight of exponentially growing data and increasingly complex processing needs. Hadoop has massive horizontal scalability along with system-level services that allow developers to free up storage and processing power from premium EDW platforms, increasing the ability for IT organizations to meet SLAs.

But Hadoop is not a complete ETL (Extract-Transform-Load) solution. While the platform offers powerful utilities and virtually unlimited horizontal scalability, it does not provide the complete set of functionality users need for enterprise ETL. In most cases, these gaps must be filled manually with a large number of lines of complex coding, slowing Hadoop adoption and frustrating organizations eager to deliver results.

CDAP fills the gaps between Hadoop and ETL needs. Cask’s EDW Offload solution package comes with pre-built pipelines which can be edited in a studio environment that consists of drag-and-drop sources, transforms, analytics, sinks and actions. The bulk load pipeline reads data from your EDW, transforms fields in the data using the wrangling tool, and writes it to a denormalized table on Hadoop. After you have migrated data and workloads to Hadoop, CDAP ensures extensive security and sophisticated governance across your datasets. This is done by automating the collection metadata from ingestion to data delivery. Additionally, CDAP provides access controls that meet typical security needs of the enterprise.

Page 2: Cask Solutionscustomers.cask.co/rs/882-OYR-915/images/Cask_Solution-Brief_final.… · Real-Time Analytics and IoT The world is projected to have 50 billion connected devices by 2020

Cask Solutions

About CaskCask makes building and running big data solutions on-premise or in the cloud easy with Cask Data Application Platform (CDAP), the first unified integration platform for big data. CDAP reduces the time to production for data lakes and data applications by 80%, empowering the business to make better decisions faster. Cask customers and partners include AT&T, Cloudera, Ericsson, Lotame, Salesforce, and Tableau, among others. For more information, visit the Cask website at cask.co and follow @caskdata.

Real-Time Analytics and IoTThe world is projected to have 50 billion connected devices by 2020. Connected devices could be the main sources of data over the Internet, which, in turn, can be woven into various applications to detect events, improve safety and drive business outcomes. In order for organizations to capitalize on the large source of streaming data and the Internet of Things (IoT), enterprise architects need to design an infrastructure that can perform continuous and timely processing of data in motion.

Cask’s Event-Condition-Action (ECA) Framework for IoT solution is offered in the form of a real-time rules engine built to enable enterprises to immediately act on huge volumes of data. Events are handled by a rules engine, which can automate actions in many different business scenarios. The visual code-free interface allows a business user to deploy rules that define both primitive and composite events, while satisfying critical enterprise requirements around security, encryption, and data lineage.

Healthcare Reporting The Healthcare Effectiveness Data and Information Set (HEDIS) is a tool used by more than 90% of America’s health plans to measure performance on important dimensions of care and service. HEDIS consists of 81 measures across 5 domains of care, which are specifically defined so that it is possible to compare the performance of health plans on an “apples to apples” basis. Thus, it is absolutely essential for healthcare providers to have an efficient and accurate way to calculate their HEDIS measures. However, currently Healthcare Reporting is a highly manual process that is run annually, often outsourced to 3rd party processing. Hence, the reports are not only expensive, but also not actionable.

With Cask’s solution for HEDIS Healthcare Reporting, the measures are calculated automatically. Any updates to how HEDIS is measured is automated and can be performed as new data comes in so providers can take corrective action. Here are some of the key benefits:

• Rapid and code-free ingestion of HEDIS source data (HL7, XML, and others) using CDAP’s visual pipeline editor, in batch or real-time

• High-performance processing of HEDIS measures using Spark while providing a repeatable, standardized, and automatic capture of provenance

• Rich visualization of calculated HEDIS measures in 3rd party tools like Excel, Tableau or other BI tools• CDAP automatically modifies the HEDIS measures annually as they are updated by NCQA

Customer 360As enterprises introduce mobile apps, online communities, chat, social media and more into business, it’s important to analyze how customers use these platforms. Each click provides behavior insight, enabling companies to analyze user buying patterns, enhance offerings, improve marketing, conduct predictive modeling and ultimately sell more. Developing this Customer 360 view relies on tools like social media listening, predictive analytics, CRM suites, marketing automation, etc.

Web analytics takes time to build, requiring multiple steps across numerous sources. For instance, aggregating user data with historical customer demographics and other recorded data is a challenge. In addition, data entry, quality, management and merging is crucial, as well as properly dealing with human errors, fuzzy matches and incomplete records.

Cask simplifies building end-to-end data pipelines, including ingesting, blending and aggregating data from varied source feeds. CDAP provides a simple and consistent platform to build, deploy and man-age complex data analytics applications in the cloud or on-premise.