big data demands big data quality - communityportal€¦ · analytics programs reach big data...

13
Big Data Demands Big Data Quality Why data quality is critical to deliver actionable big data insights

Upload: others

Post on 22-May-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Big Data Demands Big Data Quality - CommunityPortal€¦ · analytics programs reach big data scales. The volume, variety, and velocity of data means your expensive data scientists

Big Data Demands Big Data Quality Why data quality is critical to deliver actionable big data insights

Page 2: Big Data Demands Big Data Quality - CommunityPortal€¦ · analytics programs reach big data scales. The volume, variety, and velocity of data means your expensive data scientists

Big data fuels innovation by revealing transformative business insights from very large and disparate data sets. With big data, you’re able to ask questions that were never economically feasible to ask before.

Improving the quality of this data gives you the confidence to trust these insights. And that makes it far easier to:

– Enhance customer experiences.

– Make better strategic decisions, faster.

– Seize new opportunities and accelerate disruption.

– Streamline business processes to make savings.

That’s why big data quality initiatives are more than IT exercises. They’re a proven way to deliver tangible strategic value, quickly.

Value statement

Why Big Data Quality Matters

02

Page 3: Big Data Demands Big Data Quality - CommunityPortal€¦ · analytics programs reach big data scales. The volume, variety, and velocity of data means your expensive data scientists

Organizations around the world are taking on big data projects for strategic reasons:

– To enhance the customer experience by delivering better recommendations and offers in real time.

– To get predictive—such as “next best steps” for sales.

– To grow revenue by optimizing prices in real time.

– To reduce costs through AI-powered preventative maintenance.

– To reduce security risk by detecting fraudulent behavior more rapidly.

These are all crucial initiatives. And they demonstrate an important shift: big data projects are no longer just one-off experiments. They’re making an increasingly meaningful contribution to enterprises’ top and bottom lines.

As big data grows in value, big data quality is becoming increasingly important. Executives need to trust the data in reports and, more often than not, they don’t or can’t. A recent report from Forbes Insights found that 84 percent of CEOs are concerned about the quality of data used to make strategic decisions.1

Poor data quality costs the US economy $3.1 trillion dollars a year.2

This is a big problem—and an even bigger opportunity—for every enterprise.

In this eBook, we look at the impact of data quality processes on big data programs and make the case for effective big data quality.

Introduction

Data Quality Goes Big

“ The data unit is no longer the Gigabyte and Terabyte, but the Petabyte (1PB = 210TB), Exabyte (1EB = 210PB), and Zettabyte (1ZB = 210EB).”

Data Science Journal3

1 Forbes Insights, “The Data Differentiator: How Improving Data Quality Improves Business”, May 2017 2 IBM, “IBM Data Engine for Hadoop and Spark”, Aug 20163 Data Science Journal, “The Challenges of Data Quality and Data Quality Assessment in the Big Data Era”, 2015

03

Page 4: Big Data Demands Big Data Quality - CommunityPortal€¦ · analytics programs reach big data scales. The volume, variety, and velocity of data means your expensive data scientists

Why data lakes need intelligent data management for data quality

A lot of enterprises see data lakes as a solution to all their data problems. But just ingesting data into these repositories won’t clean and categorize that data for you.

In fact, data lakes are just a way to gather all of your data before it’s been cleaned and structured. But the cleaning and structuring still needs to happen at some point if you’re going to turn that data into valuable insights. And even small data quality errors can have a dramatic impact on the accuracy and consistency of these insights.

It’s worth considering intelligent data lake management solutions that use machine learning and artificial intelligence to automate the detection and correction of inaccurate or inconsistent data. That way you can populate your data lakes with the trusted data needed for self-service analytics.

Introduction 04

Page 5: Big Data Demands Big Data Quality - CommunityPortal€¦ · analytics programs reach big data scales. The volume, variety, and velocity of data means your expensive data scientists

Three Reasons Big Data Needs Data QualityLow-quality data can stymie big data programs, or even stop them in their tracks. Here’s why:

Section Two 05

Page 6: Big Data Demands Big Data Quality - CommunityPortal€¦ · analytics programs reach big data scales. The volume, variety, and velocity of data means your expensive data scientists

It destroys trust

The success of a big data program depends entirely on the reliability of its output, and just one error can have a huge impact. For example, if an executive spots a mistake in a report, you’ll have to demonstrate that all of your data and big data processes can be trusted. This can be a long and costly process and, even then, you may struggle to win back the trust of senior management.

It wastes time

When data quality is low, analysts end up wasting time wrangling data. They have to clean each data entity manually, which means they have less time to use, analyze, and experiment with the data itself.

This problem is exacerbated when your analytics programs reach big data scales. The volume, variety, and velocity of data means your expensive data scientists can spend up to 80 percent of their time on quality issues.4

It leads to bad decisions

Big data is used to make big decisions— about customer experience, new investments, business operations, growth opportunities; the list is almost endless.

When this data is flawed, the fallout can be serious. It’s not just your program that’s at risk, your whole business may suffer. A recent investigation into a data broker found that 71 percent of its customer data was inaccurate.5 Imagine if your business was using this data. How many critical decisions would depend on it? What would happen if you made the wrong call?

Section Two

1 2 3

4 CrowdFlower, “Data Science Report”, 20165 Deloitte, “Predictably inaccurate: The prevalence and perils of bad big data”, July 2017

06

Page 7: Big Data Demands Big Data Quality - CommunityPortal€¦ · analytics programs reach big data scales. The volume, variety, and velocity of data means your expensive data scientists

There are numerous reasons for data quality issues, but they can usually be traced back to three root causes.

Data is ingested incorrectly

Ingesting data into your big data platform is a critical junction for data quality. If someone makes a mistake here, then it can have serious repercussions as data proliferates across your business.

You need to ensure this data is vetted before it enters your data lake and held to strict data quality standards defined by both business and IT stakeholders. Without this collaboration, it’s likely that your standards will be ineffective and impractical. After all, it’s the business that uses and depends on data, and it’s your team on the frontline who truly understand what high-quality data looks like.

Unreliable sources

Big data comes from many sources— data warehouses, third-party feeds, the IoT, consumer devices, internal applications— and it’s often streaming into your data lake via automated feeds. You want to capture and categorize as much of this data as possible. The problem is that much of it will be of low or inconsistent quality, especially if it originated outside your business.

The only way to safeguard against quality degradation is to apply your data quality standards to every entity that enters your big data environment. This may consume a significant amount of time and resources, so it’s worth automating manual processes and considering external help.

Causes of poor-quality dataSection Two 07

Page 8: Big Data Demands Big Data Quality - CommunityPortal€¦ · analytics programs reach big data scales. The volume, variety, and velocity of data means your expensive data scientists

Poor maintenance

Data is always evolving—and degrading. It may enter your business as a high-quality asset only to deteriorate over time. Often the problem is human error. One of the many people who handle your data can easily mistype a customer name or enter the wrong value in a specific field and—bam!—that data is compromised. Then, as this data proliferates across your business, these mistakes can evolve into a major problem. But data also expires. For example, people and companies change addresses all the time.

Read our eBook, How to Govern Your Data as a Business Asset, for more guidance on creating a data governance program that works for everyone.

The obvious solution is data governance. Ideally, you want to establish a culture of collaborative data governance throughout your enterprise, so that the discipline becomes a part of daily life. The first step to achieving this is to communicate the importance and value of data quality processes to all data owners and users.

Section Two 08

Page 9: Big Data Demands Big Data Quality - CommunityPortal€¦ · analytics programs reach big data scales. The volume, variety, and velocity of data means your expensive data scientists

Section Two

Four Dimensions of Big Data Quality

6 First San Francisco Partners, “Is Data Quality Important for Big Data?”, May 2017

With big data, quality issues scale up alarmingly: a database with a billion records and an error rate of only one percent will include ten million inaccuracies. For directional, trend data, that might be acceptable. For penny-perfect financial or compliance use cases, it isn’t.

That’s why many large enterprises approach big data quality in tiers:

– One tier may focus on data innovation. Faster data may be more important than perfect data for rapid iteration and innovation. Enterprises may use “good enough” data for data lake experiments, big-picture insights, and rapid innovation. This is a world of “fail fast and move on” until you find an insight that should be repeatable.

– A higher priority tier may be data operationalization. This demands highly trusted data for ongoing uses such as decision-making, forecasting, compliance, customer interactions, and financial analysis.

In either case, big data needs to be fit-for-purpose. So developers and analysts can experiment when they need to, and strategists can tap into all the competitive advantage that come with big data-driven insights.

“ Data scientists are becoming aware that data that’s suitable for operations, for example, may not be suitable for deeper analysis.”

First San Francisco Partners6

09

Page 10: Big Data Demands Big Data Quality - CommunityPortal€¦ · analytics programs reach big data scales. The volume, variety, and velocity of data means your expensive data scientists

The first step is deciding which data is worth cleaning and which isn’t. For instance, you may not need to standardize naming conventions from social media data if you’re running sentiment analysis. But you will need a reliable view of names and addresses if you’re investing in fraud detection.

Whichever changes you prioritize, you will need a uniform way to make data trusted, relevant, and governed at the scale of big data. Specifically, you need an approach to data quality that:

Democratizes big data

Data administrators and analysts can’t scale as fast as big data itself. So big data quality needs to empower business users to profile data, spot issues, and even make changes on their own. That calls for automation (including artificial intelligence and prebuilt data quality rules), as business users have neither the time nor skill to tackle the manual work these tasks involve.

Works universally

If your teams are identifying the same issues in multiple data sets, they shouldn’t have to reinvent the wheel every time. The remediation you deploy in one system should be reusable in another (whether on-premises, in the cloud, or across hybrid environments). That means you’ll need standardized rules for specific domains like customer data so different lines of business are aligned and proven processes can be reapplied.

Embraces new technologies and data sets

Big data technologies, business models, and even data types are constantly changing. For instance, the move from MapReduce to Spark gave enterprises order-of-magnitude improvements in speed and performance. To take advantage of these changes, you need all of your existing data quality rules, logic, and metadata to be easily applied to any new technology or system.

Makes it easy to profile and track big data quality

Developers and analysts need to be able to profile data to understand its state—they may not need to clean the data, but they do need to know what they’re working with. Dashboards that provide quick insights into data quality are useful for both the people working with big data and the IT leaders who monitor and govern big data projects.

The right approach to big data quality makes it easier to empower data users, enables IT to move faster, and treats rapid innovation projects differently than it treats “enterprise-ready” projects. It also brings potentially disconnected elements of your organization together. For example, everyone—both within IT and the business—can collaborate on rule creation and business glossary definition.

Section Two 10

Page 11: Big Data Demands Big Data Quality - CommunityPortal€¦ · analytics programs reach big data scales. The volume, variety, and velocity of data means your expensive data scientists

Conclusion

The enterprises that are serious about big data are the ones that understand its potential to accelerate innovation and deliver big wins.

In some industries, like insurance, enterprises are using it to deliver better premiums and more dynamic services that react to individual customers’ behaviors. In other cases, they’re using big data to power transformative initiatives with the Internet of Things.

But in every case, it’s becoming clear that big data is more than a typical IT trend—it’s a powerful leap forward for enterprises trying to disrupt before they get disrupted.

That’s why big data quality has become such a big deal. It allows developers and analysts to innovate and experiment as quickly as they need to, without getting in IT’s way. And it gives executives the confidence they need to trust the data in their reports and make better decisions, faster.

Your big data project may well be one of the most exciting and important projects you (and your enterprise) are working on. So making sure it’s fueled by trusted, timely, and actionable data isn’t just a sensible move, it’s the best way to ensure you succeed.

If Big Data Matters, Big Data Quality Matters

11

Page 12: Big Data Demands Big Data Quality - CommunityPortal€¦ · analytics programs reach big data scales. The volume, variety, and velocity of data means your expensive data scientists

Ten Data Quality Dividends eBook

Data quality impacts every corner of every enterprise. And it’s especially important in enterprises getting serious about innovation and transformation.

Read Ten Data Quality Dividends to learn about some of the big, small, and surprising ways data quality impacts your business.

Further Reading

GET THE EBOOK

12

Page 13: Big Data Demands Big Data Quality - CommunityPortal€¦ · analytics programs reach big data scales. The volume, variety, and velocity of data means your expensive data scientists

IN18-0918-3392

© Copyright Informatica LLC 2018. Informatica and the Informatica logo are trademarks or registered trademarks of Informatica LLC in the United States and other countries

About Informatica

Digital transformation changes expectations: better service, faster delivery, with less cost. Businesses must transform to stay relevant and data holds the answers.

As the world’s leader in Enterprise Cloud Data Management, we’re prepared to help you intelligently lead—in any sector, category or niche. Informatica provides you with the foresight to become more agile, realize new growth opportunities or create new inventions. With 100% focus on everything data, we offer the versatility needed to succeed.

We invite you to explore all that Informatica has to offer—and unleash the power of data to drive your next intelligent disruption.

Worldwide Headquarters 2100 Seaport Blvd, Redwood City, CA 94063, USA Phone: 650.385.5000 Fax: 650.385.5500 Toll-free in the US: 1.800.653.3871

informatica.com linkedin.com/company/informatica twitter.com/Informatica

CONTACT US