data profiing using informatica
TRANSCRIPT
-
8/10/2019 Data Profiing Using Informatica
1/5
Data profiing using Informatica: Data Explorer
Data profiling is a technique used to examine data for different purposes like determining accuracy
and completeness. This process examines a data source such as a database to uncover the
erroneous areas in data organization. Deployment of this technique improves data quality.
Data profiling is the method of examining the data available in a data source and collecting statisticsand information about that data. Such statistics help to identify the use and data quality of metadata.
This method is widely used in enterprise data warehousing.
Data profiling clarifies the structure, relationship, content and derivation rules of data, which aid in
the understanding of anomalies within metadata. Data profiling uses different kinds of descriptive
statistics including mean, minimum, maximum, percentile, frequency and other aggregates such as
count and sum. The additional metadata information obtained during profiling is data type, length,
discrete values, uniqueness and abstract type recognition.
1. Find out whether existing data can easily be used for other purposes
2. Improve the ability to search the data bytaggingit withkeywords,descriptions, or assigning
it to a category
3. Givemetricsondata qualityincluding whether the data conforms to particular standards or
patterns
4. Assess the risk involved inintegrating datafor new applications, including the challenges
ofjoins
5. Assess whethermetadataaccurately describes the actual values in the source database
6. Understanding data challenges early in any data intensive project, so that late project
surprises are avoided. Finding data problems late in the project can lead to delays and cost
overruns.
7. Have an enterprise view of all data, for uses such asmaster data managementwhere key
data is needed, ordata governancefor improving data quality.
Data profiling is an analysis of the candidate data sources for a data warehouse to clarify the
structure, content, relationships and derivation rules of the data
Data profiling utilizes different kinds of descriptive statistics such as minimum, maximum, mean,
mode, percentile, standard deviation, frequency, and variation as well as other aggregates such as
count and sum
Additional metadata information obtained during data profiling could be data type, length, discrete
values, uniqueness, occurrence of null values, typical string patterns, and abstract type
recognition.[2][4][5]The metadata can then be used to discover problems such as illegal values,
misspelling, missing values, varying value representation, and duplicates.
http://en.wikipedia.org/wiki/Tag_(metadata)http://en.wikipedia.org/wiki/Tag_(metadata)http://en.wikipedia.org/wiki/Tag_(metadata)http://en.wikipedia.org/wiki/Keywordshttp://en.wikipedia.org/wiki/Keywordshttp://en.wikipedia.org/wiki/Keywordshttp://en.wikipedia.org/wiki/Software_metrichttp://en.wikipedia.org/wiki/Software_metrichttp://en.wikipedia.org/wiki/Software_metrichttp://en.wikipedia.org/wiki/Data_qualityhttp://en.wikipedia.org/wiki/Data_qualityhttp://en.wikipedia.org/wiki/Data_qualityhttp://en.wikipedia.org/wiki/Data_integrationhttp://en.wikipedia.org/wiki/Data_integrationhttp://en.wikipedia.org/wiki/Data_integrationhttp://en.wikipedia.org/wiki/Joinhttp://en.wikipedia.org/wiki/Joinhttp://en.wikipedia.org/wiki/Joinhttp://en.wikipedia.org/wiki/Metadatahttp://en.wikipedia.org/wiki/Metadatahttp://en.wikipedia.org/wiki/Metadatahttp://en.wikipedia.org/wiki/Master_data_managementhttp://en.wikipedia.org/wiki/Master_data_managementhttp://en.wikipedia.org/wiki/Master_data_managementhttp://en.wikipedia.org/wiki/Data_governancehttp://en.wikipedia.org/wiki/Data_governancehttp://en.wikipedia.org/wiki/Data_governancehttp://en.wikipedia.org/wiki/Data_profiling#cite_note-Loshin2009-2http://en.wikipedia.org/wiki/Data_profiling#cite_note-Loshin2009-2http://en.wikipedia.org/wiki/Data_profiling#cite_note-Singh2010-5http://en.wikipedia.org/wiki/Data_profiling#cite_note-Singh2010-5http://en.wikipedia.org/wiki/Data_profiling#cite_note-Singh2010-5http://en.wikipedia.org/wiki/Data_profiling#cite_note-Loshin2009-2http://en.wikipedia.org/wiki/Data_profiling#cite_note-Loshin2009-2http://en.wikipedia.org/wiki/Data_governancehttp://en.wikipedia.org/wiki/Master_data_managementhttp://en.wikipedia.org/wiki/Metadatahttp://en.wikipedia.org/wiki/Joinhttp://en.wikipedia.org/wiki/Data_integrationhttp://en.wikipedia.org/wiki/Data_qualityhttp://en.wikipedia.org/wiki/Software_metrichttp://en.wikipedia.org/wiki/Keywordshttp://en.wikipedia.org/wiki/Tag_(metadata) -
8/10/2019 Data Profiing Using Informatica
2/5
http://www.datamartist.com/[Data Profiling, transformation, visualization, and migration tool, 30 days
free for use.]
Data Profiling using SQL Server 2008
Configuring the profiling tool
Start whichever Visual Studio environment you have, and create a new Integration Services
project. Next, from the SSIS Toolbox, drag a Data Profiling Task onto the design surface and
double-click on it to configure.
Profiling results are stored as an XML file, so specify the name and location of the file.
Click in the blank box next to Destination, and an arrow will appear.
Click on the arrow and then on .
In the resulting box, specify a path and filename (including .xml suffix).
Click OKthen click the Quick Profilebutton.
Click the Newbutton next toADO.NET Connection.
In the box that appears, specify the SQL Server and database hosting the data to be profiled
(Im using theAdventureWorksLT2012database) then click OK.
Use the Table or Viewdrop-down box to choose the data to be profiled (ImusingSalesLT.Product).
http://www.datamartist.com/http://www.datamartist.com/http://www.datamartist.com/ -
8/10/2019 Data Profiing Using Informatica
3/5
Data governance is aquality controldiscipline for assessing, managing, using, improving,
monitoring, maintaining, and protecting organizational information.
Six Steps to Data Governance SuccessWith more than a billion people connected online today, we are at the dawn of a
data explosion, and it is becoming increasingly difficult to manage and control the
terabytes of data residing within different parts of the organization. Manycompanies use the fortress method, a big thick perimeter wall to keep out the
bad guys. But this method can be problematic since not all data has the same
value, not all risks are outside the perimeter, and not all controls can effectively
prevent fraud. The fortress model of data security creates a one-size-fits-all
approach, allowing organizations to overprotect low-quality data and
http://en.wikipedia.org/wiki/Quality_controlhttp://en.wikipedia.org/wiki/Quality_controlhttp://en.wikipedia.org/wiki/Quality_controlhttp://en.wikipedia.org/wiki/Quality_control -
8/10/2019 Data Profiing Using Informatica
4/5
underprotect high-value information like customer account details or employee
Social Security numbers, regardless of business context or use.
Step 1: Get a governor and the right people in place togovern
The first step in any successful data-governance program isidentifying an individual within the organization who carries thedelegated authority of the CEO and making that personaccountable to make things happen. There is no substitute forstrong leadership.
Data governance is a political challenge that requires buildingconsensus among many diverse stakeholders. Politicalleadership within the organization is therefore a priority. Onceestablished, the governor can create a governing councilcomposed of organizational stakeholders to formulate
stewardship policies and report progress to the CEO and boardof directors.
Step 2: Survey your situation
Once you have the leadership team in place, it needs to surveythe territory and inventory current practices across many diversedomains. The teams need to see across the stovepipes, and anenterprise data-governance assessment methodology isimperative for this task. It helps benchmark where theorganizations data-governance program is today and delivers a
road map to determine where it will be tomorrow.
Step 3: Develop a data-governance strategy
After the data-governance assessment, the governance councilshould look into creating a vision of where it wants the companysdata-governance practices to be in the next few years, therebycreating a vision for the future. The council should workbackward, and create realistic milestones and project plans to fillrelevant gaps by establishing key performance indicators to trackprogress and deliver annual reports to the CEO and the board to
validate results. Step 4: Calculate the value of your data
If companies dont know what its worth, they cant enhance,protect or measure the value of the data to the bottom line. Dataisnt a normal commodity. Its like water out of a tapvital to lifeyet so often taken for granted. But you cant calculate the valueof something if you dont know its price.
-
8/10/2019 Data Profiing Using Informatica
5/5
If you want to calculate the value of your data, build an internalmarketplace for data based on user entitlements and the utility ofIT services. When everyone in an organization is paying for ITservices and data directly, the value of data is part of thebusiness P&L.
Step 5: Calculate the probability of risk
Knowing how data has been used and abused in the past is anindicator of how it might be compromised and disclosed in thefuture. Every organization has causes, events and losses that arelost in stovepipes, hierarchies and business reports. This data isalready available and unused by most organizations. Collectingit, relating its meaning and studying loss trends over time canhelp any organization transform risk management into a fact-based, business intelligence method for analyzing past events,forecasting future losses and changing current policyrequirements to improve your mitigation strategies.
Step 6: Monitor the efficacy of your controls
Data governance is largely about organizational behavior.Organizations change every day, and therefore their data, itsvalue and risk also shift rapidly. Unfortunately, mostorganizations assess themselves only once a year. If a businessisnt able to change organizational controls to meet demands ona daily or weekly basis, it isnt governing change.
In business, master data management (MDM)comprises the processes, governance, policies,
standards and tools that consistently define and manage the critical data of anorganizationto
provide a single point of reference.[1]
The data that is mastered may include:
reference data- the business objects for transactions, and the dimensions for analysis
analytical data - supports decision making[2][3]
http://en.wikipedia.org/wiki/Organizationhttp://en.wikipedia.org/wiki/Organizationhttp://en.wikipedia.org/wiki/Organizationhttp://en.wikipedia.org/wiki/Master_data_management#cite_note-1http://en.wikipedia.org/wiki/Master_data_management#cite_note-1http://en.wikipedia.org/wiki/Master_data_management#cite_note-1http://en.wikipedia.org/wiki/Reference_datahttp://en.wikipedia.org/wiki/Reference_datahttp://en.wikipedia.org/wiki/Master_data_management#cite_note-2http://en.wikipedia.org/wiki/Master_data_management#cite_note-2http://en.wikipedia.org/wiki/Master_data_management#cite_note-2http://en.wikipedia.org/wiki/Master_data_management#cite_note-2http://en.wikipedia.org/wiki/Master_data_management#cite_note-2http://en.wikipedia.org/wiki/Reference_datahttp://en.wikipedia.org/wiki/Master_data_management#cite_note-1http://en.wikipedia.org/wiki/Organization