data enhancement 18 th meeting course name: business intelligence year: 2009

17

Upload: prudence-higgins

Post on 12-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Enhancement 18 th Meeting Course Name: Business Intelligence Year: 2009
Page 2: Data Enhancement 18 th Meeting Course Name: Business Intelligence Year: 2009

Data Enhancement18th Meeting

Course Name: Business IntelligenceYear: 2009

Page 3: Data Enhancement 18 th Meeting Course Name: Business Intelligence Year: 2009

Bina Nusantara University

3

Source of this Material

(2). Loshin, David (2003). Business Intelligence: The Savvy Manager’s

Guide. Chapter 13

Page 4: Data Enhancement 18 th Meeting Course Name: Business Intelligence Year: 2009

The Business CaseThere are two aspect to the business value of data enhancement.

The first is that as organizational data environments mature and data managers want to exploit the corporate data asset, there is an increased necessity for sharing data from different group. The second aspect emerges from the actionable knowledge that can be discovered only by analyzing the result of composing multiple data sets. Data enhancement is a critical component to the BI program, especially as a value-adding process to the following.

• Competition in knowledge industries• Customer relationship management• Micromarketing and personalization• Cooperative marketing• Industry deregulation

Bina Nusantara University 4

Page 5: Data Enhancement 18 th Meeting Course Name: Business Intelligence Year: 2009

There are two approaches to data enhancement. One focuses on incrementally improving or adding information as data is viewed or processed. Incremental enhancements are useful as a component of a later analysis stage, such sequence pattern analysis and behavior modeling. The other approach is batch enhancement, where data collections are aggregated and methods are applied to the collection to create value-added information. Here some examples.

• Auditing EnhancementIn business processes that require some degree of tracing capability, a frequent data enhancement is the addition of auditing data. Creating a tracking system associated with a sequence of related events provides a framework for evaluating efficiency within a business process.

• Temporal EnhancementHistorical data provides critical insight to a BI program. Whereas in some cases the history is embedded in the collected data, other instances require that activity be enhanced by incrementally adding timestamps noting the time at which some event occurred.

Bina Nusantara University 5

Types of Data Enhancement

Page 6: Data Enhancement 18 th Meeting Course Name: Business Intelligence Year: 2009

• Contextual EnhancementThe place, or context, of data manipulation is an enhancement as well. A physical location, a path of access, the login account through which a series of transactions were performed, are examples of context that can augment data. Contextual enhancement also includes tagging data records in a way to be correlated with other pieces of data.

• Geographic EnhancementData enhanced with geographic information allows for analysis based on regional clustering and data inference based in predefined geodemographics. The first kind of geographic enhancement is the process of address standardization, where addresses are cleansed and then modified to fit a predefined postal standard.

• Demographic EnhancementDemographic describe the similarities that exist within an entity cluster, such as customer age, marital status, gender, income, and ethnic coding. Demographic enhancements or through direct information merging.

Bina Nusantara University 6

Types of Data Enhancement (cont…)

Page 7: Data Enhancement 18 th Meeting Course Name: Business Intelligence Year: 2009

• Psychographic EnhancementPsychographics describe what distinguishes individual entities within a cluster. Psychographics information is frequently collected via surveys, contest forms, customer service activity, registration cards, as well as specialized lists. The trick to using psychographic data is in being able to make the linkage between the entity within the organization database and the supplied psychographic data set.

• Inference EnhancementInformation inference is a BI technique that allows the user to draw conclusions about the examined entity based on supporting evidence and business rules. Inferred knowledge can be used to augment data to reflect what we have learned, and this in turn provides greater insight into solving the business problem at hand.

Bina Nusantara University 7

Types of Data Enhancement (cont…)

Page 8: Data Enhancement 18 th Meeting Course Name: Business Intelligence Year: 2009

Incremental enhancement are those that can be attached to data in process.

• ProvenanceThe provenance of an item is its source. This idea generalizes the temporal and auditing enhancements described earlier. A provenance can be as simple as a single string data field describing the source or as complex as a separate table containing a time stamp and a location code each time the record is updated, related through a foreign key.

• Audit TrailsThe combination of location, time, and activity information associated with a series of manipulations of a data record allows us to trace back all occasions at which that information was touched, giving us the audit data allowing us to see how activities cause data to flow through a system.

• ContextThis kind of enhanced data provides significant marketing benefit, because this context information can be fed into a statistical framework for reporting on the behavior of users based on their locations or times of activity.

Bina Nusantara University 8

Incremental Enhancement

Page 9: Data Enhancement 18 th Meeting Course Name: Business Intelligence Year: 2009

Batch enhancements are applied to a large set of data instances as an offline process. They typically involve the merging of data from multiple instances within a single data set or multiple data instances drawn from multiple data sets.

• HouseholdingHouseholding is a process that attempts to reduce a set of individuals to a single grouped housing unit based on the database record attribution. A household consists of all people living as an entity within the same residence.

• Organizational MergingWhen organizations merge, they will eventually want to merge their vendor, customer, and employee databases as well as their base reference data.

• Other Batch EnhancementsOther batch enhancements include data scrubbing, data cleansing, and health care diagnosis assistance, as well as building affinity programs and constructing relational associations, among others.

Bina Nusantara University 9

Batch Enhancements

Page 10: Data Enhancement 18 th Meeting Course Name: Business Intelligence Year: 2009

Standardization refers to ensuring that a data instance conforms to a predefined expected format. A data standard is a format representation for data values that can be described using a series of rules. Because a standard is a distinct model to which all items in a set must conform, this means we can try to automate two components of any standardization process:

• Determination of conformance to the standard• Bringing a nonstandard data instance into conformance with the

standardThere is usually a well-defined rule set describing both how to determine if

an item conforms to the standard and what actions need to be taken to bring the offending item into conformance.

• Data Standard and StandardizationThe value of data standardization lies in the notion that given the right base of reference information and a well-defined rule set, additional data can be added to a record in a purely automated way. Probably the most important benefit of standardization is that through the process of defining standards, organizations create a streamlined means for the transference and sharing of information.Bina Nusantara University 10

Standardization

Page 11: Data Enhancement 18 th Meeting Course Name: Business Intelligence Year: 2009

• Kinds of StandardsMost standards either are dictated by some authority (such as the government), are developed through cooperation (such as an industry-defined standard), or are derived from common use (such as geographical biases toward representing dates).

Bina Nusantara University 11

Standardization

Page 12: Data Enhancement 18 th Meeting Course Name: Business Intelligence Year: 2009

In this section, we look at the different components of an address.• The Address Standard

Recipient lineThe recipient line indicates the person or entity to which the mail is to be delivered.

Delivery Address lineThe delivery address line is the line that contains the specific location associated with the recipient.

Last lineThe last line of the address includes the city name, state, and ZIP code.

• Standard AbbreviationsThe postal service provides, a set of enumerations of standard abbreviations, including U.S. State and Possession abbreviations, street abbreviations, as well as common business word abbreviations.

Bina Nusantara University 12

Example: Address Standardization

Page 13: Data Enhancement 18 th Meeting Course Name: Business Intelligence Year: 2009

• Zip + 4ZIP codes are postal codes assigned to delivery areas to improve the precision of sorting and delivering mail. ZIP + 4 codes are a further refinement, narrowing down a delivery location within as subsection of a building or a street.

• Address Standardization SoftwareBecause the USPS addressing standard is so well documented, it is relatively straightforward to build automated address standardization software, which eases the way in which this enhancement can be performed.

Bina Nusantara University 13

Example: Address Standardization (cont…)

Page 14: Data Enhancement 18 th Meeting Course Name: Business Intelligence Year: 2009

There are many issues involved in data enhancement, but because a large number of them revolve around information record linkage, it is worthwhile to explore this greater detail.

• Record LinkageAny two records can be connected based on a set of chosen attributes are candidates to be linked together. Usually record linkage is performed only when the chosen attributes match exactly, but simple record linkage is limited, for the following reasons.

Information is missing Information sources are in different formats Record linkage is imprecise Information is out of synchronization Information is lost

• Semistructured DataSemistructured data refers to information that is partially formatted, such as data elements on a web page or the comments field in a customer service database.

Bina Nusantara University 14

Enhancement Methodologies

Page 15: Data Enhancement 18 th Meeting Course Name: Business Intelligence Year: 2009

Semistructured data may be a good source for both association and relation information, but the problem of extracting information out of the data is particularly difficult.

• InferenceAn inference is an application of a heuristic rule that essentially creates a piece of information where its didn’t exist before. Even though inferencing represents the application of intuition, it is done so in a way that can be automated. Inference rules usually reflect some understood business analysis that can be boiled down to a set of business rules.

• Types of InferenceEnhancements based on inferencing are usually very focused bits of information relevant within a particular analytical context. Inferences are likely to center on demographic or psychographic details that can be derived as a direct result of data merging and analysis.

Bina Nusantara University 15

Enhancement Methodologies

Page 16: Data Enhancement 18 th Meeting Course Name: Business Intelligence Year: 2009

• Buy versus BuildIn the software and services market, the term data enhancement is overloaded and can be used to refer to anything from data cleansing and address standardization all the way to services-based record linkage as a means to add data fields to submitted data, such as credit ratings.

• Performance IssuesSome data enhancement applications are likely to be of high computational complexity, and therefore members of the team should be aware of high performance computing as well as database manipulation, ETL, and pattern matching.

Bina Nusantara University 16

Management Issues

Page 17: Data Enhancement 18 th Meeting Course Name: Business Intelligence Year: 2009

End of Slide

Bina Nusantara University 17