Download - Organizational intelligence technologies
Organizational intelligence technologies
There are three kinds of intelligence: one kind understands things for itself, the other appreciates what others can understand, the third understands
neither for itself nor through others. This first kind is excellent, the second good, and the third kind useless.
Machiavelli, The Prince, 1513.
Organizational intelligence
Organizational intelligence is the outcome of an organization’s efforts to collect store, process, and interpret data from internal and external sourcesIntelligence in the sense of gathering and distributing information
Types of information systems
Type of information system
System’s purpose
Transaction processing systemTPS
Collects and stores data from routine transactions
Management information systemMIS
Converts data from a TPS into information for planning, controlling, and managing an organization
Decision support systemDSS
Supports managerial decision making by providing models for processing and analyzing data
Executive information systemEIS
Provides senior management with information necessary to monitor organizational performance, and develop and implement strategies
On-line analytical processingOLAP
Presents a multidimensional, logical view of data to the analyst with no requirements as to how the data are stored
Data mining Uses statistical analysis and artificial intelligence techniques to identify hidden relationships in data
The information systems cycle
Transaction processing systems
Can generate huge volumes of dataA telephone company may generate 200 million records per dayRaw material for organizational intelligence
The problemOrganizational memory is fragmented
Different systemsDifferent database technologiesDifferent locations
An underused intelligence system containing undetected key facts about customers
The data warehouseA repository of organizational dataCan be measured in terabytes
Managing the data warehouse
ExtractionTransformationCleaningLoadingSchedulingMetadata
ExtractionPulling data from existing systemsOperational systems were not designed for extraction to load into a data warehouseApplications are often independent entitiesTime consuming and complexAn ongoing process
Transformation
Encodingm/f, male/female to M/F
Unit of measureinches to cms
Fieldsales-date to salesdate
Datedd/mm/yy to yyyy/mm/dd
Cleaning
Same record stored in different departmentsMultiple records for a companyMultiple entries for the same organizationMisuse of data entry fields
Loading
ArchivalMay be too costly
CurrentFrom operational systems
OngoingContinual updating of the warehouse
Scheduling
A trade-offToo frequent is costlyInfrequently means old data
Metadata
A data dictionary containing additional facts about the data in the warehouse
Description of each data typeFormat Coding standardsMeaningOperational system sourceTransformationsFrequency of extracts
Warehouse architectures
CentralizedFederatedTiered
Centralized data warehouse
Federated data warehouse
Tiered data warehouse
Server options
Single processorSymmetric multiprocessorMassively parallel processorNonuniform memory access
Single processor
Symmetric multiprocessor
Massively parallel processor
Nonuniform memory access
DBMS choicesFeatures/ functions
Relational
Super-relationa
l
Multidimensional (logical)
Multidimensional
(physical)
Object-relation
alNormalized data structures
Abstract data types
Parallelism
Multidimensional structures
Drill-down
Rotation
Data-dependent operations
Decision matrixFor these environments … Choose …Business requirements
Client population
Systems support
Architecture Server DBMS
Scope: departmentalUses: data analysis
Small;Single location
Minimal local;average central
Consolidate; turnkey package
Single-processor or SMP
MDDB
Scope: departmentalUses: analysis plus informational
Large; analysis at single location;informationalusers dispersed
Minimal local;average central
Tiered; detail at central; summary at local
Clustered SMP for central; SP or SMP for local
RDBMS for central; MDDB for local
Scope: EnterpriseUses: analysis plus informational
Large; geographically dispersed
Strong central
Centralized Clustered SMP
Object-relational Web support
Scope: departmentalUses: exploratory
Small; few sites Strong central
Centralized MPP RDBMS with parallel support
The decision
Selection of a server architecture and DBMS are not independent decisionsParallelism may be an option only for some RDBMSsNeed to find the fit that meets organizational goals
Exploiting data stores
Verification and discoveryData miningOLAP
Verification and discovery
Verification DiscoveryWhat is the average sale for in-store and catalog customers?
What is the best predictor of sales?
What is the average high school GPA of students who graduate from college compared to those who do not?
What are the best predictors of college graduation?
OLAP
Relational model was not designed for data synthesis, analysis, and consolidationThis is the role of spreadsheets and other special purpose softwareNeed to complement RDBMS technology with a multidimensional view of data
TPS versus OLAPTPS OLAPOptimize for transaction volume
Optimize for data analysis
Process a few records at a time
Process summarized data
Real time update as transactions occur
Batch update (e.g., daily)
Based on tables Based on hypercubesRaw data Aggregated dataSQL is widely used MDX becoming a
standard
ROLAP
A relational OLAPA multidimensional model is imposed on a relational structureRelational is a mature technology with extensive data management featuresNot as efficient as OLAP
The star structure
The snowflake structure
Rotation
Drill down
Region Sales variance
Africa 105%Asia 57%Europe 122%North America 97%Pacific 85%South America 163%
Nation Sales variance
China 123%Japan 52%India 87%Singapore 95%
A hypercube
A three-dimensional hypercube display
Page Columns
Region: North
Sales
Red blob
Blue blob
Total
1996Rows 1997Year Total
A six-dimensional hypercube
Dimension ExampleBrand Mt. AiryStore AtlantaCustomer segment
Business
Product group DesksPeriod JanuaryVariable Units sold
A six-dimensional hypercube display
Page ColumnsMonthSegment
Product groupVariable
March Business Desks ChairsUnits Revenue Units Revenue
Carolina AtlantaBoston
Rows Mt. Airy AtlantaBrand BostonStore Totals
The link between RDBMS and MDDB
MDDB designKey concepts
Variable dimensions• What is tracked
• Sales
Identifier dimensions• Tagging what is tracked
• Time, product, and store of sale
Prompts for identifying dimensions
Prompt ExampleWhen? June 5, 1998Where? ParisWhat? TentHow? CatalogWho? Young adult
womanOutcome?
Revenue of 6,000 FF
Variables and identifiers
Identifier time (hour)
Variablesales
(dollars)10:00 52311:00 78912:00 1,25613:00 4,12814:00 2,634
Identifier
hit
Variabletime (hh:mm:ss)
1 9:34:452 9:34:573 9:36:124 9:41:56
Analysis and variable type
Identifier dimensionContinuous Nominal or ordinal
Variable dimension
Continuous
Regression and curve fittingSales by quarter
Analysis of varianceSales by store
Nominal or ordinal
Logistic regression Customer response (yes or no) to the level of advertising
Contingency table analysisNumber of sales by region
Data mining
The search for relationships and patternsApplications
Database marketingPredicting bad loansDetecting flaws in VLSI chipsIdentifying quasars
Data mining functionsAssociations
85 percent of customers who buy a certain brand of wine also buy a certain type of pasta
Sequential patterns32 percent of female customers who order a red jacket within six months buy a gray skirt
ClassifyingFrequent customers as those with incomes about $50,000 and having two or more children
ClusteringMarket segmentation
PredictingPredict the revenue value of a new customer based on that person’s demographic variables
Data mining technologiesDecision treesGenetic algorithmsK-nearest neighbor methodNeural networksData visualization
SQL-99 and OLAPSQL can be tedious and inefficientThe following questions require four queries
Find the total revenueReport revenue by locationReport revenue by channel Report revenue by location and channel
SQL-99 extensionsGROUP BY extended with
GROUPING SETSROLLUPCUBE
GROUPING SETSSELECT location, channel,DECIMAL(SUM(revenue),9)FROM expedGROUP BY GROUPING SETS (location, channel);
GROUPING SETSLocation Channel Revenuenull Catalog 108762
null Store 347537
null Web 27166
London null 214334
New York null 39123
Paris null 143303
Sydney null 29989
Tokyo null 56716
ROLLUP
SELECT location, channel,DECIMAL(SUM(revenue),9)FROM expedGROUP BY ROLLUP (location, channel);
ROLLUPLocation Channel Revenuenull null 483465London null 214334New York null 39123Paris null 143303Sydney null 29989Tokyo null 56716London Catalog 50310London Store 151015London Web 13009New York Catalog 8712New York Store 28060New York Web 2351Paris Catalog 32166Paris Store 104083Paris Web 7054Sydney Catalog 5471Sydney Store 21769Sydney Web 2749Tokyo Catalog 12103Tokyo Store 42610Tokyo Web 2003
CUBE
SELECT location, channel,DECIMAL(SUM(revenue),9)FROM expedGROUP BY CUBE (location, channel);
Location Channel Revenuenull Catalog 108762null Store 347537null Web 27166null null 483465London null 214334New York null 39123Paris null 143303Sydney null 29989Tokyo null 56716London Catalog 50310London Store 151015London Web 13009New York Catalog 8712New York Store 28060New York Web 2351Paris Catalog 32166Paris Store 104083Paris Web 7054Sydney Catalog 5471Sydney Store 21769Sydney Web 2749Tokyo Catalog 12103Tokyo Store 42610Tokyo Web 2003
CUBE
SQL OLAP extensionsUsefulNot as powerful as MDDB toolsUse CUBE as the default
ConclusionData management is an evolving disciplineData managers have a dual responsibility
Manage data to be in business todayManage data to be in business tomorrow
Data managers now need to support organizational intelligence technologies