what is multivariate analysis - umetrics · pdf filewhat is multivariate analysis ... •...
TRANSCRIPT
05-08-17 SIMCA-P Getting started.ppt 1 (29)www.umetrics.com
What is Multivariate Analysis
• Multivariate analysis is the best way to summarize a data tables with manyvariables by creating a few new variables containing most of the information.These new variables are then used for problem solving and display, i.e.,classification, relationships, control charts, and more.
• The new variables, the scores, denoted by t, are created as weighted linearcombinations of the original variables. Each observations has t-values.
• PCA, the basic MV method, summarizes one data table.
• Plotting the scores (t’s) gives an overview of the observations (objects)
• PLS summarizes simultaneously 2 data tables (X the predictor variables) and(Y the response variables) in order to develop a relationship between them
• PCA and PLS are called Projection methods
05-08-17 SIMCA-P Getting started.ppt 2 (29)www.umetrics.com
What is a Projection?Reduction of dimensionality, model in latent variables
• Algebraically– Summarizes the information in
the observations as a few new(latent) variables
• Geometrically– The swarm of points in a K
dimensional space(K = number of variables) isapproximated by a(hyper)plane and the pointsare projected on that plane.
05-08-17 SIMCA-P Getting started.ppt 3 (29)www.umetrics.com
NotationEach obs has values of t (and u) – Each variable has values of p (and w and c)
• t: the X scores; the new summarizing variables (coordinates in the hyperplane of X-space)
• u: the Y scores in PLS; the new summarizing variables (coordinates in thehyper plane of Y-space, when Y is multidimensional)
• p: the PC loadings. These are the weights that in PCA combine the originalvariables in X to form the new variables, scores t.
• w*: the PLS weights. These are the weights that in PLS combine theoriginal variables in X to form the new variables, scores t.
• c: the weights used to combine the Y's to form the scores u.
05-08-17 SIMCA-P Getting started.ppt 4 (29)www.umetrics.com
NotationEach obs has values of t (and u) – Each variable has values of p (and w and c)
• One Component consists of one t and one p (PCA) or t, p, w, u, c (PLS).The total number of components is A.
• Model: The data are approximated by a plane or hyper plane, (the model)with as many dimensions as components extracted.
• DModX: also called Distance to the model, is the distance of a givenobservation to the model plane.
• T2: Hotelling’s T2, is a combination of all the scores (t) of all A components.T2 measures how far away an observation is from the center of a PC or PLSmodel.
05-08-17 SIMCA-P Getting started.ppt 5 (29)www.umetrics.com
Notation
• R2X: The fraction of the variation of the X variables explained by the model.
• R2Y: The fraction of the variation of the Y variables explained by the model.
• Q2X: The fraction of the variation of the X variables predicted by the model.
• Q2Y: The fraction of the variation of the Y variables predicted by the model.
05-08-17 SIMCA-P Getting started.ppt 6 (29)www.umetrics.com
MVA – SIMCA Road MapMethods available
• Preprocessing; trimming and Winsorizing (take away extremes)
• Principal Components Analysis (PCA; overview of data)
• Projection to Latent Structures (PLS; relationships X↔Y)
• Simca classification
• PLS-discriminant analysis (classification)
• Hierarchical PCA and PLS
• Predictions and classification of new data using any model
05-08-17 SIMCA-P Getting started.ppt 7 (29)www.umetrics.com
MVA – SIMCA Road MapData set = all data; Work set = working copy of data
1. Start a project
File New
Read Data File
Specify Label Cols & Rows
2. Look at the data
Data set
Quick Info; Variables or Obs.
Preprocessing, Trim, etc.
6. Outliers in scores
Polish data
Prepare new workset
Graphically or via Workset
7. New data
Predictions
Select Pred.set (observations)
T_pred, Y_pred, DModX, etc.
6. No outliers in scores
Continue
Interpret model (plots)
Relate to Objective
5. Plot results
Analysis
Scores, Loadings
Distance to Model
4. Fit the model
Analysis
Autofit
or fast button
3. Prepare a work copy
Workset
variables, observations
Preprocessing, Class spec.
Work main menus from leftto right
and pop-up menus from upto down
Plot / List allows you to plot orlist anything non-standard, notfound under Analysis
05-08-17 SIMCA-P Getting started.ppt 8 (29)www.umetrics.com
Steps in using SIMCA-P using the wizard
• Start a new project and import the data set
• Use the workset wizard to guide through building the workset and fitting themodel
• Generate the report writer to walk through the model results andinterpretation
• When displaying Simca-P plots always use the Analysis adviser to guideyou.
05-08-17 SIMCA-P Getting started.ppt 11 (29)www.umetrics.com
Autotransform variablesTo transform all variables if any needed, mark the check box
05-08-17 SIMCA-P Getting started.ppt 12 (29)www.umetrics.com
Automatic creation of classes for classification ordiscrimination
05-08-17 SIMCA-P Getting started.ppt 14 (29)www.umetrics.com
Report writerWalks you through the model results with interpretation : File | Generate Report
05-08-17 SIMCA-P Getting started.ppt 15 (29)www.umetrics.com
Steps in Using SIMCA-P, Advanced Mode
• Start a new project and import the data set
• Explore and preprocess the data
• Make working copy of selected data (workset) for model building
• Specify model type and fit it to the workset
• Review fit (plots, diagnostics, coefficients, etc.)
• Predictions
• Generate Report
05-08-17 SIMCA-P Getting started.ppt 16 (29)www.umetrics.com
1a. File NewStarting a new project
• Select the data file containing the raw data of the project– directory, file type (XLS, DIF, TXT, …..), file name
• A Wizard opens (see next page) allowing you to specify (optionally) therow containing the Variable names, and (optionally) the columns withthe Obs. Numbers and Names
• Here (Commands) you can also do additional things such as– transposing the input data matrix
• Use simple mode with workset wizard
• At the last Wizard page, you can (optionally) specify another name anddirectory for the project.
• A map of the missing data is shown
• The Wizard finishes and puts you in the Simca-window
• A starting work set (M1, all data, all X-s, UV -scaled) is ready
05-08-17 SIMCA-P Getting started.ppt 18 (29)www.umetrics.com
2. Looking at the data
• With the data set table open (Data set edit):
• Quick Info (both var and obs windows can be open)– variables
– observations
• Moving the cursor in the data set table up and down, or sidewise, changesthe displayed variable and observation
• In the quick info options you can specify what you want to look at(histograms, auto-correlations, …), as well as which items should be thebasis for the plots
05-08-17 SIMCA-P Getting started.ppt 19 (29)www.umetrics.com
View variables or Observations, Trim, etc.Quick Info
05-08-17 SIMCA-P Getting started.ppt 20 (29)www.umetrics.com
3. Prepare a work copy: The WorksetSimple Mode with guidance, or Advanced Mode
• In Workset, you prepare a working copy of the part of the data you willanalyze, i.e., use as the basis of your model.
• Here you specify transformation, scaling, and roles of variables (X or Y orexcluded).
• Also, you select the observations (your “training set”).
• You can start with the previous workset (Workset / New as model xx) andthen modify it, e.g., excluding observations.
• Whatever you do in Workset does NOT touch the raw data
• Note that outliers are just specified as “not included” in the next workset (the“polished” data). Outliers are NEVER removed from the raw data set.
05-08-17 SIMCA-P Getting started.ppt 21 (29)www.umetrics.com
Workset: two Modes, Simple and Advanced
05-08-17 SIMCA-P Getting started.ppt 22 (29)www.umetrics.com
4. AnalysisFit the Model to the Workset Data
• Either menu “Analysis / Autofit” or Fast Button
• A model with appropriate number of components is found
– If nothing happens, get the two first components(also menu or fast button)
• A table appears showing the model, component by component.
• More components can be added (menu or fast button)
• Double click on a model to specify a title
05-08-17 SIMCA-P Getting started.ppt 23 (29)www.umetrics.com
5. Plot resultsAnalysis / menu (or fast buttons)
• Summary / X/Y-Overview shows R2 and Q2 for all var.s
• Scores – scatter plot, t1-t2 and t1-u1 & t2-u2 (PLS)
• Loadings – scatter plot (p1-p2 fro PCA, wc1-wc2 for PLS)
• Distance to Model – line plot
• Contribution plots to interpret interesting observations, e.g. outliers, jumps,…
• For all plots, the right mouse button, properties allows choice of plotmarkers, and more
• The graphical tool box allows further modifications
05-08-17 SIMCA-P Getting started.ppt 24 (29)www.umetrics.com
6a. Outliers were seen in the score plot(well outside the Hotelling ellipse)
• Start another workset
(either from Workset / New as model xx, or using the graphical tool-box toremove outliers from the score plot)
• Note that outliers should NOT be deleted from the data by Edit/Data set
• When the new workset is all-right, return to “4. Analysis” to fit a new modelto the new work set
(fast button or Analysis/Autofit)
05-08-17 SIMCA-P Getting started.ppt 25 (29)www.umetrics.com
6b. No outliers were seen in the score plots(or they have been excluded, and the score plots now look all-right)
• Now, interpret the model
• Look at “patterns”, trends, etc., in the score plots
• Inspect the loading plots to interpret the above patterns
• Look at DModX
• What do these patterns say about the objective of the investigation?
05-08-17 SIMCA-P Getting started.ppt 26 (29)www.umetrics.com
Analysis Advisor to understand and interpret model results
05-08-17 SIMCA-P Getting started.ppt 27 (29)www.umetrics.com
7. PredictionsNew Data, Prediction Set
• Under Predictions, specify the set of observations for which predictions willbe made, the prediction set
• New data can be read in as a secondary data set(File / Import) and predictions can be made for these
• Prediction set / Complement WS, gives a prediction set with thoseobservations that were not in the training set
• Predictions / Y-predicted, T-predicted, etc., calculates and displays thepredicted values accordingly
05-08-17 SIMCA-P Getting started.ppt 28 (29)www.umetrics.com
8. Generate the report, with customizable templates
05-08-17 SIMCA-P Getting started.ppt 29 (29)www.umetrics.com
Use of these slides
• You may use any or all of these slides in your own presentations, providedthat you keep (and do not modify) the Umetrics logo and web reference
• If you have any problems with the software, or with understanding of thematerial, please e-mail us at