package ‘pcamixdata’ - u-bordeaux.frmchave100p/wordpress/wp-content/... · 2013. 12. 23. ·...
TRANSCRIPT
Package ‘PCAmixdata’December 23, 2013
Type Package
Title Multivariate analysis for a mixture of quantitative and qualitative data
Version 2.0
Date 2013-04-10
AuthorMarie Chavent and Vanessa Kuentz and Amaury Labenne and Benoit Liquet and Jerome Saracco
Maintainer <[email protected]>
Description Principal Component Analysis and orthogonal rotation for a mixtureof quantitative and qualitative variables.
License GPL (>=2.0)
Collate 'PCAmix.R' 'PCArot.R' 'plot.PCAmix.R' 'predict.PCAmix.R' 'print.PCAmix.R' 'recod.R''recodqual.R' 'recodquant.R' 'sol.2dim.R' 'summary.PCAmix.R' 'tab.disjonctif.NA.R''svd.triplet.R' 'Cut.Group.R' 'nb.level.R' 'Tri.Data.R' 'Lg.pond.R' 'Lg.R' 'RV.R' 'RV.pond.R''MFAmix.R' 'plot.MFAmix.R' 'print.MFAmix.R' 'summary.MFAmix.R'
R topics documented:Cut.Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2decathlon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3flower . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3INSEE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4Lg . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4Lg.pond . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5MFAmix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6nb.level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8PCAmix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8PCArot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11plot.MFAmix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13plot.PCAmix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15predict.PCAmix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17print.MFAmix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18print.PCAmix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18recod . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19recodqual . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1
2 Cut.Group
recodquant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20RV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21RV.pond . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22sol.2dim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22summary.MFAmix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23summary.PCAmix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24svd.triplet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25tab.disjonctif.NA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25Tri.Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26vnf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26wine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Index 28
Cut.Group Cut.Group
Description
Cut a data frame into differents G data frames
Usage
Cut.Group(base, group, name.group)
Arguments
base the data.frame to be split with n rows and p columns
group a vector of size p whose values indicate at which group belongs each variable
name.group a vector of size G which contains names for each group we want to create
Value
list.group a list with each group created
Examples
data(decathlon)Cut.Group(decathlon,group=c(rep(1,10),2,2,3),name.group=c("Epreuve","Classement","Competition"))
flower 3
decathlon Performance in decathlon (data)
Description
The data used here refer to athletes’ performance during two sporting events.
Usage
data(decathlon)
Format
A data frame with 41 rows and 13 columns: the first ten columns corresponds to the performanceof the athletes for the 10 events of the decathlon. The columns 11 and 12 correspond respectivelyto the rank and the points obtained. The last column is a categorical variable corresponding to thesporting event (2004 Olympic Game or 2004 Decastar)
Source
Departement of Applied Mathematics, Agrocampus Rennes
flower Flower Characteristics
Description
8 characteristics for 18 popular flowers.
Usage
data(flower)
Format
A data frame with 18 observations on 8 variables:
[ , "V1"] factor winters[ , "V2"] factor shadow[ , "V3"] factor tubers[ , "V4"] factor color[ , "V5"] ordered soil[ , "V6"] ordered preference[ , "V7"] numeric height[ , "V8"] numeric distance
V1 winters, is binary and indicates whether the plant may be left in the garden when it freezes.
V2 shadow, is binary and shows whether the plant needs to stand in the shadow.
V3 tubers, is asymmetric binary and distinguishes between plants with tubers and plants that grow
4 Lg
in any other way.
V4 color, is nominal and specifies the flower’s color (1 = white, 2 = yellow, 3 = pink, 4 = red, 5 =blue).
V5 soil, is ordinal and indicates whether the plant grows in dry (1), normal (2), or wet (3) soil.
V6 preference, is ordinal and gives someone’s preference ranking going from 1 to 18.
V7 height, is interval scaled, the plant’s height in centimeters.
V8 distance, is interval scaled, the distance in centimeters that should be left between the plants.
Source
The reference below.
References
Anja Struyf, Mia Hubert & Peter J. Rousseeuw (1996): Clustering in an Object-Oriented Environ-ment. Journal of Statistical Software, 1. http://www.stat.ucla.edu/journals/jss/
INSEE INSEE
Description
Description of 302 cities by 56 variables structured in 8 groups
Usage
data(INSEE)
Format
A data frame with 302 observations on 56 variables
Source
www.insee.fr
Lg Coefficient Lg
Description
Compute a data fame with all the Lg coefficients between each matrix of a list
Usage
Lg(liste.mat)
Arguments
liste.mat a list of G matrix
Lg.pond 5
Value
Lg a data frame with G rows and G colums with all the Lg coefficients
References
Escofier B et Pages J (1998), Analyses factorielles simples et multiples, Dunod, 3e ed.
J.Pages,Analyse factorielle multiple appliquee aux variables qualitatives et aux donnees mixtes.Revue de statistique appliquee, tome 50, num 4 (2002), p. 5-37
See Also
RV,RV.pond,Lg.pond,
Examples
V0<-c("a","b","a","a","b")V01<-c("c","d","e","c","e")V1<-c(5,4,2,3,6)V2<-c(8,15,4,6,5)V3<-c(4,12,5,8,7)V4<-c("vert","vert","jaune","rouge","jaune")V5<-c("grand","moyen","moyen","petit","grand")G1<-data.frame(V0,V01,V1)G2<-data.frame(V2,V3)G3<-data.frame(V4,V5)liste.mat<-list(G1,G2,G3)Lg(liste.mat)
Lg.pond Coefficient Lg with ponderation
Description
Compute a data fame with all the Lg coefficients between each matrix of a list ponderated
Usage
Lg.pond(liste.mat, ponde)
Arguments
liste.mat a list of G matrix
ponde a vector of size G with the ponderation associated to each matrix
Value
Lg.pond a data frame with G rows and G colums with all the Lg coefficients ponderated
6 MFAmix
Examples
V0<-c("a","b","a","a","b")V01<-c("c","d","e","c","e")V1<-c(5,4,2,3,6)V2<-c(8,15,4,6,5)V3<-c(4,12,5,8,7)V4<-c("vert","vert","jaune","rouge","jaune")V5<-c("grand","moyen","moyen","petit","grand")G1<-data.frame(V0,V01,V1)G2<-data.frame(V2,V3)G3<-data.frame(V4,V5)ponderation<-c(1,2,1)liste.mat<-list(G1,G2,G3)Lg.pond(liste.mat,ponderation)
MFAmix Multiple Factor Analysis for a mixture of qualitative and quantitativevariables inside the groups
Description
Performs Multiple Factor Analysis in the sense of Escofier-Pages. Groups of variables can bequantitative, categorical or contain both quantitatives and categoricals variables.
Usage
MFAmix(data, group, name.group, ndim, graph = TRUE,axes = c(1, 2))
Arguments
data a data frame with n rows and p colums containing all the variables. This dataframe will be split into G groups according to the vector group
group a vector of size p whose values indicate at which group each variable belongs
name.group a vector of size G which contains names for each group we want to create. Eachname must be written as a charactor chain without any spaces and any specialcharacters. Only letters, numbers and "_" allowed.
ndim number of dimensions kept in the results
graph boolean, if TRUE (default) a graph is displayed
axes a length 2 vector specifying the axes to plot
Value
eig a matrix containing all the eigenvalues, the percentage of variance and the cu-mulative percentage of variance
separate.analyses
the results for the separate analyses for each group
group a list of matrices containing all the results for the groups (Lg and RV coefficients,coordinates, square cosine, contributions, distance to the origin)
MFAmix 7
partial.axes a list of matrices containing all the results for the partial axes (coordinates, cor-relation between variables and axes)
ind a list of matrices containing all the results for the individuals (coordinates, squarecosine, contributions)
ind.partiel a matrice containing the coordinates of the partial individuals
quanti.var a list of matrices containing all the results for the quantitative variables (coordi-nates, contribution, cos2)
quali.var a list of matrices containing all the results for the categorical variables (coordi-nates, contribution, cos2)
global.pca The results of the MFAmix considered as a unique weighted PCAmix
Author(s)
Marie Chavent <[email protected]>, Vanessa Kuentz, Amaury Labenne, Benoit Li-quet, Jerome Saracco
References
Escofier B et Pages J (1983), Methode pour l’analyse de plusieurs groupes de variables. Applicationa la caracterisation des vins rouges du Val de Loire, Revue de statistique appliquee, 31(2) : 43-59.
Escofier B et Pages J (1998), Analyses factorielles simples et multiples, Dunod, 3e ed.
Examples
#MFAmix:data(INSEE)
class.var<-c(1,1,1,1,1,2,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,4,4,4,4,4,5,5,1,1,1,1,6,7,7,7,7,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,8,8,1)
nom.groupes<-c("Log_Services","Population","Emplois","Condit_Famil","Educ","Revenu","Environnement","Secu_et_Soins")
res<-MFAmix(data=INSEE,group=class.var,name.group=nom.groupes,ndim=5,graph=FALSE)summary(res)res$eigres$group$Lg
#Eigenvalues of the separate analysesres$recap.eig.separate
#Individuals map on dim 1-2 colored by MedGenplot(res,choice="ind",invisible="quali",habillage="MedGen_R",
cex=0.6,title="Individuals colored by the Qualitative variable MedGen_R")#Categories maps on dim 2-3plot(res,choice="ind",invisible="ind",cex=0.6,axes=c(2,3))#Partial axes on dim 1-2plot(res,choice="axes",invisible="ind",cex=0.6,habillage="group",
col.hab=c("green","blue","red","skyblue1","orange","grey","purple","brown"))#Groups representation on dim 1-2plot(res,choice="group",cex=0.6,habillage="group")#Correlation circle on dim 1-2plot(res,choice="var",cex=0.6,habillage="group")#Squared loadings on dim 1-2plot(res,choice="loadings",cex=0.6,habillage="group")#Some partial individuals
8 PCAmix
plot(res,choice="ind",invisible="quali",habillage="group",partial=c("33119","17410"),cex=0.6)#All partial individualsplot(res,choice="ind",invisible="quali",habillage="group",partial="all",cex=0.6)
nb.level nb.level
Description
Return the number of levels of a factor
Usage
nb.level(fact)
Arguments
fact a factor
Value
The number of levels of the factor
PCAmix Principal Component Analysis for a mixture of qualitative and quan-titative variables
Description
PCAmix is a principal component method for a mixture of qualitative and quantitative variables.PCAmix includes the ordinary principal component analysis (PCA) and multiple correspondenceanalysis (MCA) as special cases. Squared loadings are correlation ratios for qualitative variablesand squared correlation for quantitative variables. Missing values are replaced by means for quan-titative variables and by zeros in the indicator matrix for qualitative variables. Note that when allthe p variables are qualitative, the scores of the n observations are equal to the usual scores of MCAtimes square root of p and the eigenvalues are then equal to the usual eigenvalues of MCA times p.
Usage
PCAmix(X.quanti = NULL, X.quali = NULL, ndim = 5,weight.col = NULL, weight.row = NULL, graph = TRUE)
PCAmix 9
Arguments
X.quanti a numeric matrix of data, or an object that can be coerced to such a matrix (suchas a numeric vector or a data frame with all numeric columns).
X.quali a categorical matrix of data, or an object that can be coerced to such a matrix(such as a character vector, a factor or a data frame with all factor columns).
ndim number of dimensions kept in the results (by default 5).
graph boolean, if TRUE the following graphs are displayed for the first two dimensionsof PCAmix: plot of the observations (the scores), plot of the variables (squaredloadings) plot of the correlation circle (if quantitative variables are available),plot of the categories (if qualitative variables are available).
weight.col a vector of weights for the quantitatives variables and for the indicator of quali-tatives variables
weight.row a vector of weights for the individuals
Value
eig eigenvalues (i.e. variances) of the Principal Components (PC).
scores scores a n by ndim numerical matrix which contains the scores of the n obser-vations on the ndim first Principal Components (PC).
scores.stand a n by ndim numerical matrix which contains the standardized scores of the nobservations on the ndim first Principal Components (PC).
sload a p by ndim numerical matrix which contains the squared loadings of the pvariables on the ndim first PC. For quantitative variables (resp. qualitative),squared loadings are the squared correlations (resp. the correlation ratios) withthe PC scores.
categ.coord ’NULL’ if X.quali is ’NULL’ . Otherwise a m by ndim numerical matrix whichcontains the coordinates of the m categories of the qualitative variables on thendim first PC. The coordinates of the categories are the averages of the standard-ized PC scores of the observations in those categories.
quanti.cor ’NULL’ if X.quanti is ’NULL’. Otherwise a p1 by ndim numerical matrix whichcontains the coordinates (the loadings) of the p1 quantitative variables on thendim first PC. The coordinates of the quantitative variables are correlations withthe PC scores.
quali.eta2 ’NULL’ if X.quali is ’NULL’ . Otherwise a p2 by ndim numerical matrix whichcontains the squared loadings of the p2 qualitative variables on the ndim firstPC. The squared loadings of the qualitative variables are correlation ratios withthe PC scores.
res.ind Results for the individuals (coord,contrib in percentage,cos2)
res.quanti Results for the quantitatives variables (coord,contrib in percentage,cos2)
res.categ Results for the categories of the categorials variables (coord,contrib in percent-age,cos2)
coef Coefficients of the linear combinations of the quantitative variables and the cat-egories for constructing the principal components of PCAmix.
V The standardized loadings.
rec Results of the fonction recod(X.quanti,X.quali).
M Metric used in the svd for the weights of the variables.
10 PCAmix
Author(s)
Marie Chavent <[email protected]>, Vanessa Kuentz, Benoit Liquet, JeromeSaracco
References
Chavent, M., Kuentz, V., Saracco, J. (2011), Orthogonal Rotation in PCAMIX. Advances in Clas-sification and Data Analysis, Vol. 6, pp. 131-146.
Kiers, H.A.L., (1991), Simple structure in Component Analysis Techniques for mixtures of quali-tative and quantitative variables, Psychometrika, 56, 197-212.
Examples
#PCAMIX:data(wine)X.quanti <- wine[,c(3:29)]X.quali <- wine[,c(1,2)]pca<-PCAmix(X.quanti,X.quali,ndim=4)pca<-PCAmix(X.quanti,X.quali,ndim=4,graph=FALSE)pca$eig
#Scores on dim 1-2plot(pca,choice="ind",quali=wine[,1],
posleg="bottomleft",main="Scores")#Scores on dim 2-3plot(pca,choice="ind",axes=c(2,3),quali=wine[,1],
posleg="bottomleft",main="Scores")#Other graphicsplot(pca,choice="var",main="Squared loadings")plot(pca,choice="categ",main="Categories")plot(pca,choice="cor",xlim=c(-1.5,2.5),
main="Correlation circle")#plot with standardized scores:plot(pca,choice="ind",quali=wine[,1],stand=TRUE,
posleg="bottomleft",main="Standardized Scores")plot(pca,choice="var",stand=TRUE,main="Squared loadings")plot(pca,choice="categ",stand=TRUE,main="Categories")plot(pca,choice="cor",stand=TRUE,main="Correlation circle")
#PCA:data(decathlon)quali<-decathlon[,13]pca<-PCAmix(decathlon[,1:10])pca<-PCAmix(decathlon[,1:10], graph=FALSE)plot(pca,choice="ind",quali=quali,cex=0.8,
posleg="topright",main="Scores")plot(pca, choice="var",main="Squared correlations")plot(pca, choice="cor",main="Correlation circle")
#MCAdata(flower)mca <- PCAmix(X.quali=flower[,1:4])mca <- PCAmix(X.quali=flower[,1:4],graph=FALSE)plot(mca,choice="ind",main="Scores")
PCArot 11
plot(mca,choice="var",main="Correlation ratios")plot(mca,choice="categ",main="Categories")
#Missing valuesdata(vnf)PCAmix(X.quali=vnf)vnf2<-na.omit(vnf)PCAmix(X.quali=vnf2)
PCArot Varimax rotation in PCAmix
Description
Orthogonal rotation in PCAmix by maximization of the varimax function expressed in terms ofPCAmix squared loadings (correlation ratios for qualitative variables and squared correlations forquantitative variables). PCArot includes the ordinary varimax rotation in Principal ComponentAnalysis (PCA) and rotation in Multiple Correspondence Analysis (MCA) as special cases.
Usage
PCArot(obj, dim, itermax = 100, graph = TRUE)
Arguments
obj an object of class PCAmix.
dim number of rotated Principal Components.
itermax maximum number of iterations in the Kaiser’s practical optimization algorithmbased on successive pairwise planar rotations.
graph boolean, if TRUE the following graphs are displayed for the first two dimensionsafter rotation: plot of the observations (the scores), plot of the variables (squaredloadings) plot of the correlation circle (if quantitative variables are available),plot of the categories (if qualitative variables are available).
Value
theta angle of rotation if dim is equal to 2.
T matrix of rotation.
eig variances of the ndim dimensions after rotation.
scores a n by ndim numerical matrix which contains the rotated scores of the n obser-vations.
scores.stand a n by dim numerical matrix which contains the rotated standardized scores ofthe n observations.
sload a p by ndim numerical matrix which contains the rotated squared loadings ofthe p variables. For quantitative variables (resp. qualitative), the rotated squaredloadings are the squared correlations (resp. the correlation ratios) with the ro-tated standardized scores.
12 PCArot
categ.coord ’NULL’ if X.quali is ’NULL’. Otherwise a m by dim numerical matrix whichcontains the coordinates of the m categories after rotation. The coordinates ofthe categories after rotation are the averages of the rotated standardized scoresof the observations in those categories.
quanti.cor ’NULL’ if X.quanti is ’NULL’. Otherwise a p1 by dim numerical matrix whichcontains the coordinates (the loadings) of the p1 quantitative variables after rota-tion. The coordinates of the quantitative variables after rotation are correlationswith the rotated standardized scores.
quali.eta2 ’NULL’ if X.quali is ’NULL’. Otherwise a p2 by dim numerical matrix whichcontains the squared loadings of the p2 quantitative variables after rotation. Thesquared loadings of the qualitative variables after rotation are correlation ratiowith the rotated standardized scores.
coef Coefficients of the linear combinations of the quantitative variables and the cat-egories for constructing the rotated principal components of PCAmix.
V The rotated standardized loadings.
Author(s)
Marie Chavent <[email protected]>, Vanessa Kuentz, Benoit Liquet, Jerome Saracco
References
Chavent, M., Kuentz, V., Saracco, J. (2011), Orthogonal Rotation in PCAMIX. Advances in Clas-sification and Data Analysis, Vol. 6, pp. 131-146.
Kiers, H.A.L., (1991), Simple structure in Component Analysis Techniques for mixtures of quali-tative and quantitative variables, Psychometrika, 56, 197-212.
See Also
plot.PCAmix, summary.PCAmix,PCAmix,predict.PCAmix
Examples
#PCAMIX:data(wine)X.quanti <- wine[,c(3:29)]X.quali <- wine[,c(1,2)]pca<-PCAmix(X.quanti,X.quali,ndim=4, graph=FALSE)pca
rot<-PCArot(pca,3)rotrot$eig #percentages of variances after rotation
plot(rot,choice="ind",quali=wine[,1],posleg="bottomleft", main="Rotated scores")plot(rot,choice="var",main="Squared loadings after rotation")plot(rot,choice="categ",main="Categories after rotation")plot(rot,choice="cor",main="Correlation circle after rotation")
#PCA:
plot.MFAmix 13
data(decathlon)quali<-decathlon[,13]pca<-PCAmix(decathlon[,1:10], graph=FALSE)
rot<-PCArot(pca,3)plot(rot,choice="ind",quali=quali,cex=0.8,posleg="topright",main="Scores after rotation")plot(rot, choice="var", main="Squared correlations after rotation")plot(rot, choice="cor", main="Correlation circle after rotation")
#MCAdata(flower)mca <- PCAmix(X.quali=flower[,1:4],graph=FALSE)
rot<-PCArot(mca,2)plot(rot,choice="ind",main="Scores after rotation")plot(rot, choice="var", main="Correlation ratios after rotation")plot(rot, choice="categ", main="Categories after rotation")
plot.MFAmix Graphs of a MFAmix analysis
Description
Graphs of MFAmix analysis : plot of the individuals, the partial individuals, the groups, the numericvariables on the correlation circle, the categories of the categorials variables, the partial axes of eachseparates analyses
Usage
## S3 method for class MFAmixplot(x, axes = c(1, 2), choice = "axes",
lab.grpe = TRUE, lab.var = TRUE, lab.ind = TRUE,lab.par = FALSE, habillage = "ind", col.hab = NULL,invisible = NULL, partial = NULL, lim.cos2.var = 0,chrono = FALSE, xlim = NULL, ylim = NULL, cex = 1,title = NULL, palette = NULL, new.plot = FALSE,leg=TRUE, ...)
Arguments
x an object of class MFAmix
axes alength 2 vector specifying the components to plot
choice a string corresponding to the graph that you want to do ("ind" for the individualor categorical variables graph, "var" for the correlation circle of the quantitativevariables, "axes" for the graph of the partial axes, "group" for the groups rep-resentation,"loadings" for the squared loadings : squared correlation coefficientbetween numeric variables and axis and correlation ration between catogoricalvariables and axis)
lab.grpe boolean, if TRUE, the labels of the groups are drawn
lab.var boolean, if TRUE, the labels of the variables are drawn
14 plot.MFAmix
lab.ind boolean, if TRUE, the labels of the individuals and of the categories are drawn
lab.par boolean, if TRUE, the labels of the partial points are drawn
habillage string corresponding to the color which are used. If "ind", one color is used foreach individual; if "group" the individuals are colored according to the group;else if it is the name of a categorical variable, it colors according to the differentcategories of this variable
col.hab the colors to use. By default, colors are chosen
invisible a string; for choix ="ind", the individuals can be omit (invisible = "ind"), or thecategories of the the categorical variables (invisible= "quali")
partial list of the individuals or of the categories for which the partial points should bedrawn (by default, partial = NULL and no partial points are drawn)
lim.cos2.var value of the square cosinus below whixh the points are not drawn
chrono boolean, if TRUE, the partial points of a same point are linked (useful whengroups correspond to different dates)
xlim numeric vectors of length 2, giving the x coordinates range
ylim numeric vectors of length 2, giving the y coordinates range
cex cf. function par in the graphics package
title string corresponding to the title of the graph you want to draw (by default NULLand a title is chosen)
new.plot boolean, if TRUE, a new graphical device is created
palette the color palette used to draw the points. By default colors are chosen. If youwant to define the colors: palette=palette(c("black","red","blue")); or you canuse: palette=palette(rainbow(30)), or in black and white for example: palette=palette(gray(seq(0,.9,len=25)))
leg boolean, if TRUE, a legend is displayed
... further arguments passed to or from othe methods
Value
Returns the graph you want to plot
Examples
#MFAmix:data(INSEE)
class.var<-c(1,1,1,1,1,2,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,4,4,4,4,4,5,5,1,1,1,1,6,7,7,7,7,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,8,8,1)
nom.groupes<-c("Log_Services","Population","Emplois","Condit_Famil","Educ","Revenu","Environnement","Secu_et_Soins")
res<-MFAmix(data=INSEE,group=class.var,name.group=nom.groupes,ndim=5,graph=FALSE)summary(res)res$eigres$group$Lg
#Eigenvalues of the separate analysesres$recap.eig.separate
#Individuals map on dim 1-2 colored by MedGenplot(res,choice="ind",invisible="quali",habillage="MedGen_R",
cex=0.6,title="Individuals colored by the Qualitative variable MedGen_R")
plot.PCAmix 15
#Categories maps on dim 2-3plot(res,choice="ind",invisible="ind",cex=0.6,axes=c(2,3))#Partial axes on dim 1-2plot(res,choice="axes",invisible="ind",cex=0.6,habillage="group",
col.hab=c("green","blue","red","skyblue1","orange","grey","purple","brown"))#Groups representation on dim 1-2plot(res,choice="group",cex=0.6,habillage="group")#Correlation circle on dim 1-2plot(res,choice="var",cex=0.6,habillage="group")#Squared loadings on dim 1-2plot(res,choice="loadings",cex=0.6,habillage="group")#Some partial individualsplot(res,choice="ind",invisible="quali",habillage="group",partial=c("33119","17410"),cex=0.6)#All partial individualsplot(res,choice="ind",invisible="quali",habillage="group",partial="all",cex=0.6)
plot.PCAmix Graphs of a PCAmix analysis before or after rotation
Description
Graphs of PCAmix analysis before or after rotation: plot of the observations (the scores or the stan-dardized scores), plot of the variables (squared loadings) correlation circle of the available quanti-tative variables, plot of the categories of the available qualitative variables.
Usage
## S3 method for class PCAmixplot(x,axes = c(1, 2), choice = "ind", stand=FALSE,label=TRUE, quali=NULL,posleg="topleft",xlim=NULL,ylim=NULL,cex=1,col.var=NULL,...)
Arguments
x an object of class PCAmix obtained with the function PCAmix or PCArot.
axes a length 2 vector specifying the components to plot.
choice the graph to plot ("ind" for the observations, "var" for the variables (plot of thesquared loadings), "cor" for the correlation circle if quantitative variables areavailable, "categ" for the categories if qualitative variables are available.
stand if ’TRUE’ the standardized scores are used in the plot of the observations.
label if ’FALSE’ the labels of the points are not plotted
quali a qualitative variable such as a character vector or a factor of size n (the numberof observation). The observations are colored according to the categories of thisvariable.
posleg position of the legend if quali is not ’NULL’.
xlim numeric vectors of length 2, giving the x coordinates range.
ylim numeric vectors of length 2, giving the y coordinates range.
cex cf. function par in the graphics package
col.var a vector of size p (the total n umber of variables) which allows to colorate eachvariable with a different color
... further arguments passed to or from other methods.
16 plot.PCAmix
Author(s)
Marie Chavent <[email protected]>, Vanessa Kuentz, Benoit Liquet, Jerome Saracco
References
Chavent, M., Kuentz, V., Saracco, J. (2011), Orthogonal Rotation in PCAMIX
Kiers, H.A.L., (1991), Simple structure in Component Analysis Techniques for mixtures of quali-tative and quantitative variables, Psychometrika, 56, 197-212.
See Also
summary.PCAmix,PCAmix,PCArot,
Examples
#PCAMIX:data(wine)X.quanti <- wine[,c(3:29)]X.quali <- wine[,c(1,2)]pca<-PCAmix(X.quanti,X.quali,ndim=4, graph=FALSE)
#Scores on dim 1-2plot(pca,choice="ind",quali=wine[,1],posleg="bottomleft",main="Scores")#Scores on dim 2-3plot(pca,choice="ind",axes=c(2,3),quali=wine[,1],posleg="bottomleft",main="Scores")#Other graphicsplot(pca,choice="var",main="Squared loadings")plot(pca,choice="categ",main="Categories")plot(pca,choice="cor",xlim=c(-1.5,2.5),main="Correlation circle")
rot<-PCArot(pca,3,graph=FALSE)plot(rot,choice="ind", main="Rotated scores",label=FALSE)plot(rot,choice="var",main="Squared loadings after rotation")plot(rot,choice="categ",main="Categories after rotation")plot(rot,choice="cor",main="Correlation circle after rotation")
#PCA:data(decathlon)quali<-decathlon[,13]pca<-PCAmix(decathlon[,1:10], graph=FALSE)plot(pca,choice="ind",quali=quali,stand=TRUE,cex=0.8,posleg="topright",main="Standardized scores")plot(pca, choice="cor", stand=TRUE,main="Correlation circle")
predict.PCAmix 17
predict.PCAmix Scores of new objects on the principal components of PCAmix orPCArot
Description
This function calculates the scores of a new set of data on the principal components of PCA. If thecomponents have been rotated, the function gives the scores of the new objects on the rotated PC.The new objects must have the same variables than the learning set.
Usage
## S3 method for class PCAmixpredict(object, X.quanti = NULL, X.quali = NULL,...)
Arguments
object an object of class PCAmix (output of the function PCAmix or PCArot)
X.quanti numeric matrix of data for the new objects
X.quali a categorical matrix of data for the new objects
... futher arguments pased to or from other methods
Value
Returns the matrix of the scores of the new objects of the ndim principal PC or rotated PC.
Author(s)
Marie Chavent <[email protected]>, Vanessa Kuentz, Benoit Liquet, JeromeSaracco
Examples
data(decathlon)n <- nrow(decathlon)sub <- sample(1:n,20)pca<-PCAmix(decathlon[sub,1:10], graph=FALSE)predict(pca,decathlon[-sub,1:10])rot <- PCArot(pca,dim=4)predict(rot,decathlon[-sub,1:10])
18 print.PCAmix
print.MFAmix Print a ’MFAmix’ object
Description
This is a method for the function print for objects of the class MFAmix.
Usage
## S3 method for class MFAmixprint(x, ...)
Arguments
x An object of class MFAmix generated by the function PCAmix.... Further arguments to be passed to or from other methods. They are ignored in
this function.
Author(s)
Marie Chavent <[email protected]>, Vanessa Kuentz, Benoit Liquet, Jerome Saracco
See Also
MFAmix
print.PCAmix Print a ’PCAmix’ object
Description
This is a method for the function print for objects of the class PCAmix.
Usage
## S3 method for class PCAmixprint(x, ...)
Arguments
x An object of class PCAmix generated by the functions PCAmix and PCArot.... Further arguments to be passed to or from other methods. They are ignored in
this function.
Author(s)
Marie Chavent <[email protected]>, Vanessa Kuentz, Benoit Liquet, Jerome Saracco
See Also
PCAmix , PCArot
recod 19
recod Recoding of the quantitative and qualitative data matrix
Description
Recoding of the quantitative and qualitative data matrix
Usage
recod(X.quanti, X.quali)
Arguments
X.quanti numeric matrix of data
X.quali a categorical matrix of data
Value
X X.quanti and X.quali concatenated in a single matrix
Y X.quanti with missing values replaced with mean values and X.quali with miss-ing values replaced by zeros, concatenated in a single matrix
Z X.quanti standardized (centered and reduced by standard deviations) concate-nated with the indicator matrix of X.quali centered and reduced with the squareroots of the relative frequencies of the categories
W X.quanti standardized (centered and reduced by standard deviations) concate-nated with the indicator matrix of X.quali centered
n the number of objects
p the total number of variables
p1 the number of quantitative variables
p2 the number of qualitative variables
g the means of the columns of X.quanti
s the standard deviations of the columns of X.quanti (population version with 1/n)
G The indicator matix of X.quali with missing values replaced by 0
Gcod The indicator matix G reduced with the square roots of the relative frequenciesof the categories
20 recodquant
recodqual Recoding of the qualitative data matrix
Description
Recoding of the qualitative data matrix
Usage
recodqual(X)
Arguments
X the qualitative data matrix
Value
G The indicator matix of X with missing values replaced by 0
Examples
data(vnf)X <- vnf[1:10,9:12]tab.disjonctif.NA(X)recodqual(X)
recodquant Recoding of the quantitative data matrix
Description
Recoding of the quantitative data matrix
Usage
recodquant(X)
Arguments
X the quantitative data matrix
Value
Z the standardized quantitative data matrix (centered and reduced with the stan-dard deviations)
g the means of the columns of X
s the standard deviations of the columns of X (population version with 1/n)
Xcod The quantitative matrix X with missing values replaced with the column meanvalues
RV 21
Examples
data(decathlon)X <- decathlon[1:5,1:5]X[1,2] <- NAX[2,3] <-NArec <- recodquant(X)
RV Coefficient RV
Description
Compute a data fame with all the RV coefficients between each matrix of a list
Usage
RV(liste.mat)
Arguments
liste.mat a list of G matrices
Value
RV a data frame with G rows and G colums with all the RV coefficients
References
Escofier B et Pages J (1998), Analyses factorielles simples et multiples, Dunod, 3e ed.
J.Pages,Analyse factorielle multiple appliquee aux variables qualitatives et aux donnees mixtes.Revue de statistique appliquee, tome 50, num 4 (2002), p. 5-37
Examples
V0<-c("a","b","a","a","b")V01<-c("c","d","e","c","e")V1<-c(5,4,2,3,6)V2<-c(8,15,4,6,5)V3<-c(4,12,5,8,7)V4<-c("vert","vert","jaune","rouge","jaune")V5<-c("grand","moyen","moyen","petit","grand")G1<-data.frame(V0,V01,V1)G2<-data.frame(V2,V3)G3<-data.frame(V4,V5)liste.mat<-list(G1,G2,G3)RV(liste.mat)
22 sol.2dim
RV.pond Coefficient RV with ponderation
Description
Compute a data fame with all the RV coefficients between each matrix of a list ponderated
Usage
RV.pond(liste.mat, ponde)
Arguments
liste.mat a list of G matrix
ponde a vector of size G with the ponderation associated to each matrix
Value
RV.pond a data frame with G rows and G colums with all the RV coefficients ponderated
Examples
V0<-c("a","b","a","a","b")V01<-c("c","d","e","c","e")V1<-c(5,4,2,3,6)V2<-c(8,15,4,6,5)V3<-c(4,12,5,8,7)V4<-c("vert","vert","jaune","rouge","jaune")V5<-c("grand","moyen","moyen","petit","grand")G1<-data.frame(V0,V01,V1)G2<-data.frame(V2,V3)G3<-data.frame(V4,V5)liste.mat<-list(G1,G2,G3)ponderation<-c(1,2,1)RV.pond(liste.mat,ponderation)
sol.2dim Varimax rotation for PCAmix in two dimensions
Description
Varimax rotation for PCAmix in two dimensions
Usage
sol.2dim(A, indexj, p, p1)
summary.MFAmix 23
Arguments
A matrix of loadings
indexj a vector with the number of the variable associated with each row of A
p the total number of variables
p1 the number of quantitative variables
Value
theta the angle of rotation
T the matrix of rotation
summary.MFAmix Summary of a ’MFAmix’ object
Description
This is a method for the function summary for objects of the class MFAmix.
Usage
## S3 method for class MFAmixsummary(object, ...)
Arguments
object an object of class MFAmix obtained with the function MFAmix.
... further arguments passed to or from other methods.
Value
Returns the total number of observations, the number of numerical variables, the number of cate-gorical variables with the total number of categories. And all those values are declinated by groups.
See Also
plot.MFAmix,MFAmix
24 summary.PCAmix
summary.PCAmix Summary of a ’PCAmix’ object
Description
This is a method for the function summary for objects of the class PCAmix.
Usage
## S3 method for class PCAmixsummary(object, ...)
Arguments
object an object of class PCAmix obtained with the function PCAmix or PCArot.
... further arguments passed to or from other methods.
Value
Returns the matrix of squared loadings. For quantitative variables (resp. qualitative), squared load-ings are the squared correlations (resp. the correlation ratios) with the scores or with the rotated(standardized) scores.
Author(s)
Marie Chavent <[email protected]>, Vanessa Kuentz, Benoit Liquet, Jerome Saracco
References
Chavent, M., Kuentz, V., Saracco, J. (2011), Orthogonal Rotation in PCAMIX
Kiers, H.A.L., (1991), Simple structure in Component Analysis Techniques for mixtures of quali-tative and quantitative variables, Psychometrika, 56, 197-212.
See Also
plot.PCAmix,PCAmix,PCArot,
Examples
data(wine)X.quanti <- wine[,c(3:29)]X.quali <- wine[,c(1,2)]pca<-PCAmix(X.quanti,X.quali,ndim=4, graph=FALSE)summary(pca)
rot<-PCArot(pca,3,graph=FALSE)summary(rot)
svd.triplet 25
svd.triplet Singular Value Decomposition of a Matrix
Description
Compute the singular-value decomposition of a rectangular matrix with weights for rows andcolumns. Borrowed from the ’FactoMineR’ package and, used as internal function
Usage
svd.triplet(X, row.w=NULL, col.w=NULL, ncp=Inf)
Arguments
X a data matrix
row.w vector with the weights of each row (NULL by default and the weights are uni-form)
col.w vector with the weights of each column (NULL by default and the weights areuniform)
ncp the number of components kept for the outputs
Value
vs a vector containing the singular values of ’x’;
u a matrix whose columns contain the left singular vectors of ’x’;
v a matrix whose columns contain the right singular vectors of ’x’.
See Also
svd
tab.disjonctif.NA Construction of the indicator matrix of a qualitative data matrix
Description
Construction of the indicator matrix of a qualitative data matrix
Usage
tab.disjonctif.NA(tab)
Arguments
tab a qualitative data matrix
Value
res the indicator matrix with NA for the categories of the variable which is missing in tab
26 vnf
Examples
data(vnf)X <- vnf[1:10,9:12]tab.disjonctif.NA(X)recodqual(X)
Tri.Data Tri.Data
Description
Cut a data frame with numeric variables and factors into two data frames, one with only numericvariables and the other one with only factors variables.
Usage
Tri.Data(base)
Arguments
base the data.frame with numeric variables and factors
Value
X.quanti The sub data.frame with only numeric variablesX.quali The sub data.frame with only categorial variables
Examples
data(decathlon)Tri.Data(decathlon)
vnf Questionnaire done by 1232 individuals who answered 14 questions
Description
A user satisfaction survey of pleasure craft operators on the “Canal des Deux Mers”, located inSouth of France, was carried out by the public corporation “Voies Navigables de France” responsi-ble for managing and developing the largest network of navigable waterways in Europe
Usage
data(vnf)
Format
A data frame with 1232 observations on the following 14 categorical variables.
Source
Josse, J., Chavent, M., Liquet, B. and Husson, F. (2012). Handling missing values with RegularizedIterative Multiple Correspondence Analysis. Journal of classification, Vol. 29, pp. 91-116.
wine 27
wine Wine
Description
The data used here refer to 21 wines of Val de Loire.
Usage
data(wine)
Format
A data frame with 21 rows (the number of wines) and 31 columns: the first column corresponds tothe label of origin, the second column corresponds to the soil, and the others correspond to sensorydescriptors.
Source
Centre de recherche INRA d’Angers
Index
∗Topic algebrasvd.triplet, 25
∗Topic datasetsdecathlon, 3flower, 3INSEE, 4vnf, 26wine, 27
∗Topic multivariateMFAmix, 6PCAmix, 8PCArot, 11
∗Topic printprint.MFAmix, 18print.PCAmix, 18
Cut.Group, 2
decathlon, 3
flower, 3
INSEE, 4
Lg, 4Lg.pond, 5, 5
MFAmix, 6, 18, 23
nb.level, 8
PCAmix, 8, 12, 16, 18, 24PCArot, 11, 16, 18, 24plot.MFAmix, 13, 23plot.PCAmix, 12, 15, 24predict.PCAmix, 12, 17print.MFAmix, 18print.PCAmix, 18
recod, 19recodqual, 20recodquant, 20RV, 5, 21RV.pond, 5, 22
sol.2dim, 22
summary.MFAmix, 23summary.PCAmix, 12, 16, 24svd, 25svd.triplet, 25
tab.disjonctif.NA, 25Tri.Data, 26
vnf, 26
wine, 27
28