1 de xxx descripció univariant. 2 de xxx car mileage data (consum de gasolina) mpg 30,8 31,7 30,1...
TRANSCRIPT
1 de xxx
Descripció Univariant
2 de xxx
Car Mileage data (consum de gasolina)
MPG30,831,730,131,632,133,331,331,032,032,430,930,432,530,331,332,132,531,830,430,532,031,430,832,832,031,532,431,029,831,132,332,731,230,631,731,432,231,531,730,632,631,431,831,932,831,531,630,632,2
3 de xxx
Descripció Univariant
• >data=read.table("D:/Albert/COURSES/CursLlibreBowerman/Datasets - Text/GasMiles.txt", header=TRUE)
• > names(data)• [1] "MPG"• > dim(data)• [1] 49 1• > attach(data)• > stem(MPG)
• The decimal point is at the |
• 29 | 8• 30 | 1344• 30 | 5666889• 31 | 001233444• 31 | 55566777889• 32 | 0001122344• 32 | 556788• 33 | 3
> data MPG1 30.82 31.73 30.14 31.65 32.16 33.3....47 31.648 30.649 32.2
> summary(data) MPG Min. :29.80 1st Qu.:31.00 Median :31.60 Mean :31.55 3rd Qu.:32.10 Max. :33.30
4 de xxx
Amb SPSS
Estadísticos descriptivos
49 29,80 33,30 31,5531 ,79924
49
GAS
N válido (según lista)
N Mínimo Máximo Media Desv. típ.
Copiar / pegar les dades a la fulla de càlcul de SPSS (amb , en lloc de punts, si estemEn la versió espanyola de SPSs
DESCRIPTIVES VARIABLES=gas /STATISTICS=MEAN STDDEV MIN MAX .
5 de xxx
histograma
GRAPH /HISTOGRAM(NORMAL)=gas .
GAS
10
8
6
4
2
0
Desv. tνp. = ,80
Media = 31,55
N = 49,00
6 de xxx
histograma
GRAPH /HISTOGRAM(NORMAL)=gas .
GAS
12
10
8
6
4
2
0
Desv. tνp. = ,80
Media = 31,55
N = 49,00
7 de xxx
Box Plot
min max
3* IQR defines the outer fences, pointsBeyond that fences are extreme outliers
Points beyond the inner fences but below outer fences aremild outliers.
Inner Fences Q1 - 1.5* IQRQ3 + 1.5* IQR
Outer Fences Q1 - 3* IQRQ3 + 3* IQR
MinMax
Q1 Q3Mediana
8 de xxx
Box Plot
min max
3* IQR defines the outer fences, pointsBeyond that fences are extreme outliers
Points beyond the inner fences but below outer fences aremild outliers.
Inner fanceInner fance
Inner Fences Q1 - 1.5* IQRQ3 + 1.5* IQR
Inner fence:Inner fence:
IQR
1.5*IQR
9 de xxx
Box-Plot
49N =
GAS
34
33
32
31
30
29
EXAMINE VARIABLES=gas /COMPARE VARIABLE/PLOT=BOXPLOT/STATISTICS=NONE/NOTOTAL /MISSING=LISTWISE .
10 de xxx
> data=read.table("D:/Albert/COURSES/cursDAS/AS2003/DATA/BANK.TXT", header=TRUE)> dim(data)[1] 100 9> names(data)[1] "LSALNOW" "LSALBEG" "SEX" "JOBCAT" "RACE" "EDLEVEL" "TIME" [8] "AGE" "WORK" > data[sample(1:dim(data)[1],10),] LSALNOW LSALBEG SEX JOBCAT RACE EDLEVEL TIME AGE WORK25 9.4125 8.7483 0 3 0 12 80 61.67 38.3347 8.9227 8.3428 1 1 0 15 90 58.00 4.508 10.0078 9.5104 0 4 0 19 81 30.75 5.1733 9.5324 8.4888 1 2 0 12 77 24.33 0.3397 8.8217 8.3138 1 1 1 12 72 51.50 22.58100 8.9065 8.3138 1 1 1 12 85 51.00 19.0032 9.5104 8.6995 0 3 0 12 83 50.25 23.6794 8.8479 8.3138 1 1 1 12 72 46.50 9.6739 9.0711 8.5132 1 1 0 8 74 59.83 26.5036 9.1695 8.5942 1 1 0 12 98 47.33 20.33 > data[runif(dim(data)[1])<.1,]
Cross-section data: bank data
11 de xxx
Salnow by sex (boxplot)
boxplot(SALNOW ~SEX, col=c("blue", "green"))
12 de xxx
Red is kernel density Green is the normal distribution
> summary(INCOME) Min. 1st Qu. Median Mean 3rd Qu. Max.
2.00 14.00 20.00 22.44 30.00 100.00
13 de xxx
Log of incomelinc=log(INCOME)hist(linc,12, prob= TRUE, col='blue')lines(density(linc,bw=0.4), col='red')mu=mean(linc)sd=sqrt(var(linc))lines(sort(linc),dnorm(sort(linc),mu,sd), col='green')
Red is kernel density Green is the normal distribution
14 de xxx
Shape of the distribution and Mean, Median and Mode
15 de xxx
. summarize hsnotpau
Variable | Obs Mean Std. Dev. Min Max-------------+-------------------------------------------------------- hsnotpau | 3609 5.616312 1.123641 1.44 9.6
16 de xxx
Proporció d’estudiants entre notes
La distribució de les notes d’un examen test és aproximadament normal amb mitjana 6 i desviació tipica 1.7 ( 5.6 i desviació tipica 1.1). Trobeu:
a) Els quartils de la distribució:b) percentage aproximat d’estudiants amb una puntuació entre 5 i 7. c) El % d’estudiants que suspenen (suspenen amb nota <5). d) El % amb nota més gran que 7.e) Probabilitat que al triar 5 individus de la població d’estudiantsque han fet el test, n’hi hagi com a mínim 2 que tenen nota superior a 7. f) Quina es la distribució de la mitjana de notes de 10 estudiants escollits a l’atzar de la població que ha realitzat el test ?.
17 de xxx
Solució:
Quartils :
> qnorm(.25, 6, 1.7)[1] 4.853367> qnorm(.5, 6, 1.7)[1] 6> qnorm(.75, 6, 1.7)[1] 7.146633
Percentils :
> pnorm(5,6,1.7)[1] 0.2781872> pnorm(7,6,1.7)[1] 0.7218128> pnorm(7,6,1.7) - pnorm(5,6,1.7)[1] 0.4436256> 1- pnorm(7,6,1.7)[1] 0.2781872> pbinom(1, 5, 1- pnorm(7,6,1.7))[1] 0.5735169
Distribució mostral:
La mitjana mostral d’unamostra de 10 estudiants té distribució normal, demitjana 6 i desviació típica> 1.7/sqrt(10) [1] 0.5375872
18 de xxx
.
. summarize hsnotpau, detail
hsnotpau
-------------------------------------------------------------
Percentiles Smallest
1% 3 1.44
5% 3.81 2.11
10% 4.21 2.27 Obs 3609
25% 4.85 2.3 Sum of Wgt. 3609
50% 5.58 Mean 5.616312
Largest Std. Dev. 1.123641
75% 6.38 8.94
90% 7.09 9.07 Variance 1.262569
95% 7.61 9.37 Skewness .0930459
99% 8.25 9.6 Kurtosis 2.983357
19 de xxx
Funció d densitat de distribució normal
20 de xxx
Funció d densitat de distribució normal
Applet de la distribució Normal a :
Statistical Applets: http://bcs.whfreeman.com/ips4e/pages/bcs-main.asp?v=category&s=00010&n=99000&i=99010.01&o
Taules de la distribució normal:
Taules Estadístiques : http://bcs.whfreeman.com/ips4e/pages/bcs-main.asp?v=category&s=00100&n=99000&i=99100.01&o
Taules de la distribució normal a R: pnorm() qnorm()
Per exemple: > pnorm(1.87)[1] 0.969258> pnorm(-1.2)[1] 0.1150697
> qnorm(.975)[1] 1.959964> qnorm(.25)[1] -0.6744898
% d’estudiants amb una nota entre 5 i 7 ?(mitjana = 5.616312 desviació típica = 1.123641 )Z2=(7- 5.616312)/1.123641 Z1=(5- 5.616312)/1.123641 > pnorm(Z2) - pnorm(Z1)[1] 0.5992435
aproximadament un 60%. Més directe: pnorm(7, 5.616312,1.123641)- pnorm(5, 5.616312,1.123641) [1] 0.5992435
21 de xxx
The normal and t distributions( 10%, 5% 1% tails )
22 de xxx
Family consumption data (family.dta ): summary statistics
. summarize exp1_1, detail
------------------------------------------------------------- Percentiles Smallest 1% .1520551 7.18e-06 5% .3881256 7.65e-06 10% .5420735 .0000112 Obs 2640 25% .8613541 .0000267 Sum of Wgt. 2640 50% 1.294648 Mean 1.473449 Largest Std. Dev. .9169822 75% 1.901873 8.024636 90% 2.559126 8.826962 Variance .8408563 95% 3.10731 9.368608 Skewness 2.150655 99% 4.331305 10.20112 Kurtosis 13.92168
23 de xxx
Quantiles
24 de xxx
Dot-plot. dotplot food
foo
d
Frequency0 200 400
0
10.2011
25 de xxx
Histogramgraph exp1_1, bin(20) normal
Fra
ctio
n
food7.2e-06 10.2011
0
.27197
26 de xxx
Boxplot of expend. on food
7.2e-06
10.2011
food
27 de xxx
Comparison of Distributions graph exp1_1, box by( group)
7.2e-06
10.2011
food
1 2 3 4 5
28 de xxx
Distrib. of transform. var.
Fra
ctio
n
BC(exp1_1,.367)-2.68933 3.66543
0
.194318
29 de xxx
. summarize newfood, detail
BC(exp1_1,.367) ------------------------------------------------------------- Percentiles Smallest 1% -1.35978 -2.689332 5% -.7995382 -2.6885 10% -.5484188 -2.68311 Obs 2640 25% -.1452355 -2.667499 Sum of Wgt. 2640 50% .2708729 Mean .2863131 Largest Std. Dev. .6866023 75% .7250079 3.126675 90% 1.122054 3.334947 Variance .4714228 95% 1.406071 3.468853 Skewness .0912667 99% 1.941548 3.66543 Kurtosis 4.251757 .
30 de xxx
> Rendiment en Matemàtiques,
> Nombre de llibres a casa
REGR factor score 1 for analysis 1
1200
1000
800
600
400
200
0
Desv. tνp. = 1,00
Media = 0,00
N = 10791,00
How many books at home Q19
6,05,04,03,02,01,0
4000
3000
2000
1000
0
Desv. tνp. = 1,30
Media = 3,8
N = 10670,00
Pisa 2003
31 de xxx
Pisa 2003
1061512661927237235751155375N =
How many books at home Q19
Ren
dim
ent M
at
4
2
0
-2
-4
-6
32 de xxx
Pisa 2003
Informe
REGR factor score 1 for analysis 1
-1,05056 375 ,99645420
-,6346741 1155 ,93700815
-,1713318 3575 ,91639840
,1350648 2372 ,86959580
,4411666 1927 ,86447655
,5051664 1266 ,96605461
,0066095 10670 ,99317157
How many books athome Q190-10 books
11-25 books
26-100 books
101-200 books
201-500 books
More than 500 books
Total
Media N Desv. típ.
33 de xxx
Repàs d’alguns conceptes
Some exercises for the practice on the Normal Distribution
Exercises
1. The heights of adult men are normally distributed with a mean of 69.5 inches and a variance of 7.025 inches. Find the probabilitiesthat a man chosen at random will be (a) at least 72 inches tall, (b) at most 72 inches tall.
2. Scores on standard IQ Tests are usually designed to be normally distributed with a mean of 100 and a standard deviation of 15. Onsuch a test, find the probability that a person chosen at random will score (a) below 90, (b) above 90.
3. On American Roulette wheels, the probability of the ball landing on red is 18 / 38. Suppose 200 bets are placed on red. Use theNormal Approximation of the Binomial to approximate the probability of there being from 100 to 120 winners.
4. It is estimated that Americans average 200 deaths yearly (per 100,000 people) from heart attacks. Use the Normal Approximation ofthe Poisson to approximate the probability that 180 to 210 such deaths will occur in a random group of 100,000 Americans during agiven year.
Mean Value (1) (Mean of a random variable)
When a random phenomenon is repeated many times, the proportion of trials on which an outcome occurseventually approaches the probability of the outcome. If the outcomes are numerical, the average of the observedoutcomes eventually approaches the expected value. Sometimes we express the random outcome as X, arandom variable; then the expected value is also called the mean of X.
http://www.whfreeman.com/scc/con_index.htm?99spt