final project math 1040 - maria...
TRANSCRIPT
2014
Maria Davila
Salt Lake Community College
4/30/2014
Final project Math 1040
[FINAL PROJECT MATH 1040] April 30, 2014
2 | Salt Lake Community College
Term Project
Math 1040
As Criminal Justice student, Math 1040 is one of the requirement courses and important
part of my career. For this term project, I did chose Body Measurement data between 4
options given by my Instructor. This data set contains measurements for different parts
of the human body, such as chest, elbow, wrist and ankle diameter among several
others. Also, include categorical data such as gender that will help me accurately to
interpret the data.
The objective for this assignment is to pull together many of the concepts learned
through this semester as collecting samples, organizing and interpreting data using
graphs and conclusions drawn from the calculation results; using the information
provided by the data set "Body Measurements" I will select ONE CATEGORICAL
variable to build a Pie Chart and Pareto Chart, as well, I will use two different sampling
methods learned through this course. A quantitative variable will help to compute the
mean, standard deviation and the five summary numbers used to build frequency
histogram and box plot respectively. At the final of this project, I will be able to
construct level of confidence and significance for the data and test the Hypothesis for
the population proportion, also I will include the concept of Correlation between two
variables to illustrate any association between them if any.
[FINAL PROJECT MATH 1040] April 30, 2014
3 | Salt Lake Community College
PART I. ANALISYS OF CATEGORICAL VARIABLE -GENDER-
STEP I. GRAPHING THE DATA.
PIE CHART
Using the Categorical Variable denominated as "Gender" from the data set, I did a Pie
Chart, which reflects the percentage of males and females participating in this study. I
did take the entire population for this variable. As a result, I found that 48.72% of the
population are males and 51.28% are females. Blue color represents females and red
color, the males.
BODY MEASUREMENT.
Gender (Categorical Variable for entire population)
[FINAL PROJECT MATH 1040] April 30, 2014
4 | Salt Lake Community College
PARETO CHART
This graph contains the same categorical variable as Pie Chart showed before. Pareto
Chart is the best option to illustrate and compare data because we can see the
frequency calculated for Categorical variable, that is, Gender (female, represented by 0
and males represented by 1). Also, we do observe that there is not much difference
between the genders regarding to the participation in the study.
BODY MEASUREMENT.
Gender (Categorical Variable for entire population)
[FINAL PROJECT MATH 1040] April 30, 2014
5 | Salt Lake Community College
STEP II. USING SAMPLING
METHODS
SIMPLE RANDOM SAMPLE.
Pie Chart.
This is a Simple Random
Sample of 35 values taken from
the entire population of Gender
Variable, I get done this process
using my scientific calculator TI-
83 Plus (Math function, then
PRB, press 5:randInt(, next enter
values needed to calculate)
Enter.
Pareto Chart.
This graph was done using the same
sampling process and calculation as
Pie chart.
[FINAL PROJECT MATH 1040] April 30, 2014
6 | Salt Lake Community College
Systematic Sampling Method.
Pie Chart.
In this case, I had to select a start
point of the categorical data set,
(Gender) then I did select one value
for every fourteen elements. In
total I did select 35 values
manually.
Systematic Sampling Method
Pareto Chart
This is the same information contained
in Pie Chart but represented in a
Pareto chart.
[FINAL PROJECT MATH 1040] April 30, 2014
7 | Salt Lake Community College
ANALISYS OF QUANTITATIVE VARIABLE -HEIGHT-
Summary statistics table describe Mean and Standard Deviation for entire population of
the quantitative variable Height, also, includes the 5-number summary which are the
Median, Minimum and Maximum value, quartile 1 and quartile 3. Those quartiles are
measures of location that divide data into four groups with about 25% of the values in
each group. The graph used to illustrate Summary of statistics and quartiles is Box
Plot as showed in the next picture:
Summary statistics: Column n Mean Std. dev. Median Min Max Q1 Q3
Height 507 171.14379 9.4072052 170.3 147.2 198.1 163.8 177.8
Box Plot represents the
measurements of the data of
Summary Statistic table as a
graph.
[FINAL PROJECT MATH 1040] April 30, 2014
8 | Salt Lake Community College
Frequency Histogram
Show in its vertical scale
relative frequencies or
percentages instead of
actual frequencies.
This graph is bell shaped,
indicating that the data is
normally distributed,
meaning that frequencies
increases to a maximum
and then decreases. The
graph needs to be
symmetric to meet the
requirements for a normally
distribution.
Sampling methods
Having calculated measurements for the Entire Population of the variable (Height) I
used sampling methods as I showed in pages 2 and 3 of this document, then I
proceeded to calculate the mean and standard deviation for those samples, as well the
five-number summary and corresponding graphs for each sample as I did with the entire
population calculation.
Simple Random Sample.
I used Stat Crunch to obtain the simple random sample from the quantitative variable
(Height). Was very easy to use, just clicking on Data tab, selecting Sample and then I
did put in the calculation needed to obtain the sample. Next table includes the
calculation for mean and standard deviation for this simple random sample and the five-
numbers summary.
Summary statistics: Column n Mean Std. dev. Median Min Max Q1 Q3
Sample(Height) 35 171.62571 9.4314662 170.2 151.1 187.2 166.4 179.8
[FINAL PROJECT MATH 1040] April 30, 2014
9 | Salt Lake Community College
Graphs for simple random sample of quantitative variable (Height)
This is the Box Plot
graph corresponding
to the Summary
statistic data for the
simple random
sample of the
variable Height.
This is the Frequency Histogram
reflecting the same simple
random sample of the
quantitative variable (Height).
The shape of this histogram is
slightly skewed to the left.
[FINAL PROJECT MATH 1040] April 30, 2014
10 | Salt Lake Community College
Systematic Sample.
I did select a start point from the quantitative variable Height, this case number 14 and I
did chose a number every 14 elements, then I proceed with calculation for summary
statistics, box plot and frequency histogram to graph the result obtained.
Summary statistics:
Column n Mean Std. dev. Median Min Max Q1 Q3
syst.height sample 35 172.33143 9.4456268 173.2 155.8 188 164.4 180.3
[FINAL PROJECT MATH 1040] April 30, 2014
11 | Salt Lake Community College
This is the graph showing the results for Systematic sample taken from Height data
variable. This graph is lightly skewed to the left as frequency histogram of simple
random sample data.
[FINAL PROJECT MATH 1040] April 30, 2014
12 | Salt Lake Community College
Comparison analysis of the Histograms.
Gender Variable (categorical data)
This variable contain only two values: female and male. At the end of the calculations
and construction of the graphs, I did conclude that:
1. The Pareto chart did show that there are 260 females and 247 males.
2. Simple random sample was calculated with technology (TI-83 calculator) and its
values were reflected on Pareto and Pie charts, which show very similar results
regarding to the data taken from the entire population of the variable Gender.
Height Variable (quantitative data)
A histogram is a graph of bars that represent quantitative data, its vertical scale contain
frequencies and horizontal scale contain classes of quantitative data values. After
taking samples from entire population of Height variable using two different ways I found
that:
1. Simple random sample histogram is slightly skewed to the left because the
values of the variable (Height) start in a minimum value which is not zero, and
increase its values but not decreasing as a normal distribution does.
2. Systematic sample show a very slightly skewed to the left also, but taking into
consideration that the values were taken every 14 elements which are increasing
its values, it was an expected result.
In summary, we observed that quantitative variable (heights) give us statistic
information regarding to the mean and standard deviation, because its values are
numbers representing measurements, in this case the height of the participants; while
categorical data show us numbers that are used to identify two categories (Gender:
female and male). Those values does not represent any measurement but are used as
a labels to identify values. Because of the differences and role played by numbers in
each variable, I got different results and graph that were interpreted in different ways.
[FINAL PROJECT MATH 1040] April 30, 2014
13 | Salt Lake Community College
PART II. CONFIDENCE LEVEL.
Data Analysis Project Worksheet
Section 1
Population
Categorical Variable : GENDER
All Values of the Categorical Variable: 1: MALE AND 0: FEMALE
Choose one of the above values to use in Part 4 and Part 5 of the project.
FEMALE
p = 247
Sample 1 Sample 2
n = 35 n = 35
x = 7 x = 14
= 0.20 =0.40
[FINAL PROJECT MATH 1040] April 30, 2014
14 | Salt Lake Community College
Section 2
Population
Quantitative Variable HEIGHT
μ = 171.14379
σ = 9.40720
Sample 1 Sample 2
n = 35 n = 35
= 171.62571 =172.33143
s = 9.43147 s = 9.44562
[FINAL PROJECT MATH 1040] April 30, 2014
15 | Salt Lake Community College
Using data contained in the worksheet, I will create confidence intervals for the
population proportion from each of my samples taken from Categorical variable Gender.
Confidence level for sample One
P= +/- E
0.68 < < 0.332
Error Margin Calculation
E=0.132
Confidence level for sample Two
P= +/- E
0.68 < < 0.732
Error Margin Calculation
E = 0.162
[FINAL PROJECT MATH 1040] April 30, 2014
16 | Salt Lake Community College
Step II. Confidence interval for population mean of quantitative data.
Sample One
±
± 3.2389
168.3871 < < 174.8649
Error Margin Calculation
E = 3.2389
Sample Two
±
± 3.2458
169.08563 < < 175.57723
Error Margin Calculation
E = 3.2458
[FINAL PROJECT MATH 1040] April 30, 2014
17 | Salt Lake Community College
Step III. Confidence interval for population standard deviation of quantitative
data.
Sample One
<
<
8.0232 < < 13.4202
Sample Two
<
<
8.0368 < < 13.4757
[FINAL PROJECT MATH 1040] April 30, 2014
18 | Salt Lake Community College
Reflection about confidence levels.
A confidence level may be use to test a claim about the mean of the population mean or
its standard deviation. The calculations above show these confidence levels using two
variables: categorical and quantitative.
Regarding to the results of categorical data, its numbers are not measurements, so I
can state that confidence levels and any calculation made with the numbers contained
in any categorical variable are meaningless.
The difference with quantitative variable is that its numbers represent measurements
and any calculation based with those numbers have a meaning that I can use to draw a
conclusion about the population. Only results from quantitative samples contain the
population parameter.
Step IV. Level of significance.
Since categorical data variable does not represent measurements, I will calculate level
of significance only with quantitative data variable.
Sample One
Test-statistic:
t= - /
t = 0.3023
cv= 2.032
Fail to reject Ho.
Conclusion. The value of t- test is not contained in neither of the two tails, so we
fail to reject the claim that the mean of the population data is 171.62571.
[FINAL PROJECT MATH 1040] April 30, 2014
19 | Salt Lake Community College
Sample Two
Test-statistic:
t= - /
t = 0.7439
cv= 2.032
Fail to reject Ho.
Conclusion. The value of t- test is not contained in neither of the two tails, so we
fail to reject the claim that the mean of the population data is 172.33143.
Correlation and regression calculation.
Simple linear regression results: Dependent Variable: Gender (1 - M, 0 - F)
Independent Variable: Height
Gender (1 - M, 0 - F) = -5.7448963 + 0.036414268 Height Sample size: 507
R (correlation coefficient) = 0.68466211
cv= 0.196 Since absolute value of R is greater than critical value (cv) I conclude that there is not significance correlation between gender and height of a person.
[FINAL PROJECT MATH 1040] April 30, 2014
20 | Salt Lake Community College
Final conclusion.
Numbers and calculations are essential part for most of today's careers.
Statistics have been a very important tool that is described as a mathematical body of
science that pertains to the collection, analysis, interpretation or explanation, and
presentation of data, used by several agencies, companies, manufactures, sciences,
etc. Many studies such as health, consumers even politics, use Statistics to understand
and interpret data; also, based on those results it is possible to predict futures values or
outcomes for a research.
Through this project I did learn methods to improve my math skills. I understood
that math is every part of the real life. We need calculation for everything such as cost
of life, criminal, political and students rates, among thousand more. Numbers are
wonderful tool to develop many other skills such as statistic and analytical skills.
[FINAL PROJECT MATH 1040] April 30, 2014
21 | Salt Lake Community College