meljun cortes data types rm104tr-13
TRANSCRIPT
-
7/29/2019 MELJUN CORTES DATA TYPES Rm104tr-13
1/39
Lesson 13 - 1
Year 1
CS113/0401/v1
LESSON 13TYPES OF DATA
Qualitative
Not usually numeric No particular order
Examples:
Colour, Types of Materials
Quantitative Numeric
Ordered
Measurable
Continuous E.g. Length, Age, Weight
Discrete
E.g. Shoe size, Number ofpeople
-
7/29/2019 MELJUN CORTES DATA TYPES Rm104tr-13
2/39
Lesson 13 - 2
Year 1
CS113/0401/v1
First stage in making raw data
understandable
RAW DATA
Number of sheets of listing paper
used by each of 120 jobs
Not easily digested!
17
24 11
14
18
17
7
5
21
6 11 18 22 14 6 17
14
8
12132712 189
14
18 14
13
21
8
27
9
11
16 27 21 14 11 19 7
10
29
17121419 129
23
17 24
7
13
14
17
21
8
17 19 24 26 2 5 18
14
16
7162813 148
19
27 9
18
8
24
19
7
13
14 16 19 11 17 23 12
25
16
15102118 1411
9
14 28
20
12
16
10
8
9
11 22 10 17 9 18 12
24
8
716520 710
DATA TABULATION (1)
-
7/29/2019 MELJUN CORTES DATA TYPES Rm104tr-13
3/39
Lesson 13 - 3
Year 1
CS113/0401/v1
Category
(No of sheets
used)
Tally Frequency
0 - 115 - 261111 1111 1111 1111 1111 1
10 - 371111 1111 1111 1111 1111
1111 1111 11
15- 311111 1111 1111 1111 1111
1111 120 - 161111 1111 1111 1
25 - 91111 1111
120Total
Frequency distribution table
DATA TABULATION (2)
Tabulate in (discrete) categories
-
7/29/2019 MELJUN CORTES DATA TYPES Rm104tr-13
4/39
Lesson 13 - 4
Year 1
CS113/0401/v1
FREQUENCY DISTRIBUTION(1)
Raw data
Raw data are collected data
which have been organized
numerically
Array
An array is an arrangement of
raw numerical data in ascendingor descending order of
magnitude. The difference
between the largest and smallest
number is called the range of the
data
-
7/29/2019 MELJUN CORTES DATA TYPES Rm104tr-13
5/39
Lesson 13 - 5
Year 1
CS113/0401/v1
FREQUENCY DISTRIBUTION(2)
Frequency distribution
When summarizing a large
number of raw data it is often
useful to distribute the data into
classes or categories and to
determine the number of
individuals belonging to each
class, called the class frequency
-
7/29/2019 MELJUN CORTES DATA TYPES Rm104tr-13
6/39
Lesson 13 - 6
Year 1
CS113/0401/v1
EXAMPLE
A set of 100 students obtainedfrom an alphabetical listing of an
university record.
Their weights ranging from 60kgto 74kg are tabulated.
-
7/29/2019 MELJUN CORTES DATA TYPES Rm104tr-13
7/39
Lesson 13 - 7
Year 1
CS113/0401/v1
Mass ( kilograms) Number of Students
60 - 62
63 - 65
66 - 68
69 - 71
72 - 74
5
18
42
27
8
Total 100
EXAMPLE
The first class or category, for
example consists of masses from 60
to 62 kg and is indicated by the
symbol 60 - 62. Since 5 students
have masses belonging to this class,
the corresponding class frequency is
5.
Data organized and summarized in
the above frequency distribution are
often called grouped data
-
7/29/2019 MELJUN CORTES DATA TYPES Rm104tr-13
8/39
Lesson 13 - 8
Year 1
CS113/0401/v1
CLASS INTERVAL
A symbol defining a class such as60 - 62 is called a class interval.
The end numbers 60 and 62, are
called the class limits.
The smaller number 60 is the
lower class limit and the larger
number 62 is the upper class
limit.
-
7/29/2019 MELJUN CORTES DATA TYPES Rm104tr-13
9/39
Lesson 13 - 9
Year 1
CS113/0401/v1
CLASS MARK
A class mark is the midpoint ofthe class interval and is obtained
by adding the lower and upper
class limits and dividing by two
In the previous examples, the
class mark of the interval 60 - 62
is (60 + 62) / 2 = 61
-
7/29/2019 MELJUN CORTES DATA TYPES Rm104tr-13
10/39
Lesson 13 - 10
Year 1
CS113/0401/v1
MEDIAN (1)
The median of a set of numbers
arranged in order of magnitude isthe middle value or the arithmetic
mean of the two middle values.
Example 1
The set of numbers
3, 4, 4, 5, 6, 8, 8, 8, 10
For an odd number of data the
median occurs at position
(N + 1) / 2
= 10 / 2
= 5th position
Therefore the median = 6
-
7/29/2019 MELJUN CORTES DATA TYPES Rm104tr-13
11/39
Lesson 13 - 11
Year 1
CS113/0401/v1
MEDIAN (2)
Example 2 The set of numbers
5, 5, 7, 9, 11, 12, 15, 18
For even number of data themedian is the average of the two
middle values
The median= (Pos 4 + Pos 5) / 2
= (9 + 11) / 2
= 10
-
7/29/2019 MELJUN CORTES DATA TYPES Rm104tr-13
12/39
Lesson 13 - 12
Year 1
CS113/0401/v1
For grouped data the median,obtained by interpolation is given
by
MEDIAN = L1 + C
Where
L1 = lower class boundary of the
median class(I.e. the classcontaining the median).
N = number of items in the data
(I.e. total frequency)
median
- 1
N
2
MEDIAN (1)
-
7/29/2019 MELJUN CORTES DATA TYPES Rm104tr-13
13/39
Lesson 13 - 13
Year 1
CS113/0401/v1
MEDIAN (2)
1 = sum of frequenciesof all classes lower
than the median
class
median = frequency of median
class
c = size of median classinterval
-
7/29/2019 MELJUN CORTES DATA TYPES Rm104tr-13
14/39
Lesson 13 - 14
Year 1
CS113/0401/v1
MEDIAN OF A GROUPEDFREQUENCY DISTRIBUTION
Draw a Cumulative FrequencyDiagram
Search for the middle value on
the c axis and read off thecorresponding value on the x axis
This is the median
-
7/29/2019 MELJUN CORTES DATA TYPES Rm104tr-13
15/39
Lesson 13 - 15
Year 1
CS113/0401/v1
MEDIAN FROM A
CUMULATIVE FREQUENCYDIAGRAM
-
7/29/2019 MELJUN CORTES DATA TYPES Rm104tr-13
16/39
Lesson 13 - 16
Year 1
CS113/0401/v1
MODE (1)
The mode of a set of numbers is
that value which occurs with the
greatest frequency, I.e. it is the
most common value. The mode
may not exit, and even of it does
exists it may not be unique
Example
The set
2, 2, 5, 7, 9, 9, 9, 10, 11, 12, 18
has mode 9
Example
The set
3, 5, 8, 10, 12, 15, 16
has no mode
-
7/29/2019 MELJUN CORTES DATA TYPES Rm104tr-13
17/39
Lesson 13 - 17
Year 1
CS113/0401/v1
MODE (2)
Example
The set
2, 3, 4, 4, 4, 5, 5, 7, 7, 7, 9
has mode 4 and 7 and is
called bimodal
A distribution having only one
mode is called unimodal
-
7/29/2019 MELJUN CORTES DATA TYPES Rm104tr-13
18/39
Lesson 13 - 18
Year 1
CS113/0401/v1
MODE OF A FREQUENCYDISTRICUTION
Ungrouped data
Mode is the x value which has
the highest value of
Grouped data
Cant find mode, only the modal
class
-
7/29/2019 MELJUN CORTES DATA TYPES Rm104tr-13
19/39
Lesson 13 - 19
Year 1
CS113/0401/v1
x f
51 - 55
55 - 60
61 - 65
12
16
10
MODAL CLASS
55 - 60 is the modal class
We dont know x values before
grouping, so we cant find the
mode exactly
N.B.
Actual mode might not even be in
this class
-
7/29/2019 MELJUN CORTES DATA TYPES Rm104tr-13
20/39
Lesson 13 - 20
Year 1
CS113/0401/v1
In cases where grouped datawhere frequency curve has been
constructed to fit the data, the
mode will be the value (or values)
of x corresponding to the
maximum point (or points) on thecurve, From a frequency
distribution or histogram the
mode can be obtained from the
following formula,
Mode = L1 + ((
1 + 2
1* c
MODE (1)
-
7/29/2019 MELJUN CORTES DATA TYPES Rm104tr-13
21/39
Lesson 13 - 21
Year 1
CS113/0401/v1
Where
L1 = lower class boundary ofmodal class
(i.e. class containing the
mode).
1 = excess of modal frequency
over frequency of next lower
class
2 = excess of modal frequency
over frequency of the next
higher class
c = size of modal class interval
MODE (2)
-
7/29/2019 MELJUN CORTES DATA TYPES Rm104tr-13
22/39
Lesson 13 - 22
Year 1
CS113/0401/v1
GROUPED MODE FROMHISTOGRAM (1)
Can only ESTIMATE
Assume mode is in Modal Class
-
7/29/2019 MELJUN CORTES DATA TYPES Rm104tr-13
23/39
Lesson 13 - 23
Year 1
CS113/0401/v1
Calculation Mode Estimate
= 25 + 5 x
= 25 + 5 x
= 25 + 1.9
= 26.9
40
40 + 64
40
104
GROUPED MODE FROMHISTOGRAM (2)
-
7/29/2019 MELJUN CORTES DATA TYPES Rm104tr-13
24/39
Lesson 13 - 24
Year 1
CS113/0401/v1
X =
X1 + X2 + X3+ .. + Xn
N
=
n
i=1
Xi
N
ARITHMETIC MEAN (1)
The arithmetic mean or the meanof a set of N numbers X1, X2, X3,
..., Xn is donoted by X is defined
as
-
7/29/2019 MELJUN CORTES DATA TYPES Rm104tr-13
25/39
Lesson 13 - 25
Year 1
CS113/0401/v1
ARITHMETIC MEAN (2)
Eight numbers:
7, 21, 13, 17, 23, 18, 9, 20
Add them = 128
Divide by 8 = 16
This is the arithmetic mean
It is the the most common
definition of average
It only works with quantitative
data
-
7/29/2019 MELJUN CORTES DATA TYPES Rm104tr-13
26/39
Lesson 13 - 26
Year 1
CS113/0401/v1
X =
1X1 + 2X2+ .. + nXn
1 + 2 + . n
=
n
i=1
iXi
in
i=1
X
ARITHMETIC MEAN (3)
If the number X1, X2, X3, ..., Xnoccurs 1, 2, 3, ..., n times
respectively, the arithmetic mean
is
-
7/29/2019 MELJUN CORTES DATA TYPES Rm104tr-13
27/39
Lesson 13 - 27
Year 1
CS113/0401/v1
MEAN OF A FREQUENCY
DISTRIBUTION
Mean age = = 20.77
(rounded to nearest integer, 21)
2077100
Age (x) xFrequency ()
17
18
19
20
21
22
23
24
25
26
3
8
14
21
24
13
7
6
3
1
51
144
266
420
504
286
161
144
75
26
= 100 x = 2077
-
7/29/2019 MELJUN CORTES DATA TYPES Rm104tr-13
28/39
Lesson 13 - 28
Year 1
CS113/0401/v1
HISTOGRAMS (1)
Only used for quantitative data
Histogram is like a bar chart, but
with no gaps between bars and
calibrated horizontal axis
Order of bars depends on value
and on horizontal scale
-
7/29/2019 MELJUN CORTES DATA TYPES Rm104tr-13
29/39
Lesson 13 - 29
Year 1
CS113/0401/v1
HISTOGRAMS (2)
-
7/29/2019 MELJUN CORTES DATA TYPES Rm104tr-13
30/39
Lesson 13 - 30
Year 1
CS113/0401/v1
HISTOGRAMS (3)
-
7/29/2019 MELJUN CORTES DATA TYPES Rm104tr-13
31/39
Lesson 13 - 31
Year 1
CS113/0401/v1
AREA IN HISTOGRAMS
-
7/29/2019 MELJUN CORTES DATA TYPES Rm104tr-13
32/39
Lesson 13 - 32
Year 1
CS113/0401/v1
Line of Code No of Programs
100 -
150 -
125 -
39
51
42
24
12
3
325 - 349
300 -
21275 -
30250 -
200 -
175 -
225 -
12
6
CUMULATIVE FREQUENCYDIAGRAMS (1)
Table 1:
-
7/29/2019 MELJUN CORTES DATA TYPES Rm104tr-13
33/39
Lesson 13 - 33
Year 1
CS113/0401/v1
Line of Code(less than)
CumulativeFrequency
100
150
125
132
81
39
15
3
0
325
300
201275
171250
200
175
225
222
234
240350
CUMULATIVE FREQUENCYDIAGRAMS (2)
Table 2:
-
7/29/2019 MELJUN CORTES DATA TYPES Rm104tr-13
34/39
Lesson 13 - 34
Year 1
CS113/0401/v1
020
40
60
80
100120
140
160
180
200
220
240
0 50 100 150 200 250 300 350
Lines of code (less than)
CummulativeFrequency
CUMULATIVE FREQUENCYDIAGRAMS(3)
Cumulative
Frequency
Curve
-
7/29/2019 MELJUN CORTES DATA TYPES Rm104tr-13
35/39
Lesson 13 - 35
Year 1
CS113/0401/v1
n
i=1
(Xi - X) 2
N
STANDARD DEVIATION (1)
The Standard Deviation of a setof N numbers X1, X2, ..., Xn is
denoted by S.D. and is defined by
S.D. =
Where
X = Arithmetic Mean
N = Total Number of element in
the set
-
7/29/2019 MELJUN CORTES DATA TYPES Rm104tr-13
36/39
Lesson 13 - 36
Year 1
CS113/0401/v1
nj=1
[ j (Xj- X)2 ]
n
j=1
i Xi2
i
i Xi2-
i( )
S.D.
or
S.D. =
STANDARD DEVIATION (2)(GROUPED DATA)
If X1, X2, ..., Xn occurs withfrequencies 1, 2, ..., n
respectively, the standard
deviation can be written as
-
7/29/2019 MELJUN CORTES DATA TYPES Rm104tr-13
37/39
Lesson 13 - 37
Year 1
CS113/0401/v1
Question 6 c) NCC 1/93
On test the actual access times
for 50 hard disc drives weredistributed as follows:
Calculate the mean access time andthe standard deviation.
Time (ms)
No. of Drives
22.6
3
22.7
1
23.022.9
106
22.8 23.223.1
914 25
23.3
-
7/29/2019 MELJUN CORTES DATA TYPES Rm104tr-13
38/39
Lesson 13 - 38
Year 1
CS113/0401/v1
Alternative Question 6cx22.6
22.7
22.8
22.9
23.0
23.1
23.2
23.3
f fx fx2
1
3
6
10
14
9
5
2
22.6
68.1
136.8
229.0
322.0
207.9
116.0
46.6
510.76
1545.87
3119.04
5244.10
7406.00
4802.49
2691.20
1085.781149.0 26405.24 (1 mark for each total) 2
2[1] [1]
Mean = 1149
50
= 22.98 [1]
S.D =fx2f
( X )2
=26405.24
50(22.98)2
= 0.156
[1]
-
7/29/2019 MELJUN CORTES DATA TYPES Rm104tr-13
39/39
Year 1
The variance of a set of data isdefined as the square of the
standard deviation and is thus
given by (S.D.)
Variance =
i.e.
Variance = (S.D.)2
n
j=1
[ j (Xj- X)2 ]
n
j=1j
VARIANCE