knife in the wind

41
MH 1401 ALGORITHMS & COMPUTING I Asst. Prof. Axel POSCHMANN AY 2012/13 Semester 1 23.10.2012 Lecture 10: Basic Statistics, Sets, Sorting, and Indexing

Upload: john-tan

Post on 14-Sep-2015

224 views

Category:

Documents


0 download

DESCRIPTION

Like Water Cooking Book

TRANSCRIPT

  • MH 1401 ALGORITHMS & COMPUTING I Asst. Prof. Axel POSCHMANN AY 2012/13 Semester 1

    23.10.2012 Lecture 10: Basic Statistics, Sets, Sorting, and Indexing

  • 23/10/12 Lecture 10: Basic Statistics, Sets, Sorting, and Indexing L10-2

    Remaining Lab Schedule This Friday 26.10.2012 is Hari Raya Haji No make up lab sessions for LA4, LA5, LA6, no lab session LA3

    on Wednesday 24.10.2012 Next week will be Lab 8 (no marks, but part of GL2) The week after is Graded Lab Session 2 (GL2) The week after is presentation of the final project

    Lab 8 GL2 Project LA1 Monday 29.10.2012 05.11.2012 12.11.2012 LA2 Tuesday 30.10.2012 06.11.2012 20.11.2012 LA3 Wednesday 31.10.2012 07.11.2012 14.11.2012 LA4 Friday 02.11.2012 09.11.2012 16.11.2012 LA5 Friday 02.11.2012 09.11.2012 16.11.2012 LA6 Friday 02.11.2012 09.11.2012 16.11.2012

  • 23/10/12 Lecture 10: Basic Statistics, Sets, Sorting, and Indexing L10-3

    Final Project Information are available in edventure Groups of 5 have been (randomly) assembled Meet your team and split the work, discuss your approach, schedule

    meetings etc Remember: everybody is responsible for one part More explanations during next weeks lecture

    Deadline (sharp): 11.11.2012 23:59h SGT

  • 23/10/12 Lecture 10: Basic Statistics, Sets, Sorting, and Indexing L10-4

    Quiz Use a pen Closed Book Move your bags, materials etc far away

    5 minutes 1% possible

  • 23/10/12 Lecture 10: Basic Statistics, Sets, Sorting, and Indexing L10-5

    MH 1401 Algorithms & Computing I Outline

    Statistical Functions

    Set Operations

    Sorting

    Index Vectors

    Lessons Learned

  • 23/10/12 Lecture 10: Basic Statistics, Sets, Sorting, and Indexing L10-6

    Statistical Functions in MATLAB Statistical functions in MATLAB are in the data analysis

    help topic datafun >> help datafun

    In general we will write a data set of n values as

    X = {x1, x2, x3, x4, .., xn}

    In MATLAB this will generally be represented as a row vector called x

  • 23/10/12 Lecture 10: Basic Statistics, Sets, Sorting, and Indexing L10-7

    Motivating Example Statistics can be used to characterize properties of a data

    set Consider a set of exam grades

    x = {33, 75, 77, 82, 83, 85, 85, 91, 100} What is a normal, expected or average exam grade? There are several ways to interpret this:

    Mean: summing the grades, then divide by n (79) Mode: Most often found grade (85) Median: The value in the middle of the list (83)

    Another useful property to know is how spread out the data values are within the data set

  • 23/10/12 Lecture 10: Basic Statistics, Sets, Sorting, and Indexing L10-8

    Exp: UK Income Distributions Mode ~290

  • 23/10/12 Lecture 10: Basic Statistics, Sets, Sorting, and Indexing L10-9

    Min and Max MATLAB has many built in functions for statistics min and max also return the index of the smallest/largest

    value; if there is more than one occurrence, it returns the first

    Example >> x=[9 10 10 9 8 7 3 10 9 8 5 10];

    >> [maxval, maxind] = max(x)

    maxval =

    10

    maxind =

    2

    Only the first index is returned

  • 23/10/12 Lecture 10: Basic Statistics, Sets, Sorting, and Indexing L10-10

    Min and Max (ctd) For matrices min and max operate columnwise Example >> mat=[9 10 17 5; 19 9 11 14]

    mat =

    9 10 17 5

    19 9 11 14

    >> [minval, minind] = min(mat)

    minval =

    9 9 11 5

    minind =

    1 2 2 1

  • 23/10/12 Lecture 10: Basic Statistics, Sets, Sorting, and Indexing L10-11

    Min and Max (ctd) min and max can also compare vector/matrices with the

    same dimension Example >> x=[3 5 8 2 11];

    >> y=[2 6 4 5 10];

    >> min(x,y)

    Ans =

    2 5 4 2 10

    Second argument is for second vector/matrix

  • 23/10/12 Lecture 10: Basic Statistics, Sets, Sorting, and Indexing L10-12

    Rowwise Min and Max (ctd) To find the minimum/maximum for each row, the

    dimension of 2 can be specified as the third argument Example >> mat=[9 10 17 5; 19 9 11 14]

    mat =

    9 10 17 5

    19 9 11 14

    [minval, minind] = min(mat,[],2)

    minval = minind =

    5 4

    9 2

    For min and max the second argument must be empty vector

  • 23/10/12 Lecture 10: Basic Statistics, Sets, Sorting, and Indexing L10-13

    The Arithmetic Mean The arithmetic mean of a data set is also usually called

    the average of the values It is the sum of the values divided by the number of values

    >> x=[33, 75, 77, 82, 83, 85, 85, 91, 100];

    >> mean(x)

    ans =

    79

    x =xi

    i=1

    n

    n

  • 23/10/12 Lecture 10: Basic Statistics, Sets, Sorting, and Indexing L10-14

    The Arithmetic Mean (ctd) For a matrix, mean operates columnwise >> mat = [8 9 3; 10 2 3; 6 10 9] >> mean(mat) ans =

    8 7 5

    To find the mean of each row, the second argument is 2 >> mean(mat,2) ans =

    6.6667 5.0000 8.3333

    For mean the second Argument does not need to be []

    8 9 3 10 2 3 6 10 9

    columnwise

    8 9 3 10 2 3 6 10 9

    rowwise

  • 23/10/12 Lecture 10: Basic Statistics, Sets, Sorting, and Indexing L10-15

    Outliers Sometimes a value that is much larger or smaller than the

    rest of the data -called an outlier- can throw off the mean Example >>ybig=[9 10 10 9 8 100 7 3 10 9 8 5 10];

    >>mean(ybig)

    ans =

    15.2308

    Typically, an outlier represent an error of some kind (data collection etc)

    In this example, the maximum and minimum could be removed using logical indexing (how?)

  • 23/10/12 Lecture 10: Basic Statistics, Sets, Sorting, and Indexing L10-16

    Variance and Standard Deviation The standard deviation and variance are ways of

    determining the spread of the data The variance is usually defined in terms of the arithmetic

    mean as

    (Sometimes the denominator is defined as n, but MATLAB uses n-1)

    var =(xi mean)2

    i=1

    n

    n1

  • 23/10/12 Lecture 10: Basic Statistics, Sets, Sorting, and Indexing L10-17

    Variance and Standard Deviation (ctd) Example, x = [8 7 5 4 6]. Then the mean for the n=5

    elements is 6

    Matlab has a built in function var

    >> x = [8 7 5 4 6];

    >> var(x)

    ans =

    2.5000

    var = (8 6)2 + (7 6)2 + (5 6)2 + (4 6)2 + (6 6)2

    4 = 2.5

  • 23/10/12 Lecture 10: Basic Statistics, Sets, Sorting, and Indexing L10-18

    Variance and Standard Deviation (ctd) The standard deviation is the square root of the variance

    MATLAB has a built in function std

    >> x = [8 7 5 4 6];

    >> std(x)

    ans =

    1.5811

    sd = var

  • 23/10/12 Lecture 10: Basic Statistics, Sets, Sorting, and Indexing L10-19

    The Mode The mode of a data set is the value that appears most

    frequently MATLAB has a built in function mode >> x=[9 10 10 9 8 7 3 10 9 8 5 10]; >> mode(x) ans =

    10 If there is more than one value with the same (highest)

    frequency, the smaller value is the mode >> x=[3 8 5 3 4 1 8]; >> mode(x) ans =

    3

  • 23/10/12 Lecture 10: Basic Statistics, Sets, Sorting, and Indexing L10-20

    The Median The median is defined only for a data set that has been

    sorted first, meaning that the values are in order The median of a sorted set of n data values is defined as

    The value in the middle if n is odd The average of the two values in the middle if n is even

    MATLAB has a built in function median median Works also for unsorted vectors >> x=[1 4 5 9 12]; >> x=[1 4 5 9 12 33]; >> median(x) >> median(x) ans = ans =

    5 7 Even case Odd case

  • 23/10/12 Lecture 10: Basic Statistics, Sets, Sorting, and Indexing L10-21

    MH 1401 Algorithms & Computing I Outline

    Statistical Functions

    Set Operations

    Sorting

    Index Vectors

    Lessons Learned

  • 23/10/12 Lecture 10: Basic Statistics, Sets, Sorting, and Indexing L10-22

    Set Functions in MATLAB MATLAB has several built in functions that perform set

    operations Examples are:

    union, intersect setdiff setxor unique

    All return vectors that are sorted from lowest to highest (ascending order)

    There are two is functions: issorted and ismember

  • 23/10/12 Lecture 10: Basic Statistics, Sets, Sorting, and Indexing L10-23

    Union The union function returns a vector that contains all of

    the values from the two input argument vectors Example >>v1=[6 5 4 3 2];

    >>v2=[1 3 5 7];

    >>union(v1,v2)

    ans =

    1 2 3 4 5 6 7

    v1 v2

  • 23/10/12 Lecture 10: Basic Statistics, Sets, Sorting, and Indexing L10-24

    Intersect The intersect function returns a vector that contains all

    of the values that can be found in both of the two input argument vectors

    Example >>v1=[6 5 4 3 2];

    >>v2=[1 3 5 7];

    >>intersect(v1,v2)

    ans =

    3 5

    v1 v2

  • 23/10/12 Lecture 10: Basic Statistics, Sets, Sorting, and Indexing L10-25

    Intersect (ctd) The intersect function also returns an index vector into

    v1 and an index vector into v2, such that outvec is the same as v1(index1) and also v2(index2)

    Example >>v1=[6 5 4 3 2];

    >>v2=[1 3 5 7];

    >>[outvec,index1,index2]= intersect(v1,v2)

    outvec = index1 = index2 =

    3 5 4 2 2 3

  • 23/10/12 Lecture 10: Basic Statistics, Sets, Sorting, and Indexing L10-26

    Setdiff The setdiff function returns a vector consisting of all of the

    values that are contained in the first input argument vector but not the second

    The order of the input arguments is important! Example >>v1=[6 5 4 3 2]; >>v2=[1 3 5 7]; >>setdiff(v1,v2) ans =

    2 4 6 >>setdiff(v2,v1) ans =

    1 7

    v1 v2

    v1 v2

  • 23/10/12 Lecture 10: Basic Statistics, Sets, Sorting, and Indexing L10-27

    Setxor The setxor function returns a vector consisting of all of the

    values from the two input vectors that are not in the intersection of these two vectors

    It is the union of the two vectors obtained using setdiff Example >>v1=[6 5 4 3 2]; >>v2=[1 3 5 7]; >>setxor(v1,v2) ans =

    1 2 4 6 7 >>union(setdiff(v1,v2),setdiff(v2,v1)) ans =

    1 2 4 6 7

    v1 v2

  • 23/10/12 Lecture 10: Basic Statistics, Sets, Sorting, and Indexing L10-28

    Unique The unique function returns all of the unique values from

    a set argument

    Example >>v3=[1 2 3 4 5 3 4 5 6];

    >>unique(v3)

    ans =

    1 2 3 4 5 6

  • 23/10/12 Lecture 10: Basic Statistics, Sets, Sorting, and Indexing L10-29

    MH 1401 Algorithms & Computing I Outline

    Statistical Functions

    Set Operations

    Sorting

    Index Vectors

    Lessons Learned

  • 23/10/12 Lecture 10: Basic Statistics, Sets, Sorting, and Indexing L10-30

    Sorting Sorting is the process of putting a list in order

    Either descending highest to lowest Or ascending lowest to highest

    Example >>vec=[85 70 100 95 80 91];

    >>vec=sort(vec)

    vec =

    70 80 85 91 95 100

    >>vec=sort(vec,descend) vec =

    100 95 91 85 80 70

    By default sorted acending

  • 23/10/12 Lecture 10: Basic Statistics, Sets, Sorting, and Indexing L10-31

    Sorting (ctd) For matrices the sort function will sort each column To sort by rows dimension 2 is specified Example >>sort(mat) ans =

    6 2 3 8 9 3 10 10 9

    >>sort(mat,2) ans =

    3 8 9 2 3 10 6 9 10

    8 9 3 10 2 3 6 10 9

    columnwise

    8 9 3 10 2 3 6 10 9

    rowwise

  • 23/10/12 Lecture 10: Basic Statistics, Sets, Sorting, and Indexing L10-32

    Sorting rows The sortrows function sorts each row as a block, or

    group

    Example >>sortrows(mat)

    ans =

    6 10 9

    8 9 3

    10 2 3

    8 9 3 10 2 3 6 10 9

    columnwise

  • 23/10/12 Lecture 10: Basic Statistics, Sets, Sorting, and Indexing L10-33

    Sorting rows (ctd) The sortrows function also works on strings Example >>words=char(Hello,Hi,Goodbye,Ciao) Words =

    Hello Hi Goodbye Ciao

    >>sortrows(words) ans =

    Ciao Goodbye Hello Hi

  • 23/10/12 Lecture 10: Basic Statistics, Sets, Sorting, and Indexing L10-34

    MH 1401 Algorithms & Computing I Outline

    Statistical Functions

    Set Operations

    Sorting

    Index Vectors

    Lessons Learned

  • 23/10/12 Lecture 10: Basic Statistics, Sets, Sorting, and Indexing L10-35

    Index Vectors Using index vectors is an alternative to sorting a vector Indexing leaves vector in its original order, just point to

    the elements in the desired order Example

    >>grades=[85 70 100 95 80 91];

    >>grade_index=[2 5 1 6 4 3];

    >>grades(grade_index)

    ans =

    70 80 85 91 95 100

    index 1 2 3 4 5 6

  • 23/10/12 Lecture 10: Basic Statistics, Sets, Sorting, and Indexing L10-36

    Index Vectors (ctd) General algorithm to create an index vector Initialize the values in the index vector to be the indices

    1,2,3, to the length if the vector Use any sort algorithm, but compare the elements in the

    original vector using the index vector to index into it (e.g. using grades(grades_index(i))) as previously shown

    When the sort algorithm calls for exchanging values, exchange the elements in the index vector, not in the original vector.

  • 23/10/12 Lecture 10: Basic Statistics, Sets, Sorting, and Indexing L10-37

    Index Vectors (ctd) function indvec = createind(vec) % Initialize the index vector

    len = length(vec);

    indvec = 1:len;

    for i = 1:len-1

    low = i;

    for j=i+1:len

    % Compare values in the original vector

    if vec(indvec(j)) < vec(indvec(low))

    low = j;

    end

    end

    % Exchange elements in the index vector

    temp = indvec(i);

    indvec(i) = indvec(low);

    indvec(low) = temp;

    end

    end

    createind.m

  • 23/10/12 Lecture 10: Basic Statistics, Sets, Sorting, and Indexing L10-38

    MH 1401 Algorithms & Computing I Outline

    Statistical Functions

    Set Operations

    Sorting

    Index Vectors

    Lessons Learned

  • Lessons learned Common Pitfalls: Forgetting that max and min return the index of only the

    first occurrence of the maximum or minimum value Not realizing that a data set has outliers that can

    drastically alter the results obtained from the statistical functions

    23/10/12 Lecture 10: Basic Statistics, Sets, Sorting, and Indexing L10-39

  • Lessons learned Programming Style Guidelines: Remove the largest and the smallest numbers from a

    large data set before performing statistical analysis to handle the problem of outliers

    Use sortrows to sort strings stored in a matrix alphabetically

    23/10/12 Lecture 10: Basic Statistics, Sets, Sorting, and Indexing L10-40

  • 23/10/12 Lecture 10: Basic Statistics, Sets, Sorting, and Indexing L10-41

    The Geometric Mean The geometric mean of the n values in a vector x is

    defined as the nth root of the product of the data set values

    >> x = [33, 75, 77, 82, 83, 85, 91, 100];

    >> mean(x)

    ans =

    78.2500

    G = x1 * x2 * x3 *...xnn