data mining midterm_928075

7
Texas A&M University – Kingsville Frank H. Dotterweich College of Engineering Fall 2015 – CSEN4367 Data Mining Midterm Exam Name: K#: Show your work in detail 1. Given the following data: Compute the median?

Upload: ajay-reddy

Post on 13-Apr-2016

215 views

Category:

Documents


1 download

DESCRIPTION

a

TRANSCRIPT

Page 1: Data Mining Midterm_928075

Texas A&M University – KingsvilleFrank H. Dotterweich College of Engineering

Fall 2015 – CSEN4367 Data MiningMidterm Exam

Name: K#:

Show your work in detail

1. Given the following data:

Compute the median?

Page 2: Data Mining Midterm_928075

2. Given the data for age and body fat data for 18 randomly selected adults

a. Calculate the mean, median, variance, and standard deviation of age and %fat.b. Draw the boxplots for age and %fat.c. Draw a scatter plot and a q-q plot based on these two variables.d. Draw the q plot for %fate. Find the outliers

Page 3: Data Mining Midterm_928075

3. Consider the following data

(a) Compute the Euclidean distance between the two objects.(b) Compute the Manhattan distance between the two objects.(c) Compute the Minkowski distance between the two objects, using q = 3. (d) Compute the supremum distance between the two objects.

Page 4: Data Mining Midterm_928075

4. Given a similarity measure with values in the interval 10.1] describe two ways to transform this similarity value into a dissimilarity value in the interval [0,∞]?

Page 5: Data Mining Midterm_928075

5. Given the attribute age: 13, 15, 16, 16, 19, 20, 20, 21, 22, 22, 25, 25, 25, 25, 30, 33, 33, 35, 35, 35, 35, 36, 40, 45, 46, 52, 70.

(a) Use smoothing by bin means to smooth these data, using a bin depth of 3. Illustrate your steps. Comment on the effect of this technique for the given data. (b) How might you determine outliers in the data?

Page 6: Data Mining Midterm_928075

6. Data quality can be assessed in terms of several issues, including accuracy, completeness, and consistency. For each of the above three issues, discuss how data quality assessment can depend on the intended use of the data, giving examples. Propose two other dimensions of data quality.