syllabus cosc 40023-65 spring 2017 data mining …csfaculty.tcu.edu/asanchez/dmsyllabus.pdf ·...

4
1 Syllabus COSC 40023-65 Spring 2017 Data Mining and Visualization Instructor: Antonio Sanchez Office: TUC – 332 Email: [email protected] URL: http://csfaculty.tcu.edu/asanchez/ Office Hours: M&W 11:00 to 2:00 or M&W 13:30 to 14:00 T&R 11:30 to 2:00 Or by Appointment Overview Students will study the principles and practices of data mining. Both descriptive (i.e. profiling, classification, association, and clustering) and prescriptive (i.e. prediction, regression, and estimation) algorithms will be studied to obtain and analyze patterns in large datasets. In addition, students will also study the importance of applying data visualization practices to facilitate exploratory data analysis. "To understand is to perceive patterns." Sir Isaiah Berlin (1909 -1997) “The greatest value of a picture is when it forces us to notice what we never expect to see.” John W. Tukey (1915 – 2000) Textbooks Ian H. Witten, Eibe Frank, Mark A. Hall Data Mining: Practical Machine Learning Tools and Techniques (3ed Edition) NY:Morgan Kaufmann 2011 Ben Fry Visualizing Data: Exploring and Explaining Data with the Processing Environment CA:O’Reilly, 2008

Upload: ngoliem

Post on 28-Aug-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

1

Syllabus COSC 40023-65

Spring 2017 Data Mining and Visualization

Instructor: Antonio Sanchez Office: TUC – 332 Email: [email protected] URL: http://csfaculty.tcu.edu/asanchez/ Office Hours: M&W 11:00 to 2:00 or M&W 13:30 to 14:00 T&R 11:30 to 2:00 Or by Appointment Overview Students will study the principles and practices of data mining. Both descriptive (i.e. profiling, classification, association, and clustering) and prescriptive (i.e. prediction, regression, and estimation) algorithms will be studied to obtain and analyze patterns in large datasets. In addition, students will also study the importance of applying data visualization practices to facilitate exploratory data analysis.

"To understand is to perceive patterns." Sir Isaiah Berlin (1909 -1997)

“The greatest value of a picture is when it forces us to notice

what we never expect to see.” John W. Tukey (1915 – 2000)

Textbooks

Ian H. Witten, Eibe Frank, Mark A. Hall Data Mining: Practical Machine Learning Tools and Techniques (3ed Edition) NY:Morgan Kaufmann 2011

Ben Fry Visualizing Data: Exploring and Explaining Data with the Processing Environment CA:O’Reilly, 2008

2

Objectives and Outcomes Understand and apply data mining and visualization techniques, specifically, after completing the course students should be able to: 1. Apply the seven stages of data visualization to facilitate exploratory data analysis i.e. Acquisition, Parsing, Filtering, Mining, Representation Refinement, and Interaction. 2. Understand the relations and differences between statistical analysis and data mining. 3. Prepare the data needed for data mining algorithms in terms of attributes and class inputs, training, validating, and testing files. 4. Select and use different knowledge representation schemes such as trees, rules, regression polynomials, instance based learning, and clusters. 5. Apply learning methods to classify, associate, do regression, and create clusters from large data files. 6. Define and apply metrics to measure the performance of various data mining algorithms used in the course. 7. Discuss Map/Reduce algorithms used in Hadoop to process very large amounts of data and use the Mahout program to obtain instance based learning. Prerequisites Calculus (MATH 10283 or MATH 10524) and Statistics (MATH 10043 or MATH 30853) and COSC 30603. Grades Grades will be determined using the following breakdown:

Assignments (4) 40% Exams (2) 40% Final 20%

Final grades in this course will be the traditional letter grades (A, B, C, D, F) with cutoffs every 10 points; i.e., 90, 80, 70, and 60. There will be no +/- distinctions. Exams must be taken at the scheduled times. The final exam will be comprehensive and will count for 20% of the grade. The format of the exams will be closed book, with both a set of multiple-choice questions and a set of problems to be solved. Make-up exams will be given ONLY in the event of a university-approved absence or as a result of MAJOR difficulties, which have been approved by the Dean of Students.

3

Class Schedule

Lab Assignments There will be four assigned group laboratory tasks, each worth 10% of the grade, the report of each assignment should include

1) A description of the problem 2) The solution of the problem 3) Example of the solution 4) An overall discussion of the benefit of the solution taken 5) A power point to be presented in class

Class Participation

4

Due to the nature of this course, students will be required to attend every class. In addition to simply being in class, students should review the website material for that day before class and be prepared to ask and answer questions about the material being covered that day. Academic Dishonesty The Computer Science Department takes academic dishonesty quite seriously. Academic misconduct will not be tolerated. Such acts are detailed in the current TCU Bulletin and include: copying, using, or in any way misrepresenting another’s work as your own; substituting for another or having someone substitute for you; plagiarism; collusion; abusing resource materials; unauthorized use of computer software or hardware; fabrication and falsification; complicity in misconduct. Such conduct at a minimum results in a zero on the test or assignment, and may result in a failing grade for the course. Disabilities Statement Texas Christian University complies with the Americans with Disabilities Act and Section 504 of the Rehabilitation Act of 1973 regarding students with disabilities. Eligible students seeking accommodations should contact the Coordinator of Student Disabilities Services in the Center for Academic Services located in Sadler Hall, 1010. Accommodations are not retroactive, therefore, students should contact the Coordinator as soon as possible in the term for which they are seeking accommodations. Further information can be obtained from the Center for Academic Services, TCU Box 297710, Fort Worth, TX 76129, or at (817) 257-6567. Adequate time must be allowed to arrange accommodations and accommodations are not retroactive; therefore, students should contact the Coordinator as soon as possible in the academic term for which they are seeking accommodations. Each eligible student is responsible for presenting relevant, verifiable, professional documentation and/or assessment reports to the Coordinator. Guidelines for documentation may be found at http://www.acs.tcu.edu/disability_documentation.asp. Students with emergency medical information or needing special arrangements in case a building must be evacuated should discuss this information with their instructor/professor as soon as possible.