data science at scale back up - thecads.org · in the final capstone project, devel- ... learn the...

23
About This Specialization Learn scalable data management, evaluate big data technologies, and design effective visualizations. This Specialization covers intermediate topics in data science. You will gain hands-on experience with scalable SQL and NoSQL data management solutions, data mining algorithms, and practical statistical and machine learning concepts. You will also learn to visualize data and communicate results, and you’ll explore legal and ethical issues that arise in working with big data. In the final Capstone Project, devel- oped in partnership with the digital internship platform Coursolve, you’ll apply your new skills to a real-world data science project. 5 courses Follow the suggested order or choose your own Projects Follow the suggested order or choose your own Certificates Follow the suggested order or choose your own

Upload: ledung

Post on 05-May-2018

217 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Data science at scale Back Up - thecads.org · In the final Capstone Project, devel- ... Learn the basics of statistical inference, comparing classical methods with resampling methods

About This SpecializationLearn scalable data management, evaluate big data technologies, and design effective visualizations.This Specialization covers intermediate topics in data science. You will gain hands-on experience with scalable SQL and NoSQL data management solutions, data mining algorithms, and practical statistical and machine learning concepts. You will also learn to visualize data and communicate results, and you’ll explore legal and ethical issues that arise in working with big data. In the final Capstone Project, devel-oped in partnership with the digital internship platform Coursolve, you’ll apply your new skills to a real-world data science project.

5 coursesFollow the suggested order or choose yourown

ProjectsFollow the suggested order or choose yourown

Certi�catesFollow the suggested order or choose yourown

Page 2: Data science at scale Back Up - thecads.org · In the final Capstone Project, devel- ... Learn the basics of statistical inference, comparing classical methods with resampling methods

� � � � � � ��

Data Manipulation at Scale: Systems and Algorithms

������������������������

Commitment

Subtitles

4 weeks of study, 6-8 hours/week

English, Spanish, Chinese (Simplified)

About the CoursetData analysis has replaced data acquisition as the bottleneck to evidence-based decision making --- we are drowning in it. Extracting knowledge from large, heterogeneous, and noisy datasets requires not only powerful computing resources, but the programming abstractions to use them effectively. The abstractions that emerged in the last decade blend ideas from parallel databases, distributed systems, and programming languages to create a new class of scalable data analytics platforms that form the foundation for data science at realistic scales.

In this course, you will learn the landscape of relevant systems, the principles on which they rely, their tradeoffs, and how to evaluate their utility against your requirements. You will learn how practical systems were derived from the frontier of research in computer science and what systems are coming on the horizon. Cloud computing, SQL and NoSQL databases, MapReduce and the ecosystem it spawned, Spark and its contemporaries, and specialized systems for graphs and arrays will be covered.

You will also learn the history and context of data science, the skills, challenges, and methodologies the term implies, and how to structure a data science project. At the end of this course, you will be able to:

Learning Goals: 1. Describe common patterns, challenges, and approaches associated with data science projects, and what makes them different from projects in related fields.2. Identify and use the programming models associated with scalable data manipulation, including relational algebra, mapreduce, and other data flow models.3. Use database technology adapted for large-scale analytics, including the concepts driving parallel databases, parallel query processing, and in-database analytics4. Evaluate key-value stores and NoSQL systems, describe their tradeoffs with comparable systems, the details of important examples in the space, and future trends.5. “Think” in MapReduce to effectively write algorithms for systems including Hadoop and Spark. You will understand their limitations, design details, their relationship to databases, and their associated ecosystem of algorithms, extensions, and languages.write programs in Spark6. Describe the landscape of specialized Big Data systems for graphs, arrays, and streams

Page 3: Data science at scale Back Up - thecads.org · In the final Capstone Project, devel- ... Learn the basics of statistical inference, comparing classical methods with resampling methods

Week 1Data Science Context and ConceptsUnderstand the terminology and recurring principles associated with data science, and understand the structure of data science projects and emerging methodologies to approach them. Why does this emerg-ing field exist? How does it relate to other fields? How does this course distinguish itself? What do data science projects look like, and how should they be approached? What are some examples of data science projects?

Video · Appetite Whetting: Politics

Video · Appetite Whetting: Extreme Weather

Video · Appetite Whetting: Digital Humanities

Video · Appetite Whetting: Bibliometrics

Video · Appetite Whetting: Food, Music, Public Health

Video · Appetite Whetting: Public Health cont'd, Earthquakes, Legal

Video · Characterizing Data Science

Video · Characterizing Data Science, cont'd

Video · Distinguishing Data Science from Related Topics

Video · Four Dimensions of Data Science

Video · Tools vs. Abstractions

Video · Desktop Scale vs. Cloud Scale

Video · Hackers vs. Analysts

Video · Structs vs. Stats

Video · Structs vs. Stats cont'd

Video · A Fourth Paradigm of Science

Video · Data-Intensive Science Examples

Video · Big Data and the 3 Vs

Video · Big Data Definitions

Video · Big Data Sources

Reading · Supplementary: Three-Course Reading List

Reading · Supplementary: Resources for Learning Python

Video · Course LogisticsReading · Supplementary:

Class Virtual Machine

Reading · Supplementary: Github Instructions

Video · Twitter Assignment: Getting Started

Programming Assignment · Twitter Sentiment Analysis

Page 4: Data science at scale Back Up - thecads.org · In the final Capstone Project, devel- ... Learn the basics of statistical inference, comparing classical methods with resampling methods

Week 2Relational Databases and the Relational AlgebraRelational Databases are the workhouse of large-scale data management. Although originally motivated by problems in enterprise operations, they have proven remarkably capable for analytics as well. But most importantly, the principles underlying relational databases are universal in managing, manipulating, and analyzing data at scale. Even as the landscape of large-scale data systems has expanded dramatically in the last decade, relational models and languages have remained a unifying concept. For working with large-scale data, there is no more important programming model to learn.

Video · Data Models, Terminology

Video · From Data Models to Databases

Video · Pre-Relational Databases

Video · Motivating Relational Databases

Video · Relational Databases: Key Ideas

Video · Algebraic Optimization Overview

Video · Relational Algebra Overview

Video · Relational Algebra Operators: Union, Difference, Selection

Video · Relational Algebra Operators: Projection, Cross Product

Video · Relational Algebra Operators: Cross Product cont'd, Join

Video · Relational Algebra Operators: Outer Join

Video · Relational Algebra Operators: Theta-Join

Video · From SQL to RA

Video · Thinking in RA: Logical Query Plans

Video · Practical SQL: Binning Timeseries

Video · Practical SQL: Genomic Intervals

Video · User-Defined Functions

Video · Support for User-Defined Functions

Video · Optimization: Physical Query Plans

Video · Optimization: Choosing Physical Plans

Video · Declarative Languages

Video · Declarative Languages: More Examples

Video · Views: Logical Data Independence

Video · Indexes

Programming Assignment · SQL for Data Science Assignment

Page 5: Data science at scale Back Up - thecads.org · In the final Capstone Project, devel- ... Learn the basics of statistical inference, comparing classical methods with resampling methods

Week 3MapReduce and Parallel Dataflow ProgrammingThe MapReduce programming model (as distinct from its implementations) was proposed as a simplifying abstraction for parallel manipulation of massive datasets, and remains an important concept to know when using and evaluating modern big data platforms.

Video · What Does Scalable Mean?

Video · A Sketch of Algorithmic Complexity

Video · A Sketch of Data-Parallel Algorithms

Video · "Pleasingly Parallel" Algorithms

Video · More General Distributed Algorithms

Video · MapReduce Abstraction

Video · MapReduce Data Model

Video · Map and Reduce Functions

Video · MapReduce Simple Example

Video · MapReduce Simple Example cont'd

Video · MapReduce Example: Word Length Histogram

Video · MapReduce Examples: Inverted Index, Join

Video · Relational Join: Map Phase

Video · Relational Join: Reduce Phase

Video · Simple Social Network Analysis: Counting Friends

Video · Matrix Multiply Overview

Video · Matrix Multiply Illustrated

Video · Shared Nothing Computing

Video · MapReduce Implementation

Video · MapReduce Phases

Video · A Design Space for Large-Scale Data Systems

Video · Parallel and Distributed Query Processing

Video · Teradata Example, MR Extensions

Video · RDBMS vs. MapReduce: Features

Video · RDBMS vs. Hadoop: Grep

Video · RDBMS vs. Hadoop: Select, Aggregate, Join

Programming Assignment · Thinking in MapReduce

Page 6: Data science at scale Back Up - thecads.org · In the final Capstone Project, devel- ... Learn the basics of statistical inference, comparing classical methods with resampling methods

Week 4NoSQL: Systems and ConceptsNoSQL systems are purely about scale rather than analytics, and are arguably less relevant for the practicing data scientist. However, they occupy an important place in many practical big data platform architectures, and data scientists need to understand their limitations and strengths to use them effectively.

Video · What Does Scalable Mean?

Video · A Sketch of Algorithmic Complexity

Video · A Sketch of Data-Parallel Algorithms

Video · "Pleasingly Parallel" Algorithms

Video · More General Distributed Algorithms

Video · MapReduce Abstraction

Video · MapReduce Data Model

Video · Map and Reduce Functions

Video · MapReduce Simple Example

Video · MapReduce Simple Example cont'd

Video · MapReduce Example: Word Length Histogram

Video · MapReduce Examples: Inverted Index, Join

Video · Relational Join: Map Phase

Video · Relational Join: Reduce Phase

Video · Simple Social Network Analysis: Counting Friends

Video · Matrix Multiply Overview

Video · Matrix Multiply Illustrated

Video · Shared Nothing Computing

Video · MapReduce Implementation

Video · MapReduce Phases

Video · A Design Space for Large-Scale Data Systems

Video · Parallel and Distributed Query Processing

Video · Teradata Example, MR Extensions

Video · RDBMS vs. MapReduce: Features

Video · RDBMS vs. Hadoop: Grep

Video · RDBMS vs. Hadoop: Select, Aggregate, Join

Programming Assignment · Thinking in MapReduce

Page 7: Data science at scale Back Up - thecads.org · In the final Capstone Project, devel- ... Learn the basics of statistical inference, comparing classical methods with resampling methods

Week 5Graph AnalyticsGraph-structured data are increasingly common in data science contexts due to their ubiquity in modeling the communica-tion between entities: people (social networks), computers (Internet communication), cities and countries (transportation networks), or corporations (financial transactions). Learn the common algorithms for extracting information from graph data and how to scale them up.

Video · Graph Overview

Video · Structural Analysis

Video · Degree Histograms, Structure of the Web

Video · Connectivity and Centrality

Video · PageRank

Video · PageRank in more Detail

Video · Traversal Tasks: Spanning Trees and Circuits

Video · Traversal Tasks: Maximum Flow

Video · Pattern Matching

Video · Querying Edge Tables

Video · Relational Algebra and Datalog for Graphs

Video · Querying Hybrid Graph/Relational Data

Video · Graph Query Example: NSA

Video · Graph Query Example: Recursion

Video · Evaluation of Recursive Programs

Video · Recursive Queries in MapReduce

Video · The End-Game Problem

Video · Representation: Edge Table, Adjacency List

Video · Representation: Adjacency Matrix

Video · PageRank in MapReduce

Video · PageRank in Pregel

Page 8: Data science at scale Back Up - thecads.org · In the final Capstone Project, devel- ... Learn the basics of statistical inference, comparing classical methods with resampling methods

� � � � � � ��

Practical Predictive Analytics: Models and Methodst

������������������������

Commitment

Subtitles

4 weeks of study, 6-8 hours/week

Englisht

About the CourseStatistical experiment design and analytics are at the heart of data science. In this course you will design statistical experiments and analyze the results using modern methods. You will also explore the common pitfalls in interpreting statistical arguments, especially those associated with big data. Collectively, this course will help you internalize a core set of practical and effective machine learning methods and concepts, and apply them to solve some real world problems.

Learning Goals: After completing this course, you will be able to:1. Design effective experiments and analyze the results2. Use resampling methods to make clear and bulletproof statistical arguments without invoking esoteric notation3. Explain and apply a core set of classification methods of increasing complexity (rules, trees, random forests), and associ-ated optimization methods (gradient descent and variants)4. Explain and apply a set of unsupervised learning concepts and methods5. Describe the common idioms of large-scale graph analytics, including structural query, traversals and recursive queries, PageRank, and community detection.

Page 9: Data science at scale Back Up - thecads.org · In the final Capstone Project, devel- ... Learn the basics of statistical inference, comparing classical methods with resampling methods

Week 1Practical Statistical InferenceLearn the basics of statistical inference, comparing classical methods with resampling methods that allow you to use a simple program to make a rigorous statistical argument. Motivate your study with current topics at the foundations of science: publication bias and reproducibility.

Video · Appetite Whetting: Bad Science

Video · Hypothesis Testing

Video · Significance Tests and P-Values

Video · Example: Difference of Means

Video · Deriving the Sampling Distribution

Video · Shuffle Test for Significance

Video · Comparing Classical and Resampling Methods

Video · Bootstrap

Video · Resampling Caveats

Video · Outliers and Rank Transformation

Video · Example: Chi-Squared Test

Video · Bad Science Revisited: Publication Bias

Video · Effect Size

Video · Meta-analysis

Video · Fraud and Benford's Law

Video · Intuition for Benford's Law

Video · Benford's Law Explained Visually

Video · Multiple Hypothesis Testing: Bonferroni and Sidak Corrections

Video · Matrix Multiply Overview

Video · Matrix Multiply Illustrated

Video · Shared Nothing Computing

Video · MapReduce Implementation

Video · MapReduce Phases

Video · A Design Space for Large-Scale Data Systems

Video · Parallel and Distributed Query Processing

Video · Teradata Example, MR Extensions

Video · RDBMS vs. MapReduce: Features

Video · RDBMS vs. Hadoop: Grep

Video · RDBMS vs. Hadoop: Select, Aggregate, Join

Programming Assignment · Thinking in MapReduce

Page 10: Data science at scale Back Up - thecads.org · In the final Capstone Project, devel- ... Learn the basics of statistical inference, comparing classical methods with resampling methods

Week 2Supervised LearningFollow a tour through the important methods, algorithms, and techniques in machine learning. You will learn how these methods build upon each other and can be combined into practical algorithms that perform well on a variety of tasks. Learn how to evaluate machine learning methods and the pitfalls to avoid.

Video · Statistics vs. Machine Learning

Video · Simple Examples

Video · Structure of a Machine Learning Problem

Video · Classification with Simple Rules

Video · Learning Rules

Video · Rules: Sequential Covering

Video · Rules Recap

Video · From Rules to Trees

Video · Entropy

Video · Measuring Entropy

Video · Using Information Gain to Build Trees

Video · Building Trees: ID3 Algorithm

Video · Building Trees: C.45 Algorithm

Video · Rules and Trees Recap

Video · Overfitting

Video · Evaluation: Leave One Out Cross Validation

Video · Evaluation: Accuracy and ROC Curves

Video · Bootstrap Revisited

Video · Ensembles, Bagging, Boosting

Video · Boosting Walkthrough

Video · Random Forests

Video · Random Forests: Variable Importance

Video · Summary: Trees and Forests

Video · Nearest Neighbor

Video · Nearest Neighbor: Similarity Functions

Video · Nearest Neighbor: Curse of Dimensionality

Reading · R Assignment: Classification of Ocean Microbes

Quiz · R Assignment: Classification of Ocean Microbes

Page 11: Data science at scale Back Up - thecads.org · In the final Capstone Project, devel- ... Learn the basics of statistical inference, comparing classical methods with resampling methods

Week 3OptimizationYou will learn how to optimize a cost function using gradient descent, including popular variants that use randomization and parallelization to improve performance. You will gain an intuition for popular methods used in practice and see how similar they are fundamentally.

Video · Optimization by Gradient Descent

Video · Gradient Descent Visually

Video · Gradient Descent in Detail

Video · Gradient Descent: Questions to Consider

Video · Intuition for Logistic Regression

Video · Intuition for Support Vector Machines

Video · Support Vector Machine Example

Video · Intuition for Regularization

Video · Intuition for LASSO and Ridge Regression

Video · Stochastic and Batched Gradient Descent

Video · Parallelizing Gradient Descent

Page 12: Data science at scale Back Up - thecads.org · In the final Capstone Project, devel- ... Learn the basics of statistical inference, comparing classical methods with resampling methods

Week 4Unsupervised LearningA brief tour of selected unsupervised learning methods and an opportunity to apply techniques in practice on a real world problem.

Video · Introduction to Unsupervised Learning

Video · K-means

Video · DBSCAN

Video · DBSCAN Variable Density and Parallel Algorithms

Peer Review · Kaggle Competition Peer Review

Page 13: Data science at scale Back Up - thecads.org · In the final Capstone Project, devel- ... Learn the basics of statistical inference, comparing classical methods with resampling methods

� � � � � � ��

Communicating Data Science Results

������������������������

Subtitles English

About the CourseImportant note: The second assignment in this course covers the topic of Graph Analysis in the Cloud, in which you will use Elastic MapReduce and the Pig language to perform graph analysis over a moderately large dataset, about 600GB. In order to complete this assignment, you will need to make use of Amazon Web Services (AWS). Amazon has generously offered to provide up to $50 in free AWS credit to each learner in this course to allow you to complete the assignment. Further details regarding the process of receiving this credit are available in the welcome message for the course, as well as in the assignment itself. Please note that Amazon, University of Washington, and Coursera cannot reimburse you for any charges if you exhaust your credit.

While we believe that this assignment contributes an excellent learning experience in this course, we understand that some learners may be unable or unwilling to use AWS. We are unable to issue Course Certificates for learners who do not complete the assignment that requires use of AWS. As such, you should not pay for a Course Certificate in Communicating Data Results if you are unable or unwilling to use AWS, as you will not be able to successfully complete the course without doing so.

Making predictions is not enough! Effective data scientists know how to explain and interpret their results, and communi-cate findings accurately to stakeholders to inform business decisions. Visualization is the field of research in computer science that studies effective communication of quantitative results by linking perception, cognition, and algorithms to exploit the enormous bandwidth of the human visual cortex. In this course you will learn to recognize, design, and use effective visualizations.

Just because you can make a prediction and convince others to act on it doesn’t mean you should. In this course you will explore the ethical considerations around big data and how these considerations are beginning to influence policy and practice. You will learn the foundational limitations of using technology to protect privacy and the codes of conduct emerging to guide the behavior of data scientists. You will also learn the importance of reproducibility in data science and how the commercial cloud can help support reproducible research even for experiments involving massive datasets, complex computational infrastructures, or both.

Learning Goals: After completing this course, you will be able to:1. Design and critique visualizations2. Explain the state-of-the-art in privacy, ethics, governance around big data and data science3. Use cloud computing to analyze large datasets in a reproducible way.

Page 14: Data science at scale Back Up - thecads.org · In the final Capstone Project, devel- ... Learn the basics of statistical inference, comparing classical methods with resampling methods

Week 1VisualizationStatistical inferences from large, heterogeneous, and noisy datasets are useless if you can't communicate them to your colleagues, your customers, your management and other stakeholders. Learn the fundamental concepts behind informa-tion visualization, an increasingly critical field of research and increasingly important skillset for data scientists. This module is taught by Cecilia Aragon, faculty in the Human Centered Design and Engineering Department..

Video · 01 Introduction: What and Why

Video · 02 Introduction: Motivating Examples

Video · 03 Data Types: Definitions

Video · 04 Mapping Data Types to Visual Attributes

Video · 05 Data Types Exercise

Video · 06 Data Types and Visual Mappings Exercises

Video · 07 Data Dimensions

Video · 08 Effective Visual Encoding

Video · 09 Effective Visual Encoding Exercise

Video · 10 Design Criteria for Visual Encoding

Video · 11 The Eye is not a Camera

Video · 12 Preattentive Processing

Video · 13 Estimating Magnitude

Video · 14 Evaluating Visualizations

Peer Review · Crime Analytics: Visualization of Incident Reports

Page 15: Data science at scale Back Up - thecads.org · In the final Capstone Project, devel- ... Learn the basics of statistical inference, comparing classical methods with resampling methods

Week 2Privacy and EthicsBig Data has become closely linked to issues of privacy and ethics: As the limits on what we *can* do with data continue to evaporate, the question of what we *should* do with data becomes paramount. Motivated in the context of case studies, you will learn the core principles of codes of conduct for data science and statistical analysis. You will learn the limits of current theory on protecting privacy while still permitting useful statistical analysis.

Video · Motivation: Barrow Alcohol Study

Video · Barrow Study Problems

Video · Reifying Ethics: Codes of Conduct

Video · ASA Code of Conduct: Responsibilities to Stakeholders

Video · Other Codes of Conduct

Video · Examples of Codified Rules: HIPAA

Video · Privacy Guarantees: First Attempts

Video · Examples of Privacy Leaks

Video · Formalizing the Privacy Problem

Video · Differential Privacy Defined

Video · Global Sensitivity

Video · Laplacian Noise

Video · Adding Laplacian Noise and Proving Differential Privacy

Video · Weaknesses of Differential Privacy

Page 16: Data science at scale Back Up - thecads.org · In the final Capstone Project, devel- ... Learn the basics of statistical inference, comparing classical methods with resampling methods

Week 3Reproducibility and Cloud ComputingScience is facing a credibility crisis due to unreliable reproducibility, and as research becomes increasingly computational, the problem seems to be paradoxically getting worse. But reproducibility is not just for academics: Data scientists who cannot share, explain, and defend their methods for others to build on are dangerous. In this module, you will explore the importance of reproducible research and how cloud computing is offering new mechanisms for sharing code, data, environments, and even costs that are critical for practical reproducibility..

Video · Reproducibility and Data Science

Video · Reproducibility Gold Standard

Video · Anecdote: The Ocean Appliance

Video · Code + Data + Environment

Video · Cloud Computing Introduction

Video · Cloud Computing History

Video · Code + Data + Environment + Platform

Video · Cloud Computing for Reproducible Research

Video · Advantages of Virtualization for Reproducibility

Video · Complex Virtualization Scenarios

Video · Shared Laboratories

Video · Economies of Scale

Video · Provisioning for Peak Load

Video · Elasticity and Price Reductions

Video · Server Costs vs. Power Costs

Video · Reproducibility for Big Data

Video · Counter-Arguments and Summary

Practice Quiz · AWS Credit Opt-in Consent Form

Programming Assignment · Graph Analysis in the Cloud

Page 17: Data science at scale Back Up - thecads.org · In the final Capstone Project, devel- ... Learn the basics of statistical inference, comparing classical methods with resampling methods

� � � � � � ��

Data Science at Scale - Capstone Project

Upcoming Session: Jan 15

Commitment

Subtitles English

6 weeks of study, 3-4 hours/weekt

About the CourseIn the capstone, students will engage on a real world project requiring them to apply skills from the entire data science pipeline: preparing, organizing, and transforming data, constructing a model, and evaluating results. Through a collabora-tion with Coursolve, each Capstone project is associated with partner stakeholders who have a vested interest in your results and are eager to deploy them in practice. These projects will not be straightforward and the outcome is not prescribed -- you will need to tolerate ambiguity and negative results! But we believe the experience will be rewarding and will better prepare you for data science projects in practice.

Page 18: Data science at scale Back Up - thecads.org · In the final Capstone Project, devel- ... Learn the basics of statistical inference, comparing classical methods with resampling methods

Week 1Project A: Blight FightIn this project, you will build a model to predict when a building is likely to be condemned. The data is real, the problem is real, and the impact is real.

.Reading · Get the Data

Reading · Understand the Domain

Other · Milestone: Discuss the Problem and Approaches

Page 19: Data science at scale Back Up - thecads.org · In the final Capstone Project, devel- ... Learn the basics of statistical inference, comparing classical methods with resampling methods

Week 2Week 2: Derive a list of buildingsYou are given sets of incidents with location information; you need to use some assumptions to group these incidents by location to identify specific buildings.

Reading · Milestone: Create a list of "buildings" from a list of geo-located incidents

Practice Peer Review · Reflecting on defining "buildings"

Page 20: Data science at scale Back Up - thecads.org · In the final Capstone Project, devel- ... Learn the basics of statistical inference, comparing classical methods with resampling methods

Week 3Week 3: Construct a training datasetConstruct a training set by associating each of your buildings with a ground truth label derived from the permit data.

Reading · Milestone: Derive labels for each building

Practice Peer Review · Reflecting on the labeling scheme

Page 21: Data science at scale Back Up - thecads.org · In the final Capstone Project, devel- ... Learn the basics of statistical inference, comparing classical methods with resampling methods

Week 4 Week 4: Train and evaluate a simple modelUse a trivial feature set to train and evaluate a simple model

Reading · Milestone: Train a Simple Model

Practice Peer Review · Reflecting on a trivial initial model

Page 22: Data science at scale Back Up - thecads.org · In the final Capstone Project, devel- ... Learn the basics of statistical inference, comparing classical methods with resampling methods

Week 5 Week 5: Feature EngineeringDerive additional features and retrain to improve the efficacy of your model.

Reading · Milestone: Adding more features

Practice Peer Review · Reflection on your proposed features.

Page 23: Data science at scale Back Up - thecads.org · In the final Capstone Project, devel- ... Learn the basics of statistical inference, comparing classical methods with resampling methods

Week 6 Week 6: Final ReportEnter your final report for grading.

Peer Review · Final Report