donors choose project (1)
TRANSCRIPT
Funding Education through Donors ChooseGeneral Assembly 2016Fernando Hidalgo
Problem Description
Task: Predict Whether a Donor’s Choose Project will get FundedExperience: Donor’s Choose Data from Sept 2002 - CurrentlyPerformance: Classification Accuracy, the Number of correct prediction out of all predictions made.
The Data
LabelsCompleted: 592,757
&
Expired:261,536Class Skewness:
Use F1 Score as a way to use recall and precision in check.
Baseline: .69
Features Abbreviations Descriptions
total_price_excluding_optional_support Total Price of the Project (integer)(dollars)
students_reached # of students that are project reaches(integer)
school_type Types of School:Charter, magnet, year_round, nlns, kipp, Charter_ready_promise(categorical)
date_posted Day that the project was posted(categorical)
resource_type Type of Resources the project asks(categorical)
grade_level The Grade Level of the Project(categorical
poverty_level Poverty Level (categorial)
school_state From what state the project is posted(categorical)
Eligible_double_your_impact_matchWhether it was eligible to be matched(categorical
teacher_prefix The Prefix of the Teacher Posting(categorical)
primary_focus_area The Project’s Primary Area of Focus(categorical)
primary_focus_subject The Project’s Primary Subject of Focus(categorical)
Original Feature
s
Feature Engineering
New Features Description
price_per_student total_price/students_reached
project_length Date_expiration - date_posted
month_posted Extracted from date_posted
day_posted Extracted from date_posted
Visualizations
Rate of Projects Funded to Total Projects per Resource
Rate of Projects Funded to Total Projects per Month
Rate of Projects Funded to Total Projects per Grades
Rate of Projects Funded to Total Projects per Primary Focus Area
Rate of Projects Funded to Total Projects per Teacher Prefix
Rate of Projects Funded to Total Projects per Poverty Level
Relationship Between Project Length and Funding
Relationship Between Project Price and Funding
Relationship Between Price per Student and Funding
Predictive Model
The 3 Models:
1.AdaBoost
2.Random Forest
3.Logistic Regression
GridSearch Accuracy Scores
using F1 Score Metric
Model Accuracy Best Parameter
Random Forest 0.759 Criterion: Entropy
AdaBoost .7676 N_estimators: 60
Logistic Regression 0.811 Penalty: L2
Simplest Model with Best Score:Logistic Regression
Checking Feature Significance:
Using Random Forest Classifier
The top 5 Features Seem to Have Most of the Predictive Power
Using Only the 5 Most Significant Features
1. Total_price_excluding_optional_su
pport
2. Eligible_double_your_impact_match
3. Resource_Type_Books
4. Resource_Type_Technology
5. price_per_student
New Score withLogistic Regression:
.8171
Overview● Model Improvement of .1271 over the baseline using
Logistic Regression with F1 Score.
● Most of Predictive Power Lies in 5 Features
● Ethical Implications:○ The features with the most predictive power are not
ones that can be changed without fabrication
Model Improvements Add Prescriptive Data:
Project Essays Project Materials
Use Data Based on Location:Census
Skewed Data:Find Reasons
Methods