general’assembly’ -dat-19 predicting’ nanowrimo...
TRANSCRIPT
![Page 1: General’Assembly’ -DAT-19 Predicting’ NaNoWriMo Winners’res.cloudinary.com/general-assembly-profiles/image/... · What’is’NaNoWriMo? “On$November$1,$participants$ beginworking$towards$the$goal$](https://reader035.vdocuments.site/reader035/viewer/2022062604/5fbb2aab6000b957c60ea6e4/html5/thumbnails/1.jpg)
General Assembly -‐ DAT-‐19Predicting NaNoWriMoWinners NICOLE FRONDA
![Page 2: General’Assembly’ -DAT-19 Predicting’ NaNoWriMo Winners’res.cloudinary.com/general-assembly-profiles/image/... · What’is’NaNoWriMo? “On$November$1,$participants$ beginworking$towards$the$goal$](https://reader035.vdocuments.site/reader035/viewer/2022062604/5fbb2aab6000b957c60ea6e4/html5/thumbnails/2.jpg)
What is NaNoWriMo?“On November 1, participants begin working towards the goal of writing a 50,000-‐wordnovel by 11:59 PM on November 30.
Valuing enthusiasm, determination, and a deadline, NaNoWriMo is for anyonewho has ever thought about writing a novel.”
-‐ http://nanowrimo.org/about
![Page 3: General’Assembly’ -DAT-19 Predicting’ NaNoWriMo Winners’res.cloudinary.com/general-assembly-profiles/image/... · What’is’NaNoWriMo? “On$November$1,$participants$ beginworking$towards$the$goal$](https://reader035.vdocuments.site/reader035/viewer/2022062604/5fbb2aab6000b957c60ea6e4/html5/thumbnails/3.jpg)
Motivation:Tracking my Writing Progress
![Page 4: General’Assembly’ -DAT-19 Predicting’ NaNoWriMo Winners’res.cloudinary.com/general-assembly-profiles/image/... · What’is’NaNoWriMo? “On$November$1,$participants$ beginworking$towards$the$goal$](https://reader035.vdocuments.site/reader035/viewer/2022062604/5fbb2aab6000b957c60ea6e4/html5/thumbnails/4.jpg)
Motivation:Tracking my Writing Progress
Goal: Predict if a writer or novel will win the next contest
![Page 5: General’Assembly’ -DAT-19 Predicting’ NaNoWriMo Winners’res.cloudinary.com/general-assembly-profiles/image/... · What’is’NaNoWriMo? “On$November$1,$participants$ beginworking$towards$the$goal$](https://reader035.vdocuments.site/reader035/viewer/2022062604/5fbb2aab6000b957c60ea6e4/html5/thumbnails/5.jpg)
Writer DataPastNaNoWriMos Ongoing NaNoWriMo
Numerical Binary Numerical Binary
Member Length Municipal Liaison First Week Word Count Donor
Lifetime Word Count Sponsorship First Week Num Submissions
Number of Novels Second Week Word Count
Count Wins Second Week Num Submissions
Count Donations
Average Submission
Daily Average
Num Consecutive YearsParticipated
Num Consecutive Winning Years
Novel DataGenre
Num Words in Synopsis/Excerpt
Num Unique Words in Synopsis/Excerpt
Num Sentences in Synopsis/Excerpt
Num Paragraph Synopsis/Excerpt
Reading Score of Synopsis/Excerpt
![Page 6: General’Assembly’ -DAT-19 Predicting’ NaNoWriMo Winners’res.cloudinary.com/general-assembly-profiles/image/... · What’is’NaNoWriMo? “On$November$1,$participants$ beginworking$towards$the$goal$](https://reader035.vdocuments.site/reader035/viewer/2022062604/5fbb2aab6000b957c60ea6e4/html5/thumbnails/6.jpg)
Getting the DataStats from Most Recent Contest – Word Count API
![Page 7: General’Assembly’ -DAT-19 Predicting’ NaNoWriMo Winners’res.cloudinary.com/general-assembly-profiles/image/... · What’is’NaNoWriMo? “On$November$1,$participants$ beginworking$towards$the$goal$](https://reader035.vdocuments.site/reader035/viewer/2022062604/5fbb2aab6000b957c60ea6e4/html5/thumbnails/7.jpg)
Getting the DataWeb pages – Kimono Labs
![Page 8: General’Assembly’ -DAT-19 Predicting’ NaNoWriMo Winners’res.cloudinary.com/general-assembly-profiles/image/... · What’is’NaNoWriMo? “On$November$1,$participants$ beginworking$towards$the$goal$](https://reader035.vdocuments.site/reader035/viewer/2022062604/5fbb2aab6000b957c60ea6e4/html5/thumbnails/8.jpg)
Getting the DataWeb pages – Beautiful Soup
![Page 9: General’Assembly’ -DAT-19 Predicting’ NaNoWriMo Winners’res.cloudinary.com/general-assembly-profiles/image/... · What’is’NaNoWriMo? “On$November$1,$participants$ beginworking$towards$the$goal$](https://reader035.vdocuments.site/reader035/viewer/2022062604/5fbb2aab6000b957c60ea6e4/html5/thumbnails/9.jpg)
Exploring the Data: Writers
Municipal Liaisons6x likely to Win
501 Writers219 won recent contest 282 lost recent contest
Writers with Sponsors 2x likely to Win
![Page 10: General’Assembly’ -DAT-19 Predicting’ NaNoWriMo Winners’res.cloudinary.com/general-assembly-profiles/image/... · What’is’NaNoWriMo? “On$November$1,$participants$ beginworking$towards$the$goal$](https://reader035.vdocuments.site/reader035/viewer/2022062604/5fbb2aab6000b957c60ea6e4/html5/thumbnails/10.jpg)
Writer DataPastNaNoWriMos Ongoing NaNoWriMo
Numerical Binary Numerical Binary
Member Length Municipal Liaison First Week Word Count Donor
Lifetime Word Count Sponsorship First Week Num Submissions
Number of Novels Second Week Word Count
Count Wins Second Week Num Submissions
Count Donations
Average Submission
Daily Average
Num Consecutive YearsParticipated
Num Consecutive Winning Years
![Page 11: General’Assembly’ -DAT-19 Predicting’ NaNoWriMo Winners’res.cloudinary.com/general-assembly-profiles/image/... · What’is’NaNoWriMo? “On$November$1,$participants$ beginworking$towards$the$goal$](https://reader035.vdocuments.site/reader035/viewer/2022062604/5fbb2aab6000b957c60ea6e4/html5/thumbnails/11.jpg)
Writer DataPastNaNoWriMos Ongoing NaNoWriMo
Numerical Binary Numerical Binary
Member Length Municipal Liaison First Week Word Count Donor
Lifetime Word Count Sponsorship First Week Num Submissions
Number of Novels Second Week Word Count
Count Wins Second Week Num Submissions
Count Donations
Average Submission
Daily Average
Num Consecutive YearsParticipated
Num Consecutive Winning Years
![Page 12: General’Assembly’ -DAT-19 Predicting’ NaNoWriMo Winners’res.cloudinary.com/general-assembly-profiles/image/... · What’is’NaNoWriMo? “On$November$1,$participants$ beginworking$towards$the$goal$](https://reader035.vdocuments.site/reader035/viewer/2022062604/5fbb2aab6000b957c60ea6e4/html5/thumbnails/12.jpg)
Logistic Regression – Predicting Winning Writers
Actual 0 Actual 1
Predicted 0 46 7
Predicted 1 22 34
Precision Recall F1-‐Score Support
0 0.69 0.87 0.77 55
1 0.77 0.52 0.62 46
avg/total 0.73 0.71 0.70 101
Other Models:Naïve Bayes – 67%Support Vector Machine – 72%Decision Tree – 65%Random Forest – 74%
![Page 13: General’Assembly’ -DAT-19 Predicting’ NaNoWriMo Winners’res.cloudinary.com/general-assembly-profiles/image/... · What’is’NaNoWriMo? “On$November$1,$participants$ beginworking$towards$the$goal$](https://reader035.vdocuments.site/reader035/viewer/2022062604/5fbb2aab6000b957c60ea6e4/html5/thumbnails/13.jpg)
Writer DataPast NaNoWriMos Ongoing NaNoWriMo
Numerical Binary Numerical Binary
Member Length Municipal Liaison First Week Word Count Donor
Lifetime Word Count Sponsorship First Week Num Submissions
Number of Novels Second Week Word Count
Count Wins Second Week Num Submissions
Count Donations
Average Submission
Daily Average
Num Consecutive YearsParticipated
Num Consecutive Winning Years
![Page 14: General’Assembly’ -DAT-19 Predicting’ NaNoWriMo Winners’res.cloudinary.com/general-assembly-profiles/image/... · What’is’NaNoWriMo? “On$November$1,$participants$ beginworking$towards$the$goal$](https://reader035.vdocuments.site/reader035/viewer/2022062604/5fbb2aab6000b957c60ea6e4/html5/thumbnails/14.jpg)
Logistic Regression – Predicting Winning WritersInclude Word Count from current contest
Actual 0 Actual 1
Predicted 0 46 9
Predicted 1 8 38
Precision Recall F1-‐Score Support
0 0.85 0.84 0.84 55
1 0.81 0.83 0.82 46
avg/total 0.83 0.83 0.83 101
Train Data – 1st & 2nd PC Test Data Predictions – 1st & 2nd PC Test Data Actual Outcomes – 1st & 2nd PC
![Page 15: General’Assembly’ -DAT-19 Predicting’ NaNoWriMo Winners’res.cloudinary.com/general-assembly-profiles/image/... · What’is’NaNoWriMo? “On$November$1,$participants$ beginworking$towards$the$goal$](https://reader035.vdocuments.site/reader035/viewer/2022062604/5fbb2aab6000b957c60ea6e4/html5/thumbnails/15.jpg)
Logistic Regression – Predicting Winning Writers
Feature Measured Importance
Second Week Word Count 0.712847
LifetimeWordCount 0.057194
Second Week Num Submissions 0.045414
Expected Avg Submission 0.039117
Consecutive Part 0.036241
Data of progress in first few weeks helps improve predictions
![Page 16: General’Assembly’ -DAT-19 Predicting’ NaNoWriMo Winners’res.cloudinary.com/general-assembly-profiles/image/... · What’is’NaNoWriMo? “On$November$1,$participants$ beginworking$towards$the$goal$](https://reader035.vdocuments.site/reader035/viewer/2022062604/5fbb2aab6000b957c60ea6e4/html5/thumbnails/16.jpg)
Next StepsImproving the model◦ Collect more data◦ Feature Engineering
Predicting Final Word Count
Discovering clusters of writers ◦ (Not just winners and losers)
![Page 17: General’Assembly’ -DAT-19 Predicting’ NaNoWriMo Winners’res.cloudinary.com/general-assembly-profiles/image/... · What’is’NaNoWriMo? “On$November$1,$participants$ beginworking$towards$the$goal$](https://reader035.vdocuments.site/reader035/viewer/2022062604/5fbb2aab6000b957c60ea6e4/html5/thumbnails/17.jpg)
Next StepsImproving the model◦ Collect more data◦ Feature Engineering
Predicting Final Word Count
Discovering clusters of writers◦ (Not just winners and losers)
Win the Next NaNoWriMoJ
![Page 18: General’Assembly’ -DAT-19 Predicting’ NaNoWriMo Winners’res.cloudinary.com/general-assembly-profiles/image/... · What’is’NaNoWriMo? “On$November$1,$participants$ beginworking$towards$the$goal$](https://reader035.vdocuments.site/reader035/viewer/2022062604/5fbb2aab6000b957c60ea6e4/html5/thumbnails/18.jpg)
Questions?