doing your first kaggle (python for big data sets)
TRANSCRIPT
Doing Your First Kaggle - Python for
Big Data Sets
Lee Trawick, ASA
#datapopup
What We’ll Cover
About me.
How is this relevant?
Tell me again how this is relevant?
Pokemon vs Big DataName! That! Pokemon!
HadoopClouderaBulbasourMongoTalendQuboleStatwingMagikarpFlinkImpala
It’s easy to submit to Kaggle.
Download someone’s code and submit.
We’re on the leaderboard!
About the data.
My approach.
First, I load the data.
How they load the data.
Why be memory-conscious?
Other ways to save memory
Delete objects when you’re done with them
Specifying data types on every new column. Not just for loading files
My approach to the “One Vector Problem”
Kaggle solution to the “One Vector Problem”
The map() trick to joining large tables
Kaggle Solution - 3 Key Things
Gameshow…Will! That! Work?!Looping through big tables
Challenge 1
Challenge 2
Challenge 3
It’s the same code
That’s All Folks.