doing your first kaggle (python for big data sets)

25
Doing Your First Kaggle - Python for Big Data Sets Lee Trawick, ASA #datapopup

Upload: domino-data-lab

Post on 29-Jan-2018

263 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: Doing your first Kaggle (Python for Big Data sets)

Doing Your First Kaggle - Python for

Big Data Sets

Lee Trawick, ASA

#datapopup

Page 2: Doing your first Kaggle (Python for Big Data sets)

What We’ll Cover

Page 3: Doing your first Kaggle (Python for Big Data sets)

About me.

Page 4: Doing your first Kaggle (Python for Big Data sets)

How is this relevant?

Page 5: Doing your first Kaggle (Python for Big Data sets)

Tell me again how this is relevant?

Page 6: Doing your first Kaggle (Python for Big Data sets)

Pokemon vs Big DataName! That! Pokemon!

HadoopClouderaBulbasourMongoTalendQuboleStatwingMagikarpFlinkImpala

Page 7: Doing your first Kaggle (Python for Big Data sets)

It’s easy to submit to Kaggle.

Page 8: Doing your first Kaggle (Python for Big Data sets)

Download someone’s code and submit.

Page 9: Doing your first Kaggle (Python for Big Data sets)

We’re on the leaderboard!

Page 10: Doing your first Kaggle (Python for Big Data sets)

About the data.

Page 11: Doing your first Kaggle (Python for Big Data sets)

My approach.

Page 12: Doing your first Kaggle (Python for Big Data sets)

First, I load the data.

Page 13: Doing your first Kaggle (Python for Big Data sets)

How they load the data.

Page 14: Doing your first Kaggle (Python for Big Data sets)

Why be memory-conscious?

Page 15: Doing your first Kaggle (Python for Big Data sets)

Other ways to save memory

Delete objects when you’re done with them

Specifying data types on every new column. Not just for loading files

Page 16: Doing your first Kaggle (Python for Big Data sets)

My approach to the “One Vector Problem”

Page 17: Doing your first Kaggle (Python for Big Data sets)

Kaggle solution to the “One Vector Problem”

Page 18: Doing your first Kaggle (Python for Big Data sets)

The map() trick to joining large tables

Page 19: Doing your first Kaggle (Python for Big Data sets)

Kaggle Solution - 3 Key Things

Page 20: Doing your first Kaggle (Python for Big Data sets)

Gameshow…Will! That! Work?!Looping through big tables

Page 21: Doing your first Kaggle (Python for Big Data sets)

Challenge 1

Page 22: Doing your first Kaggle (Python for Big Data sets)

Challenge 2

Page 23: Doing your first Kaggle (Python for Big Data sets)

Challenge 3

Page 24: Doing your first Kaggle (Python for Big Data sets)

It’s the same code

Page 25: Doing your first Kaggle (Python for Big Data sets)

That’s All Folks.