final project! (& wind sprints) - brown...

Post on 21-Jun-2021

3 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Final Project! (& wind sprints)

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 1

Final Project

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 2

Define Problem Find Data

Write a set of instructions

Solution

Final Project

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 3

Define Problem Find Data

Write a set of instructions

Solution

ANYTHING Numerical Data

Textual Data Twitter Data

Geographical Data ...

Final Project

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 4

Define Problem Find Data

Write a set of instructions

Solution

ANYTHING Numerical Data

Textual Data Twitter Data

Geographical Data ...

ANYTHING COMPUTATIONAL

Find Trends/Patterns Predict Outliers/Groups

Describe Anomalies Assess Someone’s Claims

...

Final Project

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 5

Define Problem Find Data

Write a set of instructions

Solution

ANYTHING Numerical Data

Textual Data Twitter Data

Geographical Data ...

ANYTHING COMPUTATIONAL

Find Trends/Patterns Predict Outliers/Groups

Describe Anomalies Assess Someone’s Claims

...

ANY TOOL Excel or Python

Probably...

Final Project

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 6

Define Problem Find Data

Write a set of instructions

Solution

ANYTHING Numerical Data

Textual Data Twitter Data

Geographical Data ...

ANYTHING COMPUTATIONAL

Find Trends/Patterns Predict Outliers/Groups

Describe Anomalies Assess Someone’s Claims

...

ANY TOOL Excel or Python

Probably...

ANY ANALYSIS Plots, Tables, KML Files, ...

Final Project

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 7

Define Problem Find Data

Write a set of instructions

Solution

ANYTHING Numerical Data

Textual Data Twitter Data

Geographical Data ...

ANYTHING COMPUTATIONAL

Find Trends/Patterns Predict Outliers/Groups

Describe Anomalies Assess Someone’s Claims

...

ANY TOOL Excel or Python

Probably...

ANY ANALYSIS Plots, Tables, KML Files, ...

Only Constant: you will use computers to process data automatically

Today

• What do we wish we had known before?

• Example Final Projects

• General Project Description – Scope

– Timeline

• K-means in Python

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 8

What do we wish we had known before?

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 9

Today

• What do we wish we had known before?

• Example Final Projects

• General Project Description – Scope

– Timeline

• Project 2 Workshop

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 10

Example 1: Where should I Live?

I want to buy a house somewhere near my hometown. Where should I buy?

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 11

Kasey Hass’s Project https://sites.google.com/a/brown.edu/house-hunting-mecklenburg-county/home

In Each Zip Code: Rent price: Red dollar sign Vacancies: For sale sign Income: Green dollar sign Marriage rate: Wedding rings

Example 1: Where should I Live?

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 12

Kasey Hass’s Project https://sites.google.com/a/brown.edu/house-hunting-mecklenburg-county/home

In Each Zip Code: Rent price: Red dollar sign Vacancies: For sale sign Income: Green dollar sign Marriage rate: Wedding rings

Example 1: Where should I Live?

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 13

Kasey Hass’s Project https://sites.google.com/a/brown.edu/house-hunting-mecklenburg-county/home

In Each Zip Code: Rent price: Red dollar sign Vacancies: For sale sign Income: Green dollar sign Marriage rate: Wedding rings

Example 2: Athletes & Geography

Malcolm Gladwell’s Outliers • Professional Canadian hockey players

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 14

Example 2: Athletes & Geography

Malcolm Gladwell’s Outliers • Professional Canadian hockey players

– Mostly born in January. – Few are born in December. – Junior league’s cutoff is Dec. 31st

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 15

Example 2: Athletes & Geography

Malcolm Gladwell’s Outliers • Professional Canadian hockey players

– Mostly born in January. – Few are born in December. – Junior league’s cutoff is Dec. 31st

• Any association between professional players and where they were born?

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 16

Example 2: Athletes & Geography

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 17

Daniel Newmark’s Project https://sites.google.com/a/brown.edu/daniel-s-data/project-4--distribution-of-athletes

Football (NFL) Hockey (NHL)

Example 3: Growth of Top Digital Media Companies

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 18

How does the Growth Rate Compare in the Past 10 Years?

Example 3: Growth of Top Digital Media Companies

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 19

Nicholas Talbott’s Project https://sites.google.com/site/growthofdigitalmediacompanies/home

YouTube Video: http://www.youtube.com/watch?v=QDtDF6_mrkc

One KML File for Each Year Logo Size shows Growth

Today

• What do we wish we had known before?

• Example Final Projects

• General Project Description – Scope

– Timeline

• K-means

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 20

Scope – Think Big! (& Have Backup Plans)

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 21

Define Problem Find Data

Write a set of instructions

Solution

Scope – Think Big! (& Have Backup Plans)

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 22

Define Problem Find Data

Write a set of instructions

Solution

Utilize MANY data sources to test your claim.

Scope – Think Big! (& Have Backup Plans)

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 23

Define Problem Find Data

Write a set of instructions

Solution

Analyze the same dataset in DIFFERENT ways:

Make a number of related claims/hypotheses

Utilize MANY data sources to test your claim.

Scope – Think Big! (& Have Backup Plans)

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 24

Define Problem Find Data

Write a set of instructions

Solution

Analyze the same dataset in DIFFERENT ways:

Make a number of related claims/hypotheses

Utilize MANY data sources to test your claim.

GENERALIZE the Program Try to Implement something we

haven’t discussed

Scope – Think Big! (& Have Backup Plans)

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 25

Define Problem Find Data

Write a set of instructions

Solution

Analyze the same dataset in DIFFERENT ways:

Make a number of related claims/hypotheses

Utilize MANY data sources to test your claim.

Display the Results in Different Ways

GENERALIZE the Program Try to Implement something we

haven’t discussed

Today

• What do we wish we had known before?

• Example Final Projects

• General Project Description – Scope

– Timeline

• Project 2 Workshop

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 26

Timeline

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 27

Sun Mon Tues Wed Thurs Fri Sat

4/14 4/15 4/16 4/17 4/18

Final Project Out Project 2 Due

Guest and In-Class Work Time

4/21 4/22 4/23 4/24 4/25

Guest and In-Class Work Time

Proposal Due

Guest and In-Class Work Time

4/28 4/29 4/30 5/1 5/2

In-Class Work Time (Reading Period)

In-Class Work Time (Reading Period)

5/5 5/6 5/7 5/8 5/9

Final Project Due and Presentations

Meet With Staff At Least Once

Meet With Staff At Least Once

Before You Begin...

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 28

Define Problem Find Data

Write a set of instructions

Solution

Say you have a vague idea of what you want to do....

Before You Begin...

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 29

Define Problem Find Data

Write a set of instructions

Solution

Where is the data? What does it look like?

What format will be easiest for you to analyze?

Say you have a vague idea of what you want to do....

Before You Begin...

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 30

Define Problem Find Data

Write a set of instructions

Solution

Where is the data? What does it look like?

What format will be easiest for you to analyze?

What is your claim? Are there any sub-claims?

What are your assumptions?

Say you have a vague idea of what you want to do....

Before You Begin...

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 31

Define Problem Find Data

Write a set of instructions

Solution

Where is the data? What does it look like?

What format will be easiest for you to analyze?

What is your claim? Are there any sub-claims?

What are your assumptions?

What do you want the output to be?

What analysis will best test your claim?

Say you have a vague idea of what you want to do....

Before You Begin...

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 32

Define Problem Find Data

Write a set of instructions

Solution

Where is the data? What does it look like?

What format will be easiest for you to analyze?

What is your claim? Are there any sub-claims?

What are your assumptions?

What do you want the output to be?

What analysis will best test your claim?

What are the steps? What tools are best for each step?

How can you GENERALIZE for a broader analysis?

Say you have a vague idea of what you want to do....

Today

• What do we wish we had known before?

• Example Final Projects

• General Project Description – Scope

– Timeline

• K-means

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 33

K-means

• Recall goal: given data that might be clustered...find the clusters

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 34

K-means

• Recall goal: given data that might be clustered...find the clusters

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 35

Idea

• Guess cluster-centers at random • Assign points to nearest “center” to build

clusters • Average points in cluster to update “center”

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 36

Idea

• Guess cluster-centers at random

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 37

Idea

• Assign points to nearest “center” to build clusters

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 38

Idea

• Average points in cluster to update “center”

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 39

Idea

• Repeat!

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 40

Idea

• Repeat!

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 41

Code • A “point” is a (float, float) tuple. def kmeans(points, k): ''' point list * int -> [point list] take a list of n points in the plane, and a number k > 0 of groups to put them in. Return a list of points containing the k "center” points (the "means" of the clusters) '''

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 42

def kmeans(points, k): distant = findDistantPoint(points) means = initialMeans(points, k) clusters = findClusters(points, means) oldClusters = [] while (oldClusters != clusters) : oldClusters = clusters means = findMeans(clusters, distant) clusters = findClusters(points, means) return means

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 43

def findDistantPoint(points): ''' point list -> point compute a point very far from all the selected points, to use as a dummy "cluster center" when some cluster "dies". '''

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 44

def findDistantPoint(points): x0 = 0 y0 = 0 for p in points : x0 = max(x0, abs(p[0])) y0 = max(y0, abs(p[1])) return (3 * x0, 3 * y0) CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 45

def initialMeans(points, k) : ''' point list * int -> point list Select k points at random from the input point list, and return a list containing those k selected points.'''

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 46

def findClusters(points, means) : ''' point list -> point list list Given n points and k "mean" points, form k lists; the ith list consists of all input points that are closer to the ith mean than to any other. Return a list of these k lists of "closest points".'''

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 47

def findClusters(points, means) : selectionList = [indexOfClosest(point, means) for point in points] k = len(means) return [selectList(points, selectionList, i) for i in range(0,k)]

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 48

def indexOfClosest(point, meanPoints) : ''' point * point list -> int Take a point P and a list of points Q0, Q2, ..., Qk-1 (the "meanPoint" list) and compute the distance to each of the Q points. If the distance to Qi is the smallest of these, return i. Thus if the distances are [2, 2, 1, 6, 3], we'd return 2, because the "1" is at position 2 (counting from zero!]''' CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 49

def distance(p, q) : ''' point * point -> float return the distance between two points in the plane. ''' xdiff = p[0] - q[0] ydiff = p[1] - q[1] return sqrt( xdiff*xdiff + ydiff*ydiff)

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 50

def selectList(points, selectionList, index) : ''' (float, float) list * int list * int -> (float, float) list Given a list of points, and a selection list (a list of indices from 0 to k-1, for some k, such as (for k = 2) [0 1 1 0 0], and an index that's between 0 and k-1 (inclusive), return the points corresponding to the selected index. For instance, if the selection list is [0 1 1 0 0] and index = 0, you'd return a list of three points: the ones at index 0, 3, and 4 in the "points" list''' CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 51

def findMeans(clusters, distant) : ''' (point list) list * point -> (point list) Given k clusters, each represented as a point list, find the mean (average) point for each cluster; if the cluster is empty, use "distant" as its mean''' return [findMean(cluster, distant) for cluster in clusters]

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 52

def findMean(cluster, distant): ''' point list * point-> point Given a nonempty list of n points, return the average of the points gotten by summing up the x-coordinates of all the points and dividing by n, to get a number x0, and doing the same with the y-coordinates to get y0, and returning the point (x0, y0). For an empty list of points, return "distant" instead.'''

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 53

Shared Skeleton Code

• https://drive.google.com/a/brown.edu/file/d/0B4mKyAS1wRwhSnZvcTdFMnlHQXM/view?usp=sharing

• Available from course page, too

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 54

To do

• Person 1: create at least two tests def TestfindMeans(): clusters1 = [(1,2), (1, 4)] cluster2 = [(4, 5)] clusters = [cluster1, cluster2] u =findMeans(clusters, (10,10)) if (u != [(1,3), (4, 5)]) : print “error”

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 55

To do

• Person 2: coder/typist • Person 3: editor

CSCI 0931 - Intro. to Comp. for the Humanities and Social Sciences 56

top related