grounding text

98
Grounding Text Jason Baldridge @jasonbaldridge Austin Data 2014 Associate Professor Co-founder & Chief Scientist Friday, September 5, 14

Upload: people-pattern

Post on 29-Nov-2014

501 views

Category:

Data & Analytics


2 download

DESCRIPTION

Jason Baldridge, Co-Founder and Chief Scientist at People Pattern and Associate Professor of Computational Linguistics at the University of Texas at Austin, shares recent research of his from UT Austin on text-based geolocation using Wikipedia, Twitter and other sources.

TRANSCRIPT

Page 1: Grounding Text

Grounding Text

Jason Baldridge @jasonbaldridge

Austin Data 2014

Associate Professor Co-founder & Chief Scientist

Friday, September 5, 14

Page 2: Grounding Text

© 2013 Jason M Baldridge Text Analytics Summit, June 2013

What does “barbecue” mean?

2

Friday, September 5, 14

Page 3: Grounding Text

© 2013 Jason M Baldridge Text Analytics Summit, June 2013

What does “barbecue” mean? Barbecue’

2

Friday, September 5, 14

Page 4: Grounding Text

© 2013 Jason M Baldridge Text Analytics Summit, June 2013

What does “barbecue” mean? Barbecue’

2

Friday, September 5, 14

Page 5: Grounding Text

© 2013 Jason M Baldridge Text Analytics Summit, June 2013

What does “barbecue” mean? Barbecue’

2

Friday, September 5, 14

Page 6: Grounding Text

© 2013 Jason M Baldridge Text Analytics Summit, June 2013

What does “barbecue” mean? Barbecue’

2

Friday, September 5, 14

Page 7: Grounding Text

© 2013 Jason M Baldridge Text Analytics Summit, June 2013

What does “barbecue” mean? Barbecue’

2

Friday, September 5, 14

Page 8: Grounding Text

© 2013 Jason M Baldridge Text Analytics Summit, June 2013

What does “barbecue” mean? Barbecue’

2

Friday, September 5, 14

Page 9: Grounding Text

© 2012 Jason M Baldridge Text Analytics Summit, June 2013

What I thought semantics was before 2005

3

From: John Enrico and Jason Baldridge. 2011. Possessor Raising, Demonstrative Raising, Quantifier Float and Number Float in Haida. International Journal of American Linguistics. 77(2):185-218

Friday, September 5, 14

Page 10: Grounding Text

© 2012 Jason M Baldridge Text Analytics Summit, June 2013

Updated perspective a la Ray Mooney (UT Austin CS)

4

http://www.cs.utexas.edu/users/ml/slides/chen-icml08.ppt

Friday, September 5, 14

Page 11: Grounding Text

© 2012 Jason M Baldridge Text Analytics Summit, June 2013

http://www.lib.utexas.edu/books/travel/index.htmlTravel at the Turn of the 20th Century

5

Friday, September 5, 14

Page 12: Grounding Text

© 2013 Jason M Baldridge Text Analytics Summit, June 2013

Motivation: Google Lit Trips [http://www.googlelittrips.com/]

6

Grapes of Wrath in Google Earth

Text

http://www.googlelittrips.com/GoogleLit/9-12/Entries/2006/11/1_The_Grapes_of_Wrath_by_John_Steinbeck.html

Friday, September 5, 14

Page 13: Grounding Text

© 2013 Jason M Baldridge Text Analytics Summit, June 2013

Motivation: Google Lit Trips [http://www.googlelittrips.com/]

6

Grapes of Wrath in Google Earth

Text

http://www.googlelittrips.com/GoogleLit/9-12/Entries/2006/11/1_The_Grapes_of_Wrath_by_John_Steinbeck.html

Friday, September 5, 14

Page 14: Grounding Text

© 2013 Jason M Baldridge Text Analytics Summit, June 2013

Crisis response: Haiti earthquake

7

Friday, September 5, 14

Page 15: Grounding Text

© 2013 Jason M Baldridge Text Analytics Summit, June 2013

Crisis response: Haiti earthquake

7

Friday, September 5, 14

Page 16: Grounding Text

© 2013 Jason M Baldridge Text Analytics Summit, June 2013

Look, Mom, no hands! (Err, um... no metadata.)

8

Friday, September 5, 14

Page 17: Grounding Text

© 2013 Jason M Baldridge Text Analytics Summit, June 2013

Look, Mom, no hands! (Err, um... no metadata.)

8

Topics with a clear, circumscribed geographic focus emerge!

Friday, September 5, 14

Page 18: Grounding Text

© 2013 Jason M Baldridge Text Analytics Summit, June 2013

But, of course, metadata is now plentiful.

9

Friday, September 5, 14

Page 19: Grounding Text

© 2013 Jason M Baldridge Text Analytics Summit, June 2013

Geotagged Wikipedia

10

30° 17′ N 97° 44′ W

Friday, September 5, 14

Page 20: Grounding Text

© 2013 Jason M Baldridge Text Analytics Summit, June 2013

01:55:55 RT @USER_dc5e5498: Drop and give me 50....

05:09:29 I said u got a swisher from redmond!? He said nah kirkland! Lol..ooooooooOkay!

05:57:35 Lmao!:) havin a good ol time after work! Unexpected! #goodtimes

06:00:09 RT @USER_d5d93fec: #letsbereal .. No seriously, #letsbereal>>lol. Don't start.

06:00:37 On my way to get @USER_60939380 yeee! She want some of this strawberry! Sexy!

...

47°31’41’’ N 122°11’52’’ W11

Geotagged Twitter

Friday, September 5, 14

Page 21: Grounding Text

© 2013 Jason M Baldridge Text Analytics Summit, June 2013

01:55:55 RT @USER_dc5e5498: Drop and give me 50....

05:09:29 I said u got a swisher from redmond!? He said nah kirkland! Lol..ooooooooOkay!

05:57:35 Lmao!:) havin a good ol time after work! Unexpected! #goodtimes

06:00:09 RT @USER_d5d93fec: #letsbereal .. No seriously, #letsbereal>>lol. Don't start.

06:00:37 On my way to get @USER_60939380 yeee! She want some of this strawberry! Sexy!

...

47°31’41’’ N 122°11’52’’ W11

Geotagged Twitter

Friday, September 5, 14

Page 22: Grounding Text

© 2013 Jason M Baldridge Text Analytics Summit, June 2013

Document geolocation: where is this person?

12

Friday, September 5, 14

Page 23: Grounding Text

© 2013 Jason M Baldridge Text Analytics Summit, June 201313

Amsterdam, Zaandam, Amstelveen, Diemen, Landsmeer ...

Frankfurt, Frechen, Hürth, Brühl, Wesseling, ...

Language modeling approach

Wing & Baldridge 2011: Simple supervised document geolocation with geodesic grids.

Friday, September 5, 14

Page 24: Grounding Text

© 2013 Jason M Baldridge Text Analytics Summit, June 201313

Amsterdam, Zaandam, Amstelveen, Diemen, Landsmeer ...

Frankfurt, Frechen, Hürth, Brühl, Wesseling, ...

Language modeling approach

Wing & Baldridge 2011: Simple supervised document geolocation with geodesic grids.

Friday, September 5, 14

Page 25: Grounding Text

© 2013 Jason M Baldridge Text Analytics Summit, June 2013

mountainbeach

wine barbecue

Where’s a word on Earth?

Friday, September 5, 14

Page 26: Grounding Text

© 2013 Jason M Baldridge Text Analytics Summit, June 2013

mountainbeach

wine barbecue

Where’s a word on Earth?

Friday, September 5, 14

Page 27: Grounding Text

© 2013 Jason M Baldridge Text Analytics Summit, June 2013

Locations of Twitter users are not uniformly distributed!

15

(Small) GeoUT (Twitter) plotted on Google Earth, one pin per user.

Density of (all) documents in GeoUT

over the USA (390 million tweets)

Friday, September 5, 14

Page 28: Grounding Text

© 2013 Jason M Baldridge Text Analytics Summit, June 2013

k-d tree for geotagged Wikipedia, looking at N. America

16

Roller, Speriosu, Rallapalli, Wing & Baldridge 2014: Supervised Text-based Geolocation Using Language Models on an Adaptive Grid.

Friday, September 5, 14

Page 29: Grounding Text

© 2013 Jason M Baldridge Text Analytics Summit, June 2013

k-d tree for geotagged Wikipedia, looking at N. America

16

Roller, Speriosu, Rallapalli, Wing & Baldridge 2014: Supervised Text-based Geolocation Using Language Models on an Adaptive Grid.

Friday, September 5, 14

Page 30: Grounding Text

© 2013 Jason M Baldridge Text Analytics Summit, June 2013

Pre-grid clustering [Erik Skiles, MA thesis, UT Austin, Ling]

17

Friday, September 5, 14

Page 31: Grounding Text

© 2013 Jason M Baldridge Text Analytics Summit, June 2013

Four clusters on GeoUT (390 million tweets)

18

Friday, September 5, 14

Page 32: Grounding Text

© 2013 Jason M Baldridge Text Analytics Summit, June 2013

Four clusters on GeoUT (390 million tweets)

18

West coast East coast Midwest & South Spanish language

All tweets

Friday, September 5, 14

Page 33: Grounding Text

[Serdyukov, Murdock, & van Zwol 2009; Cheng, Caverlee, & Lee 2010; Wing & Baldridge 2011]

Automatic document geolocation

Friday, September 5, 14

Page 34: Grounding Text

[Serdyukov, Murdock, & van Zwol 2009; Cheng, Caverlee, & Lee 2010; Wing & Baldridge 2011]

Automatic document geolocation

Friday, September 5, 14

Page 35: Grounding Text

© 2010 Jason M Baldridge Text Analytics Summit, June 2013

Image geo-location: http://graphics.cs.cmu.edu/projects/im2gps/

Friday, September 5, 14

Page 36: Grounding Text

© 2010 Jason M Baldridge Text Analytics Summit, June 2013

Performance (kd-tree with clustering)

21

Wikipedia (entire world)Half of documents geotagged within 12 km of truthPercent of documents within 166km (100 miles): 91%

Twitter (USA)Half of users geotagged within 330 km of truthPercent of documents within 166km (100 miles): 40%

For better or worse, it soon might not matter whether you have location turned on or not... what

you say is where you are / are from. (Also, other factors, e.g. who you are linked to, of course.)

Friday, September 5, 14

Page 37: Grounding Text

© 2010 Jason M Baldridge Text Analytics Summit, June 2013

Hierarchical geo-location with logistic regression

22

Wing & Baldridge 2014: Hierarchical Discriminative Classification for Text-Based Geolocation.

Friday, September 5, 14

Page 38: Grounding Text

© 2010 Jason M Baldridge Text Analytics Summit, June 2013

Performance (kd-tree with clustering)

23

Flickr (entire world)Half of documents geotagged within 18 km of truthPercent of documents within 166km (100 miles): 66%

Twitter (World)Half of users geotagged within 490 km of truthPercent of documents within 166km (100 miles): 31%

Twitter (USA)Half of users geotagged within 170 km of truthPercent of documents within 166km (100 miles): 49%

Friday, September 5, 14

Page 39: Grounding Text

© 2010 Jason M Baldridge Text Analytics Summit, June 2013

Hierarchical logistic regression beats flat naive Bayes

24

Naive Bayes Hierarchical LR

Twitter USA

Twitter World

Flickr

English Wikipedia

German Wikipedia

Portuguese Wikipedia

36.2 49.2

28.7 31.3

58.5 66.0

84.5 88.9

89.3 90.2

77.1 89.5

Accuracy @ 161 km, kd-tree grid

Friday, September 5, 14

Page 40: Grounding Text

© 2010 Jason M Baldridge Text Analytics Summit, June 2013

Logistic regression weights good features heavily

25

Friday, September 5, 14

Page 41: Grounding Text

© 2013 Jason M Baldridge Text Analytics Summit, June 2013

Toponym (place name) resolution

26

They visit Portland every year.

Friday, September 5, 14

Page 42: Grounding Text

© 2013 Jason M Baldridge Text Analytics Summit, June 2013

Toponym (place name) resolution

26

They visit Portland every year.

?

?

?

?

?

?

?

?

?

?

?

?

?

?

??

?

Which Portland? (Also: Canada, Australia, Ireland...)

Friday, September 5, 14

Page 43: Grounding Text

© 2013 Jason M Baldridge Text Analytics Summit, June 2013

Toponym resolution in context

27

Although Elisha Newman made the first land entry in the township of Portland (June, 1833), he did not become a settler until three years later, by which time a few settlers had located in the town. From Mr. Newman's story, it appears that early in 1833, he was visiting friends in Ann Arbor, and during an evening conversation discussed with others the subject of unlocated lands lying west of Ann Arbor. One of the company (Joseph Wood) remarked that he had been out with the party sent to survey Ionia and other counties, and that the surveyors were struck by the valuable water-power at the mouth of the Looking Glass River, saying there would surely be a village there some day.Mr. Newman was at once taken with the idea of locating lands at the mouth of the Looking Glass. Following up his impulse, he made ready to start at once, and, accompanied by James Newman and Joseph Wood, went out to the Looking Glass on a tour of inspection. Being satisfied with the location, he returned Eastward with his companions, and at White Pigeon made his land entry.Newman did not return for a permanent settlement until the spring of 1836, and meanwhile, in November, 1833, Philo Bogue bought a piece of land on section 28, in the bend of the Grand River, where he proposed to set up a trading post. Unaided he rolled up a log cabin near where the Detroit, Lansing, and Northern depot was located, and when he brought the house into decent shape went over to Hunt's at Lyons for his family, whom he had left there against such time as he should have affairs prepared for their comfort.

Friday, September 5, 14

Page 44: Grounding Text

© 2013 Jason M Baldridge Text Analytics Summit, June 2013

Spatial minimality

28

Although Elisha Newman made the first land entry in the township of Portland (June, 1833), he did not become a settler until three years later, by which time a few settlers had located in the town. From Mr. Newman's story, it appears that early in 1833, he was visiting friends in Ann Arbor, and during an evening conversation discussed with others the subject of unlocated lands lying west of Ann Arbor. One of the company (Joseph Wood) remarked that he had been out with the party sent to survey Ionia and other counties, and that the surveyors were struck by the valuable water-power at the mouth of the Looking Glass River, saying there would surely be a village there some day.

Mr. Newman was at once taken with the idea of locating lands at the mouth of the Looking Glass. Following up his impulse, he made ready to start at once, and, accompanied by James Newman and Joseph Wood, went out to the Looking Glass on a tour of inspection. Being satisfied with the location, he returned Eastward with his companions, and at White Pigeon made his land entry.

Newman did not return for a permanent settlement until the spring of 1836, and meanwhile, in November, 1833, Philo Bogue bought a piece of land on section 28, in the bend of the Grand River, where he proposed to set up a trading post. Unaided he rolled up a log cabin near where the Detroit, Lansing, and Northern depot was located, and when he brought the house into decent shape went over to Hunt's at Lyons for his family, whom he had left there against such time as he should have affairs prepared for their comfort.

Friday, September 5, 14

Page 45: Grounding Text

© 2013 Jason M Baldridge Text Analytics Summit, June 2013

Geo

Nam

es

4048392 Portland Mills Portland Mills 39.7781 -87.00918 P PPL US IN 133 0 223 218 America/Indiana/Indianapolis 2010-02-154084605 Portland Portland 32.15459 -87.1686 P PPL US AL 047 0 30 41 America/Chicago 2006-01-154127143 Portland Portland Portlend,Портленд 33.2379 -91.51151 P PPL US AR 003 430 38 39 America/Chicago 2011-05-144169227 Portland Portland 30.51242 -86.19578 P PPL US FL 131 0 8 14 America/Chicago 2006-01-154217115 Portland Portland 34.05732 -85.03634 P PPL US GA 233 0 229 228 America/New_York 2010-09-054277586 Portland Portland 37.0778 -97.31227 P PPL US KS 191 0 362 364 America/Chicago 2006-01-154305000 Portland Portland 37.12062 -85.44608 P PPL US KY 001 0 220 223 America/Chicago 2006-01-154305001 Portland Portland 38.26924 -85.8108 P PPL US KY 111 0 135 138 America/Kentucky/Louisville 2006-01-154305002 Portland Portland 38.74812 -84.44772 P PPL US KY 191 0 265 266 America/New_York 2006-01-15404289 Portland Portland Portlend,Портленд 38.71088 -91.71767 P PPL US MO 027 0 170 172 America/Chicago 2010-01-294521811 Portland Portland Portlend,Портленд 39.00341 -81.77124 P PPL US OH 105 0 187 188 America/New_York 2010-01-294650946 Portland Portland Portlend,Портленд 36.58171 -86.51638 P PPL US TN 165 11480 244 245 America/Chicago 2011-05-144720131 Portland Portland Portlend,Портленд 27.87725 -97.32388 P PPL US TX 409 15099 13 11 America/Chicago 2011-05-144841001 Portland Portland Portlend,Портленд 41.57288 -72.64065 P PPL US CT 007 5862 24 27 America/New_York 2011-05-144871855 Portland Portland 43.12858 -93.12354 P PPL US IA 033 35 327 330 America/Chicago 2011-05-144906524 Portland Portland 41.66253 -89.98012 P PPL US IL 195 0 190 190 America/Chicago 2006-01-155006314 Portland Portland Portlend,Портленд 42.8692 -84.90305 P PPL US MI 067 3883 221 223 America/Detroit 2011-05-145746545 Portland Portland 45.52345 -122.67621 P PPLA2 US OR 051 583776 12 15 America/Los_Angeles 2011-05-14

Spatial minimality

28

Although Elisha Newman made the first land entry in the township of Portland (June, 1833), he did not become a settler until three years later, by which time a few settlers had located in the town. From Mr. Newman's story, it appears that early in 1833, he was visiting friends in Ann Arbor, and during an evening conversation discussed with others the subject of unlocated lands lying west of Ann Arbor. One of the company (Joseph Wood) remarked that he had been out with the party sent to survey Ionia and other counties, and that the surveyors were struck by the valuable water-power at the mouth of the Looking Glass River, saying there would surely be a village there some day.

Mr. Newman was at once taken with the idea of locating lands at the mouth of the Looking Glass. Following up his impulse, he made ready to start at once, and, accompanied by James Newman and Joseph Wood, went out to the Looking Glass on a tour of inspection. Being satisfied with the location, he returned Eastward with his companions, and at White Pigeon made his land entry.

Newman did not return for a permanent settlement until the spring of 1836, and meanwhile, in November, 1833, Philo Bogue bought a piece of land on section 28, in the bend of the Grand River, where he proposed to set up a trading post. Unaided he rolled up a log cabin near where the Detroit, Lansing, and Northern depot was located, and when he brought the house into decent shape went over to Hunt's at Lyons for his family, whom he had left there against such time as he should have affairs prepared for their comfort.

Friday, September 5, 14

Page 46: Grounding Text

© 2013 Jason M Baldridge Text Analytics Summit, June 2013

Geo

Nam

es

4048392 Portland Mills Portland Mills 39.7781 -87.00918 P PPL US IN 133 0 223 218 America/Indiana/Indianapolis 2010-02-154084605 Portland Portland 32.15459 -87.1686 P PPL US AL 047 0 30 41 America/Chicago 2006-01-154127143 Portland Portland Portlend,Портленд 33.2379 -91.51151 P PPL US AR 003 430 38 39 America/Chicago 2011-05-144169227 Portland Portland 30.51242 -86.19578 P PPL US FL 131 0 8 14 America/Chicago 2006-01-154217115 Portland Portland 34.05732 -85.03634 P PPL US GA 233 0 229 228 America/New_York 2010-09-054277586 Portland Portland 37.0778 -97.31227 P PPL US KS 191 0 362 364 America/Chicago 2006-01-154305000 Portland Portland 37.12062 -85.44608 P PPL US KY 001 0 220 223 America/Chicago 2006-01-154305001 Portland Portland 38.26924 -85.8108 P PPL US KY 111 0 135 138 America/Kentucky/Louisville 2006-01-154305002 Portland Portland 38.74812 -84.44772 P PPL US KY 191 0 265 266 America/New_York 2006-01-15404289 Portland Portland Portlend,Портленд 38.71088 -91.71767 P PPL US MO 027 0 170 172 America/Chicago 2010-01-294521811 Portland Portland Portlend,Портленд 39.00341 -81.77124 P PPL US OH 105 0 187 188 America/New_York 2010-01-294650946 Portland Portland Portlend,Портленд 36.58171 -86.51638 P PPL US TN 165 11480 244 245 America/Chicago 2011-05-144720131 Portland Portland Portlend,Портленд 27.87725 -97.32388 P PPL US TX 409 15099 13 11 America/Chicago 2011-05-144841001 Portland Portland Portlend,Портленд 41.57288 -72.64065 P PPL US CT 007 5862 24 27 America/New_York 2011-05-144871855 Portland Portland 43.12858 -93.12354 P PPL US IA 033 35 327 330 America/Chicago 2011-05-144906524 Portland Portland 41.66253 -89.98012 P PPL US IL 195 0 190 190 America/Chicago 2006-01-155006314 Portland Portland Portlend,Портленд 42.8692 -84.90305 P PPL US MI 067 3883 221 223 America/Detroit 2011-05-145746545 Portland Portland 45.52345 -122.67621 P PPLA2 US OR 051 583776 12 15 America/Los_Angeles 2011-05-14

Spatial minimality

28

Ann ArborDetroit

IoniaLyons

PortlandWhite Pigeon

1>7>4

>15>17

1

# LocationsToponym

Although Elisha Newman made the first land entry in the township of Portland (June, 1833), he did not become a settler until three years later, by which time a few settlers had located in the town. From Mr. Newman's story, it appears that early in 1833, he was visiting friends in Ann Arbor, and during an evening conversation discussed with others the subject of unlocated lands lying west of Ann Arbor. One of the company (Joseph Wood) remarked that he had been out with the party sent to survey Ionia and other counties, and that the surveyors were struck by the valuable water-power at the mouth of the Looking Glass River, saying there would surely be a village there some day.

Mr. Newman was at once taken with the idea of locating lands at the mouth of the Looking Glass. Following up his impulse, he made ready to start at once, and, accompanied by James Newman and Joseph Wood, went out to the Looking Glass on a tour of inspection. Being satisfied with the location, he returned Eastward with his companions, and at White Pigeon made his land entry.

Newman did not return for a permanent settlement until the spring of 1836, and meanwhile, in November, 1833, Philo Bogue bought a piece of land on section 28, in the bend of the Grand River, where he proposed to set up a trading post. Unaided he rolled up a log cabin near where the Detroit, Lansing, and Northern depot was located, and when he brought the house into decent shape went over to Hunt's at Lyons for his family, whom he had left there against such time as he should have affairs prepared for their comfort.

Friday, September 5, 14

Page 47: Grounding Text

© 2013 Jason M Baldridge Text Analytics Summit, June 2013

Geo

Nam

es

4048392 Portland Mills Portland Mills 39.7781 -87.00918 P PPL US IN 133 0 223 218 America/Indiana/Indianapolis 2010-02-154084605 Portland Portland 32.15459 -87.1686 P PPL US AL 047 0 30 41 America/Chicago 2006-01-154127143 Portland Portland Portlend,Портленд 33.2379 -91.51151 P PPL US AR 003 430 38 39 America/Chicago 2011-05-144169227 Portland Portland 30.51242 -86.19578 P PPL US FL 131 0 8 14 America/Chicago 2006-01-154217115 Portland Portland 34.05732 -85.03634 P PPL US GA 233 0 229 228 America/New_York 2010-09-054277586 Portland Portland 37.0778 -97.31227 P PPL US KS 191 0 362 364 America/Chicago 2006-01-154305000 Portland Portland 37.12062 -85.44608 P PPL US KY 001 0 220 223 America/Chicago 2006-01-154305001 Portland Portland 38.26924 -85.8108 P PPL US KY 111 0 135 138 America/Kentucky/Louisville 2006-01-154305002 Portland Portland 38.74812 -84.44772 P PPL US KY 191 0 265 266 America/New_York 2006-01-15404289 Portland Portland Portlend,Портленд 38.71088 -91.71767 P PPL US MO 027 0 170 172 America/Chicago 2010-01-294521811 Portland Portland Portlend,Портленд 39.00341 -81.77124 P PPL US OH 105 0 187 188 America/New_York 2010-01-294650946 Portland Portland Portlend,Портленд 36.58171 -86.51638 P PPL US TN 165 11480 244 245 America/Chicago 2011-05-144720131 Portland Portland Portlend,Портленд 27.87725 -97.32388 P PPL US TX 409 15099 13 11 America/Chicago 2011-05-144841001 Portland Portland Portlend,Портленд 41.57288 -72.64065 P PPL US CT 007 5862 24 27 America/New_York 2011-05-144871855 Portland Portland 43.12858 -93.12354 P PPL US IA 033 35 327 330 America/Chicago 2011-05-144906524 Portland Portland 41.66253 -89.98012 P PPL US IL 195 0 190 190 America/Chicago 2006-01-155006314 Portland Portland Portlend,Портленд 42.8692 -84.90305 P PPL US MI 067 3883 221 223 America/Detroit 2011-05-145746545 Portland Portland 45.52345 -122.67621 P PPLA2 US OR 051 583776 12 15 America/Los_Angeles 2011-05-14

Spatial minimality

28

PortlandLyonsIonia

White Pigeon

Ann ArborDetroit

IoniaLyons

PortlandWhite Pigeon

1>7>4

>15>17

1

# LocationsToponym

Although Elisha Newman made the first land entry in the township of Portland (June, 1833), he did not become a settler until three years later, by which time a few settlers had located in the town. From Mr. Newman's story, it appears that early in 1833, he was visiting friends in Ann Arbor, and during an evening conversation discussed with others the subject of unlocated lands lying west of Ann Arbor. One of the company (Joseph Wood) remarked that he had been out with the party sent to survey Ionia and other counties, and that the surveyors were struck by the valuable water-power at the mouth of the Looking Glass River, saying there would surely be a village there some day.

Mr. Newman was at once taken with the idea of locating lands at the mouth of the Looking Glass. Following up his impulse, he made ready to start at once, and, accompanied by James Newman and Joseph Wood, went out to the Looking Glass on a tour of inspection. Being satisfied with the location, he returned Eastward with his companions, and at White Pigeon made his land entry.

Newman did not return for a permanent settlement until the spring of 1836, and meanwhile, in November, 1833, Philo Bogue bought a piece of land on section 28, in the bend of the Grand River, where he proposed to set up a trading post. Unaided he rolled up a log cabin near where the Detroit, Lansing, and Northern depot was located, and when he brought the house into decent shape went over to Hunt's at Lyons for his family, whom he had left there against such time as he should have affairs prepared for their comfort.

Friday, September 5, 14

Page 48: Grounding Text

© 2013 Jason M Baldridge Text Analytics Summit, June 2013

Spatial minimality often fails

29

I moved from Encinitas, CA, a nice beach town in North San Diego County to Asheville, NC. By far, Ashville is more hip, especially West Asheville. Asheville has a lot in common with Portland. Austin, I've never been to so I cannot comment. But what makes a place cool and hip, in my opinion are that give a area "punch". There are a lot of ingredients. One is geography. Add a college or university (and all that they bring- and draw), good restaurants, a good music scene, a progressive attitude and tolerance. Hmmm. I'm sure there are many more to ponder. But that's my start. Oh, lots of bars!

From: http://www.city-data.com/forum/austin/1694181-what-makes-city-like-austin-portland-3.html

City-data.com incorrectly marks “West” and “Portland” as the cities in Texas -- presumably because of their textual and spatial proximity to “Austin”.

Friday, September 5, 14

Page 49: Grounding Text

© 2013 Jason M Baldridge Text Analytics Summit, June 2013

Spatial minimality often fails

29

I moved from Encinitas, CA, a nice beach town in North San Diego County to Asheville, NC. By far, Ashville is more hip, especially West Asheville. Asheville has a lot in common with Portland. Austin, I've never been to so I cannot comment. But what makes a place cool and hip, in my opinion are that give a area "punch". There are a lot of ingredients. One is geography. Add a college or university (and all that they bring- and draw), good restaurants, a good music scene, a progressive attitude and tolerance. Hmmm. I'm sure there are many more to ponder. But that's my start. Oh, lots of bars!

From: http://www.city-data.com/forum/austin/1694181-what-makes-city-like-austin-portland-3.html

City-data.com incorrectly marks “West” and “Portland” as the cities in Texas -- presumably because of their textual and spatial proximity to “Austin”.

But: it is clear from the text that Portland, Oregon and Austin, Texas are the referents, though their states are never mentioned and are far from the other locations!

I moved from Encinitas, CA, a nice beach town in North San Diego County to Asheville, NC. By far, Ashville is more hip, especially West Asheville. Asheville has a lot in common with Portland. Austin, I've never been to so I cannot comment. But what makes a place cool and hip, in my opinion are that give a area "punch". There are a lot of ingredients. One is geography. Add a college or university (and all that they bring- and draw), good restaurants, a good music scene, a progressive attitude and tolerance. Hmmm. I'm sure there are many more to ponder. But that's my start. Oh, lots of bars!

Friday, September 5, 14

Page 50: Grounding Text

© 2013 Jason M Baldridge Text Analytics Summit, June 2013

Toponym classifiers

30

Strategy: build a textual classifier per toponym by obtaining indirectly labeled examples from Wikipedia.

Friday, September 5, 14

Page 51: Grounding Text

© 2013 Jason M Baldridge Text Analytics Summit, June 2013

Toponym classifiers

30

Strategy: build a textual classifier per toponym by obtaining indirectly labeled examples from Wikipedia.

Friday, September 5, 14

Page 52: Grounding Text

© 2013 Jason M Baldridge Text Analytics Summit, June 2013

Toponym classifiers

30

Strategy: build a textual classifier per toponym by obtaining indirectly labeled examples from Wikipedia.

Friday, September 5, 14

Page 53: Grounding Text

© 2013 Jason M Baldridge Text Analytics Summit, June 2013

Toponym classifiers

30

Strategy: build a textual classifier per toponym by obtaining indirectly labeled examples from Wikipedia.

Friday, September 5, 14

Page 54: Grounding Text

© 2013 Jason M Baldridge Text Analytics Summit, June 2013

Toponym classifiers

30

Strategy: build a textual classifier per toponym by obtaining indirectly labeled examples from Wikipedia.

Friday, September 5, 14

Page 55: Grounding Text

© 2013 Jason M Baldridge Text Analytics Summit, June 2013

Toponym classifiers

30

Strategy: build a textual classifier per toponym by obtaining indirectly labeled examples from Wikipedia.

Friday, September 5, 14

Page 56: Grounding Text

© 2013 Jason M Baldridge Text Analytics Summit, June 2013

Toponym classifiers

30

Strategy: build a textual classifier per toponym by obtaining indirectly labeled examples from Wikipedia.

P(Portland-OR|music) > P(Portland-ME|music)P(Portland-OR|wharf ) < P(Portland-ME|wharf )

Friday, September 5, 14

Page 57: Grounding Text

© 2013 Jason M Baldridge Text Analytics Summit, June 2013

Results: disambiguating toponyms

31

Average error distance

Accuracy Average error distance

Accuracy

Population

SPIDER(spatial minimality)

WISTR(Wiki supervised)

SPIDER+WISTR

216 81.0 1749 59.7

2180 30.9 266 57.5

279 82.3 855 69.1

430 81.8 201 85.9

TR-CoNLLReuters News Texts

August 1996

Perseus Civil War CorpusBooks

Late 19th Century

Take-home message: text classifiers are very effective & can be boosted by spatial minimality algorithms.

Friday, September 5, 14

Page 58: Grounding Text

© 2013 Jason M Baldridge Text Analytics Summit, June 2013

Identifying, disambiguating, and displaying toponyms

32

Friday, September 5, 14

Page 59: Grounding Text

© 2013 Jason M Baldridge Text Analytics Summit, June 2013

Back to grounding

33

Grounding often involves connecting text to knowledge sources and other modalities (image, video) & bootstrapping.

Friday, September 5, 14

Page 60: Grounding Text

© 2013 Jason M Baldridge Text Analytics Summit, June 2013

Back to grounding

33

Grounding often involves connecting text to knowledge sources and other modalities (image, video) & bootstrapping.

Also, they can help us create models for deeper aspects of language, such as syntactic structure and logical form.

Friday, September 5, 14

Page 61: Grounding Text

© 2013 Jason M Baldridge Text Analytics Summit, June 2013

Lexical brain decoding [Yarkoni, Poldrack, Nichols, Van Essen & Wager (2011)]

34

Friday, September 5, 14

Page 62: Grounding Text

© 2013 Jason M Baldridge Text Analytics Summit, June 2013

Lexical brain decoding [Yarkoni, Poldrack, Nichols, Van Essen & Wager (2011)]

34

Friday, September 5, 14

Page 63: Grounding Text

He says, she says http://www.tweetolife.com/gender/

Friday, September 5, 14

Page 64: Grounding Text

© 2013 Jason M Baldridge Text Analytics Summit, June 2013

Temporality of words, by hour http://www.tweetolife.com/hour/

36

Friday, September 5, 14

Page 65: Grounding Text

© 2013 Jason M Baldridge Text Analytics Summit, June 2013

Temporality of words, by hour http://www.tweetolife.com/hour/

36

Friday, September 5, 14

Page 66: Grounding Text

© 2013 Jason M Baldridge Text Analytics Summit, June 2013

Temporality of expressions, by day: http://www.google.com/trends

37

Friday, September 5, 14

Page 67: Grounding Text

© 2013 Jason M Baldridge Text Analytics Summit, June 2013

Temporality of expressions, by day: http://www.google.com/trends

37

Friday, September 5, 14

Page 68: Grounding Text

© 2013 Jason M Baldridge Text Analytics Summit, June 2013

Temporality of expressions, by year: http://ngrams.googlelabs.com/

38

slavetrenches aircraft

war

Friday, September 5, 14

Page 69: Grounding Text

© 2013 Jason M Baldridge Text Analytics Summit, June 2013

Temporal resolution [Kumar, Lease, and Baldridge 2011]

39

2000

BC

0 A

D

2000

AD

4000

BC

Friday, September 5, 14

Page 70: Grounding Text

© 2013 Jason M Baldridge Text Analytics Summit, June 2013

Temporal resolution [Kumar, Lease, and Baldridge 2011]

39

2000

BC

0 A

D

2000

AD

4000

BC

Friday, September 5, 14

Page 71: Grounding Text

© 2013 Jason M Baldridge Text Analytics Summit, June 2013

Temporal resolution [Kumar, Lease, and Baldridge 2011]

39

2000

BC

0 A

D

2000

AD

4000

BC

Friday, September 5, 14

Page 72: Grounding Text

© 2013 Jason M Baldridge Text Analytics Summit, June 2013

Temporal resolution [Kumar, Lease, and Baldridge 2011]

39

2000

BC

0 A

D

2000

AD

4000

BC

Friday, September 5, 14

Page 73: Grounding Text

© 2013 Jason M Baldridge Text Analytics Summit, June 2013

Temporal resolution [Kumar, Lease, and Baldridge 2011]

39

2000

BC

0 A

D

2000

AD

4000

BC

Friday, September 5, 14

Page 74: Grounding Text

© 2013 Jason M Baldridge Text Analytics Summit, June 2013

Temporal resolution [Kumar, Lease, and Baldridge 2011]

39

2000

BC

0 A

D

2000

AD

4000

BC

Friday, September 5, 14

Page 75: Grounding Text

© 2013 Jason M Baldridge Text Analytics Summit, June 2013

More modalities: videos [Motwani & Mooney, 2012]

40

Friday, September 5, 14

Page 76: Grounding Text

© 2013 Jason M Baldridge Text Analytics Summit, June 2013

Beyond word co-occurences for vector-space models

41

bear boat car cow hadoop snow water wrench

3 234 42 4 1 2 325 0beach

Friday, September 5, 14

Page 77: Grounding Text

© 2013 Jason M Baldridge Text Analytics Summit, June 2013

Beyond word co-occurences for vector-space models

41

bear boat car cow hadoop snow water wrench

3 234 42 4 1 2 325 0

beach

Friday, September 5, 14

Page 78: Grounding Text

© 2013 Jason M Baldridge Text Analytics Summit, June 2013

Beyond word co-occurences for vector-space models

41

bear boat car cow hadoop snow water wrench

3 234 42 4 1 2 325 0

beach

Friday, September 5, 14

Page 79: Grounding Text

© 2013 Jason M Baldridge Text Analytics Summit, June 2013

Beyond word co-occurences for vector-space models

41

bear boat car cow hadoop snow water wrench

3 234 42 4 1 2 325 0

beach

Friday, September 5, 14

Page 80: Grounding Text

© 2013 Jason M Baldridge Text Analytics Summit, June 2013

Beyond word co-occurences for vector-space models

41

bear boat car cow hadoop snow water wrench

3 234 42 4 1 2 325 0

beach

Friday, September 5, 14

Page 81: Grounding Text

© 2013 Jason M Baldridge Text Analytics Summit, June 2013

Beyond word co-occurences for vector-space models

41

bear boat car cow hadoop snow water wrench

3 234 42 4 1 2 325 0

beach

Friday, September 5, 14

Page 82: Grounding Text

© 2013 Jason M Baldridge Text Analytics Summit, June 2013

Beyond word co-occurences for vector-space models

41

bear boat car cow hadoop snow water wrench

3 234 42 4 1 2 325 0

beach

Friday, September 5, 14

Page 83: Grounding Text

© 2013 Jason M Baldridge Text Analytics Summit, June 2013

Beyond word co-occurences for vector-space models

41

bear boat car cow hadoop snow water wrench

3 234 42 4 1 2 325 0

beach

Friday, September 5, 14

Page 84: Grounding Text

© 2013 Jason M Baldridge Text Analytics Summit, June 2013

Beyond word co-occurences for vector-space models

41

bear boat car cow hadoop snow water wrench

3 234 42 4 1 2 325 0

beach

Friday, September 5, 14

Page 85: Grounding Text

© 2013 Jason M Baldridge Text Analytics Summit, June 2013

Combining distributional models with logics

42

Erk (2013): “Towards a semantics for distributional representations.”

Garrette et al (2012): “A formal approach to linking logical form and vector-space lexical semantics”Beltagy et al (2013): “Montague Meets Markov: Deep Semantics with Probabilistic Logical Form”

Friday, September 5, 14

Page 86: Grounding Text

© 2013 Jason M Baldridge Text Analytics Summit, June 2013

Multi-component structured vector-space models

43

beachchildren

visit

the children visit the beach

Agent Patient

Friday, September 5, 14

Page 87: Grounding Text

© 2013 Jason M Baldridge Text Analytics Summit, June 2013

Language learning in context [Kim & Mooney, 2013]

44

Friday, September 5, 14

Page 88: Grounding Text

© 2013 Jason M Baldridge Text Analytics Summit, June 2013

Language learning in context [Kim & Mooney, 2013]

44

Friday, September 5, 14

Page 89: Grounding Text

All your meaning are belong to us

Friday, September 5, 14

Page 90: Grounding Text

All your meaning are belong to us

Friday, September 5, 14

Page 91: Grounding Text

All your meaning are belong to us

Friday, September 5, 14

Page 92: Grounding Text

http://davidrothman.net/2009/09/02/all-your-healthbase-are-belong-to-us-want-em-back/

Grounding matters

Friday, September 5, 14

Page 93: Grounding Text

Junto - label propagationhttps://github.com/scalanlp/junto

Textgrounder - document geolocationhttps://github.com/utcompling/textgrounder

Fieldspring - toponym resolutionhttps://github.com/utcompling/fieldspring

Low-resource POS tagginghttps://github.com/dhgarrette/low-resource-

pos-tagging-2013

Updown - polarity classificationhttps://github.com/scalanlp/junto

OpenNLP - machine learning / NLPhttp://opennlp.apache.org/

Open Source Software (Scala/Java)

Friday, September 5, 14

Page 94: Grounding Text

Junto - label propagationhttps://github.com/scalanlp/junto

Textgrounder - document geolocationhttps://github.com/utcompling/textgrounder

Fieldspring - toponym resolutionhttps://github.com/utcompling/fieldspring

Low-resource POS tagginghttps://github.com/dhgarrette/low-resource-

pos-tagging-2013

Updown - polarity classificationhttps://github.com/scalanlp/junto

OpenNLP - machine learning / NLPhttp://opennlp.apache.org/

Nak - machine learninghttps://github.com/scalanlp/nak

Chalk - NLPhttps://github.com/scalanlp/chalk

Breeze - linear algebrahttps://github.com/scalanlp/nak

Scal

aNLP

Open Source Software (Scala/Java)

Friday, September 5, 14

Page 95: Grounding Text

Junto - label propagationhttps://github.com/scalanlp/junto

Textgrounder - document geolocationhttps://github.com/utcompling/textgrounder

Fieldspring - toponym resolutionhttps://github.com/utcompling/fieldspring

Low-resource POS tagginghttps://github.com/dhgarrette/low-resource-

pos-tagging-2013

Updown - polarity classificationhttps://github.com/scalanlp/junto

OpenNLP - machine learning / NLPhttp://opennlp.apache.org/

Nak - machine learninghttps://github.com/scalanlp/nak

Chalk - NLPhttps://github.com/scalanlp/chalk

Breeze - linear algebrahttps://github.com/scalanlp/nak

Scal

aNLP

Open Source Software (Scala/Java)

Friday, September 5, 14

Page 96: Grounding Text

This research was sponsored by:

Grant: W911NF-10-1-0533Grant from the Morris Memorial Trust Fund

- Walt Whitman, A Song of the Rolling Earth (in Leaves of Grass)

Final note: Whitman had it right many years ago!

Friday, September 5, 14

Page 97: Grounding Text

Supervision- documents labeled with latitude & longitude

Methods- Language Modeling for Information Retrieval

Code- Textgrounder: https://github.com/utcompling/textgrounder

Publications- Stephen Roller, Mike Speriosu, Sarat Rallapalli, Benjamin Wing and Jason Baldridge. 2012. Supervised Text-based Geolocation Using Language Models on an Adaptive Grid. EMNLP 2012. Jeju, Korea.- Benjamin Wing and Jason Baldridge. 2011. Simple supervised document geolocation with geodesic grids. In Proceedings of ACL HLT 2011.

Document geolocation

Friday, September 5, 14

Page 98: Grounding Text

Supervision- indirectly acquired toponym annotations using a gazeteer and geo-annotated Wikipedia

Methods- logistic regression- named entity recognition

Code- Fieldspring: https://github.com/utcompling/fieldspring

Publications- Mike Speriosu and Jason Baldridge. Text-Driven Toponym Resolution using Indirect Supervision. To appear in proceedings of ACL 2013.

Toponym resolution

Friday, September 5, 14