spatial analysis of news sources
DESCRIPTION
Spatial Analysis of News Sources. Stony Brook University www.textmap.com. Andrew Mehler, Steven Skiena, Yunfan Bao, Xin Li, Yue Wang. Computational News Analysis. Lydia: Large scale newspaper analysis. Obtain data on how the volume of news coverage varies by location. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Spatial Analysis of News Sources](https://reader034.vdocuments.site/reader034/viewer/2022042616/56814573550346895db242f5/html5/thumbnails/1.jpg)
Spatial Analysis of News Sources
Andrew Mehler, Steven Skiena, Yunfan Bao, Xin Li, Yue Wang
Stony Brook Universitywww.textmap.com
![Page 2: Spatial Analysis of News Sources](https://reader034.vdocuments.site/reader034/viewer/2022042616/56814573550346895db242f5/html5/thumbnails/2.jpg)
Computational News Analysis
• Lydia: Large scale newspaper analysis.• Obtain data on how the volume of news
coverage varies by location.• Our paper describes how we calculate,
display, and evaluate spatial bias in news sources.
![Page 3: Spatial Analysis of News Sources](https://reader034.vdocuments.site/reader034/viewer/2022042616/56814573550346895db242f5/html5/thumbnails/3.jpg)
Who Is Running For President?
Stony Brook University
![Page 4: Spatial Analysis of News Sources](https://reader034.vdocuments.site/reader034/viewer/2022042616/56814573550346895db242f5/html5/thumbnails/4.jpg)
Mark Foley Scandal
![Page 5: Spatial Analysis of News Sources](https://reader034.vdocuments.site/reader034/viewer/2022042616/56814573550346895db242f5/html5/thumbnails/5.jpg)
Who is Looking for a Manager?
![Page 6: Spatial Analysis of News Sources](https://reader034.vdocuments.site/reader034/viewer/2022042616/56814573550346895db242f5/html5/thumbnails/6.jpg)
Steve Nash’s Teams
![Page 7: Spatial Analysis of News Sources](https://reader034.vdocuments.site/reader034/viewer/2022042616/56814573550346895db242f5/html5/thumbnails/7.jpg)
Lydia (textmap.com)Data-maps are a component of the Lydia system. The data generated from the Lydia system drives the data-map creation.
Monitors ~1000 newspapers every day and also other sources.
Components of Lydia include….
![Page 8: Spatial Analysis of News Sources](https://reader034.vdocuments.site/reader034/viewer/2022042616/56814573550346895db242f5/html5/thumbnails/8.jpg)
Named Entity Recognition
Saddam Hussein’s chief lawyer warned Sunday of worsening violence in Iraq and chaos across the Mideast if the ex-president is sentenced to death at his trial for a crackdown on a Shiite Muslim village in the 1980s. Khalil al-Dulaimi also said he would break a month long boycott and attend proceedings Monday when Saddam's second trial resumes on separate charges of genocide against the Kurds.
![Page 9: Spatial Analysis of News Sources](https://reader034.vdocuments.site/reader034/viewer/2022042616/56814573550346895db242f5/html5/thumbnails/9.jpg)
Segmentation and Classification
Saddam Hussein’s chief lawyer warned Sunday of worsening violence in Iraq and chaos across the Mideast if the ex-president is sentenced to death at his trial for a crackdown on a Shiite Muslim village in the 1980s. Khalil al-Dulaimi also said he would break a month long boycott and attend proceedings Monday when Saddam's second trial resumes on separate charges of genocide against the Kurds.
![Page 10: Spatial Analysis of News Sources](https://reader034.vdocuments.site/reader034/viewer/2022042616/56814573550346895db242f5/html5/thumbnails/10.jpg)
Favorite Things
![Page 11: Spatial Analysis of News Sources](https://reader034.vdocuments.site/reader034/viewer/2022042616/56814573550346895db242f5/html5/thumbnails/11.jpg)
Social Network
![Page 12: Spatial Analysis of News Sources](https://reader034.vdocuments.site/reader034/viewer/2022042616/56814573550346895db242f5/html5/thumbnails/12.jpg)
Juxtaposition Analysis
![Page 13: Spatial Analysis of News Sources](https://reader034.vdocuments.site/reader034/viewer/2022042616/56814573550346895db242f5/html5/thumbnails/13.jpg)
Article Categorization
![Page 14: Spatial Analysis of News Sources](https://reader034.vdocuments.site/reader034/viewer/2022042616/56814573550346895db242f5/html5/thumbnails/14.jpg)
Related Work
• Visualizing Data (Tufte)
• Geographic Visualization (Slocum, McMaster, Kessler, Howard)
• Data Maps / Color Schemes (Brewer)
• Quantitative Geography (Fotheringham, Brunsdon, Charlton)
• Spatial Data-Mining (Miller, Han)
• Spatial Interpolation / Smoothing (Fuentes, Stein)
![Page 15: Spatial Analysis of News Sources](https://reader034.vdocuments.site/reader034/viewer/2022042616/56814573550346895db242f5/html5/thumbnails/15.jpg)
Outline of this Talk
News/Data Acquisition
Source-Influence Modeling
Spatial Visualization
Identification of Spatially Biased Maps
Conclusions
![Page 16: Spatial Analysis of News Sources](https://reader034.vdocuments.site/reader034/viewer/2022042616/56814573550346895db242f5/html5/thumbnails/16.jpg)
News AcquisitionSpiders - Programs that crawl a web domain and download all of the pages. Universal Spider built using wget.
Still need customization
• Cookies / Logins
• Page Structure / formatting / Advertisements
• Each paper ~ 40-130MB in 20-80 minutes.
• ~800 U.S. papers and ~300 foreign papers.
Duplicate Articles?
• Syndication, Persistence, Ongoing Stories
![Page 17: Spatial Analysis of News Sources](https://reader034.vdocuments.site/reader034/viewer/2022042616/56814573550346895db242f5/html5/thumbnails/17.jpg)
Duplicate Detection
Despite playing without three injured defensive starters and losing another early, the Giants held Tampa Bay to 174 total yards and set up a score with a turnover deep in Buccaneers' territory in a 17-3 victory Sunday that gave New York its fourth straight win.
Despite playing without three injured defensive starters and losing another early, the Giants held Tampa Bay to 174 total yards and set up a score with a turnover deep in Buccaneers' territory in a 17-3 victory Sunday.
![Page 18: Spatial Analysis of News Sources](https://reader034.vdocuments.site/reader034/viewer/2022042616/56814573550346895db242f5/html5/thumbnails/18.jpg)
Despite playing without three injured defensive starters and losing another early, the Giants held Tampa Bay to 174 total yards and set up a score with a turnover deep in Buccaneers' territory in a 17-3 victory Sunday that gave New York its fourth straight win.
Despite playing without three injured defensive starters and losing another early, the Giants held Tampa Bay to 174 total yards and set up a score with a turnover deep in Buccaneers' territory in a 17-3 victory Sunday.
Character Windows
![Page 19: Spatial Analysis of News Sources](https://reader034.vdocuments.site/reader034/viewer/2022042616/56814573550346895db242f5/html5/thumbnails/19.jpg)
Despite playing without three injured defensive starters and losing another early, the Giants held Tampa Bay to 174 total yards and set up a score with a turnover deep in Buccaneers' territory in a 17-3 victory Sunday that gave New York its fourth straight win.
Despite playing without three injured defensive starters and losing another early, the Giants held Tampa Bay to 174 total yards and set up a score with a turnover deep in Buccaneers' territory in a 17-3 victory Sunday.
Most Windows Equal in Duplicates
![Page 20: Spatial Analysis of News Sources](https://reader034.vdocuments.site/reader034/viewer/2022042616/56814573550346895db242f5/html5/thumbnails/20.jpg)
Document 1: 17, 29, 113, 30, 25, 10, 130, 128, 50, 119, 190, 1979
Document 2: 17, 29, 113, 30, 25, 10, 130, 128, 50
Hash Codes For Windows
![Page 21: Spatial Analysis of News Sources](https://reader034.vdocuments.site/reader034/viewer/2022042616/56814573550346895db242f5/html5/thumbnails/21.jpg)
Document 1: 17, 29, 113, 30, 25, 10, 130, 128, 50, 119, 190, 1979
Document 2: 17, 29, 113, 30, 25, 10, 130, 128, 50
Size Reduction
![Page 22: Spatial Analysis of News Sources](https://reader034.vdocuments.site/reader034/viewer/2022042616/56814573550346895db242f5/html5/thumbnails/22.jpg)
Document 1: 17, 29, 113, 30, 25, 10, 130, 128, 50, 119, 190, 1979
Document 2: 17, 29, 113, 30, 25, 10, 130, 128, 50
Size Reduction
![Page 23: Spatial Analysis of News Sources](https://reader034.vdocuments.site/reader034/viewer/2022042616/56814573550346895db242f5/html5/thumbnails/23.jpg)
Outline of this Talk
News/Data Acquisition
Source-Influence Modeling
Spatial Visualization
Identification of Spatially Biased Maps
Conclusions
![Page 24: Spatial Analysis of News Sources](https://reader034.vdocuments.site/reader034/viewer/2022042616/56814573550346895db242f5/html5/thumbnails/24.jpg)
Combining News Influence
How do we combine all the newspapers that are read in an area?
In Bloomsburg, PA people might read• The New York Times• The Philadelphia Inquirer• The Bloomsburg Press Enterprise
What Is Reflective of Bloomsburg’s Interests?
![Page 25: Spatial Analysis of News Sources](https://reader034.vdocuments.site/reader034/viewer/2022042616/56814573550346895db242f5/html5/thumbnails/25.jpg)
Linear Decay Model
Bloomsburg NY TimesPhiladelphia
![Page 26: Spatial Analysis of News Sources](https://reader034.vdocuments.site/reader034/viewer/2022042616/56814573550346895db242f5/html5/thumbnails/26.jpg)
Influence Model
To estimate the contributions of different sources, we develop an influence model.
The influence is a function on cities and sources, quantifying how influential a source is in a particular city.
Influence(New York Times, Baltimore) = ?
The frequency of reference estimate for a city is then a weighted average over the sources.
F(Knicks, NY) = ∑F(Knicks,s)*influence(s,NY) / ∑influence(s,NY)
![Page 27: Spatial Analysis of News Sources](https://reader034.vdocuments.site/reader034/viewer/2022042616/56814573550346895db242f5/html5/thumbnails/27.jpg)
Readership Estimate
The readership of a paper is estimated by combining the papers circulation with its alexa.com rpm (reach per million).
We can then estimate the radius of a newspapers influence by making 10% of the population covered equal the readership.
The influence function decays linearly with distance from the source, and 0 outside its radius of influence.
• Big papers have a larger influence than small papers.• Potential readership base not a factor.• Is linear decay the right model?• Some large papers have national distributions.
![Page 28: Spatial Analysis of News Sources](https://reader034.vdocuments.site/reader034/viewer/2022042616/56814573550346895db242f5/html5/thumbnails/28.jpg)
Outline of this Talk
News/Data Acquisition
Source-Influence Modeling
Spatial Visualization
Identification of Spatially Biased Maps
Conclusions
![Page 29: Spatial Analysis of News Sources](https://reader034.vdocuments.site/reader034/viewer/2022042616/56814573550346895db242f5/html5/thumbnails/29.jpg)
Visualization Issues• Representing United States SurfaceTriangle (Shewchuk) used to create a Delauney triangulation of the cities.
• Interpolating Surface from Point Data (cities)
![Page 30: Spatial Analysis of News Sources](https://reader034.vdocuments.site/reader034/viewer/2022042616/56814573550346895db242f5/html5/thumbnails/30.jpg)
Visualization
Mesa/openGl used to render maps.
Relative color scale, max heat hottest red.
![Page 31: Spatial Analysis of News Sources](https://reader034.vdocuments.site/reader034/viewer/2022042616/56814573550346895db242f5/html5/thumbnails/31.jpg)
Absolute Color Scale
2 maps directly comparable
![Page 32: Spatial Analysis of News Sources](https://reader034.vdocuments.site/reader034/viewer/2022042616/56814573550346895db242f5/html5/thumbnails/32.jpg)
Outline of this Talk
News/Data Acquisition
Source-Influence Modeling
Spatial Visualization
Identification of Spatially Biased Maps
Conclusions
![Page 33: Spatial Analysis of News Sources](https://reader034.vdocuments.site/reader034/viewer/2022042616/56814573550346895db242f5/html5/thumbnails/33.jpg)
Which Maps are Interesting?
How can we Identify the Terms With A Geographic Bias?Don’t want to look through all 200,000 entities!
How do we Quantify Geographic Bias?
![Page 34: Spatial Analysis of News Sources](https://reader034.vdocuments.site/reader034/viewer/2022042616/56814573550346895db242f5/html5/thumbnails/34.jpg)
Variance AnalysisOur Analysis Gives frequency estimates for 25,374 cities.We defined 2 measures based on variance.
• Variance: The variance of the 25,374 values.• Weighted Variance: The variance divided by the mean.
Var: 7.06e-09 W-Var: 7.11e-05
Var: 6.24e-07 W-Var: 3.00e-03
Can’t distinguish a bipolar map from a checkerboard map.
![Page 35: Spatial Analysis of News Sources](https://reader034.vdocuments.site/reader034/viewer/2022042616/56814573550346895db242f5/html5/thumbnails/35.jpg)
Component Analysis
Consider what happens to the number of connected components if you only consider cities above a certain value.
![Page 36: Spatial Analysis of News Sources](https://reader034.vdocuments.site/reader034/viewer/2022042616/56814573550346895db242f5/html5/thumbnails/36.jpg)
Component Analysis
Consider what happens to the number of connected components if you only consider cities above a certain value.
![Page 37: Spatial Analysis of News Sources](https://reader034.vdocuments.site/reader034/viewer/2022042616/56814573550346895db242f5/html5/thumbnails/37.jpg)
Component Analysis
Consider what happens to the number of connected components if you only consider cities above a certain value.
![Page 38: Spatial Analysis of News Sources](https://reader034.vdocuments.site/reader034/viewer/2022042616/56814573550346895db242f5/html5/thumbnails/38.jpg)
Component Analysis
In a biased map, we expect the largest values to be clustered together.
![Page 39: Spatial Analysis of News Sources](https://reader034.vdocuments.site/reader034/viewer/2022042616/56814573550346895db242f5/html5/thumbnails/39.jpg)
Component Analysis
In an unbiased map, we expect many random clusters of high heat. Not the single cluster we expect in biased maps.
![Page 40: Spatial Analysis of News Sources](https://reader034.vdocuments.site/reader034/viewer/2022042616/56814573550346895db242f5/html5/thumbnails/40.jpg)
Component Measures
• Largest Gap: The value of the largest gap. A large gap suggests the entity is drawn from 2 different distributions, local and national.
• Weighted Gap: Largest divided by max. • Percentage Gap: Percentage Change.
![Page 41: Spatial Analysis of News Sources](https://reader034.vdocuments.site/reader034/viewer/2022042616/56814573550346895db242f5/html5/thumbnails/41.jpg)
Evaluating Bias Measures
To evaluate the measures, we made 4 sets of data maps…
![Page 42: Spatial Analysis of News Sources](https://reader034.vdocuments.site/reader034/viewer/2022042616/56814573550346895db242f5/html5/thumbnails/42.jpg)
Random Entity: Uniform
![Page 43: Spatial Analysis of News Sources](https://reader034.vdocuments.site/reader034/viewer/2022042616/56814573550346895db242f5/html5/thumbnails/43.jpg)
Random Entity: Binomial
![Page 44: Spatial Analysis of News Sources](https://reader034.vdocuments.site/reader034/viewer/2022042616/56814573550346895db242f5/html5/thumbnails/44.jpg)
Unbiased Entity
![Page 45: Spatial Analysis of News Sources](https://reader034.vdocuments.site/reader034/viewer/2022042616/56814573550346895db242f5/html5/thumbnails/45.jpg)
Biased Entity
![Page 46: Spatial Analysis of News Sources](https://reader034.vdocuments.site/reader034/viewer/2022042616/56814573550346895db242f5/html5/thumbnails/46.jpg)
Results
Data Set Size400 biased128 unbiased200 uniform200 binomial
![Page 47: Spatial Analysis of News Sources](https://reader034.vdocuments.site/reader034/viewer/2022042616/56814573550346895db242f5/html5/thumbnails/47.jpg)
Discriminating Real Data
![Page 48: Spatial Analysis of News Sources](https://reader034.vdocuments.site/reader034/viewer/2022042616/56814573550346895db242f5/html5/thumbnails/48.jpg)
Future Work
• Improved Map visualization• Sentiment Data Maps.• Animated maps showing temporal changes in popularity.• Improved influence models.• Empirical justifications of models.• Improved bias estimators.