1 next generation cybertools: social science research using web data a project of cornell...
DESCRIPTION
The Internet Archive Complete crawls of the Web, every two months since 1996 Range of formats and depth of crawl have increased with time. No protected sites. Total archive is about 600 TByte Rate of increase is about 1 TByte/dayTRANSCRIPT
![Page 1: 1 Next Generation Cybertools: Social Science Research using Web Data A project of Cornell University…](https://reader036.vdocuments.site/reader036/viewer/2022090107/5a4d1c0c7f8b9ab0599f46a9/html5/thumbnails/1.jpg)
1
Next Generation Cybertools:Social Science Research using Web Data
A project of Cornell University and the Internet Archive,
Funded by the National Science Foundation
![Page 2: 1 Next Generation Cybertools: Social Science Research using Web Data A project of Cornell University…](https://reader036.vdocuments.site/reader036/viewer/2022090107/5a4d1c0c7f8b9ab0599f46a9/html5/thumbnails/2.jpg)
2
The NSF Cybertools Grant
Sociology: Michael Macy (Principal Investigator), David Strang
Computing and Information Science: Bill Arms, Dan Huttenlocher, Jon Kleinberg
Very Large Semi-Structured Datasets for Social Science Research
"Computer scientists have learned through experience that it is usually best to build software tools in close collaboration with users. Hence, our proposal is two-fold – to build an intelligent front-end that will make the Internet Archive data broadly accessible to social scientists, and to develop, test, and refine these tools through a specific research application – the diffusion of innovation."
Begins January 2006
![Page 3: 1 Next Generation Cybertools: Social Science Research using Web Data A project of Cornell University…](https://reader036.vdocuments.site/reader036/viewer/2022090107/5a4d1c0c7f8b9ab0599f46a9/html5/thumbnails/3.jpg)
The Internet Archive• Complete crawls of the
Web, every two months since 1996
• Range of formats and depth of crawl have increased with time.
• No protected sites.
• Total archive is about 600 TByte
• Rate of increase is about 1 TByte/day
![Page 4: 1 Next Generation Cybertools: Social Science Research using Web Data A project of Cornell University…](https://reader036.vdocuments.site/reader036/viewer/2022090107/5a4d1c0c7f8b9ab0599f46a9/html5/thumbnails/4.jpg)
4
New Opportunities forSocial Science Research
The Web as a social phenomenon
Political campaigns
Online retailing
Self-publication (blogs)
The Web as evidence
The spread of urban legends ("Einstein failed mathematics")
Diffusion of innovation (e.g. free hotel wireless internet)
Polarization of opinion
![Page 5: 1 Next Generation Cybertools: Social Science Research using Web Data A project of Cornell University…](https://reader036.vdocuments.site/reader036/viewer/2022090107/5a4d1c0c7f8b9ab0599f46a9/html5/thumbnails/5.jpg)
5
Using the Web Laboratory
Using the laboratory
If you would like to use the Web Laboratory for your research or teaching, please contact me. The order in which we build the services will be decided by the demands of the users.
![Page 6: 1 Next Generation Cybertools: Social Science Research using Web Data A project of Cornell University…](https://reader036.vdocuments.site/reader036/viewer/2022090107/5a4d1c0c7f8b9ab0599f46a9/html5/thumbnails/6.jpg)
6
Thanks
This work would not be possible without the forethought and longstanding commitment of the Internet Archive to capture and preserve the content of the Web for future generations.
The work is funded in part by National Science Foundation grant 0403340, with equipment support from Unisys.
The Cornell Theory Center's support for this project is funded in part by Microsoft, Dell and Intel.