tools for multivariate, evolving scientometric visualizations angela zoss, m.s
DESCRIPTION
Tools for Multivariate, Evolving Scientometric Visualizations Angela Zoss, M.S. Research Assistant, Cyberinfrastructure for Network Science Center Doctoral Student, School of Library and Information Science Indiana University, Bloomington, IN [email protected] - PowerPoint PPT PresentationTRANSCRIPT
Tools for Multivariate, Evolving Scientometric Visualizations
Angela Zoss, M.S.Research Assistant, Cyberinfrastructure for Network Science CenterDoctoral Student, School of Library and Information ScienceIndiana University, Bloomington, [email protected]
March 23, 2011 – Mining the Digital Traces of ScienceMOMA, ISC-PIF and CREA, CNRS/École PolytechniqueParis, France
Analysis and Visualization of Science
Advantages for Funding Agencies Supports monitoring of (long-term) money flow and research
developments, evaluation of funding strategies for different programs, decisions on project durations, funding patterns.
Staff resources can be used for scientific program development, to identify areas for future development, and the stimulation of new research areas.
Advantages for Researchers Easy access to research results, relevant funding programs and their
success rates, potential collaborators, competitors, related projects/publications (research push).
More time for research and teaching.Advantages for Industry Fast and easy access to major results, experts, etc. Can influence the direction of research by entering information on
needed technologies (industry-pull).Advantages for Publishers Unique interface to their data. Publicly funded development of databases and their interlinkage.For Society Dramatically improved access to scientific knowledge and expertise.
Information Needs for Science Map User Groups
Type of Analysis vs. Scale of Level of Analysis
Micro/Individual(1-100 records)
Meso/Local(101–10,000 records)
Macro/Global(10,000 < records)
Statistical Analysis/Profiling
Individual person and their expertise profiles
Larger labs, centers, universities, research domains, or states
All of NSF, all of USA, all of science.
Temporal Analysis (When)
Funding portfolio of one individual
Mapping topic bursts in 20-years of PNAS
113 Years of physics Research
Geospatial Analysis (Where)
Career trajectory of one individual
Mapping a states intellectual landscape
PNAS publications
Topical Analysis (What)
Base knowledge from which one grant draws.
Knowledge flows in Chemistry research
VxOrd/Topic maps of NIH funding
Network Analysis (With Whom?)
NSF Co-PI network of one individual
Co-author network
NSF’s core competency
Micro/Individual(1-100 records)
Meso/Local(101–10,000 records)
Macro/Global(10,000 < records)
Statistical Analysis/Profiling
Individual person and their expertise profiles
Larger labs, centers, universities, research domains, or states
All of NSF, all of USA, all of science.
Temporal Analysis (When)
Funding portfolio of one individual
Mapping topic bursts in 20-years of PNAS
113 Years of physics Research
Geospatial Analysis (Where)
Career trajectory of one individual
Mapping a states intellectual landscape
PNAS publications
Topical Analysis (What)
Base knowledge from which one grant draws.
Knowledge flows in Chemistry research
VxOrd/Topic maps of NIH funding
Network Analysis (With Whom?)
NSF Co-PI network of one individual
Co-author network
NSF’s core competency
Type of Analysis vs. Scale of Level of Analysis
Process of Computational Scientometrics
Börner, Katy, Chen, Chaomei, and Boyack, Kevin. (2003) Visualizing Knowledge Domains. In Blaise Cronin (Ed.), Annual Review of Information Science & Technology, Volume 37, Medford, NJ: Information Today, Inc./American Society for Information Science and Technology, chapter 5, pp. 179-255.
Data Extraction
Unit of Analysis
Measures Layout (often one code does both similarity and ordination steps)
Display
Similarity OrdinationSearches•ISI•INSPEC•Eng Index•Medline•ResearchIndex•Patents•etc.
Broadening•By citation•By terms
Common Choices•Journal•Document•Author•Term
Counts/Frequencies•Atributes (e.g., terms)•Author citations•Co-citations•By year
Thresholds•By counts
Scalar (unit by unit matrix)•Direct citation•Co-citation•Combined linkage•Co-word/co-term•Co-classification
Vector (unit by attribute matrix)•Vector space model (words/terms)•Latent Semantic Analysis (words/terms) incl. Singular Value Decomp (SVD)
Correlation (if desired)•Pearson’s R on any of above
Dimensionality Reduction:•Eigenvector/Eigenvalue solutions•Factor Analysis (FA) and Principal Components Analysis (PCA)•Multi-dimensional scaling (MDS)•LSA•Pathfinder networks (PFNet)•Self-organizing maps (SOM) incl. SOM, ET-maps, etc.
Cluster analysis
Scalar•Triangulation•Force-directed placement (FDP)
Interaction•Browse•Pan•Zoom•Filter•Query•Detail on demand
Analysis
Computational Scientometrics ReferencesBörner, Katy, Chen, Chaomei, and Boyack, Kevin. (2003).
Visualizing Knowledge Domains. In Blaise Cronin (Ed.), ARIST, Medford, NJ: Information Today, Inc./American Society for Information Science and Technology, Volume 37, Chapter 5, pp. 179-255. http://ivl.slis.indiana.edu/km/pub/2003-borner-arist.pdf
Shiffrin, Richard M. and Börner, Katy (Eds.) (2004). Mapping Knowledge Domains. Proceedings of the National Academy of Sciences of the United States of America, 101(Suppl_1). http://www.pnas.org/content/vol101/suppl_1/
Börner, Katy, Sanyal, Soma and Vespignani, Alessandro (2007). Network Science. In Blaise Cronin (Ed.), ARIST, Information Today, Inc./American Society for Information Science and Technology, Medford, NJ, Volume 41, Chapter 12, pp. 537-607. http://ivl.slis.indiana.edu/km/pub/2007-borner-arist.pdf
Börner, Katy (2010) Atlas of Science. MIT Press.http://scimaps.org/atlas
Börner, Katy. (March 2011). Plug-and-Play Macroscopes. Communications of the ACM, 54(3), 60-69.http://dx.doi.org/10.1145/1897852.1897871
Science of Science Cyberinfrastructure
Overview
What cyberinfrastructure will be required to measure, model, analyze, and
communicate scholarly data and, ultimately, scientific progress?
Our efforts to create a science of science cyberinfrastructure support:
• Data access and federation via the Scholarly Database, http://sdb.slis.indiana.edu,
• Data preprocessing, modeling, analysis, and visualization using plug-and-play cyberinfrastructures such as the Sci2 Tool, http://sci2.cns.iu.edu,
• Real-time data generation and integration via the VIVO collaboration, http://vivo-netsci.cns.iu.edu, and
• Communication of science to a general audience via the Mapping Science Exhibit at http://scimaps.org.
Google Code and SourceForge.net provide special means for developing and distributing software
• In August 2009, SourceForge.net hosted more than 230,000 software projects by two million registered users (285,957 in January 2011);
• In August 2009 ProgrammableWeb.com hosted 1,366 application programming interfaces (APIs) and 4,092 mashups (2,699 APIs and 5,493 mashups in January 2011)
Cyberinfrastructures serving large biomedical communities• Cancer Biomedical Informatics Grid (caBIG) (http://cabig.nci.nih.gov) • Biomedical Informatics Research Network (BIRN) (http://nbirn.net)• Informatics for Integrating Biology and the Bedside (i2b2)
(https://www.i2b2.org)• HUBzero (http://hubzero.org) platform for scientific collaboration uses • myExperiment (http://myexperiment.org) supports the sharing of scientific
workflows and other research objects.
Missing so far is a common standard for • the design of modular, compatible algorithm and tool
plug-ins (also called “modules” or “components”)
• that can be easily combined into scientific workflows (“pipeline” or “composition”),
• and packaged as custom tools.
Related Work
Scholarly Database (http://sdb.cns.iu.edu)
The Scholarly Database at Indiana University provides free access to 25,000,000 papers, patents, and grants. Since March 2009, users can also download networks, e .g., co-author, co-investigator, co-inventor, patent citation, and tables for burst analysis.
Sci2 Tool for Science of Science (http://sci2.cns.iu.edu)
• Explicitly designed for SoS research and practice, well documented, easy to use.
• Empowers many to run common studies while making it easy for exports to perform novel research.
• Advanced algorithms, effective visualizations, and many (standard) workflows.
• Supports micro-level documentation and replication of studies.
• Is open source—anybody can review and extend the code, or use it for commercial purposes.
http://sci2.cns.iu.edu http://sci2.wiki.cns.iu.edu
Sci2 Tool
Supported Input file formats:• GraphML (*.xml or *.graphml)• XGMML (*.xml)• Pajek .NET (*.net) & Pajek .Matrix (*.mat)• NWB (*.nwb)• TreeML (*.xml)• Edge list (*.edge)• CSV (*.csv)• ISI (*.isi)• Scopus (*.scopus)• NSF (*.nsf)• Bibtex (*.bib)• Endnote (*.enw)
http://sci2.wiki.cns.iu.edu/2.3+Data+Formats
Output file formats:GraphML (*.xml or *.graphml)Pajek .MAT (*.mat)Pajek .NET (*.net)NWB (*.nwb)XGMML (*.xml)CSV (*.csv)
Sci2 Tool
Sample paper network (left) and four different network types derived from it (right).From ISI files, about 30 different networks can be extracted.
Network Extraction
Geo Maps
Circular Hierarchy
Sci2 Tool
Sci2 Tool
Plugins that render into Postscript files:
Börner, Katy, Huang, Weixia (Bonnie), Linnemeier, Micah, Duhon, Russell Jackson, Phillips, Patrick, Ma, Nianli, Zoss, Angela, Guo, Hanning & Price, Mark. (2009). Rete-Netzwerk-Red: Analyzing and Visualizing Scholarly Networks Using the Scholarly Database and the Network Workbench Tool. Proceedings of ISSI 2009: 12th International Conference on Scientometrics and Informetrics, Rio de Janeiro, Brazil, July 14-17 . Vol. 2, pp. 619-630.
Horizontal Time Graphs
Sci MapsGeo Maps
Interactive S&T Maps
http://mapsustain.cns.iu.edu
Google Map JavaScript API was used to implement both maps with two aggregation layers for each. The geographic map aggregates to the state
level and the city level.
http://mapofscience.com
The science map has a high level of aggregation of 13 top-level scientific disciplines
and a low level of 554 sub-disciplines.
The geographic map at state level.
The geographic map at city level.
Search results for “corn”Icons have same size but represent different # records
Click on one icon to display all records of one type.Here are publications in the state of Florida.
Detailed information on demand via original source site for exploration and study.
The science map at 13 top-level scientific disciplines level.
The science map at 554 sub-disciplines level.
Using Semantic Web Data to Improve Data Coverage and
Currency
VIVO: A Semantic Approach to Creating a National Network of Researchers (http://vivoweb.org)• Semantic web application and
ontology editor originally developed at Cornell U.
• Integrates research and scholarship info from systems of record across institution(s).
• Facilitates research discovery and cross-disciplinary collaboration.
• Simplify reporting tasks, e.g., generate biosketch, department report.
Funded by $12 million NIH award. Cornell University: Dean Krafft (Cornell PI), Manolo Bevia, Jim Blake, Nick Cappadona, Brian Caruso, Jon Corson-Rikert, Elly Cramer, Medha Devare, John Fereira, Brian Lowe, Stella Mitchell, Holly Mistlebauer, Anup Sawant, Christopher Westling, Rebecca Younes. University of Florida: Mike Conlon (VIVO and UF PI), Cecilia Botero, Kerry Britt, Erin Brooks, Amy Buhler, Ellie Bushhousen, Chris Case, Valrie Davis, Nita Ferree, Chris Haines, Rae Jesano, Margeaux Johnson, Sara Kreinest, Yang Li, Paula Markes, Sara Russell Gonzalez, Alexander Rockwell, Nancy Schaefer, Michele R. Tennant, George Hack, Chris Barnes, Narayan Raum, Brenda Stevens, Alicia Turner, Stephen Williams. Indiana University: Katy Borner (IU PI), William Barnett, Shanshan Chen, Ying Ding, Russell Duhon, Jon Dunn, Micah Linnemeier, Nianli Ma, Robert McDonald, Barbara Ann O'Leary, Mark Price, Yuyin Sun, Alan Walsh, Brian Wheeler, Angela Zoss. Ponce School of Medicine: Richard Noel (Ponce PI), Ricardo Espada, Damaris Torres. The Scripps Research Institute: Gerald Joyce (Scripps PI), Greg Dunlap, Catherine Dunn, Brant Kelley, Paula King, Angela Murrell, Barbara Noble, Cary Thomas, Michaeleen Trimarchi. Washington University, St. Louis: Rakesh Nagarajan (WUSTL PI), Kristi L. Holmes, Sunita B. Koul, Leslie D. McIntosh. Weill Cornell Medical College: Curtis Cole (Weill PI), Paul Albert, Victor Brodsky, Adam Cheriff, Oscar Cruz, Dan Dickinson, Chris Huang, Itay Klaz, Peter Michelini, Grace Migliorisi, John Ruffing, Jason Specland, Tru Tran, Jesse Turner, Vinay Varughese.
http://vivo-netsci.cns.iu.edu
Data Download Support
General Statistics• 36 publication(s) from 2001 to 2010
(.CSV File) • 80 co-author(s) from 2001 to 2010
(.CSV File)
Co-Author Network(GraphML File)
Save as Image (.PNG file)
Tables• Publications per year (.CSV File)• Co-authors (.CSV File)
VIVO is supported by NIH Award U24 RR029822
Second Annual VIVO ConferenceAugust 24-26, 2011Gaylord National, Washington D.C.http://vivoweb.org/conference
Public Communication of Scientometrics Research:Places and Spaces Exhibit
Exhibit has been shown in 72 venues on four continents. Currently at:• NSF, 10th Floor, 4201 Wilson Boulevard, Arlington, VA• Center of Advanced European Studies and Research, Bonn, Germany• University of Michigan, Ann Arbor, MI
Mapping Science Exhibit – 10 Iterations in 10 Yearshttp://scimaps.org
Science Maps for Economic Decision Makers (2008)
Science Maps for Scholars (2010)
Science Maps for Science Policy Makers (2009)
Science Maps as Visual Interfaces to Digital Libraries (2011)Science Maps for Kids (2012) Science Forecasts (2013)How to Lie with Science Maps (2014)
The Power of Maps (2005)
Power of Reference Systems (2006)
Power of Forecasts (2007)
April 5 – 9, 2005: 101st Annual Meeting of the Association of American Geographers,
Denver, Colorado.
Science Maps in “Expedition Zukunft” science train visiting 62 cities in 7 months, 12 coaches, 300 m long. Opening was on April 23rd, 2009 by German Chancellor Merkel, http://www.expedition-zukunft.de
Upcoming Iterations
• Science Maps as Visual Interfaces to Digital Libraries (2011)
• Science Maps for Kids (2012)
• Science Forecasts (2013)
• How to Tell Lies with Science Maps (2014)
http://tinyurl.com/45j5rj5
Contact the map makers or the exhibit curators:Katy Börner ([email protected]) and
Michael J. Stamper ([email protected])
Acknowledgements
Micah Linnemeier, Russell J. Duhon, Bruce W. Herr II, George Kampis, Gregory J. E. Rawlins, Geoffrey Fox, Shawn Hampton, Carol Goble, Mike Smoot, Yanbo Han for stimulating discussions and comments.
The Cyberinfrastructure for Network Science Center (http://cns.iu.edu), the Network Workbench team (http://nwb.cns.iu.edu), and Science of Science project team (http://sci2.cns.iu.edu) for their contributions toward the work presented here.
Software development benefits greatly from the open-source community. Full software credits are distributed with the source, but I would especially like to acknowledge Jython, JUNG, Prefuse, GUESS, GnuPlot, and OSGi, as well as Apache Derby, used in the Sci2 tool.
This research and development is based on work supported by National Science
Foundation grants SBE-0738111, IIS-0513650, IIS-0534909 and National Institutes
of Health grants R21DA024259 and 5R01MH079068.
All papers, maps, cyberinfrastructures, talks, press
are linked from http://cns.iu.edu