matchmaking in the academy - penn libraries · university of pennsylvania libraries plans are also...

2
University of Pennsylvania Libraries News and Notes from the Penn Libraries | Fall 2012 Published by The Office of the Vice Provost and Director of Libraries Inquiries invited at [email protected] Van Pelt-Dietrich Library Center 3420 Walnut Street Philadelphia, PA 19104-6206 215-898-7091 Fall 2012 VIVO introduces a new level of collaboration for research and discovery Support Innovation at Penn We see an exciting future for the Penn Libraries, a future that will change the conversation about libraries and what they do. If you’re interested in learning more about our service innovations and helping to support the evolution and growth of Franklin’s library, contact the Office of the Vice Provost and Director of Libraries, Van Pelt-Dietrich Library Center, 3420 Walnut Street, Philadelphia, PA, 19104-6206, (215) 573-3610, and follow us at www.library.upenn.edu/portal/dev.html. Board of Overseers of the Penn Libraries Happy 50th Birthday to the Van Pelt Library! October 22, 2012 marked the 50th anniversary of the dedication of the Charles Patterson Van Pelt Library. A photo exhibition of the building’s history is on display in Lee Lounge (next to the Kamin Gallery) from August 27, 2012 - February 24, 2013. Matchmaking in the Academy What is VIVO? VIVO is an open source semantic web application (more on that in a bit) currently being deployed by the Penn Libraries in a pilot with Penn’s Institute for Translational Medicine and Therapeutics (ITMAT) and the Perelman School of Medicine. By providing a space where scholars aggregate and share information, VIVO enables the discovery of research and new knowledge across disciplines, administrative boundaries, and institutions. Using the extraordinary search logic of the semantic web, VIVO allows researchers to create detailed profiles, which can include: Academic appointments Areas of researcher expertise Publications Grants Researcher networks Teaching service awards, and much more. These profiles showcase to the world the extraordinary range and depth of university research and help researchers match their interests with potential colleagues and projects in the field. It’s all the result of assembling research intelligence in a central resource for use and analysis by the academic communities; and it happens without the overlay of administrative procedures, inaccuracy, or data loss that plagued knowl- edge management ventures in the past. Most profile information is collected by software from verified university data sources and harvested from publication databases, such as PubMed. This reduces VIVO’s reli- ance on manual input and updates. University data sources include the following: Bibliographic and grants databases Human resources databases Course listings and university calendars Grants data Citation databases Demographic data CVs and biosketches User input Equipment and facilities databases The Semantic Web Often referred to as “Web 3.0”, the semantic web is proving to be the new frontier in information exchange and search. It adds a new layer to the current web that allows computers to understand data in ways they haven’t been able to previously. The web familiar to most of us is a web of linked documents where content exists in multiple formats and silos — information can be stored and transmitted but in ways that are cumber- some and often inadequate for search and discovery purposes. The semantic web, however, is a network of linked data, a set of recombinant associations that interconnect concepts — like the concepts of person, organization, place, event, publication, image, and nearly any notion you can imagine. Using a formal structure called an “ontology” to define and map interconnected concepts, computers can determine how related entities connect to each other and reveal these linkages to search engines, which in turn facilitate the discovery process. A researcher might use VIVO to look up people who are active in cancer research. VIVO itself has a powerful engine that uses highly struc- tured data, so without all the typical sifting through results an investigator might dredge up on Google, she can quickly find reseachers who have a specific skill set or expertise. But in time, as Google and other services begin crawling VIVO sites, academics and the general public will begin finding VIVO webpages in their search results and enjoy improved access to scholarly networks. And as more linkages are integrated across VIVO- adopting institutions, VIVO networks will expand, creating an ideal gateway to scholarly content and ultimately advancing the public benefits of research. Gradually this technology should transform the means of scholarly communication. Think of it as match.com for scholars. John Hogenesch, Professor of Pharmacology at Penn’s Perleman School of Medicine, explains the sort of problem the semantic web can solve for researchers, whose work, he says, often involves “finding people and things: search for melanoma, alzheimers, sleep, and restrict the results to ‘people’ to find active researchers in the field; [in VIVO] this function is compelling, This fall the Libraries received the Arnold and Deanne Kaplan Collection of Early American Judaica, a gift valued at $8.5 million. With over 11,000 items ranging from the late 16th century through the early 20th century, the Kaplan Collection is a rich and diverse collection of historical docu- ments as well as fine and folk art. For more details, go to http://www.library.upenn.edu/publications/. University of Pennsylvania Libraries Scientific research is seldom a solo affair. No one engineer designed the iPad. No single researcher will make the next great breakthrough in cancer treatments. These days science and research blend disciplines and advance through a network of collaborators and scholarly communities. Increasingly these communities traverse institutional, national, and cultural borders, borders that scholars struggle to navigate in keeping abreast of new discoveries and alert to potential partnerships. So how do scholars discover the important actors in their fields? How do they find and connect with people engaged with similar problems, methods, and goals? How do they identify opportunities for funding and creative cooperation on the cutting edge of their disciplines? This issue of the Ivy Leaves explores the emergence of a new network technology called VIVO that creates cohesion and integration in communities of research by profiling scholars and helping them “match up.” It’s a fascinating challenge in today’s academy, one the Libraries can help the University master and, in the process, advance the conversation about the library of the future. Smilow Center for Translational Research, home of ITMAT (Continued inside) The harvest, integration and dissemination of research data require an underlying IT architecture, which is represented in this illustration. The flow of data must support VIVO’s web presence, those faculty and their proxies who must update their information, and a variety of ancillary services. For more information about VIVO at Penn, contact: Anne Seymour Associate Director, Biomedical Library (215) 898-4115 [email protected] History of the VIVO Project The application that supports VIVO was developed at Cornell University in 2003 and imple- mented locally in 2004. In September 2009, seven institutions received a $12.2m stimulus grant from the National Center for Research Resources of the NIH to enable National Networking with VIVO. University of Florida is leading the grant with six other participating institutions: Cornell University, Weill Cornell Medical College, Indiana University, Washington University School of Medicine in St. Louis, The Scripps Research Institute, and the Ponce School of Medicine in Puerto Rico. In April 2012, vivo.cornell.edu contained profiles for 16,231 people, 19,847 organizations, 15660 courses, 28,839 academic articles and 6,162 grants. VIVO information is public and anyone can access it. Most universities’ sites use a simple naming convention of vivo.name.edu, where name is the name of the university. Judith L. Bollinger Thomas J. Cusack, chair Erik Gershwind, vice chair Joseph B. Glossberg, chair emeritus Bernard Goldstein Mark H. Goldstein Sandra Grymes Christine Hikawa Kim Hirshman, ex officio James Hoesley Alan S. Jacobs Marilyn Weitzman Kahn, ex officio Jeffrey C. Keil Rodger R. Krouse Susanna E. Lachs Charles MacDonald Edward P . Mally Margy Ellin Meyerson Ellen Moelis Joshua A. Polan Joseph F. Rascoff Barbara Brizdle Schoenberg Lawrence J. Schoenberg, chair emeritus Jeffrey L. Seltzer Dhiren H. Shah Andrew M. Snyder Alberto Vitale Ronald F. E. Weissman Jill Siegel Yablon How Data Flow Through VIVO Figure 9. VIVO IT Architecture

Upload: doanlien

Post on 09-May-2018

219 views

Category:

Documents


4 download

TRANSCRIPT

University of Pennsylvania Libraries

News and Notes from the Penn Libraries | Fall 2012Published by The Office of the Vice Provost and Director of LibrariesInquiries invited at [email protected]

Van Pelt-Dietrich Library Center3420 Walnut Street Philadelphia, PA 19104-6206

215-898-7091

Fall 2012

VIVO introduces a new level of collaboration for research and discovery

Support Innovation at Penn

We see an exciting future for the Penn Libraries, a future that will change the conversation about libraries and what they do. If you’re interested in learning more about our service innovations and helping to support the evolution and growth of Franklin’s library, contact the Office of the Vice Provost and Director of Libraries, Van Pelt-Dietrich Library Center, 3420 Walnut Street, Philadelphia, PA, 19104-6206, (215) 573-3610, and follow us at www.library.upenn.edu/portal/dev.html.

Board of Overseers of the Penn Libraries

Happy 50th Birthday to the Van Pelt Library!

October 22, 2012 marked the 50th anniversary of the dedication of the Charles Patterson Van Pelt Library. A photo exhibition of the building’s history is on display in Lee Lounge (next to the Kamin Gallery) from August 27, 2012 - February 24, 2013.

Matchmaking in the Academy

What is VIVO?VIVO is an open source semantic web application (more on that in a

bit) currently being deployed by the Penn Libraries in a pilot with Penn’s Institute for Translational Medicine and Therapeutics (ITMAT) and the Perelman School of Medicine. By providing a space where scholars aggregate and share information, VIVO enables the discovery of research and new knowledge across disciplines, administrative boundaries, and institutions.

Using the extraordinary search logic of the semantic web, VIVO allows researchers to create detailed profiles, which can include:

• Academic appointments • Areas of researcher expertise • Publications• Grants• Researcher networks• Teaching service awards, and much more.

These profiles showcase to the world the extraordinary range and depth of university research and help researchers match their interests with potential colleagues and projects in the field. It’s all the result of assembling research intelligence in a central resource for use and analysis by the academic communities; and it happens without the overlay of administrative procedures, inaccuracy, or data loss that plagued knowl-edge management ventures in the past. Most profile information is collected by software from verified university data sources and harvested from publication databases, such as PubMed. This reduces VIVO’s reli-ance on manual input and updates. University data sources include the following:

• Bibliographic and grants databases• Human resources databases• Course listings and university calendars • Grants data• Citation databases• Demographic data• CVs and biosketches• User input• Equipment and facilities databases

The Semantic Web Often referred to as “Web 3.0”, the semantic web is proving to be the

new frontier in information exchange and search. It adds a new layer to the current web that allows computers to understand data in ways they haven’t been able to previously. The web familiar to most of us is a web of linked documents where content exists in multiple formats and silos — information can be stored and transmitted but in ways that are cumber-some and often inadequate for search and discovery purposes.

The semantic web, however, is a network of linked data, a set of recombinant associations that interconnect concepts — like the concepts of person, organization, place, event, publication, image, and nearly any notion you can imagine. Using a formal structure called an “ontology” to define and map interconnected concepts, computers can determine how related entities connect to each other and reveal these linkages to search engines, which in turn facilitate the discovery process.

A researcher might use VIVO to look up people who are active in cancer research. VIVO itself has a powerful engine that uses highly struc-tured data, so without all the typical sifting through results an investigator might dredge up on Google, she can quickly find reseachers who have a specific skill set or expertise. But in time, as Google and other services begin crawling VIVO sites, academics and the general public will begin finding VIVO webpages in their search results and enjoy improved access to scholarly networks. And as more linkages are integrated across VIVO-adopting institutions, VIVO networks will expand, creating an ideal gateway to scholarly content and ultimately advancing the public benefits of research.

Gradually this technology should transform the means of scholarly communication. Think of it as match.com for scholars. John Hogenesch, Professor of Pharmacology at Penn’s Perleman School of Medicine, explains the sort of problem the semantic web can solve for researchers, whose work, he says, often involves “finding people and things: search for melanoma, alzheimers, sleep, and restrict the results to ‘people’ to find active researchers in the field; [in VIVO] this function is compelling,

This fall the Libraries received the Arnold and Deanne Kaplan Collection of Early American Judaica, a gift valued at $8.5 million. With over 11,000 items ranging from the late 16th century through the early 20th century, the Kaplan Collection is a rich and diverse collection of historical docu-ments as well as fine and folk art. For more details, go to http://www.library.upenn.edu/publications/.

University of Pennsylvania Libraries

Scientific research is seldom a solo affair. No one engineer designed the iPad. No single researcher will make the next great breakthrough in cancer treatments. These days science and research blend disciplines and advance through a network of collaborators and scholarly communities. Increasingly these communities traverse institutional, national, and cultural borders, borders that scholars struggle to navigate in keeping abreast of new discoveries and alert

to potential partnerships.

So how do scholars discover the important actors in their fields? How do they find and connect with people engaged with similar problems, methods,

and goals? How do they identify opportunities for funding and creative cooperation on the cutting edge of their disciplines?

This issue of the Ivy Leaves explores the emergence of a new network technology called VIVO that creates cohesion and integration in communities of research by profiling scholars and helping them “match up.” It’s a fascinating challenge in today’s academy, one the Libraries can help the University master and, in the process, advance the conversation about the library of the future.

Smilow Center for Translational Research, home of ITMAT

(Continued inside)

The harvest, integration and dissemination of research data require an underlying IT architecture, which is represented in this illustration. The flow of data must support VIVO’s web presence, those faculty and their proxies who must update their information, and a variety of ancillary services.

For more information about VIVO at Penn, contact:Anne Seymour

Associate Director, Biomedical Library(215) 898-4115

[email protected]

History of the VIVO Project

• The application that supports VIVO was developed at Cornell University in 2003 and imple-

mented locally in 2004.

• In September 2009, seven institutions received a $12.2m stimulus grant from the National

Center for Research Resources of the NIH to enable National Networking with VIVO.

• University of Florida is leading the grant with six other participating institutions: Cornell

University, Weill Cornell Medical College, Indiana University, Washington University School

of Medicine in St. Louis, The Scripps Research Institute, and the Ponce School of Medicine in

Puerto Rico.

• In April 2012, vivo.cornell.edu contained profiles for 16,231 people, 19,847 organizations,

15660 courses, 28,839 academic articles and 6,162 grants.

• VIVO information is public and anyone can access it. Most universities’ sites use a simple

naming convention of vivo.name.edu, where name is the name of the university.

Judith L. BollingerThomas J. Cusack, chairErik Gershwind, vice chairJoseph B. Glossberg, chair emeritus Bernard Goldstein Mark H. GoldsteinSandra GrymesChristine HikawaKim Hirshman, ex officioJames HoesleyAlan S. JacobsMarilyn Weitzman Kahn, ex officioJeffrey C. Keil Rodger R. Krouse Susanna E. Lachs

Charles MacDonaldEdward P. MallyMargy Ellin MeyersonEllen MoelisJoshua A. PolanJoseph F. RascoffBarbara Brizdle SchoenbergLawrence J. Schoenberg, chair emeritusJeffrey L. SeltzerDhiren H. ShahAndrew M. SnyderAlberto VitaleRonald F. E. WeissmanJill Siegel Yablon

How Data Flow Through VIVO

Figure 9. VIVO IT Architecture

University of Pennsylvania Libraries

Plans are also underway to extend VIVO to all programs and disciplines across campus. Penn Libraries has presented the VIVO project to a number of schools that have expressed interest including Nursing, Veterinary, and Engineering. Showcasing services as well as publications is yet another way to employ VIVO — anything from common equipment and lab facilities to IT-based services can be highlighted.

A VIVO site was released at the Perelman School of Medicine in early October 2012 to a select group of faculty. After initial feedback from the faculty, the site is planned for release to all faculty at the School of Medicine in early 2013 and will be opened later in the year to public search engines with linkages to the VIVO federated search site at vivosearch.org.

As of December 2012, vivo.upenn.edu is populated with 4,000 medical school faculty and over 200,000 publications. Faculty profiles include current and recent positions, publications both peer reviewed and research reviewed, and statements of research. The idea is that VIVO itself is not a system of record but can pull data from a number of more authoritative systems. For more information on the IT architecture that supports VIVO, check out Figure 10 on the overleaf.

VIVO: Part of a Suite of Digital ServicesVIVO service complements the Libraries’ “Institutional Repository” program, ScholarlyCommons@Penn , which is part of a wider effort to develop digital repository capabilities for faculty and students. ScholarlyCommons@Penn promotes dissemination of Penn’s research output by preserving it in a freely accessible, long-term archive available on the open web. This serves our goal of showcasing Penn scholarship and allows the University to plan for scholarly communication models built around Open Access models.

Other repository capabilities support the archiving of the Libraries’ million-plus digital objects and the Work and Family Research Network archive, a collaborative effort of the Libraries and the School of Arts and Sciences in support of open access to social science research.

While VIVO is not itself a content repository, it is able to provision content to the growing array of repository services the Libraries invision. VIVO forms a part of a larger framework designed for the curation, discovery, and delivery of research.

As discussed, VIVO contains a range of data harvested from many University and third-party sources, but researchers still require the ability to edit the contents of their VIVO profiles for timeli-ness and accuracy and to describe facets of their work that might not be adequately represented in other places.

For this reason, VIVO sits atop a data integration framework (see Figure 6) that, along with managing machine-harvested data, also provides a private, internal-to-Penn portal for users to edit and add content directly to their profiles. The Libraries will make this functionality available to faculty who have VIVO accounts as well as to their proxies — individuals who may serve a large-scale project or the members of an academic department.

This user portal is itself rich with information, providing at a glance all the data appearing to the outside world through VIVO,

Library as Facilitator VIVO is a platform for managing knowledge and while its tools are

firmly of the 21st century, it is at its heart a natural extension of what libraries have always done: organize, discover, deliver, and preserve learning.

Although there is continuity in our mission, new tools and changing context are accelerating the evolution of the library and expanding the list of things we’re known for within the academy. By using systems to care-fully and methodically collect complicated metrics, we can analyze and identify trends and patterns that give us a more complete picture of the needs of our diverse clientele. At the same time, information is gleaned that can guide collection development and purchasing decisions, and also help the Library to plan and develop new services.

The Library is not only the gateway to and steward of knowledge and information, it is also the driver of change — steering the University and its constituents toward new resources for research, teaching, and learning — and always responding to patron demand.

The Libraries’ involvement with VIVO began with a partnership with the Perelman School of Medicine. Penn, specifically the Institute for Translation Medicine and Therapeutics in the medical school, is a recipient of a Clinical and Translational Science Award (CTSA) from the same agency at NIH that provided funding for VIVO development. The CTSA consortium recommends the use of research networking tools throughout the consortium and more specifically, tools that are compatible with VIVO. Translational medicine is just one example of where collabora-tion is particularly important: the focus is on translating bench research into clinical applications. What better way to do that than by building improved pathways that lead scholars through the mountains of informa-tion that currently exist on the web.

Fostering connections, collaborations, and discoveries across boundaries

and produces results with much more specificity than Google does.” He continues, “Search ‘sleep upenn’ on Google and you get organizations, not people. In VIVO, if your search is for people, you just get people. You can drill down and get hits in relevant subcategories.”

Just as traditional web documents are published in HTML, the semantic web requires a single standard publication format for data. One format common to VIVO, RDF (short for Resource Description Framework), is a coding structure that represents data in the form of subject-predicate-object, or “triplets,” as illustrated in the diagram below. RDFs provide more struc-tured tags for information within webpages that Google and major search engines are also beginning to pick up, enabling them to do more intel-ligent linking. Developers can use RDF code to design applications to do natural language processing, which allows computers to better understand the meaning behind the data on a webpage so that locating “people” at the nexus of “sleep” and “Penn” is more likely.

RDF, also known as RDF Schema (RDFS), is one of a number of semantic web technologies that provides an environment where applications can query data and “draw inferences.” Inference refers to automatic proce-dures that can generate new relationships when information is provided in a structured form, as in an ontology. An ontology, in this context, provides a framework for defining concepts, such as “Person,” “Faculty Member” or “Student,” and describes the properties of and relationships between these different concepts. Inference comes in, for example, when we already know that something – our subject – is a “Faculty Member.” There is no need for us to assert that this faculty member is also a “Person” because the relation-ship of “Faculty Member” as a type of “Person” has already been defined by the ontology. Other examples of semantic web technologies include OWL (Web Ontology Language, used to build ontologies), SKOS (for designing know ledge o rganization systems), and SPARQL (an SQL-like query lan-guage that extracts RDF data from traditional databases).

Once the data is in RDF form, uniform resource identifiers (URIs) merge and map the data from different resources. The naming scheme offered by URIs, combined with the inferencing capabilities in Web 3.0 applications, makes it possible to do more sophisticated data integration. More tools continue to be developed for manipulating RDF data, which in turn is rapidly advancing the Web 3.0 platform.

John Mark Ockerbloom, Penn Libraries’ Digital Library Planner & Architect, describes how VIVO’s ontologies are more sophisticated than traditional database and metadata schemas, “One of the nice things about VIVO is that it has an extendable ontology model that lets you bring in ontologies that specialize in people, publications, equipment, or in research resources that you would use in experimental investigations.”

“We can even extend the ontology with our own objects and properties,” John explains. “One simple extension is the lab phone number used in the medical school system. Although VIVO has the concept of phone numbers it didn’t specifically have the concept for lab phone numbers, so we were easily able to add that.”

Users can find people by searching or browsing. The Researcher Profile in Figure 3 at left displays the profile of Garret A. FitzGerald, McNeil Professor in Translational Medicine and Therapeutics. Because the data is linked, it’s structured so that there are linkages between different elements of the citation. For example, in the first citation in the Researcher Profile image, clicking on the article title brings up a new page with a list of co-authors, each with a link to his/her own profile page.

We can also build a co-author network so that we can see collaborations between authors, as in Figure 4 at left. What happens in the next screen-shot is like “6 Degrees of Separation” — we can analyze the RDF data to view frequent collaborations and then by hovering with the mouse over an author’s name, we can view the number of joint publications between a co-author and Professor FitzGerald. The co-authors are sorted alphabetically in Figure 4; however, in Figure 5, the option to “Sort into communities” was selected. With this configuration, the co-authors are placed near one another if they frequently collaborate with each other and each other’s co-authors in the graph.

The same information in the co-author network visualiza-tion can also be displayed as a time trend (Figure 7), which shows trends in publishing over time, or in textual form as in the table at right. And since the data is also available as a CSV file — a spreadsheet — you can download it into Excel and manipulate the data. Clearly, there are many different ways of manipulating the data from the underlying RDF, including viewing the RDF data itself.

VIVO at Penn

Six Degrees of Separation

The User Portalbut with a variety of customizations for creating VIVO links to people, organizations, even types of lab equipment used in research that may be of possible interest to a field.

While the data integration framework supports the scholars’ inter-face to VIVO, it serves a larger purpose by providing research intelligence for lots of university concerns (VIVO does this as well, but with public data). For example, Penn’s Communications offi-cials can access the framework to locate an expert for a New York Times interview. Data can also be aggregated and made available to statistical software for creating research analytics, an important concern of the Provost and Deans. And finally, the framework can used to broadcast publication data to Penn’s Institutional Repository, the ScholarlyCommons@Penn. The architecture supporting the data ingestion and broadcast capabilities beneath VIVO is further described in Figure 9 .

Figure 5. Co-author network sorted by communities, depicting joint publications.

Figure 4. Co-author Network

Figure 8. Table of Publications by Year

Figure 2. Communities Served by VIVO

Figure 3. Researcher Profile

Figure 1. RDF Triplet

John Mark Ockerbloom Figure 6. User profile editor (left) and schematic of the data integration framework supporting VIVO

“Although there is continuity in our mission, new tools And chAnging context Are AccelerAting the

evolution of the librAry...”

Figure 7. Time Trend