research profiling – using vantagepoint to characterize a ... · research profiling – using...
TRANSCRIPT
Research Profiling – Using VantagePoint to characterize a body of research publications:• A series of short presentations
(“podcasts”)• Mining Web of Science data• Case example: nano-enhanced, thin-film
solar cells Cells• Nano-enhanced Thin-film Solar CellsAlan Porter
Director of R&D, Search Technology, Inc. [& Georgia Tech]
Pod 1: Overview of Research Profiling & Getting data from Web of Science
Research Profiling1. Overview of the general process & getting
data2. Data into VantagePoint & cleaned 3. Basic descriptors
+ (tentatively):a) Trends b) Topical emphases & Changesc) Influence Measuresd) Research Networking: Mapse) Locating a body of research: science & geo
mapsf) Super Profiling: Breakoutsg) Advanced Analyses
Session StrategyA. ~10 minutes per session – sequential, but you can
skip to topics of interest after the introductionB. Aim: To stimulate your ideas on how to apply
VantagePoint to gain insights from sets of research publications
C. This first set of sessions keys on Web of Science (“WOS”) results with a technology topic search focus – i.e., “what?”
D. A future set will key on WOS search results based on searching on a given organization – i.e., a “who?” focus
E. Case example: Nano-enhanced Solar Cells[with special thanks to Ying Guo]
5 Stages in Mining External R&D Knowledge1. Literature review (within research community)2. Research Profiling: Characterizing a body of
research publication activity• Focus on research activities• Largely descriptive
3. Tech Mining• Multiple data to mine• To generate effective technical intelligence
4. Structured Knowledge Discovery5. Literature-Based Discovery (“LBD”)
Research Profiling 1: Getting Going
A. General overview of the Research Profiling process and its aims Questions Answers Data
B. Search; download
How to do Tech Mining (or Research Profiling): 8 steps
1. Spell out the questions and how to answer them
2. Get suitable data3. Search (iterate)4. Import into text mining software (e.g.,
VantagePoint)5. Clean the data6. Analyze & interpret 7. Represent the information well – communicate!8. Standardize and semi-automate where possible
Start with the questions!
Text and data mining techniques are good at addressing:
WHO?WHAT?WHEN?WHERE?
Additional questions usually require more human insight:
HOW?WHY?
Types of Questions
“Answers”: Innovation Indicators• Technology Life Cycle Indicators
- e,g, growth curve location & projection
• Innovation Context Indicators- e.g., presence or absence of success factors (funding, standards, infrastructure, etc.)
• Product Value Chain and Market Prospects Indicators- e.g., applications, sectors engaged
Technical Information• Science, Technology
& Innovation (“ST&I”) Databases (e.g., Web of Science; CSCD, Thomson Innovation)
• Internet Sources(e.g., Googling)
• Technical Expertise
Contextual Information• Business, competition,
customer, policy, popular content Databases (e.g., Thomson One)
• Internet Sources (e.g., blogs, website profiling)
• Business Expertise
Six information types
On-line Data Sources Custom DataCambridge Scientific Abstracts Factiva Patbase Comma/tab delimited tablesDelphion ISI Web Of Knowledge Questel-Orbit Microsoft Excel and AccessDialog Lexis Nexis SilverPlatter SmartChartsEBSCOHost Micropatent STN XMLEi Engineering Village Ovid Thomson Innovation
Databases Record/Field ToolsAerospace Focust Pascal Combine duplicate recordsArt Abstracts Food Sci & Tech Patent Citation Index Remove duplicate recordsBiobase Foodline Market PCT Create “frankenrecords”Biological Abstracts Foodline Science PCTPAT (merge records fromBiological Sciences Forege Phin dissimilar sources)Biosis Frosti Pira Classify recordsBiotechno FSTA Pluspat Merge fieldsBusiness & Industry Gale PROMT PROMT Clean up fieldsCAPlus (AnaVist export) GeoRef PsycINFO Apply thesauriCassis Global Reporter PubMedCBNB IFIPAT Rapra Claims IFIUDB Recent RefsComputer & Info Systems INPADOC Reference ManagerCorrosion INSPEC Science Citation IndexCurrent Contents IPA SciSearchDerwent Biotech Abstracts ISD ScopusDerwent Innovations Index ITRD Tech ResearchDerwent World Patent Index JAPIO ToxFile Ei Compendex JICST TransportEMBase Kosmet USAppsEnCompass Literature LGST USPat EnCompass Patents MATBUS WaternetEnergy Medline WaterResAbsEnergySciTech METADEX Web of ScienceEngineering Materials Abstr Mgmt and Org Studies WeldaSearch Envr Sci & Pollution Mgmt Micropatent Materials Wisdomain ERIC MobilityEuroPat NSF AwardsFamPat NTIS
VantagePoint Import Filters and Tools
A wealth of diverse
information sources for innovation
management
Requires Access to External Information (License)
• Bulk Processing is a must• Download in electronic form• Requires competence in searching
Management Issues
Case Examples
Getting to the data- usually via internet
Case Examples
Getting the data- search within databases
Case Examples
Retrieving the data
Resources• www.theVantagePoint.com – offers multiple papers and
some case analyses
• View the VantagePoint Video Tutorial Series by Paul Oldham on the website, especially Sessions 1, 2 & 3
• Tech Mining by Alan Porter and Scott Cunningham, Wiley, 2005.
• Porter, A.L., Kongthon, A., Lu, J-C., Research Profiling: Improving the Literature Review, Scientometrics, Vol. 53, p. 351-370, 2002.
Pod 2: Cleaning the Data in VantagePoint
Research Profiling1. Overview of the general process & getting
data2. Data into VantagePoint & cleaned 3. Basic descriptors
+ (tentatively):a) Trends b) Topical emphases & Changesc) Influence Measuresd) Research Networking: Mapse) Locating a body of research: science & geo
mapsf) Super Profiling: Breakoutsg) Advanced Analyses
Getting the data into VantagePoint
1. Open VantagePoint2. File > Import Raw Data File3. Import Wizard opened:
Select Files4. Select a suitable import filter
> Next5. Select fields to import
- maybe Secondary Fields too- you can later “import more fields”
Case ExamplesSummary Sheet
VPT file- Fields available- Counts- Coverage of record set
“Right-Click” to - set data type- rename- view statistics- etc.
Search Refinement
• Confirm your search boundaries: time, geographical, institutional
• Check your search quality Precision – how much noise did you retrieve? Recall – what did you miss?
• Check in VantagePoint Are you finding researchers and organizations you expect? Topical inclusion – especially check key terms
– Keywords (authors)– Keywords Plus (based on recurring phrases in the titles of papers
referenced by the documents you’ve retrieved)– Title NLP (Natural Language Processing) phrases– Or a combination of these (use “Merge Fields”)
You may well identify terms to try out in your WOS search• Ask knowledgeable technical folks to review and advise• Redo your search and download
Data Cleaning• Just pointers here• Fields > List Cleanup – Window opens Select field Select “.fuz” to apply: e.g.,
– Organization Names.fuz– Person Names.fuz– General.fuz– BritishAmericanSpelling.fuz
Option: Verify matches w/another Field[e.g., Person Names with Author Affiliation]
• Fields > Thesaurus – Window opens Select field Select “.the” to apply: e.g., provided by Search Technology:
– Country.the– AcadCorpGov.the
Or select custom thesauri: e.g.,– Azerbaijan Natl Acad Sci name variations in WOS.the
Whew!
• Remember to check your search coverage.
• Redo a refined search as needed
• Import and clean your data as warranted
• And the next podcast will get us into Research Profiling!
• Basic Descriptors coming up next
Pod 3: Dealing with single fields: Getting set to work with Lists
Research Profiling1. Overview of the general process & getting
data2. Data into VantagePoint & cleaned 3. Basic descriptors
+ (tentatively):a) Trends b) Topical emphases & Changesc) Influence Measuresd) Research Networking: Mapse) Locating a body of research: science & geo
mapsf) Super Profiling: Breakoutsg) Advanced Analyses
Research Profiling Segment 3: “Basic descriptors”
A. Data prep – getting the target fields(variables) all set
B. “Top N” lists and such [single field tallies across the record set]
Nano-enhanced Thin-film Solar Cells
Analysis of Global Research Activities with Future Prospects
Ying Guo
Ph.D. Candidate, Beijing Institute of TechnologyVisiting Student, Georgia Institute of Technology
Alan L. PorterLu Huang
International Association for Management of Technology, 2009
Data Prep (1)1. If you have refined your search, re-import2. Clean -- as suitable to meet your objectives,
for basic descriptors, especially check:a. Publication Years [year.the available, but Web
of Science data are usually clean]b. Countries [apply country.the]c. Affiliations [organization names.fuz]d. Authors [person names.fuz; potentially “verify
matches with another field” – use Affiliations to help disambiguate names]
3. If you are apt to deal with a topic in the future, save List Cleanup results as your own topical thesaurus.
Data Prep (2)1. Topical fields
a. Make Macro-disciplines from Subject Categories[not a standard VP thesaurus, but we plan to make available on our new academic website]
b. Keywords: decide if you want to MERGE some combination of: Keywords (author’s) & Keywords Plus & Title (NLP) phrases & Abstract (NLP) phrases
2. Keyword Clumping optionsa. Human: Scan the combo Keywords field of choice;
make groups of interesting terms using FINDb. Statistical: After a little pre-cleaning, use Factor
Mapping to form groups of the top %’s [e.g., 1%, 2%, 5% of records]; examine their performance; pick the best level to get at topical emphases
Top N’s1. (Document types)2. (Publication Years)3. (Times Cited)4. Countries5. Affiliations6. Funding agencies7. Authors8. Journals (or Sources)9. Key terms10. Subject Categories11. Macro-Disciplines12. Organization Types
Top N’s
1. Pick your output venue(s) – e.g., in VP and/or MS Excel, Word, Powerpoint
2. Decide if normalization is in ordera. % of All (or something else)b. Across databases or datasetsc. Table or Figure
DONE! Research Profiling Segment 3: “Basic descriptors”A. Data prep – getting the target fields
(variables) all setB. “Top N” lists and such
[single field tallies across the record set] Fields from the dataset Derived fieldsUp next in Segment 4:
• 2 Fields together (matrices)• Trends• Discerning “Hot and New” topics
Pod 3+: VP Help & Interactions/Exercises
Research Profiling – Using VantagePoint to characterize a body of research publications:• A series of short presentations
(“podcasts”)• Mining Web of Science data• Case example: nano-enhanced, thin-film
solar cells Cells• Nano-enhanced Thin-film Solar CellsAlan Porter
Director of R&D, Search Technology, Inc. [& Georgia Tech]
Help!
1. VantagePoint Help2. Analyst’s Guide
Interacting
1. Discuss uses of VantagePoint to answer your research profiling questions If you are together in a real or virtual group,
discuss materials presented Here’s a starter question (next slide)
2. Perform hands-on exercises
Interactive Ideas/Exercises
1. What “MOT” (management of technology, or technology policy, or research opportunity) questions might you want to answer from a Web of Science dataset?[next slides illustrative]
For S&T Policy Maker and Manager:
• What are national R&D strengths and weaknesses?
• What is the existing status and what about forecasting likely future developments for thin-film solar cells?
• How to gauge relative opportunities for collaborative development, as well as monitor emerging competitors?
MOT
Who
What
When
Where
Why
How
Global
Research Activities with
Future Prospects
Our Paper
Need more experts’ inputs (we’re working on this)
By
Data Mining Technology
IAMOT 2009
IAMOT 2009
We look at:
1. What research fields are involved?---map of science
2. quantity---publication numbers and trends
3. diversity---national contrasts
4. quality---citations
5. patterns of research networking---using VantagePoint
6. “Hot” nano-materials
For data:
a global dataset of nano publications downloaded from the SCI
defined “thin film and (solar or photovoltaic)” as our search expression
acquired the dataset containing 1659 records for time period from 2001 to mid-2008
Basic Dataset Search Expression Result Dataset
Interactive Ideas/Exercises
2. Search on a topic with colleagues; consider how to refine your search• Import preliminary search results into VP
[do you have the right import filter?]• Scan key terms, Subject Categories, etc. to check
coverage and identify ways to enhance your search• Refine and rerun the search if warranted and time
permits
Interactive Ideas/Exercises
3. Given your MOT questions, what data cleaning is in order?• Step through cleaning actions for each key field• Apply suitable “List Cleanup” (using appropriate
“.fuz” files)• Apply thesauri as suitable (“.the” files)
Interactive Ideas/Exercises
4. A possible exercise: Thesaurus enhancement• Run the AcadCorpGov.the on your cleaned
Affiliations field [get rid of existing groups]• On that resulting field, “Create Group Using
Thesaurus” using this same “.the” file. Select “Group for Each Alias.”
• Research (e.g., Google) & assign some of the multiply-occurring organizations to one of the 4 groups.
• “Create thesaurus using groups”; select all 4 groups; save as AcadCorpGov-new date.the
• Run it as thesaurus; run it to create groups.
Interactive Ideas/Exercises5. A Web of Science Key Terms exercise
• Merge fields (candidates include Keywords-Author; Keywords-Plus, Title NLP phrases; Abstract NLP phrases)
• Apply general.fuz• Apply stopwords.the• Make your own “interesting” key terms set
• Scan for an interesting term; use FIND with “select all” and make a GROUP of variations of that term
• Repeat for several interesting terms, making more groups• Create a new Field from Group Names
• Use Factor Map to statistically make a key terms set• Make a group in the Key Terms field – selecting interesting
terms appearing in, say, >1% of the records• Run Factor Map – then check out the resulting term
grouping (in a new Key Terms field created)• Compare the two key term sets – either useful?
Interacting
1. We’ll insert more candidate exercises as we proceed, without great elaboration – use as you choose
2. Now, back to the show
Pod 4: Matrices
Nano-enhanced Solar Cell Web of Science Subject Category Concentrations of the Leading Countries
USA India Germany Japan China
Materials Science, Multidisciplinary
126 132 83 68 63
Physics, Applied 112 56 92 68 53Physics, Condensed Matter 59 72 80 47 46Chemistry, Physical 82 26 28 34 32Energy & Fuels 26 49 16 9 10Materials Science, Coatings & Films
24 21 26 17 21
Acad-Corp-Gov Publishing by Country
Cross-national Collaboration
% International Cooperation (among top 10)
USA India Germany Japan China France UK SouthKorea
Mexico Spain
USA 20.1% 288 5 16 5 6 5 3 9 8 1India 26.4% 5 239 4 15 4 5 20 10Germany 27.1% 16 4 195 10 2 8 8 1 4
Japan 24.2% 5 15 10 182 4 2 5 2 1China 10.4% 6 2 4 182 2 2 1 2France 24.8% 5 4 8 2 2 113 4 3UK 34.5% 3 5 8 5 2 4 84 1 1SouthKorea
52.2% 9 20 1 2 1 1 69 2
Mexico 38.5% 8 10 1 2 2 65 2Spain 17.5% 1 4 3 1 2 63
Matrix-related Topics covered in VantagePoint
• Matrix Viewer• Multiple visualizations available
• Activity-Diversity• Scattergram for one variable based on 2 others
• Aduna Clustering• Colorful visualization of intersecting sets (e.g., co-
authoring)• Capability to zoom to records at those intersections
(extending to >2-way connections)
Pod 5: Trends
Trends
1. Decide if normalization is in ordera. Over time [rate of change]b. Most recent year
2. Decide if comparative analyses are in ordera. What/who are the benchmarks?b. How do you want to present your results?
DSSC research by organization type (from SCI)
# of author affiliations/paper for DSSC publications (SCI)
2001 2002 2003 2004 2005 2006 2007
ChinaJapan
Mexico
South Korea
France
0
2
4
6
8
10
12
14
China
India
Japan
USA
Mexico
Germany
SouthKoreaSpain
France
China and India
are notable!
Nano-Structured ZnO Thin-film Solar Cells Publication by Countries and Years
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
2001 2002 2003 2004 2005 2006 2007
France
Spain
SouthKoreaGermany
Mexico
USA
Japan
India
China
Nano-Structured ZnO Thin-film Solar Cells Publication: Top 10 countries by Years – note the increasing share for India & China
IAMOT 2009
DSSC Publications (SCI) with % 2006 or later
0
0.05
0.1
0.15
0.2
0.25
USA India Germany Japan China
2001
2003
2005
2007
Share of Nano-enhancedThin-film Solar Cells Publications by Countries [Science Citation Index, 2001-08 (part-year)]
Projecting Nano-enhanced Solar Cell Research Activity
Actual data Projected data
USA
IndiaGermany
China
UKJapan
FranceSouth Korea
MexicoSpain0
500
1000
1500
2000
0 50 100 150 200 250 300 350
act iv ity -# of records
qual
ity-#
of
cita
tions
• Nodes above the diagonal suggest relatively higher quality (US and UK). Below the diagonal, the closer to the diagonal, the higher the quality of that country’s research.
Research activity and impact characteristics—First Way
0
20
40
60
80
100
120
140
160
180
200
0 10 20 30 40 50 60# of Records, 2001 and 2006
# of
Age
d* C
itatio
ns,
2001
and
200
6
US
ChinaGermany
India
Japan
Year denoted by s tar t and endpoints
2001 2006
• The steeper the slope of
the line connecting these
two points, the greater the
increase in quality of the
country’s research on this
topic
• Compared with Japan and
Germany, China and India
are upgrading!
Research activity and impact characteristics—Second Way
Pod 6: “Hot topics”
Research Profiling – Using VantagePoint to characterize a body of research publications:• A series of short presentations
(“podcasts”)• Mining Web of Science data• Case example: nano-enhanced, thin-film
solar cells [Ying Guo, Lu Huang & me]Cells
• Nano-enhanced Thin-film Solar CellsAlan Porter
Director of R&D, Search Technology, Inc. [& Georgia Tech]
ZnO attracts increasing attention in recent years and is on trend to catch up with TiO2
“Hot” topic as shown by relative trends
Ratio of Occurrences 2007-08 to those in 2001-06
ratio-recent # Records Top 20 Key Terms1.14 47 conjugated polymer0.85 74 fabrication0.85 61 TiO20.74 66 chemical vapor deposition0.65 28 amorphous silicon0.53 72 morphology0.52 94 semiconductor0.50 48 fullerene0.48 49 zinc oxide0.46 51 microstructure0.41 65 spray pyrolysis0.36 49 heterojunction0.32 37 CdTe0.29 102 electrodeposition0.28 92 CuInSe20.24 21 anatase0.22 39 chemical bath deposition0.17 21 Cu(In0.00 37 sol-gel0.00 22 photoconductivity
0.44 Top 20 Key Terms combined
New Topics via List Comparison• Create VP sub-dataset for the recent nano-
enhanced solar cells publications (new VP file –I used 2007-08)
• Create VP sub-dataset for the earlier publications (I used 2001-06)
• Under GROUPS, choose LIST COMPARISON;I did so from the select keywords list (82) for 2007-08 and made a new group of those unique to this dataset in comparison to the earlier one.
• Results: “characterize” and “deposit” are the 2 novel ones[Warrants in-depth probing to check if these are meaningful]
Key Terms by First Year
New Key Terms Recently
Year 2005 2006 2007 2008Records 225 334 372 174New Terms 3 2 2 0
device [8 of 54]
nanocrystal[10 of 25]
DEPOSIT [37 of 52]
TiO2 film [8 of 29]
room temperature [4 of 24]
CHARACTERIZE [25 of 25]
cD[5 of 27]
Recent Entrants
• We need not restrict the temporal comparison to key terms or topics
• Same modus operandi can be applied to identify new or recent entrants to the research (e.g., first papers on the topic from a given organization)
• Another variant is the inverse – to look for which participants seem to have abandoned the topic (no publications since Year X)
Pod 7: Maps
Visualization (Maps)1. VantagePoint Maps Auto-correlation maps Cross-correlation maps Factor maps
2. Social Network Analysis (SNA)3. Science Overlay Maps4. Geo-mapping
USA Germany
Auto-Correlation MapsNETFSC Research networking comparison
USS (dispersed) vs Germany (1 central organization)
Auto-correlation vs. Cross-correlation Nano-enhanced Solar Cells Country Research Networks
Factor Map (Principal Components Analysis) –groups terms based on their tendency to co-occur across records
Social Network Analysis (SNA)• VantagePoint offers several application opportunities Create a sub-dataset for a given country or organization Within that target group, for the given research topic, explore
research network connections• Examples Collaborations Shared interests Discrepancies between interests & collaboration
• Working with Pajek adds options Calculation of networking statistical measures (e.g.,
centrality) More mapping nuances
Cognitive Sci
Computer Sci
GeosciencesAgri Sci
Ecol Sci
Biomed Sci.
Chemistr y
Physics
Engr SciMtls Sci
Infec tious Diseases
Clinical Med
Health Sci
EnvSci & Tech
Nano-Thin-Film Publications 2001-08 DistributionOv erlay ov er base 175 Subject Category Science Map
Ley desdorff &Raf ols (Forthcoming) –
Materials Science, Multidisciplinary
Physics, Applied
Physics, Condensed Matter
Chemistry, Physical
Energy & Fuels
Materials Science, Coatings & Films
Science Overlay Map [see: www.idr.gatech.edu – includes “how to make your own map” and full citations]
Nanotechnology Thin-film Solar Cells Publications by Research Field
Science Overlay Mapping1. Start with Web of Science file in VantagePoint
• Map the Subject Categories or• Cited Subject Categories (somewhat complicated process)
• Special import filter to extract cited source titles• Applies a special Find/Replace thesaurus to those to make titles
more standardized (e.g., J vs. Jnl vs. Journal)• We then apply a special macro that uses a Journal-to-Subject
Category thesaurus to get Cited Subject Categories (“SCs”)• Output a vector file of SCs or Cited SCs
2. In Pajek• Select the SCI (175 SC) or SCI+SSCI (221 SC) base map• Edit your map (e.g., change node size)• Output in desired format (e.g., jpeg)
3. In MS Powerpoint• Overlay on the appropriate base map
4. Or, go to www.idr.gatech.edu/ -- select “Upload Map”
Geomapping
Geo-map: Nano-enhanced Solar Cells – European Institutions >=10 papers
Pod 7+: Activities for Matrices, Trends, Hot Topics & Maps + … “SuperProfile”
Research Profiling Interactions/Excercises for Matrices, Trends
& Hot Topics
**The following exercises may be downloaded at
http://www.thevantagepoint.com/webinars.cfm
Alan PorterDirector of R&D, Search Technology, Inc.
[& Georgia Tech][email protected]
Interactive Ideas/Exercises6. Matrix Fun & Games
• In VantagePoint, on your dataset, make a matrix of interest• Relate analytical possibilities to spell out what MOT questions
these could help answer?• One family of matrices involve Time (e.g., Year) vs. another
variable [“When vs. …]• Another family involves Topic (e.g., Key terms, Subject Categories)
vs. Performer (e.g., Country, Affiliation, Author) [“What vs. Who”]• An important matrix type entails a variable vs. itself (e.g., Author by
Author; Country by Country)• Try out matrix operations
• Flood the matrix to different degrees [use the Up & Down bars in the upper left corner cell (headings by headings cell)
• Open detail views to explore a group of cells together; select an entry in a detail view to see the records to which it pertains in the title view
• Paint groups of cells; then re-sort• Address one or more MOT questions via your matrix content
Interactive Ideas/Exercises7. Matrix Viz
• In VantagePoint, with your matrix open, run the MatrixViewerscript. [If the view is too cluttered or not interesting, make a more suitable matrix, possibly by creating a group on a particular variable to select key entities.]
• Try different “Layouts”; select and move entities in the viewer• Export the most interesting layout to file.
Interactive Ideas/Exercises8. Activity-Diversity
• Make a group of Top Affiliations in your dataset [experiment with this – maybe start with an interesting 15-20]; create a field from group items.
• Open the Activity-Diversity Scatter 3D script; select that field to plot; select the field to measure Diversity (e.g., Subject Categories; Affiliations); select your minimum; try a Graphic Size.
• Say “yes” to “make changes to this chart” – and try out various sizes, axis formats, font and label angles – to get a plot you like.[Hint: You can keep redoing – but you can’t edit once you say ‘no.’]
• Interpret – what can you say about differences in research focus?
Interactive Ideas/Exercises9. Aduna Clustering
• Create a sub-dataset for a country of interest; save the VP file.• Create a “top n” (e.g., 10-30) affiliations group in that country
dataset.• Run the AdunaClusterMap macro for that group• Do you spot any interesting inter-institutional collaborations?
- any collaborations involving more than 2 organizations?• Consider whether such cluster maps could address your MOT
issues• At a higher level (inter-country collaboration investigation)• At a lower level (co-authoring patterns)
Interactive Ideas/Exercises10. Plot Matrix (for Trend)
• In your VP Summary sheet, check if you have “Number of Authors” [alternatively, “Number of Affiliation (name only)”]; if not import (they may be secondary fields in the Web of Science import filter)
• Make a matrix of Number of Authors by Publication Year• Sort; select all values except the last year.• Run the PlotMatrix script• Examine the resulting plots in MS Excel; pick one you like, or
make another (like the colorful plot of affiliations by year in Pod 5)
• Interpret
Interactive Ideas/Exercises11. Hot and New• List Comparison
• Pod 6 illustrated use of “List Comparison” to hunt for new terms in recent years; try your own version.
• Pick a suitable set of key terms. If these are a subset of a large field, it may be handy to make a new field of just those terms (e.g., by using “Group” capabilities)
• Break your data set to give “recent” and “earlier” based on publication years; create new Sub-datasets.
• Under the “Groups” menu, select “List Comparison”; compare the same key terms field in the 2 sub-datasets. Start with “Unique” and explore what may be of interest. [Expect lots of noise, but some interesting “new” to discover.]
• Try out “List Comparison” for other purposes – e.g., compare two organizations for relative emphases.
• Expectancy Values• Open your Publication Years field. Show your key terms of interest
in a Detail Window [see next slide]• Sort in the Detail Window on the Expectancies (terms with triple or
double Up arrows are quick candidate “HOT” topics)
Another Way to get at Hot Topics
Interactive Ideas/Exercises12. Tracking Term Appearance: Terms by Year
• Pick a terms field (e.g., “Keywords (author’s)” – but check record coverage
• Open the Terms by Year macro and run for “First Year,” including Summary report in Excel
• Examine the resulting VP list – sort by successive years and see if you can spot a set of potentially interesting “new in Year X” terms for recent years
Mapping1. Pod 7 introduced 3 types of VantagePoint
maps + a couple of maps that begin with VP analyses, extending to use of other software
2. No separate exercise for Factor Maps in VP here – adapt the ideas presented in Pod 7 to large term sets and try out yourself.
3. No separate exercises for: Science overlay maps [Pod 7 points to a helpful
website to make your own maps from Web of Science Subject Category lists]
Geo-mapping – Pod 7 presented to illustrate possibilities [there are other ways to create geo-maps from Web of Science affiliation information, processed thru VP, working with mapping software]
Interactive Ideas/Exercises13. Correlation Maps in VantagePoint: Collaboration Patterns
within an Organization• Select the target organization; create a sub-dataset for it• Open the authors LIST; create a group of interesting authors
(e.g., top 15)• Open the Mapping Wizard; Create an auto-correlation map• Then go back to the Wizard and Create a cross-correlation map
for those same interesting authors; select a topic field (e.g., key terms or Subject Categories)
• Compare the maps – open a couple of Detail Windows to explore what is going on – similarities? Differences?
• Right-click in a map – explore the various options – especially “Edit Preferences”• Change the threshold for showing links• Change the canvas size• Change the font size
Interactive Ideas/Exercises14. SuperProfile! [really versatile ‘research profiling’ tool – provides
“breakouts” for a set of entities to show other field values]• From the Scripts menu, select SuperProfile• Pick a field (or group) that you would like to profile (e.g.,
Country, Subject Category, Publication Year, Highly Cited papers); make selections as the Wizard poses them
• In the “Browser” then – Pick Column Type (e.g., Top Items); Pick Field (e.g., Subject Category); Pick # (e.g., how many Subject Categories to list out); Pick minimum # to include (the “Remove items” option); Pick output type – sheet is in VP; try Excel); Add to Profile.
• Pick another – Column Type (e.g., another “Top Items” type field) – or let’s try “Percent Recent-Database”; Pick field (Publication Year); Pick # of years to use as “recent”; Add to Profile
• Check the MS Excel results; if not quite what you want, redo; if they are what you want, edit for appearance.