research profiling – using vantagepoint to characterize a ... · research profiling – using...

92
Research Profiling – Using VantagePoint to characterize a body of research publications: A series of short presentations (“podcasts”) Mining Web of Science data Case example: nano-enhanced, thin-film solar cells Cells Nano-enhanced Thin-film Solar Cells Alan Porter Director of R&D, Search Technology, Inc. [& Georgia Tech] [email protected]

Upload: others

Post on 01-Jun-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

Research Profiling – Using VantagePoint to characterize a body of research publications:• A series of short presentations

(“podcasts”)• Mining Web of Science data• Case example: nano-enhanced, thin-film

solar cells Cells• Nano-enhanced Thin-film Solar CellsAlan Porter

Director of R&D, Search Technology, Inc. [& Georgia Tech]

[email protected]

Page 2: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

Pod 1: Overview of Research Profiling & Getting data from Web of Science

Page 3: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

Research Profiling1. Overview of the general process & getting

data2. Data into VantagePoint & cleaned 3. Basic descriptors

+ (tentatively):a) Trends b) Topical emphases & Changesc) Influence Measuresd) Research Networking: Mapse) Locating a body of research: science & geo

mapsf) Super Profiling: Breakoutsg) Advanced Analyses

Page 4: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

Session StrategyA. ~10 minutes per session – sequential, but you can

skip to topics of interest after the introductionB. Aim: To stimulate your ideas on how to apply

VantagePoint to gain insights from sets of research publications

C. This first set of sessions keys on Web of Science (“WOS”) results with a technology topic search focus – i.e., “what?”

D. A future set will key on WOS search results based on searching on a given organization – i.e., a “who?” focus

E. Case example: Nano-enhanced Solar Cells[with special thanks to Ying Guo]

Page 5: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

5 Stages in Mining External R&D Knowledge1. Literature review (within research community)2. Research Profiling: Characterizing a body of

research publication activity• Focus on research activities• Largely descriptive

3. Tech Mining• Multiple data to mine• To generate effective technical intelligence

4. Structured Knowledge Discovery5. Literature-Based Discovery (“LBD”)

Page 6: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

Research Profiling 1: Getting Going

A. General overview of the Research Profiling process and its aims Questions Answers Data

B. Search; download

Page 7: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

How to do Tech Mining (or Research Profiling): 8 steps

1. Spell out the questions and how to answer them

2. Get suitable data3. Search (iterate)4. Import into text mining software (e.g.,

VantagePoint)5. Clean the data6. Analyze & interpret 7. Represent the information well – communicate!8. Standardize and semi-automate where possible

Page 8: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

Start with the questions!

Page 9: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

Text and data mining techniques are good at addressing:

WHO?WHAT?WHEN?WHERE?

Additional questions usually require more human insight:

HOW?WHY?

Types of Questions

Page 10: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

“Answers”: Innovation Indicators• Technology Life Cycle Indicators

- e,g, growth curve location & projection

• Innovation Context Indicators- e.g., presence or absence of success factors (funding, standards, infrastructure, etc.)

• Product Value Chain and Market Prospects Indicators- e.g., applications, sectors engaged

Page 11: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

Technical Information• Science, Technology

& Innovation (“ST&I”) Databases (e.g., Web of Science; CSCD, Thomson Innovation)

• Internet Sources(e.g., Googling)

• Technical Expertise

Contextual Information• Business, competition,

customer, policy, popular content Databases (e.g., Thomson One)

• Internet Sources (e.g., blogs, website profiling)

• Business Expertise

Six information types

Page 12: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

On-line Data Sources Custom DataCambridge Scientific Abstracts Factiva Patbase Comma/tab delimited tablesDelphion ISI Web Of Knowledge Questel-Orbit Microsoft Excel and AccessDialog Lexis Nexis SilverPlatter SmartChartsEBSCOHost Micropatent STN XMLEi Engineering Village Ovid Thomson Innovation

Databases Record/Field ToolsAerospace Focust Pascal Combine duplicate recordsArt Abstracts Food Sci & Tech Patent Citation Index Remove duplicate recordsBiobase Foodline Market PCT Create “frankenrecords”Biological Abstracts Foodline Science PCTPAT (merge records fromBiological Sciences Forege Phin dissimilar sources)Biosis Frosti Pira Classify recordsBiotechno FSTA Pluspat Merge fieldsBusiness & Industry Gale PROMT PROMT Clean up fieldsCAPlus (AnaVist export) GeoRef PsycINFO Apply thesauriCassis Global Reporter PubMedCBNB IFIPAT Rapra Claims IFIUDB Recent RefsComputer & Info Systems INPADOC Reference ManagerCorrosion INSPEC Science Citation IndexCurrent Contents IPA SciSearchDerwent Biotech Abstracts ISD ScopusDerwent Innovations Index ITRD Tech ResearchDerwent World Patent Index JAPIO ToxFile Ei Compendex JICST TransportEMBase Kosmet USAppsEnCompass Literature LGST USPat EnCompass Patents MATBUS WaternetEnergy Medline WaterResAbsEnergySciTech METADEX Web of ScienceEngineering Materials Abstr Mgmt and Org Studies WeldaSearch Envr Sci & Pollution Mgmt Micropatent Materials Wisdomain ERIC MobilityEuroPat NSF AwardsFamPat NTIS

VantagePoint Import Filters and Tools

A wealth of diverse

information sources for innovation

management

Page 13: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

Requires Access to External Information (License)

• Bulk Processing is a must• Download in electronic form• Requires competence in searching

Management Issues

Page 14: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

Case Examples

Getting to the data- usually via internet

Page 15: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

Case Examples

Getting the data- search within databases

Page 16: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

Case Examples

Retrieving the data

Page 17: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

Resources• www.theVantagePoint.com – offers multiple papers and

some case analyses

• View the VantagePoint Video Tutorial Series by Paul Oldham on the website, especially Sessions 1, 2 & 3

• Tech Mining by Alan Porter and Scott Cunningham, Wiley, 2005.

• Porter, A.L., Kongthon, A., Lu, J-C., Research Profiling: Improving the Literature Review, Scientometrics, Vol. 53, p. 351-370, 2002.

Page 18: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

Pod 2: Cleaning the Data in VantagePoint

Page 19: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

Research Profiling1. Overview of the general process & getting

data2. Data into VantagePoint & cleaned 3. Basic descriptors

+ (tentatively):a) Trends b) Topical emphases & Changesc) Influence Measuresd) Research Networking: Mapse) Locating a body of research: science & geo

mapsf) Super Profiling: Breakoutsg) Advanced Analyses

Page 20: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

Getting the data into VantagePoint

1. Open VantagePoint2. File > Import Raw Data File3. Import Wizard opened:

Select Files4. Select a suitable import filter

> Next5. Select fields to import

- maybe Secondary Fields too- you can later “import more fields”

Page 21: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

Case ExamplesSummary Sheet

VPT file- Fields available- Counts- Coverage of record set

“Right-Click” to - set data type- rename- view statistics- etc.

Page 22: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

Search Refinement

• Confirm your search boundaries: time, geographical, institutional

• Check your search quality Precision – how much noise did you retrieve? Recall – what did you miss?

• Check in VantagePoint Are you finding researchers and organizations you expect? Topical inclusion – especially check key terms

– Keywords (authors)– Keywords Plus (based on recurring phrases in the titles of papers

referenced by the documents you’ve retrieved)– Title NLP (Natural Language Processing) phrases– Or a combination of these (use “Merge Fields”)

You may well identify terms to try out in your WOS search• Ask knowledgeable technical folks to review and advise• Redo your search and download

Page 23: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

Data Cleaning• Just pointers here• Fields > List Cleanup – Window opens Select field Select “.fuz” to apply: e.g.,

– Organization Names.fuz– Person Names.fuz– General.fuz– BritishAmericanSpelling.fuz

Option: Verify matches w/another Field[e.g., Person Names with Author Affiliation]

• Fields > Thesaurus – Window opens Select field Select “.the” to apply: e.g., provided by Search Technology:

– Country.the– AcadCorpGov.the

Or select custom thesauri: e.g.,– Azerbaijan Natl Acad Sci name variations in WOS.the

Page 24: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

Whew!

• Remember to check your search coverage.

• Redo a refined search as needed

• Import and clean your data as warranted

• And the next podcast will get us into Research Profiling!

• Basic Descriptors coming up next

Page 25: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

Pod 3: Dealing with single fields: Getting set to work with Lists

Page 26: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

Research Profiling1. Overview of the general process & getting

data2. Data into VantagePoint & cleaned 3. Basic descriptors

+ (tentatively):a) Trends b) Topical emphases & Changesc) Influence Measuresd) Research Networking: Mapse) Locating a body of research: science & geo

mapsf) Super Profiling: Breakoutsg) Advanced Analyses

Page 27: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

Research Profiling Segment 3: “Basic descriptors”

A. Data prep – getting the target fields(variables) all set

B. “Top N” lists and such [single field tallies across the record set]

Page 28: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

Nano-enhanced Thin-film Solar Cells

Analysis of Global Research Activities with Future Prospects

Ying Guo

Ph.D. Candidate, Beijing Institute of TechnologyVisiting Student, Georgia Institute of Technology

Alan L. PorterLu Huang

International Association for Management of Technology, 2009

Page 29: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

Data Prep (1)1. If you have refined your search, re-import2. Clean -- as suitable to meet your objectives,

for basic descriptors, especially check:a. Publication Years [year.the available, but Web

of Science data are usually clean]b. Countries [apply country.the]c. Affiliations [organization names.fuz]d. Authors [person names.fuz; potentially “verify

matches with another field” – use Affiliations to help disambiguate names]

3. If you are apt to deal with a topic in the future, save List Cleanup results as your own topical thesaurus.

Page 30: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

Data Prep (2)1. Topical fields

a. Make Macro-disciplines from Subject Categories[not a standard VP thesaurus, but we plan to make available on our new academic website]

b. Keywords: decide if you want to MERGE some combination of: Keywords (author’s) & Keywords Plus & Title (NLP) phrases & Abstract (NLP) phrases

2. Keyword Clumping optionsa. Human: Scan the combo Keywords field of choice;

make groups of interesting terms using FINDb. Statistical: After a little pre-cleaning, use Factor

Mapping to form groups of the top %’s [e.g., 1%, 2%, 5% of records]; examine their performance; pick the best level to get at topical emphases

Page 31: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

Top N’s1. (Document types)2. (Publication Years)3. (Times Cited)4. Countries5. Affiliations6. Funding agencies7. Authors8. Journals (or Sources)9. Key terms10. Subject Categories11. Macro-Disciplines12. Organization Types

Page 32: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

Top N’s

1. Pick your output venue(s) – e.g., in VP and/or MS Excel, Word, Powerpoint

2. Decide if normalization is in ordera. % of All (or something else)b. Across databases or datasetsc. Table or Figure

Page 33: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

DONE! Research Profiling Segment 3: “Basic descriptors”A. Data prep – getting the target fields

(variables) all setB. “Top N” lists and such

[single field tallies across the record set] Fields from the dataset Derived fieldsUp next in Segment 4:

• 2 Fields together (matrices)• Trends• Discerning “Hot and New” topics

Page 34: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

Pod 3+: VP Help & Interactions/Exercises

Page 35: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

Research Profiling – Using VantagePoint to characterize a body of research publications:• A series of short presentations

(“podcasts”)• Mining Web of Science data• Case example: nano-enhanced, thin-film

solar cells Cells• Nano-enhanced Thin-film Solar CellsAlan Porter

Director of R&D, Search Technology, Inc. [& Georgia Tech]

[email protected]

Page 36: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

Help!

1. VantagePoint Help2. Analyst’s Guide

Page 37: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

Interacting

1. Discuss uses of VantagePoint to answer your research profiling questions If you are together in a real or virtual group,

discuss materials presented Here’s a starter question (next slide)

2. Perform hands-on exercises

Page 38: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

Interactive Ideas/Exercises

1. What “MOT” (management of technology, or technology policy, or research opportunity) questions might you want to answer from a Web of Science dataset?[next slides illustrative]

Page 39: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

For S&T Policy Maker and Manager:

• What are national R&D strengths and weaknesses?

• What is the existing status and what about forecasting likely future developments for thin-film solar cells?

• How to gauge relative opportunities for collaborative development, as well as monitor emerging competitors?

MOT

Who

What

When

Where

Why

How

Global

Research Activities with

Future Prospects

Our Paper

Need more experts’ inputs (we’re working on this)

By

Data Mining Technology

IAMOT 2009

Page 40: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

IAMOT 2009

We look at:

1. What research fields are involved?---map of science

2. quantity---publication numbers and trends

3. diversity---national contrasts

4. quality---citations

5. patterns of research networking---using VantagePoint

6. “Hot” nano-materials

For data:

a global dataset of nano publications downloaded from the SCI

defined “thin film and (solar or photovoltaic)” as our search expression

acquired the dataset containing 1659 records for time period from 2001 to mid-2008

Basic Dataset Search Expression Result Dataset

Page 41: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

Interactive Ideas/Exercises

2. Search on a topic with colleagues; consider how to refine your search• Import preliminary search results into VP

[do you have the right import filter?]• Scan key terms, Subject Categories, etc. to check

coverage and identify ways to enhance your search• Refine and rerun the search if warranted and time

permits

Page 42: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

Interactive Ideas/Exercises

3. Given your MOT questions, what data cleaning is in order?• Step through cleaning actions for each key field• Apply suitable “List Cleanup” (using appropriate

“.fuz” files)• Apply thesauri as suitable (“.the” files)

Page 43: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

Interactive Ideas/Exercises

4. A possible exercise: Thesaurus enhancement• Run the AcadCorpGov.the on your cleaned

Affiliations field [get rid of existing groups]• On that resulting field, “Create Group Using

Thesaurus” using this same “.the” file. Select “Group for Each Alias.”

• Research (e.g., Google) & assign some of the multiply-occurring organizations to one of the 4 groups.

• “Create thesaurus using groups”; select all 4 groups; save as AcadCorpGov-new date.the

• Run it as thesaurus; run it to create groups.

Page 44: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

Interactive Ideas/Exercises5. A Web of Science Key Terms exercise

• Merge fields (candidates include Keywords-Author; Keywords-Plus, Title NLP phrases; Abstract NLP phrases)

• Apply general.fuz• Apply stopwords.the• Make your own “interesting” key terms set

• Scan for an interesting term; use FIND with “select all” and make a GROUP of variations of that term

• Repeat for several interesting terms, making more groups• Create a new Field from Group Names

• Use Factor Map to statistically make a key terms set• Make a group in the Key Terms field – selecting interesting

terms appearing in, say, >1% of the records• Run Factor Map – then check out the resulting term

grouping (in a new Key Terms field created)• Compare the two key term sets – either useful?

Page 45: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

Interacting

1. We’ll insert more candidate exercises as we proceed, without great elaboration – use as you choose

2. Now, back to the show

Page 46: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

Pod 4: Matrices

Page 47: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

Nano-enhanced Solar Cell Web of Science Subject Category Concentrations of the Leading Countries

USA India Germany Japan China

Materials Science, Multidisciplinary

126 132 83 68 63

Physics, Applied 112 56 92 68 53Physics, Condensed Matter 59 72 80 47 46Chemistry, Physical 82 26 28 34 32Energy & Fuels 26 49 16 9 10Materials Science, Coatings & Films

24 21 26 17 21

Page 48: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

Acad-Corp-Gov Publishing by Country

Page 49: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

Cross-national Collaboration

% International Cooperation (among top 10)

USA India Germany Japan China France UK SouthKorea

Mexico Spain

USA 20.1% 288 5 16 5 6 5 3 9 8 1India 26.4% 5 239 4 15 4 5 20 10Germany 27.1% 16 4 195 10 2 8 8 1 4

Japan 24.2% 5 15 10 182 4 2 5 2 1China 10.4% 6 2 4 182 2 2 1 2France 24.8% 5 4 8 2 2 113 4 3UK 34.5% 3 5 8 5 2 4 84 1 1SouthKorea

52.2% 9 20 1 2 1 1 69 2

Mexico 38.5% 8 10 1 2 2 65 2Spain 17.5% 1 4 3 1 2 63

Page 50: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

Matrix-related Topics covered in VantagePoint

• Matrix Viewer• Multiple visualizations available

• Activity-Diversity• Scattergram for one variable based on 2 others

• Aduna Clustering• Colorful visualization of intersecting sets (e.g., co-

authoring)• Capability to zoom to records at those intersections

(extending to >2-way connections)

Page 51: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

Pod 5: Trends

Page 52: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

Trends

1. Decide if normalization is in ordera. Over time [rate of change]b. Most recent year

2. Decide if comparative analyses are in ordera. What/who are the benchmarks?b. How do you want to present your results?

Page 53: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

DSSC research by organization type (from SCI)

Page 54: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

# of author affiliations/paper for DSSC publications (SCI)

Page 55: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

2001 2002 2003 2004 2005 2006 2007

ChinaJapan

Mexico

South Korea

France

0

2

4

6

8

10

12

14

China

India

Japan

USA

Mexico

Germany

SouthKoreaSpain

France

China and India

are notable!

Nano-Structured ZnO Thin-film Solar Cells Publication by Countries and Years

Page 56: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

2001 2002 2003 2004 2005 2006 2007

France

Spain

SouthKoreaGermany

Mexico

USA

Japan

India

China

Nano-Structured ZnO Thin-film Solar Cells Publication: Top 10 countries by Years – note the increasing share for India & China

IAMOT 2009

Page 57: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

DSSC Publications (SCI) with % 2006 or later

Page 58: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

0

0.05

0.1

0.15

0.2

0.25

USA India Germany Japan China

2001

2003

2005

2007

Share of Nano-enhancedThin-film Solar Cells Publications by Countries [Science Citation Index, 2001-08 (part-year)]

Page 59: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

Projecting Nano-enhanced Solar Cell Research Activity

Actual data Projected data

Page 60: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

USA

IndiaGermany

China

UKJapan

FranceSouth Korea

MexicoSpain0

500

1000

1500

2000

0 50 100 150 200 250 300 350

act iv ity -# of records

qual

ity-#

of

cita

tions

• Nodes above the diagonal suggest relatively higher quality (US and UK). Below the diagonal, the closer to the diagonal, the higher the quality of that country’s research.

Research activity and impact characteristics—First Way

Page 61: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

0

20

40

60

80

100

120

140

160

180

200

0 10 20 30 40 50 60# of Records, 2001 and 2006

# of

Age

d* C

itatio

ns,

2001

and

200

6

US

ChinaGermany

India

Japan

Year denoted by s tar t and endpoints

2001 2006

• The steeper the slope of

the line connecting these

two points, the greater the

increase in quality of the

country’s research on this

topic

• Compared with Japan and

Germany, China and India

are upgrading!

Research activity and impact characteristics—Second Way

Page 62: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

Pod 6: “Hot topics”

Page 63: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

Research Profiling – Using VantagePoint to characterize a body of research publications:• A series of short presentations

(“podcasts”)• Mining Web of Science data• Case example: nano-enhanced, thin-film

solar cells [Ying Guo, Lu Huang & me]Cells

• Nano-enhanced Thin-film Solar CellsAlan Porter

Director of R&D, Search Technology, Inc. [& Georgia Tech]

[email protected]

Page 64: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

ZnO attracts increasing attention in recent years and is on trend to catch up with TiO2

“Hot” topic as shown by relative trends

Page 65: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

Ratio of Occurrences 2007-08 to those in 2001-06

ratio-recent # Records Top 20 Key Terms1.14 47 conjugated polymer0.85 74 fabrication0.85 61 TiO20.74 66 chemical vapor deposition0.65 28 amorphous silicon0.53 72 morphology0.52 94 semiconductor0.50 48 fullerene0.48 49 zinc oxide0.46 51 microstructure0.41 65 spray pyrolysis0.36 49 heterojunction0.32 37 CdTe0.29 102 electrodeposition0.28 92 CuInSe20.24 21 anatase0.22 39 chemical bath deposition0.17 21 Cu(In0.00 37 sol-gel0.00 22 photoconductivity

0.44 Top 20 Key Terms combined

Page 66: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

New Topics via List Comparison• Create VP sub-dataset for the recent nano-

enhanced solar cells publications (new VP file –I used 2007-08)

• Create VP sub-dataset for the earlier publications (I used 2001-06)

• Under GROUPS, choose LIST COMPARISON;I did so from the select keywords list (82) for 2007-08 and made a new group of those unique to this dataset in comparison to the earlier one.

• Results: “characterize” and “deposit” are the 2 novel ones[Warrants in-depth probing to check if these are meaningful]

Page 67: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

Key Terms by First Year

Page 68: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

New Key Terms Recently

Year 2005 2006 2007 2008Records 225 334 372 174New Terms 3 2 2 0

device [8 of 54]

nanocrystal[10 of 25]

DEPOSIT [37 of 52]

TiO2 film [8 of 29]

room temperature [4 of 24]

CHARACTERIZE [25 of 25]

cD[5 of 27]

Page 69: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

Recent Entrants

• We need not restrict the temporal comparison to key terms or topics

• Same modus operandi can be applied to identify new or recent entrants to the research (e.g., first papers on the topic from a given organization)

• Another variant is the inverse – to look for which participants seem to have abandoned the topic (no publications since Year X)

Page 70: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

Pod 7: Maps

Page 71: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

Visualization (Maps)1. VantagePoint Maps Auto-correlation maps Cross-correlation maps Factor maps

2. Social Network Analysis (SNA)3. Science Overlay Maps4. Geo-mapping

Page 72: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

USA Germany

Auto-Correlation MapsNETFSC Research networking comparison

USS (dispersed) vs Germany (1 central organization)

Page 73: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

Auto-correlation vs. Cross-correlation Nano-enhanced Solar Cells Country Research Networks

Page 74: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

Factor Map (Principal Components Analysis) –groups terms based on their tendency to co-occur across records

Page 75: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

Social Network Analysis (SNA)• VantagePoint offers several application opportunities Create a sub-dataset for a given country or organization Within that target group, for the given research topic, explore

research network connections• Examples Collaborations Shared interests Discrepancies between interests & collaboration

• Working with Pajek adds options Calculation of networking statistical measures (e.g.,

centrality) More mapping nuances

Page 76: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

Cognitive Sci

Computer Sci

GeosciencesAgri Sci

Ecol Sci

Biomed Sci.

Chemistr y

Physics

Engr SciMtls Sci

Infec tious Diseases

Clinical Med

Health Sci

EnvSci & Tech

Nano-Thin-Film Publications 2001-08 DistributionOv erlay ov er base 175 Subject Category Science Map

Ley desdorff &Raf ols (Forthcoming) –

Materials Science, Multidisciplinary

Physics, Applied

Physics, Condensed Matter

Chemistry, Physical

Energy & Fuels

Materials Science, Coatings & Films

Science Overlay Map [see: www.idr.gatech.edu – includes “how to make your own map” and full citations]

Nanotechnology Thin-film Solar Cells Publications by Research Field

Page 77: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

Science Overlay Mapping1. Start with Web of Science file in VantagePoint

• Map the Subject Categories or• Cited Subject Categories (somewhat complicated process)

• Special import filter to extract cited source titles• Applies a special Find/Replace thesaurus to those to make titles

more standardized (e.g., J vs. Jnl vs. Journal)• We then apply a special macro that uses a Journal-to-Subject

Category thesaurus to get Cited Subject Categories (“SCs”)• Output a vector file of SCs or Cited SCs

2. In Pajek• Select the SCI (175 SC) or SCI+SSCI (221 SC) base map• Edit your map (e.g., change node size)• Output in desired format (e.g., jpeg)

3. In MS Powerpoint• Overlay on the appropriate base map

4. Or, go to www.idr.gatech.edu/ -- select “Upload Map”

Page 78: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

Geomapping

Page 79: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

Geo-map: Nano-enhanced Solar Cells – European Institutions >=10 papers

Page 80: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

Pod 7+: Activities for Matrices, Trends, Hot Topics & Maps + … “SuperProfile”

Page 81: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

Research Profiling Interactions/Excercises for Matrices, Trends

& Hot Topics

**The following exercises may be downloaded at

http://www.thevantagepoint.com/webinars.cfm

Alan PorterDirector of R&D, Search Technology, Inc.

[& Georgia Tech][email protected]

Page 82: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

Interactive Ideas/Exercises6. Matrix Fun & Games

• In VantagePoint, on your dataset, make a matrix of interest• Relate analytical possibilities to spell out what MOT questions

these could help answer?• One family of matrices involve Time (e.g., Year) vs. another

variable [“When vs. …]• Another family involves Topic (e.g., Key terms, Subject Categories)

vs. Performer (e.g., Country, Affiliation, Author) [“What vs. Who”]• An important matrix type entails a variable vs. itself (e.g., Author by

Author; Country by Country)• Try out matrix operations

• Flood the matrix to different degrees [use the Up & Down bars in the upper left corner cell (headings by headings cell)

• Open detail views to explore a group of cells together; select an entry in a detail view to see the records to which it pertains in the title view

• Paint groups of cells; then re-sort• Address one or more MOT questions via your matrix content

Page 83: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

Interactive Ideas/Exercises7. Matrix Viz

• In VantagePoint, with your matrix open, run the MatrixViewerscript. [If the view is too cluttered or not interesting, make a more suitable matrix, possibly by creating a group on a particular variable to select key entities.]

• Try different “Layouts”; select and move entities in the viewer• Export the most interesting layout to file.

Page 84: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

Interactive Ideas/Exercises8. Activity-Diversity

• Make a group of Top Affiliations in your dataset [experiment with this – maybe start with an interesting 15-20]; create a field from group items.

• Open the Activity-Diversity Scatter 3D script; select that field to plot; select the field to measure Diversity (e.g., Subject Categories; Affiliations); select your minimum; try a Graphic Size.

• Say “yes” to “make changes to this chart” – and try out various sizes, axis formats, font and label angles – to get a plot you like.[Hint: You can keep redoing – but you can’t edit once you say ‘no.’]

• Interpret – what can you say about differences in research focus?

Page 85: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

Interactive Ideas/Exercises9. Aduna Clustering

• Create a sub-dataset for a country of interest; save the VP file.• Create a “top n” (e.g., 10-30) affiliations group in that country

dataset.• Run the AdunaClusterMap macro for that group• Do you spot any interesting inter-institutional collaborations?

- any collaborations involving more than 2 organizations?• Consider whether such cluster maps could address your MOT

issues• At a higher level (inter-country collaboration investigation)• At a lower level (co-authoring patterns)

Page 86: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

Interactive Ideas/Exercises10. Plot Matrix (for Trend)

• In your VP Summary sheet, check if you have “Number of Authors” [alternatively, “Number of Affiliation (name only)”]; if not import (they may be secondary fields in the Web of Science import filter)

• Make a matrix of Number of Authors by Publication Year• Sort; select all values except the last year.• Run the PlotMatrix script• Examine the resulting plots in MS Excel; pick one you like, or

make another (like the colorful plot of affiliations by year in Pod 5)

• Interpret

Page 87: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

Interactive Ideas/Exercises11. Hot and New• List Comparison

• Pod 6 illustrated use of “List Comparison” to hunt for new terms in recent years; try your own version.

• Pick a suitable set of key terms. If these are a subset of a large field, it may be handy to make a new field of just those terms (e.g., by using “Group” capabilities)

• Break your data set to give “recent” and “earlier” based on publication years; create new Sub-datasets.

• Under the “Groups” menu, select “List Comparison”; compare the same key terms field in the 2 sub-datasets. Start with “Unique” and explore what may be of interest. [Expect lots of noise, but some interesting “new” to discover.]

• Try out “List Comparison” for other purposes – e.g., compare two organizations for relative emphases.

• Expectancy Values• Open your Publication Years field. Show your key terms of interest

in a Detail Window [see next slide]• Sort in the Detail Window on the Expectancies (terms with triple or

double Up arrows are quick candidate “HOT” topics)

Page 88: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

Another Way to get at Hot Topics

Page 89: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

Interactive Ideas/Exercises12. Tracking Term Appearance: Terms by Year

• Pick a terms field (e.g., “Keywords (author’s)” – but check record coverage

• Open the Terms by Year macro and run for “First Year,” including Summary report in Excel

• Examine the resulting VP list – sort by successive years and see if you can spot a set of potentially interesting “new in Year X” terms for recent years

Page 90: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

Mapping1. Pod 7 introduced 3 types of VantagePoint

maps + a couple of maps that begin with VP analyses, extending to use of other software

2. No separate exercise for Factor Maps in VP here – adapt the ideas presented in Pod 7 to large term sets and try out yourself.

3. No separate exercises for: Science overlay maps [Pod 7 points to a helpful

website to make your own maps from Web of Science Subject Category lists]

Geo-mapping – Pod 7 presented to illustrate possibilities [there are other ways to create geo-maps from Web of Science affiliation information, processed thru VP, working with mapping software]

Page 91: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

Interactive Ideas/Exercises13. Correlation Maps in VantagePoint: Collaboration Patterns

within an Organization• Select the target organization; create a sub-dataset for it• Open the authors LIST; create a group of interesting authors

(e.g., top 15)• Open the Mapping Wizard; Create an auto-correlation map• Then go back to the Wizard and Create a cross-correlation map

for those same interesting authors; select a topic field (e.g., key terms or Subject Categories)

• Compare the maps – open a couple of Detail Windows to explore what is going on – similarities? Differences?

• Right-click in a map – explore the various options – especially “Edit Preferences”• Change the threshold for showing links• Change the canvas size• Change the font size

Page 92: Research Profiling – Using VantagePoint to characterize a ... · Research Profiling – Using VantagePoint to characterize a body of research publications: • A series of short

Interactive Ideas/Exercises14. SuperProfile! [really versatile ‘research profiling’ tool – provides

“breakouts” for a set of entities to show other field values]• From the Scripts menu, select SuperProfile• Pick a field (or group) that you would like to profile (e.g.,

Country, Subject Category, Publication Year, Highly Cited papers); make selections as the Wizard poses them

• In the “Browser” then – Pick Column Type (e.g., Top Items); Pick Field (e.g., Subject Category); Pick # (e.g., how many Subject Categories to list out); Pick minimum # to include (the “Remove items” option); Pick output type – sheet is in VP; try Excel); Add to Profile.

• Pick another – Column Type (e.g., another “Top Items” type field) – or let’s try “Percent Recent-Database”; Pick field (Publication Year); Pick # of years to use as “recent”; Add to Profile

• Check the MS Excel results; if not quite what you want, redo; if they are what you want, edit for appearance.