a scientific framework to measure results of research investments
DESCRIPTION
Eu descrevo em detalhe uma abordagem científica para medir os resultados dos investimentos em ciência. O modelo é baseado em uma abordagem sócio científica, ao invés de bibliométrica para descrever o empreendimento científico. Isso significa estudar e explicar a criação, transmissão e adoção de ideias científicas, ao invés de descrever e classificar documentos. As ideias são geradas dentro das redes sociais (tanto científicas quanto econômicas); o financiamento da ciência funciona, em parte, ao permitir que estas redes existam e se expandam. Como Kahneman salientou “o primeiro grande avanço em nossa compreensão do mecanismo de associação foi uma melhoria no método de medição”, e a chave para melhores medições científicas são melhores dados. Eu descrevo os princípios e metodologia de um amplo espectro de dados que descrevem o processo de pesquisa e as redes de pesquisa que impulsionam este processo. Eu discuto a abordagem para a construção de uma poderosa nova infraestrutura de dados, que facilitará a integração destes dados permitindo, assim, uma análise do papel do financiamento para estimular a criação, transmissão e adoção de ideias através destas redes. I describe in detail a science-based approach for measuring the results of science investments. The framework is based on a social scientific, rather than a bibliometric approach to describing the scientific enterprise. This means studying and explaining the creation, transmission and adoption of scientific ideas, rather than describing and classifying documents. The ideas are generated within social (both scientific and economic) networks; science funding works in part by enabling those networks to exist and expand. As Kahneman has pointed out, “the first big breakthrough in our understanding of the mechanism of association was an improvement in a method of measurement,” and the key to better scientific measurements is better data. Since the key to better scientific measurements is better data. I describe the methodical and principled capture of a broad spectrum of data describing the research process and the research networks that drives that process. I discuss the approach to building a powerful new data infrastructure that will enable the integration of this data and thus permit analysis of the role of funding in stimulating the creation, transmission and adoption of ideas through those networks. Describo en detalle un enfoque basado en la ciencia para medir los resultados de las inversiones científicas. El marco es un enfoque basado en las ciencias sociales más que un enfoque bibliométrico para describir la empresa científica. Esto significa estudiar y explicar la creación, transmisión y adopción de las ideas científicas, en lugar de describir y clasificar los documentos. Las ideas se generan dentro de las redes sociales (tanto científicas como económicas); la financiación de las ciencia opera en parte al permitir que las redes existanTRANSCRIPT
A scientific framework to measure results of research investmentsJulia Lane, American Institutes of Research, University of Strasbourg and University of MelbourneAnd many colleagues
Key ideas• Need sensible scientific framework which:– Is theoretically driven– Uses appropriate unit of analysis– Is generalizable and replicable
• Need sensible empirical framework which– Uses 21st Century technology to collect data– Uses 21st Century technology to link activities
• Need framework which can be international
Outline
• Motivation• Conceptual Framework• Empirical Frameworks• Next steps
Motivation
The President recently asked his Cabinet to carry out an aggressive management agenda for his second term that delivers a smarter, more innovative, and more accountable government for citizens. An important component of that effort is strengthening agencies' abilities to continually improve program performance by applying existing evidence about what works, generating new knowledge, and using experimentation and innovation to test new approaches to program delivery.
MotivationHow much should a nation spend on science? What kind of science? How much from private versus public sectors? Does demand for funding by potential science performers imply a shortage of funding or a surfeit of performers?......A new “science of science policy” is emerging, and it may offer more compelling guidance for policy decisions and for more credible advocacy
We spend a lot on research: What’s the impact?
Classic Questions for Measuring Impact
• What is the impact or causal effect of a program on outcome of interest?
• Is a given program effective compared to the absence of the program?
• When a program can be implemented in several ways, which one is the most effective?
Classic Example: Measuring Impact
Illustration of swan-necked flask experiment used by Louis Pasteur to test the hypothesis of spontaneous generation
Classic Challenge: Theory of Change
Key ideas• Need sensible scientific framework which:– Is theoretically driven (theory of change)– Uses appropriate unit of analysis (people)– Is generalizable and replicable (open)
Outline
• Motivation• Conceptual Framework• Empirical Frameworks• Next steps
The Theory of Change
Classic Challenge: Theory of Change
Writing the Framework Down• (1) Yit
(1) = Yit(2)α + Xit
(1)λ + εit
• (2) Yit(2) = Zitβ +Xit
(2)μ + ηit
where the subscripts i and t denote project teams and quarters ε and η stand for unobserved factors, serendipity and errors of measurement and specification (and can possibly include individual unobserved project teams’ characteristics).
The output variables are measured by Y(1) and research collaboration variables by Y(2).
Both are determined by a set of control variables X(1) and X(2) that can overlap and be truly exogenous or predetermined variables of key interest Z (funding).
Source: Jason Owen Smith
Outline
• Approach: Doing an Evaluation• Conceptual Framework• Empirical Framework• Next steps
STAR METRICS approach
• Level 1: Document the levels and trends in the scientific workforce supported by federal funding.
• Level 2: Develop an open automated data infrastructure and tools that will enable the documentation and analysis of a subset of the inputs, outputs, and outcomes resulting from federal investments in science.
Institution STARSTARPilot
ProjectAcquisition
And Analysis
DirectBenefitAnalysis
IntellectualPropertyBenefitAnalysis
InnovationAnalysis
Jobs,Purchases,Contracts
BenefitAnalysis
DetailedCharacterization
andSummary
Institution
Agency Budget
Award
StateFunding
Personnel Vendor Contractor
HR System ProcurementSystem
SubcontractingSystem
EndowmentFunding
Financial System
Hire Buy Engage
Disbursement
Award
Record
Start-Up
Papers
Patents
DownloadState
ResearchProject
ExistingInstitutional
Reporting
Agency
Automated Data Construction
• Most data efforts focus on hand-curated data• Scalable, Low cost / burden: Algorithmically
link researchers to their support (grants) scientific output (publications and citations) technological products (patents and drug approvals) Impacts (Health, economy, productivity)
• Link to linked employee / employer data• Probabilistic matches
The Theory of Change
Key ideas• Need sensible empirical framework which– Uses 21st Century technology to collect data
(cybertools..and SCIELO like activities)– Uses 21st Century technology to link activities
(disambiguation; ORCID)
Example in practice: CalTech Project• Funded by Sloan Foundation• Goals– Use STAR METRICS Level I data to examine production of
science at project, PI and lab level– Interview Caltech PIs to get qualitative grounding– Begin to build STAR METRICS Level 2 data linking PEOPLE
to results: publications, patents, altmetrics, dissertations, and Census data on student placements, firm startups etc
– Make source code and database infrastructure available to all STAR METRICS institutions
Award Funding for one researcher
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
0
2
4
6
8
10
12
Ongoing awardsNew awardsOngoing awardsNew awards
Lab staffing20
03
2004
2005
2006
2007
2008
2009
2010
2011
2012
0
20
40
60
80
100
120
UndergraduateTechnician / Staff scientistResearchResearch AnalystFacultyPost-DocGraduate Students
Industry Expenditures Number of transactions
Other Professional Equipment and Supplie 3386.36 121
Rail transportation 36 1
Scenic and Sightseeing Transportation, L 896.12 4
Commercial Banking 4616 2
Testing Laboratories 8312.92 100
Pharmaceutical Preparation Manufacturing 629.63 12
Biological Product (except Diagnostic) M 2480.45 37
Electrometallurgical Ferroalloy Product 189.8 8
Electronic Computer Manufacturing 6831.41 49
Semiconductor and Related Device Manufac 3672.51 73
Analytical Laboratory Instrument Manufac 61464.87 49
Scheduled Passenger Air Transportation 5892.79 19
Passenger car rental 1015.28 8
Research and development in the physical 1654.88 38
Colleges, Universities, and Professional -110.88 1
Vendor Expenditures on one project
Publications of researcher20
00
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
0
2
4
6
8
10
12
PHD Theses Supervised
1 2 3 4 5 6 7 8 9 10
0
1
2
3
4
5
6N
. of T
hese
s
Patents for same researcher20
00
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
0
0.5
1
1.5
2
2.5
3
3.5
USPTO Patents
n_pat_uspto n_pat_uspto
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
0
0.5
1
1.5
2
2.5
3
3.5
EPO Patents
n_pat n_pat
New research: Exploratory regressions
+...
Y (outputs) can be expanded
• Currently Y is just publications, patents, PhD students
• Census interest suggests we can develop additional economic outcomes:– Wages and career trajectories for postdocs/grad.
Students– Firm startups, growth and productivity
• And..substantial competence in SciSIP community in building out science and social outcomes
VARIABLES Pubs Patents PhDs Pubs Patents PhDs
Award expenditures 0.057*** 0.0018 0.0093**
Labor inputs 0.19*** 0.056*** 0.10*** 0.12*** 0.053*** 0.089***
Share post-doc 0.43** -0.071 -0.078 0.23 -0.077 -0.11
Share PhD 0.072 -0.023 0.27*** -0.14 -0.030 0.23***
Equipments 0.010 0.00055 0.0029 -0.015 -0.00024 -0.0011
Share computer -0.36 -0.042 -0.25 -0.41 -0.044 -0.26
Share optics -0.21 0.68** 0.22 0.016 0.68** 0.26
seniority -0.0098*** -0.00081 0.00014 -0.010*** -0.00083 0.000030
Full Prof. 0.081 0.027 0.072** 0.054 0.026 0.068**
Share ARRA 0.94*** -0.018 -0.10 0.71** -0.026 -0.14
harvard -0.026 -0.041 -0.0024 -0.069 -0.042 -0.0095
mit 0.065 0.092 -0.00068 0.051 0.091 -0.0030
caltech 0.23** 0.028 0.046 0.21** 0.027 0.043
physics 0.26*** -0.047 0.0047 0.22*** -0.048 -0.0017
chemistry 0.40*** 0.064 0.17** 0.38*** 0.063 0.17**
engineering 0.60*** 0.030 0.22*** 0.59*** 0.030 0.22***
Calendar year dummies yes yes yes yes yes yes
Constant 0.11 -0.021 -0.16*** 0.018 -0.024 -0.17***
Observations 2,590 2,590 2,590 2,590 2,590 2,590
R-squared 0.321 0.084 0.205 0.365 0.084 0.210
Robust standard errors in parentheses
Use data to estimate production functions at project level
Note: Same approach as that used to derive widely accepted result that R&D generated more than half of US productivity growth in the 1990’s; these data preliminary and not to be cited
Next example: CIC Activity Now building out across multiple universities and frames
Bruce Weinberg, OSU
The CIC• University of Chicago • University of Illinois • Indiana University • University of Iowa • University of Maryland • University of Michigan • Michigan State University • University of Minnesota • University of Nebraska-Lincoln • Northwestern University • Ohio State University • Pennsylvania State University • Purdue University • Rutgers University • University of Wisconsin-Madison
STEM Workforce Training:A Quasi-Experimental Approach Using
the Effects of Research FundingJoint with Bruce Weinberg, Vetle Torvik, Lee
Giles and Chris Morphew
Overview and Goals• The impact of research environment and
funding structures on the training and outcomes of graduate students and post docs
• Build automated, extensible data infrastructure
• Pilot for international community
Data Structure
CIC STAR METRICS Data(Grants/Labs / Teams;
Sample)
SED(Chars, Initial
outcomes)
Web, Algorithmic
Disambiguation, Microsoft Academic
(Pubs, Patents, Cites, Grants)
LEHD(Employment,
wages w/in US)
Econometric Models(1)
(2) (3)
Identification• Relate outcomes to length of training, team, and
funding structure• ARRA funding as “experiment” to shift length of
training– Lightly Reviewed Grants– Supplements to Existing Grants– Payline Extension Granst
• Also, presumably, shift teams toward postdocs• Get returns to time in training under different
team and funding structures
Probability ofFunding
Proposed Project “Quality” Non-ARRA Payline
Extended ARRA
Payline
Likely Funded only under ARRA
Figure 2. Research Design for Payline Extension.
Likely Funded even without ARRA
Unlikely to be Funded even with ARRA
Possible Analyses• Estimate how training environment affects
retention in US, sector of employment, wages• Estimate how flows of trainees to companies
affects productivity• Measure impact on innovation by linking text of
patents to the research done in the labs where people trained
• Open the knowledge transfer black box and estimate returns to training
What are the results of research (internationally)ASTRA (Australia)HELIOS (France)CAELIS (Czech Republic) NORDSJTERNEN (Norway)STELLAR (Germany)TRICS (UK)SOLES (SPAIN)
Building new tools
We spend a lot on research: What’s the impact?
Key ideas• Need sensible scientific framework which:
– Is theoretically driven (theory of change)– Uses appropriate unit of analysis (people)– Is generalizable and replicable (open)
• Need sensible empirical framework which– Uses 21st Century technology to collect data (cybertools..and
SCIELO like activities)– Uses 21st Century technology to link activities (disambiguation;
ORCID)• Need framework which can be international (develop
community of practice with common interests)
Thank you!
Julia Lanewww.julialane.org
www.cssip.org