for Evaluating Public R&D Investment
CHAPTER 8: Other Evaluation Methods Used by ATP
As reported in Chapters 4–7, the survey method, case study methods, and econometric/statistical methods have been applied extensively to the evaluation of ATP projects. This chapter presents several additional methods that have been used to a lesser extent—two traditional and three newly emerging methods. The first two methods presented, expert judgment and bibliometrics, are well known evaluation methods. The last three methods presented—a unique approach to estimating knowledge spillovers using social network analysis, a cost index model for estimating consumer market spillovers, and a composite performance rating system for project and portfolio analyses—are still emerging.
Their usefulness can be expected to increase as ATP and other programs gain experience with them and refine them further. The three emerging methods underscore a persistent goal of ATP’s evaluation effort: to explicate new and improved analytical techniques for assessing ATP’s program impacts. This effort reflects a realization that measuring the impacts of a program such as ATP challenges existing models and tools. Table 8–1 lists 11 studies illustrating in turn the use of expert judgment, bibliometrics, and the three emerging methods.
Like most public research and development (R&D) programs, ATP has used expert judgment in evaluating its program and general performance. In addition to using expert judgment as a stand-alone method, researchers have used expert judgment in various ways to support other evaluation methods, primarily by providing a basis for certain aspects of quantitative analyses. Examples from ATP evaluation studies show the use of expert judgment for assessment of project and program performance. Use of the method for program relevance review, for benchmarking of organizations, programs, staff, or facilities, for personnel decisions, and for proposal review lies outside the boundary of this treatment. 243
*Note that five of seven studies using expert judgment are included; others being Branscomb, 2002; and Ruegg, 2003. Three of four studies using bibliometrics are included; the other being a report by Powell similar to that shown. Three studies — one of which (Fogarty, et al. 2000 draft) is included under Emerging Methods — used sociometric methods. The others using sociometric methods included Przybylinski, 2000 draft; and Darby, et al., 2002.
The first study presented illustrates the use of expert judgment to provide broad program assessment. The second study illustrates the use of expert judgment to increase understanding of underlying program theory. Both studies were carried out by highly respected organizations capable of convening groups of experts to deliberate on complex ideas, and to draw conclusions and make recommendations based on that judgment. In these studies, the persuasive ability of the group providing the expert judgment is central to its success, and is based in large part on the reputation of the organizing body. In both cases, the knowledge and judgment of the experts are conditioned and informed by supporting testimony and studies brought to the attention of the experts and reflected in their study reports.
Two additional examples illustrate the use of expert judgment to overcome the absence of data needed for prospective impact estimation. In one of these examples, expert judgment plays a relatively minor role in support of a complex case study. In the other, expert judgment is the primary method used to develop project-level data used in the case study estimates.
The National Research Council’s Assessment of ATP
In 1999, Congress directed ATP to arrange for a well-regarded organization with significant business and economic experience to conduct a comprehensive assessment of the program to determine how well ATP has performed in terms of achieving goals established by its authorizing statute. 244 In response, ATP requested that the National Research Council’s (NRC) Board on Science, Technology, and Economic Policy (STEP) conduct an assessment of ATP as part of its broader review of government-industry partnerships for the development of new technologies. 245 In taking on this responsibility, NRC/STEP convened workshops and symposia, and commissioned a series of papers on ATP—all feeding into its ultimate findings and recommendations about ATP.
The important role of expert judgment is evident throughout the NRC assessment. First, the NRC study of ATP was managed by a distinguished multidisciplinary steering committee assembled to oversee the broader study of government-industry partnerships. 246
The two major symposia and a workshop organized by the steering committee convened additional experts and organized them in panels. The informed panelists were drawn from industry, government, academia, and the investment and financial communities. The symposia and workshops provided forums for experts to present and discuss information about the program, to express their perspectives, and to facilitate extensive audience discussion. Participants included critics, supporters, and those with neutral points of view. A body of independent analyses informed the deliberations.
The steering committee issued its core findings and recommendations in 2000. Among the findings was: 247
This statement suggests a strong connection and interplay between detailed program study and expert judgment in program assessment.
Harvard-MIT Study of Technical Risk Management
Our second example of expert judgment illustrates use of the method to increase understanding of program fundamentals. Performed by Harvard’s John F. Kennedy School of Government in collaboration with MIT’s Sloan School of Management and the Harvard Business School, the study examined R&D barriers that keep private investors from funding early-stage, high-risk technology development projects. 248 Because the study was discussed extensively in Chapter 4, the treatment here will be brief, focusing on its use of expert judgment. Principal investigators included Lewis Branscomb, Harvard University, and Kenneth Morse, Director of the Sloan School Entrepreneurship Center, with participation by Harvard Business School and MIT faculty. The project drew extensively on contributions of scholars and practitioners from business and venture capital organizations, government, and universities.
Like the NRC study, the Harvard-MIT study used the workshop/symposium format to convene experts, present the results of independent research and commissioned papers, stimulate discussion, shape key questions, and provide answers. The project report contains both the collection of contributed papers and the report of the project team.
Unlike the NRC study, however, the Harvard-MIT study does not present recommendations. Rather, it uses expert judgment to add to other methods that inform the program. In the words of the authors:
Using Expert Judgment in Support of a Case Study
Often, expert judgment is used to support other methods. The illustrations provided here are drawn from two sources: a set of prospective economic case studies of medical technologies performed by Research Triangle Institute (RTI) 249 and a prospective economic case study of dimensional control technologies performed by CONSAD Research Corporation. Both of these are discussed in more detail in Chapter 6, where the focus is on the case study method.
Expert Judgment for Prospective Evaluation of Medical Technologies
RTI’s seven case studies all dealt with new tissue engineering technologies, which had not yet been commercialized at the time of the study. To estimate prospective private and social benefits, the analysts needed to know the number of patients to be treated with each new technology each year if it were successfully implemented. The plan was to project the market penetration of each technology by using a widely used, standard Bass diffusion model, depicted in Figure 8–1 and named after Frank Bass who described it in 1969. 250 According to the researchers: 251
In the case of RTI’s analysis of Aastrom Biosciences’ human stem cell expansion technology and three other technologies under development, the approach was to interview physicians to obtain data for the Bass diffusion model. RTI identified the physicians for interview as experts in the treatment of the diseases in question after consulting with the innovating companies, relevant medical associations, and other physicians. For three additional cases for which less in-depth case studies were performed, the researchers interviewed company representatives for their expert judgment to obtain the provided diffusion estimates. Although both the physicians and company representatives were considered experts by the study, it was implied that the physicians were higher-level experts in the subject matter than the company representatives. The definition of an expert, therefore, has a subjective element and may vary even within a given study.
The Bass Diffusion Curve reflects the fact that a new technology is typically not adopted by all potential users at one time, but rather diffuses into use over time. Adoption is shown to increase at an increasing rate for a period and then to level off as market saturation occurs, generating an s-shaped curve.
Source: RTI, A Framework for Estimating the National Economic Benefits of ATP Funding of Medical Technologies, 1998, p. 2–27.
Source: RTI, A Framework for Estimating the National Economic Benefits of ATP Funding of Medical Technologies, 1998, pp. A–8 to A–9.
For the physician interviews, RTI developed clinical profiles for the technologies and provided them to the physicians prior to the interviews. They did not identify the companies involved to the physicians. They sent a set of questions to the physicians in advance of the interview and prepared an informal interview guide that was used for the subsequent telephone interviews of the physicians. Figure 8–2 replicates the interview guide used to solicit expert judgment from the physicians. Table 8–2 shows the data collected from physician experts for one of the technologies studied, Aastrom Biosciences’ human stem cell expansion system.
Source: RTI, A Framework for Estimating the National Economic Benefits of ATP Funding of Medical Technologies, 1998, pp. A–11 to A–12.
It is apparent in the study report that the researchers took special care in collecting and documenting the rendering of the expert judgments in their in-depth case studies. This appears to contrast sharply with the approach taken in the next example, where the use of expert judgment is stated, but the approach and documentation remains a “black box,” unavailable to the reader.
Expert Judgment for Prospective Evaluation of Dimensional Control Technologies
A second example of using expert judgment to fill in missing data in prospective case study analysis is found in CONSAD’s case study of technologies to reduce dimensional variation in automotive bodies, the so-called “2mm project” funded by ATP. 252 The technologies in this case were further along than in the previous medical technologies cases. In fact, according to the researchers:
The researchers noted, however, “Because the technologies...are new, their impacts on industrial production and economic activity are not yet revealed in the extant empirical data on industrial performance.” 254
In the absence of published data at the industrial sector level, CONSAD researchers relied heavily on expert judgment in making their economic estimates of impact. The expert judgment appeared to be informed by empirical data but the extent to which this was true was not revealed. They cited the proprietary and confidential nature of in-house data from plant implementation of the technology as the reason for concealing exactly how and to what extent it may have come into play in the estimates. According to the CONSAD report:
CONSAD’s approach constitutes a sweeping use of experts to estimate economic impacts, with only general references in support of the estimates. At best, the study reveals the types of impacts identified, explains the logic behind each impact, and plugs in the expert-supplied dollar amount per unit. Thus, for estimated benefits of automotive production cost savings, the report states “Engineers at GM’s truck assembly plant in Linden, New Jersey, and 2mm Project researchers involved with the technology transfer at Chrysler’s Jefferson North assembly plant estimate that new production costs...at those facilities have been reduced by approximately $75 per vehicle...” 255 Similarly, experts estimated a range of benefits for automotive maintenance cost savings, a range of changes in market share associated with improvements in automobile quality, and the year in which the 2mm technology was expected to be adopted in all GM and Chrysler assembly plants. The estimates of production cost savings and maintenance cost savings drove the study’s microeconomic estimates of impact, and the estimated change in market share due to quality improvements drove the study’s macroeconomic impact estimations using the Regional Economic Modeling, Inc., (REMI) model.
According to the report, “The experts who provided data and judgments regarding estimated changes in market share referred to past instances when a short-term change in the perceived quality of an automobile model resulted in a shift in market share for the particular model relative to competitive models.” 256 But no reference data and no specific references to the literature documenting these past instances are provided in the report. Presumably, they are proprietary or confidential.
Citing experts knowledgeable about the substance of the technologies, and experts knowledgeable about the industries and markets, the study states:
But these referenced checks across the groups of experts and the comparisons to empirical data and published evidence on similar technologies are not revealed to the reader beyond the assertion that they were done. In this study, the findings rest to an extraordinary extent on expert judgment, for which it is impossible for the reader to judge the care that went into its collection or its quality.
As defined in Chapter 2, the bibliometrics method encompasses a family of techniques for assessing the quantity, dissemination, and content of publications and patents—both important knowledge outputs of ATP projects. As indicated throughout this report, adding to the nation’s technical knowledge base is one of the central missions of ATP.
Surprisingly, perhaps, ATP’s use of bibliometrics during most of its first decade has been limited to counting papers and patents and the organizations producing them, and to an emerging use of patent trees to capture patent-to-patent citation patterns for completed projects. Publication-to-publication and patent-to-publication citation analysis has not yet been done at the time of this report, and content analysis of published documents also has not yet been done. Econometric studies have made extensive use of patent citation data as a proxy for knowledge spillovers. (See Chapter 7.)
One factor accounting for ATP’s limited use of bibliometrics thus far has been the time it takes for patents to be awarded and citation databases to reflect program activity. Thus, the technique could be more profitably applied now than earlier in the program’s history. Another limiting factor may have been stakeholders’ emphasis on the program’s ability to demonstrate quantitative measures of economic impact and direct, tangible short-term output indicators such as number of patents granted.
This section covers two ways ATP applied the bibliometrics method during its first decade. First presented is the quantification of publication and patents as measures of program outputs. Second presented are patent citation trees showing dissemination of knowledge for completed projects.
Counting Publications and Patents
ATP began compiling counts of papers published in professional journals and presented at conferences when it launched its Business Reporting System (BRS) in 1993. 257 It used the data to show that the projects were disseminating nonproprietary information. Table 8–3 shows the counts of papers and organizations and projects reporting papers on December 31, 1996, from 210 ATP projects funded between 1993 and 1995.
Counts of papers, as well as patents, were also a feature of the data compiled in conjunction with the Status Reports described in Chapter 6. 258 A template used to guide uniform data collection in support of the Status Reports includes the number of papers published or presented, as well as data for patents granted and a rough count of patents filed but not yet granted.
Figure 8–3 shows the summary data for papers and presentations from the second volume of Status Reports covering 50 completed projects. These data were one input to scoring project performance.
Figure 8–4 shows data for patent filings. These data were another input to scoring project performance.
Patent Citation Trees
While patent citations have been used extensively as data input to the econometric studies described in Chapter 7 and the social network analysis study presented in Chapter 8, bibliometric citation analysis of ATP projects has been limited at the time of this study to the construction of “patent citation trees” for completed projects, available in ATP Status Report 2. 259 Patent citation trees are diagrams that show forward citing of the patents generated by, in this case, ATP-funded completed projects.
Note: Across the 208 projects reporting, an average of 0.6 professional journal articles were published and 1.8 conference papers were presented per project. Thirty-six percent of the projects produced at least one professional journal article; 53% of the projects produced at least one conference paper.
Source: Powell and Lellock, Development, Commercialization, and Diffusion of Enabling Technologies, Progress Report, 1996, p. 41.
The ability to construct a patent tree rests on the fact that each published patent contains a list of previous patents and scholarly papers, establishing the prior art as it relates to the invention in question. Citations can be used to track the dissemination of technical knowledge to subsequent publications and patents. Patent citation trees show visually the pattern of dissemination of patents granted. As described in ATP, Performance of 50 Completed ATP Projects, Status Report 2 :
Source: Advanced Technology Program, Performance of 50 Completed ATP Projects, Status Report 2, 2001, p. 16.
Figure 8–5 shows a patent tree for one of the first 50 completed ATP projects, a project to develop wafer ion-implantation, carried out by Diamond Semiconductor Group (DSG). At the time the patent analysis was performed, DSG had received two patents and filed for two additional patents related to its ATP project. The focus of the illustration here, Patent 5,486,080, “High speed movement of workpieces in vacuum processing,” was granted to DSG in 1996. The Status Report provides the following account of subsequent citing of the patent:
Source: Advanced Technology Program, Performance of 50 Completed ATP Projects, Status Report 2, 2001, p. 10.
Source: Advanced Technology Program, Performance of 50 Completed ATP Projects, Status Report 2, 2001, p. 121.
The following year, 1997, two patents—one granted to VLSI Technology and the other to Hitachi—cited the DSG patent. In 1998, three additional patents—granted to Eaton, Fanuc, and Tokyo Ohka Kogyo—directly cited the DSG patent. An additional patent—granted to Jenoptik, cited the Hitachi patent—thus indirectly citing the DSG patent. In 1999, two additional patents—granted to Applied Materials and Dainippon— directly cited the DSG patent, and five new patents indirectly cited the DGS patent. Four of these citations are once removed: a patent granted to Cypress, a second patent granted to Applied Materials, and two patents granted to GaSonics International; and one is twice removed: a patent granted to SEZ. (p. 11)
Additional branches may spring up over time as subsequent patents cite either the DSG patent or one of the subsequent patents that cite the DSG patent. Additional patents granted to DSG, and attributable to the ATP project, also generated subsequent citations.
Some of the patent trees portrayed in the Status Report are for projects that appeared to have reached a stand still in terms of follow-through activity by the project innovators. These patent trees serve as reminders that knowledge pillover may result from projects that have not shown much commercial progress or related market spillovers. “Although representing only one aspect of knowledge dissemination, the patent trees extend our understanding of the influence of the new knowledge created on others.” 260 As was indicated in Chapter 3, the frequency of citations suggests, though imperfectly, the significance of a patent in terms of its relevance, extent of dissemination, and impact.
Emerging Methods: Using Social Network Analysis to Identify Knowledge Spillovers
Spillovers have a central role in justifying public support of R&D, but are difficult to identify and measure. Improving the methods of identifying and quantifying R&D spillovers is an important goal for a public R&D program. This section discusses a promising new method of identifying and measuring knowledge spillovers from R&D.
The earlier discussion of the study by Darby et al., introduced the concept of research networks or systems, that is, patterns of interactions and communications among organizations—firms, universities, other laboratories—that reveal the generation and exchange of scientific and technological knowledge. An implicit hypothesis linking the concepts of spillovers and research networks is that the closer and denser the system of linkages among various organizations, the greater the likelihood of knowledge spillovers.
Adam Jaffe, Brandeis University, who prepared a seminal background report on spillovers for ATP, teamed with Michael Fogarty, Portland State University, and Amit Sinha, Case Western Reserve University, to develop a new method of assessing knowledge spillovers using social network analysis. 261 In itself, ATP’s support of this work constitutes a form of knowledge spillover, as the technique has potential applicability for other federal and state agencies in evaluating their R&D and other knowledge-generating programs.
This emerging method uses systems analysis and fuzzy logic to analyze R&D knowledge spillovers within networks of R&D organizations. Though still in its infancy, the method holds promise for retrospective evaluation as well as prospective selection of projects with above-average knowledge spillover potential. Furthermore, the method, which identifies spillover patterns across organizations, technological areas, geographic regions, and industries, permits the separation of knowledge spillovers into those realized by the United States and those realized by other countries. The researchers noted that by funding projects involving particular organizations and technologies ATP would implicitly pick networks with implications for expected social benefits. 262 An important theoretical aspect of their work is that it highlights the fact that a firm’s value as a source of knowledge spillovers “depends on its ability to learn from its external environment....” 263 Figure 8–6 illustrates the basic conceptual framework of the R&D network analysis method.
Depicted are the interactions of an organization (an IBM laboratory), working in a specific technology area (denoted by a patent class), located in a specific geographical region (New York), with other parts of the relevant technology network. For each node in the system, patent citations are used to measure its pair wise interaction, in both directions, with another node. In addition, once the strength of pairwise interactions is measured, it is possible to measure the system influence of each node. “The system influence of each node results from the strength of its interaction with other nodes, compounded by the strength of the interaction of those nodes with other nodes.” (p. 26)
Source: Fogarty et al., ATP and the U.S. Innovation System—A Methodology for Identifying Enabling R&D Spillover Networks with Applications to Micro-electromechanical Systems (MEMS) and Optical Recording, 2000.
Figure 8–7 illustrates a more sophisticated version of the analysis, as it incorporates the strength of the relationship between organization A and other R&D laboratories.
Source: Fogarty et al., ATP and the U.S. Innovation System—A Methodology for Identifying Enabling R&D Spillover Networks with Applications to Micro-electromechanical Systems (MEMS) and Optical Recording , 2000.
Patent Citations as a Proxy for Knowledge Flows
The new method uses patent citations as a proxy for flows of scientific and technical knowledge. The researchers acknowledged that there is “noise” in patent citations data 264 and that “for any given patent citation, there is a non-trivial chance that no spillover occurred.” 265 These shortcomings notwithstanding, they asserted, “The probability of a spillover, conditional on a citation being observed, is significantly greater than the unconditional probability.” 266
Though previous studies used patent citations as a proxy for knowledge spillovers, 267 this study went beyond those studies by analyzing patent citations from a systems perspective; that is, in terms of citing and sourcing entities that constitute “networks of communication and influence within the innovation system.” 268 At the crux of this approach is the idea that “spillovers that one organization gets from another depend not only on the communication between the two organizations, but also on the communication that each engages in with other organizations.” 269 In describing the difference between their approach and previous work, the researchers stated:
The researchers considered an increase in the rate of system citations to the ATP award winner to provide a measure of the direct spillovers from the award winner. They pointed to overall increases in flows of knowledge through the network as signaling a broader influence of ATP. In their words, “[a funded project] establishes and strengthens communication links among the joint venture members and perhaps with other firms.” 270
The methodological advance attempted in this study is the move from pair-wise relationships among sets of variables—researchers, research organizations, regions—to a “systems” perspective. The systems perspective provides for a cascading sequence of interactions among R&D performers, again measured in terms of patent itations. Their study analyzed the impact of patent A not only on patent B, which cites A, but on patents C and D that cite B, and thus A by inference. It also estimated the differential importance of citations based on the importance of the organizations that cited them. They described their overall approach as follows:
Application of the Systems Method, Network Analysis, and Fuzzy Logic Techniques to Measure Communications Flows
The researchers used fuzzy logic to model varying degrees of connectedness of nodes within the network, calculating a truth value, a value between 0 and 1, to provide a “measure of the magnitude of interaction for every pair of nodes.” 271 System influence is the measure of the overall impact of a node, built iteratively with a fuzzy-logic algorithm. 272 Since each node is described in terms of its organization, technology, and geographical location, it is possible to “construct slices through the multidimensional system along any dimension of interest.” 273 The basic unit of analysis in the study is the R&D laboratory, which is located in a specific region, working on a specific technology, in a particular time period. Patent citations are treated as a proxy or indicator variable for “communications” among R&D laboratories. The researchers assumed “the tightness of the link between citations and communications does not differ systematically across the different dimensions of organization, technology and geography.” 274 They explained in the report how they calculated each of the measures:
Experimental Application of the Method to Investigate Knowledge Spillovers in Two Technology Areas
The researchers’ first application of the new method was to map research networks underlying two areas: micro-electromechanical systems (MEMS) and short-wavelength sources for optical recording (SWAT). The MEMS analysis emphasized the network dimension of organizational space, and the optical recording analysis emphasized technological space. The researchers did not attempt to measure knowledge spillovers, though they pointed to this potential use of the method after further development.
A core database of about 1,200 MEMS patents and citations to the patents underlies the MEMS analysis. A comparison of organizational rankings based on simple counts of citations, versus rankings based on the new network analysis, showed a difference in the expected R&D spillovers. Using their systems approach, the researchers identified the patents cited most frequently, the leading MEMS organizations ranked by systems influence, and the geographic centers of R&D. Their study identified the top five MEMS technologies, the most influential segments of the MEMS network, the key universities, and the key regions.
SWAT technology had been the focus of a joint venture funded in part by ATP in 1991, and led by the National Storage Industry Consortium (NSIC). The project was intended to respond to Japanese domination of the optical recording market at the time, which in turn was seen as a threat to U.S. market share in the data storage device industry. For the SWAT analysis the researchers identified leading researchers, patents, and citations. By applying the new method, they identified and ranked the top 20 optical wavelength technologies, organizations, and regions. Figure 8–8 shows the optical wavelength technologies to be concentrated in a few regions.
Source: Fogarty et al., ATP and the U.S. Innovation System—A Methodology for Identifying Enabling R&D Spillover Networks with Applications to Micro-electromechanical Systems (MEMS) and Short-Wavelength Sources for Optical Recording , 2000.
Fogarty et al., described their work on optical recording as serving to illustrate the potential of their methodology, not as an evaluation of SWAT, for which considerably more information and more current data would have been needed. Their findings indicated that ATP had good reason to expect that its support of SWAT would generate a large volume of R&D spillovers. Supporting this conclusion is their finding that the most influential organizations in the network were U.S. organizations, two were members of the ATP joint venture, several were also members of NSIC, and the joint venture was led by NSIC. 275 The researchers observed that the ATP funding in this case supported research of an enabling technology carried out by well connected U.S. organizations, leveraging federal support of basic research, and reaching a large market. 276
Significance and Status of the New Method
By placing knowledge spillover analysis within the context of R&D networks, the Fogarty et al., method of measuring knowledge spillovers made a significant advance over other ways of measuring knowledge spillovers. It “provides a framework for developing ATP strategies to maximize spillovers, and suggests an approach to evaluating ATP projects.” 277
According to the authors, the approach can be used to analyze the evolution of networks surrounding particular industry-based technologies, and thus answer ATP program design and project selection questions, such as, “Are university members becoming less important sources while companies become increasingly important network members?” 278 It can also be used to identify which of a set of technology clusters within a larger technology area are more likely to spur advances in other technologies.
The researchers pointed out that the fuzzy logic method would not permit the drawing of statistical inferences or the standard statistical testing of hypotheses. 279 They argued that ex post evaluation, to measure knowledge spillovers, is its most straightforward application. They described potential use of the method for project selection as more uncertain, “but still worth exploring.” 280 They speculated that software and a database might be developed to allow routine assessment of the strength of applicants’ system influence. They also suggested that the method might be used to develop a way to assess synergistic impacts of projects. 281 Thus, the researchers’ suggestions for further research are aimed at advancing the new method to overcome existing limitations and make it practically applicable. If this could be done, the method would offer an important new way of obtaining more information about a central impact of publicly funded R&D, the generation of knowledge spillovers.
Emerging Method: Using the Cost Index Method to Estimate Social Benefits
Estimation of the social benefits derived from ATP-sponsored technological innovations constitutes an important part of its evaluation program, because such estimates help answer policy questions about the magnitude of the return to the nation on the public investment in ATP-funded projects. David Austin and Molly Macauley, senior economists at Resources for the Future, developed a new method for estimating “social benefits from innovations in inputs in the service sector, where real output is not directly observable.” 282 To develop the model, Austin and Macauley drew on earlier work by Stanford University’s Timothy Bresnahan. 283 They extended Bresnahan’s method, which was aimed at retrospective evaluation, to make it appropriate for prospective analysis.
Their technique involves a more comprehensive, theoretically grounded, quantitatively flexible means of estimating consumer welfare gains than provided for in earlier ATP studies. However, the technique has greater demands for data, involves a larger number of assumptions about the values of unknowns, and, because of its complexity, requires explicit attention to the sensitivity of findings to assumed values. Therefore, it simultaneously runs the risk of being dependent on the modeler’s art and of being opaque to decision makers. As the authors noted:
Austin-Macauley’s Cost-Index Approach
The basic working of the model is illustrated in Figure 8–9, where introduction of a new innovation shifts the supply schedule from S o DT (the pre-innovation supply of the defender technology) to S 1 ATP (the after-ATP supply of the new technology). Allowance for continuous improvement in the defender technology, however, causes a shift in the defender technology supply schedule to S 1 DT . For the ATP-sponsored technology to yield economic benefits requires that S 1 ATP be lower, or to the right, of the improved defender technology, S 1 DT .
The new technology is assumed to be an input into the production of goods and services by downstream buyers. Under competitive conditions in these downstream markets, derived demand for the technology reflects consumer demand, and, according to the researchers, “...the cost index will correctly estimate the welfare gain.” 284 If downstream markets are not competitive, then the cost index yields a lower bound estimate of consumer gain. The cost index approach “compares observed price and performance for an innovated product against hypothetical, best available price and performance had the technical advance not occurred.” 285
The researchers noted three major sources of randomness in the model’s parameters: variability in manufacturing and market conditions, imperfectly observed data, and “most importantly, uncertainty about future outcomes.” 286 Thus, rather than use single values, they incorporate in the model probability distributions of several parameters, including off-the-shelf nominal prices, quarterly rates of change in these prices, quality differences between performance attributes of the innovation and defender technologies, market size, adoption rates, personal consumption expenditures, and shadow prices.
Source: Austin and Macauley, Estimating Future Consumer Benefits from ATP-Funded Innovation: The Case of Digital Data Storage, 2000, p. 6.
Testing the Austin-Macauley Model
Austin and Macauley tested their model for two digital storage technologies funded in part by ATP. Both technologies were aimed at achieving much faster writing and retrieval of digital data than possible with defender technologies, and one also offered a large increase in storage capacity.
Data elements used in constructing the cost index included “estimated downstream digital data storage (DDS) expenditures as a share of total personal consumption expenditures, off-the-shelf DDS prices; differences in the technical attributes of the defender technologies and the innovations; marginal consumer valuations of these differences; quality-adjusted prices reflecting these valuations; and market rate of adoption of the innovation.” 287
Data for several of these parameters were collected from structured interviews in 1998 with the leaders of the industry teams conducting the ATP-funded research. Table 8–4 reproduces the interview instrument.
The researchers generated estimates of shadow prices to determine the imputed value of the improved performance attributes associated with the two new technologies using a hedonic regression model of digital data storage drive attributes. They obtained data for this procedure from the websites of manufacturers and advertisements in magazines. 288
Estimates of consumer welfare gains were generated by a series of simulations containing 18 parameters. The interaction of different assumptions of parameter values in the simulation model yielded a range of estimates. The median estimate for consumer welfare gains over five years was $2.2B for the linear scanning technology, and $1.5B for optical tape, using a 5% discount rate.
Source: Austin and Macauley, Est imating Future Consumer Benefits from ATP-Funded Innovation: The Case of Digital Data Storage, 2000, pp. 22–23.
A third emerging method sponsored by ATP, and developed by Rosalie Ruegg of TIA Consulting, Inc., is a composite performance rating system (CPRS). The CPRS, constructed from indicator metrics, is designed to provide an evaluation tool for the intermediate period after project completion and before longr term impacts can be measured. It scores each project in terms of 0 to 4 stars, depending on the strength of its overall progress toward knowledge creation and dissemination and commercialization. It provides a distribution of scores across the portfolio of completed projects. 289 Rooted in the descriptive case study method in combination with uniform compilation of indicator metrics, CPRS brings together a variety of mission-relevant information to provide a composite performance measure that is easy to grasp and communicate, and that can be used to characterize ATP’s portfolio or projects. It was developed to meet a need not provided by existing evaluation methods.
An Additional Step in an Evolving, Multi-Step Framework
The development of CPRS is the most recent evolutionary step in ATP’s use of descriptive case study methodology. As explained in Chapter 3, the descriptive case study method is among the simplest and best known of the evaluation methods. It is an evaluation mainstay for R&D programs because it lets the analyst tell the often-complex story of a scientific research project. A drawback is its focus on qualitative and anecdotal information that limits the method’s usefulness as an evaluation tool.
A need had developed in ATP and its parent organization, National Institute of Standards and Technology (NIST), for an evaluation product that would provide a progress update for all ATP projects that takes into account the program’s multiple goals. 290 As the first cases in the completed project status reports (described in Chapter 6) were being developed, an opportunity to extend the analysis became apparent. By specifying the data collection to cover a comprehensive set of output and outcome measures of progress toward achieving ATP’s major goals, and collecting the data uniformly, it would be possible to construct aggregate statistics to describe a variety of output and outcome data for the portfolio of completed projects. These data were helpful, but too many to provide management a clear view of project and portfolio performance. The need by management and other stakeholders for a single measure of overall performance provided the impetus behind development of the CPRS method described here. The purpose of CPRS is to consolidate the extensive amount of performance information from individual project case studies to produce a single symbolic performance rating which can be quickly grasped and used for comparisons and whose distribution across projects can be used to depict portfolio performance.
Construction of CPRS
Constructing a composite rating system that reflects the multiple dimensions of ATP’s mission raises challenges about how to combine multiple metrics in a meaningful way. Despite inherent problems in clustering indicators and aggregating diverse data, there are numerous examples of rating systems in use that are based on multiple variables and multiple dimensions of interest. For example, the Quadrix Stock-Rating System uses more than 100 variables to score stocks in seven categories. 291 The approach of combining output/outcome data to indicate performance at different stages of the innovation process has precedence in the work of Eliezer Geisler. Geisler develops clusters of outputs for each stage of the innovation process, assigns normalized weights to each measure of each indicator, and calculates an overall index for each stage of outputs. 292
General Formulation of CPRS
The CPRS is constructed as the sum of the weighted indicator measures for a set of mission-driven goals, adjusted to a 0–4 point scale. In its general form, CPRS is formulated as follows:
Selecting Indicator Variables and Assigning Weights
The specification of the indicator variables and weights is ad hoc in nature because there is no general existing theory to guide their selection. The various indicator data that would be used to apply the method must reflect a program’s specific mission statement and also take into account the feasibility of collecting the data. The weighting factors determine how the data are combined and, hence, how they contribute to the composite rating measure. Formulation of the weighting factors for ATP reflected the judgment of the analyst informed by the range of observed values of the selected variables for the first 50 projects, sensitivity testing, and guidance by senior ATP managers about desired characteristics for a rating tool. For application to a different program, the indicator metrics and weighting factors would need to be specific to that program.
The nine types of variables used to construct CPRS for ATP are listed in Chapter 6, Table 6–14. These are assigned weights and combined to calculate scores signaling progress of each project toward accomplishing each of three major goals of ATP: (1) knowledge creation, (2) knowledge dissemination, and (3) commercialization of the newly developed technologies. For example, the knowledge creation score is calculated from weighted values of technical award, patent filings, publications and presentations, and products and processes on the market or expected soon, while the commercialization progress score is calculated from weighted values of products and processes, capital attraction, change in company size, business awards, and outlook for future commercialization. Of the three goals, the second is assigned more potential weight than the first in computing the total raw score, and the third goal more than the second. This formulation is consistent with the premise that a project with sustained accomplishments by the innovator and their collaborators will continue to progress, eventually showing evidence of progress toward commercialization.
A factor is applied to the raw scores to facilitate assigning symbolic star ratings to each project. A score equal to or greater than 4 receives a four-star rating; a score less than 4 but greater or equal to 3, a three-star rating, and so forth.
Limitations of CPRS
This initial approach to constructing CPRS scores for ATP was exploratory. The CPRS as currently formulated represents an initial baseline, or prototype, system for trial use and further examination by ATP. From this baseline subsequent refinements can be made if desired.
Among the limitations to the approach is that the data for commercialization center mainly on the original innovator(s) and their collaborators, and are not indicators of commercialization by others outside the project. However, the seriousness of this limitation is alleviated by two factors. One is that efforts by others, if known, are reflected in the outlook data and, thus, not totally excluded. A second factor that alleviates this potential limitation is that the prototype CPRS is specifically designed to be applied within several years of project completion when early commercialization efforts typically still reside with the original innovators and their collaborators and licensees.
An additional limitation is that each CPRS score, like the case study information that underlies it, is time sensitive, and represents a snapshot, or benchmark, of performance at a particular time. Over time, the performance of individual projects may change, and performance measures may need to be updated. For example, the farther out in time from project completion one moves, the more important it becomes to investigate alternative commercialization paths stemming from knowledge spillovers, including paths revealed by patent citation analysis, and the more available are opportunities to include market measures of commercial impact in a composite score. Of course, as one moves farther out, opportunities to use benefit-cost and other evaluation approaches increase. The CPRS was developed specifically as an indicator-based evaluation tool to serve in the intermediate period, after project completion and before long-term benefits are realized.
More important limitations relate to data availability and methodology. The system is designed to use available data rather than ideal data. There is a lack of empirical verification of the relationships modeled. The construction of CPRS is necessarily ad hoc and improvisational, reflective of the absence of an underlying theory. There is precedence, however, for developing logic-based composite rating systems and for using expert judgment to assign weights to the selected indicator variables. As is the case with counterpart composite rating systems used by other Federal agencies, international bodies, hospitals, and businesses, the selection of indicator variables and the weighting algorithms specified in the CPRS are based on expert judgment constrained by the availability of data. Alternative algorithms may be superior.
In addition, the rating system is not expressed in terms of net benefits, and projects with the same “progress intensity” score may differ in their net benefits. In short, CPRS in its current state represents an initial prototype for trial use and further examination by ATP.
CPRS in Use
Current limitations notwithstanding, the CPRS method has practical utility to ATP’s program managers and administrators. Implementation of CPRS has given ATP a new evaluation product rooted in case study, that gives stakeholders a quick take on project and portfolio performance. Because the method preserves the details of the case studies underlying the composite ratings, the results have a high level of transparency and the underlying data can be examined in detail.
Over time, there may be opportunities to improve formulation of CPRS. For example, patent citation data would likely be a better indicator of knowledge dissemination than patent counts. Regarding the construction of weighting factors, it might be possible to conduct supporting studies to inform the relationships between potential indicators and actual progress toward mission, for example, what should be the relative weights of publications, patents, and collaborative relationships in estimating their contribution to knowledge dissemination. At a minimum, more extensive sensitivity analysis could be conducted to determine how changing the baseline weights changes the results.
The general approach is extendable to other programs. But, other programs, with their different goals and output/outcome measures, would need a customized implementation of the general CPRS framework.
Summary of Other Evaluation Methods
This chapter has highlighted several additional traditional methods ATP has used for evaluation, together with three emerging methods whose development ATP supported. Evolving shifts in the demand for and supply of evaluation have shaped the choice of techniques. As the program has matured, new questions, typically of an increasingly nuanced or complex nature have arisen. Concurrently, the maturing of the program has generated increased quantities of data, especially longitudinal data, that make it possible to employ a wider set of standard methodologies and to experiment with newer ones.
In addition to the much used survey, case study, and econometric/statistical analysis methods, evaluators have applied two other traditional evaluation methods—expert judgment and bibliometrics—to the evaluation of ATP.
Expert judgment has been used to examine various facets of the program, ranging from underlying theory, to providing estimated input values for economic case studies, to the overall effectiveness of ATP. In 1999–2000, the NRC conducted an assessment of ATP’s overall effectiveness using expert judgment as the central method, informed by studies carried out with a variety of other methods.
Bibliometrics, likewise a frequently used technique in science and technology program evaluation, has not been extensively used in ATP’s evaluation program. Its use over most of ATP’s first decade was limited to counts of publications and patents, with patent citation analysis being added late in the decade. The limited use early in the program is attributed largely to two factors: bibliometrics did not provide economic estimation at a time when stakeholders were pressing for economic performance metrics, and passage of time was needed to provide the databases required for more extensive citation analysis.
The support ATP provided to develop new evaluation methods reflected shortcomings in existing methods in the face of demanding evaluative requirements. One new method ATP supported is a cost index method for estimating social benefits, specifically market spillovers, from technological innovations. This method has been tested in applications to two digital storage technologies funded by ATP. The method is theoretically grounded and appears suitable for wider use both in ATP prospective case studies and those of other agencies. 293 Factors retarding wider use of the method are its large data requirements, large number of assumptions, complexity, and lack of transparency to managers.
Another of the new methods, CPRS, combines clusters of quantitative and qualitative indicator data gathered through descriptive case studies to rate overall project performance. A prototype CPRS has been implemented for ATP.
A third new method under development used fuzzy logic and social network analysis to improve the assessment of knowledge spillovers. With patent citations serving as a proxy for knowledge flows, the new method has been used to map research networks underlying two technical areas: MEMS and SWAT. With further development, the method offers a potentially important new way of identifying information needed to maximize knowledge spillovers and assess their magnitude.
Taken with the evaluation techniques reviewed in Chapters 4–7, Chapter 8 points to an extensive and increasingly sophisticated toolkit of methodologies available to evaluate ATP. For ATP evaluators, the challenge has been to select the most appropriate mix of tools for each analytical task. And, when a tool has not been invented to do that particular job, the challenge has been to design and create a new tool, adding to the body of knowledge about ATP and to the toolkit of all program evaluators.
244 U.S. Senate Report 105–235.
245 The resulting NRC review is organized into two reports, both edited by Charles W. Wessner, ed., and both published in NRC’s Government-Industry Partnerships Series: Wessner, The Advanced Technology Program: Challenges and Opportunities, 1999; and Wessner, ed., The Advanced Technology Program: Assessing Outcomes, 2000.
246 The Steering Committee was chaired by Gordon Moore, the Chairman Emeritus of Intel. Other members included such notables in the field of economics and technology policy as Michael Borrus, Co-Director, Berkeley Roundtable on International Economics; Iain Cockburn, Professor of Commerce and Business Administration, University of British Columbia; Kenneth Flamm, Dean Rush Chair in International Affairs, University of Texas at Austin; James Gibbons, Professor of Engineering, Stanford University; William J. Spencer, Chairman, SEMATECH; Richard Nelson, George Blumenthal Professor of International and Public Affairs, Columbia University; and Patrick Windham, Stanford University. In addition, the project was overseen by members of the STEP Board, which included such notables in economics, business, and management as Dale Jorgenson, Chair, Frederic Eaton Abbe Professor of Economics, Harvard University; M. Kathy Behrens, Managing Partner, Robertson Stephens Investment Management; Vinton G. Cerf, Senior Vice-President, WorldCom; Richard Levin, President, Yale University; and Bronwyn Hall, Professor of Economics, University of California- Berkeley.
247 Wessner, ed., The Advanced Technology Program: Assessing Outcomes, 2000.
248 Branscomb et al., Managing Technical Risk: Understanding Private Sector Decision Making on Early Stage Technology-Based Project, 2000.
249 Martin et al., A Framework for Estimating the National Economic Benefits of ATP Funding of Medical Technologies, 1998.
250 Bass, “A New Product Growth Model for Consumer Durables,” 1969.
251 Martin et al., A Framework for Estimating the National Economic Benefits of ATP Funding of Medical Technologies, 1998, pp. 1–9 to 1–10.
252 CONSAD Research Corporation, Advanced Technology Program Case Study: The Development of Advanced Technologies and Systems for Controlling Dimensional Variation in Automobile Body Manufacturing, 1996.
253 BIW refers to body-in-white.
254 Ibid., p. 17.
255 Ibid., p. 20.
256 Ibid., p. 22.
257 See Chapter 5 for a description of BRS. Prior to the collection of counts of papers, surveys of program participants had queried them about their intentions to publish and to patent.
258 See Long, Performance of Completed Projects, Status Report 1, 1999, p. 12; and Advanced Technology Program, Performance of 50 Completed ATP Projects, Status Report 2, 2001, p. 16.
259 Advanced Technology Program, Performance of 50 Completed ATP Projects, Status Report 2, 2001.
260 Ibid., p. 11.
The earlier discussion
261 Michael S. Fogarty, Amit K. Sinha, and Adam B. Jaffe, ATP and the U.S. Innovation System—A Methodology for Identifying Enabling R&D Spillover Networks with Applications to Micro-electromechanical Systems (MEMS) and Optical Recording, Draft report, ATP, 2000.
262 Ibid., p. 52.
263 Ibid., p. 35.
264 They point to a less critical view of patent citations that sees them as providing “direct observations of knowledge spillovers...” Ibid., p. 14.
265 Ibid., p. 20. “Noise” in the data refers to the fact that patent citations are imperfect indicators of flows of knowledge. For example, a patent analyst may add a citation as legal protection without an actual occurrence of communication expressing a knowledge flow underlying the citation.
266 Ibid., p. 20.
267 The study provides a review of prior research using patent citations. Prior research, according to Fogarty et al., has used “ex-post citations to infer the ‘quality’ or ‘importance’ of the cited inventions,” and “citation patterns to make inferences about the nature and direction of knowledge spillovers.” Ibid., p. 15.
268 Ibid., p. 5.
269 Ibid., p. 20.
270 Ibid., p. 8.
271 Ibid., pp. 29–30.
272 Ibid., p. 30.
273 Ibid., pp. 30–31.
274 Ibid., p. 28.
275 Even though only a minority of its members participated in the joint venture, NSIC represents a potentially very powerful mechanism for magnifying R&D spillovers from the joint venture.
276 Ibid., p. 80.
277 Ibid., p. 93.
278 Ibid., p. 44.
279 Ibid., p. 93.
280 Ibid., p. 95.
281 Ibid., p. 97.
282 David Austin and Molly Macauley, Estimating Future Consumer Benefits from ATP-Funded Innovation: The Case of Digital Data Storage, NIST GCR 00–790 (Gaithersburg, MD: National Institute of Standards and Technology, 2000.)
283 Timothy Bresnahan, “Measuring the Spillovers from Technical Advance: Mainframe Computers in Financial Services,” American Economic Review, 76(4): 742–755, 1986.
284 Austin and Macauley, Estimating Future Consumer Benefits from ATP-Funded Innovation: The Case of Digital Data Storage, 2000, p. 5.
285 Ibid., p. 1.
286 Ibid., p. 13.
287 Ibid., p. 12.
288 Ibid., p. 23.
289 For a detailed description of the CPRS, see Ruegg, A Composite Performance Rating System for ATP-Funded Completed Projects, 2003.
290 Performing in-depth economic assessments for all projects was considered impractical in terms of time and money.
291 The Quadrix Rating System was developed by Richard Moroney, editor of Dow Theory Forecasts.
292 Eliezer Geisler, The Metrics of Science and Technology (Westport, CT: Quorum Books, 2000), pp. 243–266.
293 The cost index method has been applied to other agency programs, including NASA.
Date created: July 22,
NIST is an agency of the U.S. Commerce Department