for Evaluating Public R&D Investment
CHAPTER 2: Choosing Methods of Evaluation
Evaluators use a variety of methods to address questions of program performance. The methods share common features, but each has its advantages, disadvantages, and specialized purposes. Former methodological wars about the relative merits of different techniques have largely given way to an eclectic approach in which techniques are chosen for their appropriateness to the evaluation question at hand, to cost and administrative feasibility, and to a purposeful mixture of methodological paradigms. 33
The use of some methods of evaluation depends on how a program is positioned relative to the market and how mature it is. Some methods are particularly useful in assessing early-stage research programs, while others are better suited for assessing later-stage, closer-to-market programs. Both the utility and feasibility of methods may change as a program develops. Generally, the more an R&D program’s scope spans from research to commercialization, the more methods evaluators can use to capture the full range of a program’s impacts. Recognizing this, a farsighted strategy would be to design an evaluation program that lets data perform multiple duties. For instance, early evaluations may be designed to generate survey information on participants for immediate use, with the idea that the survey information can later be used as baseline information for subsequent evaluations. A farsighted strategy would also use multiple methods to capture the full range of a program’s impacts, to triangulate findings on salient program impacts, and to identify and validate relationships and impacts not readily apparent in the construction of the initial design. Table 2–1 lists major evaluation methods, defines each, and illustrates how each may be used to gather information about a program’s performance.
Crosscutting the methods are various approaches to collecting data. Among the approaches to collecting data for use in evaluation are systematic and anecdotal observation, review of records and searches of existing databases, testing, experimenting, recording responses of focus groups, interviewing experts, and conducting surveys by structured interviews in person or by telephone, or by mailed or electronically administered questionnaires.
The remainder of this chapter provides overviews of the methods listed in Table 2–1. In-depth coverage is beyond the scope of this toolkit. Volumes could and have been written on the various methods. Readers seeking additional information are directed to references provided.
Analytical/Conceptual Methods for Modeling and Informing Underlying Program Theory
Clarifying and validating a program’s underlying concepts and theories, and investigating analytical linkages among program elements, are frequently important parts of an agency’s overall program evaluation strategy. Modeling and informing the underlying theory of the program is an ongoing rather than a onetime task in a dynamic program. It is an essential prelude to program operations, an early stage complement to formative management and evaluation efforts, and a building block in the design and revision of a long-term comprehensive evaluation program as experience and evidence accumulates about program impacts.
Why is this form of self-study so effective? Federal government R&D programs are typically based on a combination of hypothesized and documented relationships that link activities to objectives. Some of these relationships are explicit, evident in the language of the legislative, budgetary, and administrative debates that give rise to the program and its operational provisions. Others are implicit, involving widely shared acceptance and assumptions about stylized facts or causal linkages. These explicit and implicit relationships constitute the program’s theory. As Mohr writes about program theory:
In practice, however, programs are often established on partially formed theories, incomplete documentation, or fragile empirical grounds. They often reflect new, untried approaches to problems. Legislation may define program objectives without suggesting the most effective ways to meet those objectives. Furthermore, programs are frequently established on broad associations between program mechanisms and intended outcomes, implying direct, linear relationships. But the actual causal paths may be more complex; they may entail one or more intermediate steps. 34
In addition to ideological or partisan opposition to its ends or means, a program may be challenged on the grounds that its theoretical underpinnings have logical inconsistencies, lack adequate empirical testing, and will be ineffective or inefficient in achieving intended objectives. Moreover, even where such “technical” challenges do not exist, good management practice requires that agency administrators and program officials comprehend nuanced relationships among program design features, program implementation, and outcomes.
ATP’s use of analytical and conceptual methods to model underlying program theory is presented in Chapter 4.
Surveys can be used to describe a program in terms of frequencies, percentages, means, medians, standard deviations, and significance of sample data. Survey results are typically presented in aggregate, without identifying individual results, using tabular and graphical summaries of data. Surveys provide a statistical overview for multiple projects and participants, rather than project details, and are particularly useful in portfolio analysis.
Conducting the SurveySurvey data can be collected by interviews conducted in person or by phone, or by questionnaires mailed, dropped off, or posted on the Internet. Questions may be either open-ended ( Why? What? ) or close-ended ( Yes/No ; or, Which of the following choices best describes...? ). Close-ended questions may use ranking systems ( Indicate the order of your preference... ) and scales (Rate on a scale of 1 to 5).
Computation of survey statistics requires consistency across individuals in a survey group in terms of the questions asked or ranking systems or scales used. It also requires that responses to open-ended questions be coded systematically and consistently. Generally, questionnaires use a series of precisely worded, closeended questions, and interviews use more open-ended questions and discussion, leading to more varied data that may be more difficult to analyze. But a questionnaire can also include open-ended questions, and an interview may rigidly follow a scripted questionnaire format.
Statistical inference, the process of using sample data to make inferences about the parameters of a population, reduces the time and cost of collecting data by survey from an entire population. 35 Sample design should be sufficiently described to enable calculation of sampling errors. Establishing a sampling frame—the list from which a sample is drawn—is essential. Samples may be randomized or stratified. They may be longitudinal, drawing data from the same panel of individuals at different times with the same survey questions. Or, they may be cross-sectional, drawing new samples for successive data collection. 36
Once it has been decided that a survey method is appropriate for the evaluation task at hand, there are a number of steps involved in carrying it out. Table 2–2 lists the steps.
Advantages, Disadvantages, and Special Uses of Surveys
An advantage of using survey-based descriptive statistics in evaluation is that it provides an economical way to gather aggregate level information about a program and its participants, even in its early stages, and it accommodates the use of control and comparison groups or the collection of counterfactual information. Other advantages are that diverse audiences can usually understand the approach and results, and many people find statistical results credible and informative. Furthermore, once collected, survey data can be analyzed and reanalyzed in different ways. Surveys can provide information about participants and users not available through other sources.
There are advantages and disadvantages associated with the various methods of collecting data for descriptive statistical analysis. Phone interview works best when timeliness is important and the length of the survey is limited. Face-toface interviews cost more and take more time, but are better for collecting complex information, using open-ended questions, providing needed flexibility, and obtaining higher response rates. Mailed questionnaires have the advantages of usually being cheaper to administer, allowing more time for respondents to form responses. The disadvantages of mailed questionnaires include relatively low response rates, a lack of flexibility, and complete reliance on the written questionnaire as self-explanatory. Web-based surveys and e-mail interactions for follow-up offer a promising approach. Using a mix-mode approach may offer advantages.
A disadvantage of survey statistics is that they do not convey the richness of individual project detail that stakeholders tend to find interesting and memorable. A further limitation is that the responses on which descriptive statistics are based are often subjective in nature. Respondents may not be truthful. They may have faulty recall. Or, they may wish to promote a particular point of view. Hence, results may be biased.
See suggested references on the survey method at the end of this chapter and examples of survey studies from ATP’s experience in Chapter 5.
Case Study: Descriptive
Descriptive case studies are in-depth investigations into a program, project, facility, or phenomenon, usually to examine what happened, to describe the context in which it happened, to explore how and why, and to consider what would have happened otherwise. 37 Case studies are particularly helpful in understanding general propositions, 38 and in identifying key relationships and variables. Thus, case study can be particularly useful in the exploratory phases of a program.
Broadly Accessible Results
Most descriptive case studies are written in the narrative and are aimed at a wide audience. For example, such studies can make complex scientific and technology projects accessible to a non-scientist audience. The potential scope of the descriptive case study method is broad—ranging from brief descriptive summaries to long complex treatments.
Descriptive case studies usually start with qualitative information from direct observation, program/project documents, and interviews with key project managers. Program and project documents are useful for establishing key dates, budgets, initial plans and goals, specific outputs, key staff, and other critical information helpful in framing a study. To extend the available information, the evaluator may bring in results from one or more of the other evaluation methods listed in Table 2–1, such as survey results or bibliometric results, to enhance the story.
Using a “story-telling” approach, the evaluator may present the genesis of ideas, give an account of the human side of the project, explain goals, explore project dynamics, and present outcomes. Case studies can also be used to construct theories about program or project dynamics. 39 Multiple case studies may be conducted with uniform compilation of information across cases to provide aggregate statistics for a portfolio of projects.
Advantages and Disadvantages of Descriptive Case Study
An advantage of the descriptive case study method is that many decision makers read and process anecdotal cases more easily than they do quantitative studies. Another advantage is that by bringing in substantial program/project information on a less restrictive basis than most methods, case studies document events and provide a richness of detail that may prove useful in formulating theories and hypotheses that lay the groundwork for further evaluation. Case studies are also valuable in identifying exemplary or best-practice experiences. They can be used to describe how and why a program is or is not working. Used in formative evaluations, they can guide agency behavior and serve as a benchmark for other program recipients.
A disadvantage of the descriptive case study method is that the anecdotal evidence provided is generally considered less persuasive than quantitative evidence. The results of one or more individual cases may not apply to other cases.
See suggested references on the descriptive case-study method at the end of this chapter and examples of ATP’s use of the method in Chapter 6.
Case Study: Economic EstimationEconomic case studies combine descriptive case histories with quantification of benefits and costs, including treatment of the distribution of benefits and costs. 40 Carrying out the descriptive analysis in advance of quantification is generally an essential step toward economic quantification. Indeed, developing an indepth understanding of the problem in its “case-specific” context is invaluable to the analyst who must design an appropriate estimation model, track down diverse effects, choose supporting analytical techniques, establish reasonable assumptions, and develop data that will lead to reliable calculations.
Prospective and Retrospective Studies
An economic study may be retrospective, based on empirically estimated past effects, or prospective, based on projected future effects. The longer a project has been in existence and the closer it is to market, the more feasible is an empirically based analysis. Often, by necessity, economic case studies will combine elements of existing data with forecasts in order to take the analysis into the outcomes/impact stage of a project.
Because economic case study requires impacts to be expressed in monetary units, its use is more feasible in evaluating applied research and technology development programs than basic science programs where the ultimate outcomes and impacts may be decades away and difficult or impossible to capture. However, even with applied research and technology development projects, there may be difficulties in estimation related to a project’s distance from the market. Generally, the further upstream of the market a program is positioned, the more complicated becomes the task of apportioning costs and disentangling the contributions of various contributors to the development of the eventual technology, and of estimating downstream economic benefits.
Discounting Benefits and Costs
Economic case studies generally employ the techniques of benefit-cost analysis, including adjusting benefit and cost estimates for differences in their timing. Benefits and costs occurring over time are adjusted both for the real opportunity cost of capital and changes in purchasing power due to inflation or deflation in order to be compared on a consistent basis. One approach is to first eliminate the effects of inflation or deflation from the estimated cash amounts so they are expressed in constant dollars, and then apply a “real discount rate” to adjust for opportunity costs. An alternative approach is to express cash amounts in current dollars, and use a “nominal discount rate” to adjust for the combination of opportunity costs and inflation/deflation. Because an interest rate called a “discount rate” is applied to adjust the cash flows, the procedure is called “discounting cash flows.”
Discounting adjusts all dollar amounts to a common time so that they can be combined and compared with other discounted dollars. Amounts can be expressed either as a present value, a lump sum as of the present; an annual value, a series of annual amounts spread evenly over the study period; or a future value, a lump sum as of a designated future date. Often in public-sector evaluations all amounts are adjusted to present values. Discounting benefits and costs reduces the value of amounts occurring farther in the future relative to amounts occurring closer to the present time. 41
Comparing Benefits and Costs
An evaluator must also decide how to express the measure of project performance that compares benefits against costs. Often used benefit-cost measures of project performance are briefly discussed below. All except the rate-of-return measures directly use discounting to adjust dollar amounts prior to computing the performance measure. The rate-of-return measures are computed by using the appropriate discounting formula to solve for the discount rate that equates benefits and costs. 42The net benefits measure is computed by subtracting time-adjusted costs from time-adjusted benefits. If the net benefit measure is greater than zero, then the project is considered desirable, since the minimum required rate of return is already accounted for through discounting.
When a project results primarily in cost reduction, the performance measure may be given in terms of life-cycle costs by combining all relevant costs and comparing them with the life-cycle costs of the best alternative to the project. A comparison of time-adjusted total costs among alternatives indicates which is lowest. If the levels of performance are comparable, the least-cost alternative is considered the cost-effective choice.
Project performance may also be expressed as a benefit-to-cost ratio, a variation of which is a savings-to-investment ratio. The ratio is computed by dividing benefits (or savings) by costs. 43 The ratio indicates how many dollars of benefit per dollar of cost are realized. The ratio must be greater than one to indicate a minimally worthwhile project. Again, the minimal acceptable rate of return is already built into the analysis through discounting, and a ratio greater than one means that the return is greater than the minimal acceptable rate. 44 Project performance may also be expressed as a rate of return. The traditional rate of return measure is an internal rate of return (IRR). This measure solves for the interest rate that will equate the stream of benefits and costs. For example, if we spent $712,990 today to receive $1 million in five years, the investment would yield a 7% internal rate of return. After the solution value of the interest rate is computed, it is compared against a specified minimum acceptable rate of return to determine the desirability of the investment or performance of a project. If we required a return of, say, 10% instead of 7%, the above investment would not be attractive.
The economics and financial communities have increasingly come to use an adjusted version of the rate of return measure that makes explicit the reinvestment rate has increased because it avoids some problems associated with use of the IRR measure, such as the possibility of no unique solution value and the assumption inherent in the technique that the rate of return on the initial investment will also be obtained on reinvested proceeds over the study period. 45
Yet another related measure of performance is discounted payback period , that is, the length of time until the accumulation of time-adjusted benefits is sufficient to pay back the cost. A shortcoming of this measure is that it focuses on a breakeven point rather than on net benefits, and, hence, is not recommended as a standalone measure of economic performance. It may be useful, however, as a supplementary measure.
Lead with Net Benefits; Supplement with Other Measures
A frequently used strategy in benefit-cost analysis is to lead with a net benefit calculation and supplement it with one or more of the other measures to help reach audiences familiar with the different measures. Those with primarily private sector experience will usually be most familiar with IRR and business cash flows, and not the broader perspective of public sector analysis.
Challenges in expanding a case study to include benefit-cost measures are identifying the various pathways through which project effects occur, identifying the populations affected, and estimating difficult-to-quantify benefits and costs. Seldom is this information readily available. Furthermore, sufficient time may not have elapsed to allow a project to yield positive outcomes. The project may still be in the stage of net negative returns even though the potential for large positive returns in the long run may be strong. An additional challenge is attributing benefits and costs among joint investors.
Treating Uncertainty and Risk
Given the uncertainties of the technical and economic outcomes associated with research and development programs, evaluations that seek to estimate costs and benefits or other economic impacts must deal with the presence of uncertainty and risk. Quantitative studies that express results deterministically, ignoring uncertainty and risk, tend to be misleading in their implied level of precision. If probabilities can be attached to different values, risk assessment can be added to the economic analysis and the extent to which the actual outcome will likely differ from the “best guess” can be estimated. 46 If probabilities are not available, then a technique for treating uncertainty can be used. Sensitivity analysis, for instance, tests how outcomes change as the values of uncertain input data are changed, shows the estimated outcome of a project for alternative data estimates and assumptions, and allows us to express the results in terms of a range of possible values. Most importantly, it reminds the audience that there is uncertainty, and indicates how the outcome might vary. Scenario analysis allows the analyst to show results based on different scenarios of interest to decision makers.
Advantages and Disadvantages of Economic Case Study
An economic case study is widely considered one of the more highly developed methods of evaluation because of its focus on ultimate outcomes and impacts rather than on outputs. Its advantages include the fact that its scope extends from project start to finish, and it provides quantitative estimates of results that are often considered more convincing evidence of value than qualitative measures. Another advantage is that its measures are stated in the language of finance, which facilitates comparisons. Combined with a descriptive treatment of a project, a well-done economic analysis can shed light on the overall performance of a project and provide valuable insight to program administrators and policy makers.
The method also has disadvantages. For instance, it may be impossible to estimate the value of important benefits in monetary terms. 47 A further problem can arise if there is not a clear understanding of the essential differences between analyses performed for public versus private projects. For instance, spillover effects in case studies of publicly funded projects designed to deliver social benefits may be overlooked in the face of easier-to-capture private returns. A related disadvantage is that stakeholders may expect positive net benefits and large IRR in the short-run when, in fact, a public R&D program often takes substantial time for impacts to be realized, particularly spillover impacts resulting from knowledge dissemination. Another disadvantage may be the risk of raising expectations based on a single project that all or most projects will be like that one, or the risk that policy makers will draw conclusions from an idiosyncratic experience. Some of these potential disadvantages, however, may be avoided through skillful execution, presentation, and interpretation of the studies.
See suggested references on economic case-study method at the end of this chapter, and examples from ATP in Chapter 6.
Econometrics is a branch of economics by which researchers empirically estimate economic relationships by applying mathematical models to structure the relationships, and by applying statistical methods to analyze economic data, estimate model parameters, and interpret the strength of evidence for the hypotheses examined. Thus, econometrics includes model building, estimation, hypothesis formation and testing, and extensive data analysis. The method employs many techniques from mathematics and statistics, and is used in a wide range of applications. The results are highly quantitative, with the specific units of measure dependent on the nature of the individual analysis.
Application of econometric/statistical methods requires considerable care and skill in (1) hypothesizing relationships that derive from, or correspond to, prior theoretical or programmatic concepts; (2) selecting or constructing measures for dependent and independent variables corresponding to the key concepts and relationships posited in theory; and (3) using and interpreting appropriate statistical tests.
Reflecting the complexity of the phenomena examined and the absence of perfect empirical data to use in models, Griliches and Intriligator state in their extensive reference work on econometrics, 48 the following, which captures the flavor of the method:
Hypothesis testing first makes a tentative assumption called the null hypothesis, denoted by H0. Then an alternative hypothesis, denoted Ha, is defined which states the opposite of the null hypothesis. The hypothesis-testing procedure generally uses sample data to determine whether or not H0 can be rejected. If H0 is rejected, then the statistical conclusion is that the alternative hypothesis Ha cannot be rejected. It may be postulated in Ho, for example, that the number of collaborative research ventures is unaffected by public-private partnership programs, and in Ha, the opposite.
Regression and Correlation Analysis
Another application of statistical methods in evaluation is in regression and correlation analysis to identify the relationship between a dependent variable and one or more independent variables and to measure the degree of association between variables. Regression analysis develops an estimating equation from sample data to make projections about one variable (the dependent variable, y) based on another variable (the independent variable, x). Correlation analysis measures the strength of the relationship between the variables, that is, the variability in y that is explained by x, typically measured by the correlation of determination or its square root, the coefficient of correlation.
An example of a possible relationship that might be tested is an increase in the numbers of patents by firms in a given industry and the number/amount of federal research grants received. An estimated regression equation could be used to predict the change in the number of patents given an increase in federal grants. It is important to note that neither regression nor correlation analyses alone prove cause-and-effect relationships; rather, the analyses indicate how or to what extent variables are associated with each other.
Production Function Analysis to Measure Productivity
Evaluators also use econometric analysis to estimate a production function, the mathematical expression of the technical relationship between inputs and outputs. The production function equation quantifies the output that can be obtained from combinations of inputs, assuming the most efficient available methods of production are used. The production function can be used to estimate the change in output from an additional input or the least-cost combination of productive factors that can be used to produce a given output. It can be used, for example, to examine the impact of federal funding on private-firm R&D productivity.
Macroeconomic models can help in economic forecasting and the analysis and formulation of public policy. For example, a macroeconomic model based on national input-output tables and using a set of structural equations to explain economic relationships might be used to analyze the national effects of a product innovation that decreases its supply cost. An example of a macroeconomic model is the REMI Policy Insight TM model, which is used to forecast national and regional economic effects of a wide range of policy initiatives and technological changes. 49
Advantages and Disadvantages of Econometric/Statistical Methods
One advantage of econometric/statistical methods is that the methods significantly add to the analytical capability of evaluators. Use of these methods can contribute to an understanding of the relationships between inputs and outputs in the face of complex and imperfect data. Econometric/statistical methods can be used to produce quantitative results with detailed parameters, and, importantly, can be used to demonstrate cause-and-effect relationships.
The disadvantage to using these methods is that both the approaches and results may be difficult for the non-specialist to understand, replicate, and communicate. In addition, not all effects can be captured in these highly quantitative methods, which are imperfect and variable in how well they capture relationships between changing technical knowledge and economic and social phenomena.
See suggested references on econometric/statistical methods at the end of this chapter, and examples from ATP in Chapter 7.
Sociometric/Social Network Analysis
According to sociologists, the fact that economic behavior is embedded in networks of social ties has a profound impact on economic outcomes. There is an emerging awareness of the significance of social networks and their dynamics on the economic impacts of research and technology development among economists who are engaged in program evaluation.
There is growing interest in how social networks emerge, how social networks evolve, and how social networks affect economic behavior. Additionally, there is growing interest in applying methods of sociometrics and social network analysis to learn more about the spheres of influence of scientists, technologists, and innovators and the importance of their work, to identify evolving pathways of knowledge spillover, to improve the success of collaborative relationships, and to map the development and diffusion of human capital from projects.
Identify Networks of Information Sharing
Aside from tracking citations of patents and publications, how can evaluators define social networks of information sharing? One approach is to ask project participants to list several others outside their organization with whom they often share information, and also to list several others whose work they consider most important in the field of inquiry. The people they list are queried, and so forth. The multi-level communications network defined from the data can include affiliations and disciplines, can reveal paths of knowledge spillover, can show areas of influence, and can suggest the importance of the work of different people and the influence of one field on another. 50
Another approach, called co-nomination analysis, asks researchers in a given field to nominate others whose work is similar to or most relevant to their own. Evaluators assume links exist between those co-nominated.
Yet another approach to modeling scientific collaboration networks analyzes data from existing databases on co-authorship. According to Newman, scientists are connected if they have authored a paper together. This approach allows analysis of a large network without collecting primary data from the network participants. Network characteristics such as number of collaborators, degrees of separation between scientists, and clustering of the network and of disciplines are described and compared. 51
Advantages and Disadvantages of Sociometric/Social Network Analysis
The sociometric/social network analysis methods of evaluation have distinct advantages, principal among them being that these methods bring into focus a dimension of the process of innovation/economic impact that tends to be overlooked in traditional economic analysis. If, as there is growing reason to expect, social networks are the most important element for understanding spillover effects, it behooves a public program to understand better how to model, assess, identify, and encourage formation of these networks. The methods also offer a research advantage in that they tend to require relatively modest data that can be obtained through survey, interview, or existing databases. The methods provide insight offered by an alternative perspective focusing specifically on the human and institutional dimensions of analysis.
A possible disadvantage of sociometric/social network analysis is that it remains largely unfamiliar to most economists, agency administrators, and program stakeholders. Furthermore, the resulting qualitative measures may be considered, in and of themselves, not very informative of a program’s performance, particularly if the emphasis is on economic measures of impact. This problem should be lessened as economists and sociologists work together to fuse social network analysis with economic models.
See suggested references on sociometric/social network analysis methods at the end of this chapter, and examples of network analysis based on patent citation analysis in Chapter 8.
Bibliometrics: Counting, Citing, and Analyzing Content of Documents
Publications and patents constitute major outputs of research programs, and the large databases created to capture these outputs support the bibliometrics method of evaluation. As the term is used here, bibliometrics encompasses: tracking the quantity of publications and patents, analyzing citations of publications and patents, and extracting content information from documents. 52 Bibliometrics is used to assess the quantity, quality, significance, dissemination, and intellectual linkages of research, as well as to measure the progress, dynamics, and evolution of scientific disciplines.
Counting Publications and Patents
An easy output measure to track is the quantity of an organization’s or project’s publications and patents. The count may be normalized by research costs or some other measure of input to create an indicator of research productivity. Aggregated across a program, numbers of publications and patents per research dollar may serve as an indicator of program progress, and trends in outputs may be tracked over time. Adjustment can be made to account for quality differences in publication journals. Care should be exercised in making comparisons among organizations on the basis of their counts of publications and patents. Rates of patenting and publishing may vary for reasons other than productivity, and quality differences may not be adequately taken into account.
Tracking citations of publications and patents is useful for identifying pathways of knowledge spillovers. Citations may include publications citing other publications, patents citing publications, and patents citing other patents.
The frequency with which publications and patents are cited is also used as an indicator of quality and significance. The more other scientists cite a research paper or patent, the greater its assumed relevance, impact, quality, and dissemination, other things being equal. Normalization approaches can be used to help control for quality differences in the citing journals. An example of a simple normalization approach is to hold the journal constant and compare the number of citations a given paper, or group of papers, receives against the average citation rate of all papers in the journal. A value greater than one indicates the paper, or set of papers, is more heavily cited than the average.
Who is citing publications or patents may also be of interest. Examining who is citing what can reveal where a field of research or a technology is moving, and show knowledge linkages among subject areas. For example, a public program may wish to know whether U.S.-owned or foreign-owned firms take up a technology it funded. It may wish to know if its research is supporting other fields of knowledge. Citations of research papers in patents may be of special interest to a research organization because the citations show how the program’s research findings are being converted into technology and yielding economic benefits.
Citation analysis is also a useful adjunct to other evaluation methods. For example, it can facilitate historical tracing studies. In addition, citation analysis can be used to support social network analysis by investigating paper-to-paper, patent-to-patent, and patent-to-paper citations to identify potential intellectual linkages and clusters of relationships among researchers and organizations.
Extracting content information is another way to use documents in evaluation. Content analysis can help evaluate the historical evolution of research funded or conducted by a particular organization, or trace the emergence of a field of knowledge from multiple sources. One approach to content analysis is co-word analysis, which uses key words to search text. The frequency of co-occurrence of the key words for a selected database of published articles depicts the evolution of ideas and concepts. A newer approach is database tomography, which avoids the need to pre-specify key words. The texts to be searched are entered into a computer database, and a computer-based algorithm extracts words and phrases that are repeated throughout the database, using the proximity of words and their frequency of co-occurrence to estimate the strength of their relationship. A more recent approach, textual data mining, goes beyond statistical methods and uses such techniques as artificial neural networks and fuzzy logic to extract content information from “data warehouses.”
Visualization ToolsWith both citation analysis and content analysis, the visual display of results aids comprehension of both the analyst and the audience. Special hardware and software programs, including SPIRE™ and Starlight, effectively illustrate the results of content analysis. 53
Advantages and Disadvantages of Bibliometrics
A major advantage of bibliometric methods is that they are widely applicable to evaluation of programs with an emphasis on publishing or patenting. The methods can be used to address a variety of evaluation topics, including productivity trends, collaborative relationships, program innovation, and patterns and intensity of knowledge dissemination. Existing databases support the methods, and the methods scale easily, making it feasible and economical to apply them to large numbers of documents. The approach is relatively straightforward, and diverse audiences can understand the results. Another important advantage is that the methods do not burden those who are the subject of evaluation because data are obtained from existing databases. Some of the bibliometric methods can be applied to a program with a relatively short time lag. Finally, the objectivity associated with the methods lends them a high degree of credibility.
A disadvantage of bibliometric evaluation is that it treats only publications and patents as program outputs and ignores other outputs and long-term outcomes. Another disadvantage is that time must pass before extensive patent citations can be observed. Potential problems abound in the application of the methods. For example, counts indicate quantity of output, not quality; all publications are not of equal importance; and adjustment approaches may not adequately account for differences in quality and importance. The propensities to publish and to patent differ among organizations, technical fields, and disciplines for a variety of reasons, not just productivity differences. For example, mature technology areas can be expected to exhibit more citations than emerging technology areas. Works of poor quality may be heavily cited. Self-citations and friend-citations may artificially inflate citation rates as may patent citations provided by the patent examiner. Citing organizations may not have significant intellectual linkage. Though databases exist, the databases may be difficult to work with due to inconsistent and incomplete citations.
See suggested references on bibliometrics at the end of this chapter, and examples from ATP in Chapter 8.
The historical tracing method, or historiographic method, resembles the descriptive case study method in terms of providing an in-depth investigation in a storytelling mode. What sets it apart is its emphasis on tracing chronologically a series of interrelated developments leading from research to ultimate outcomes or from outcomes back to the factors that spawn them.
When the objective is to evaluate a given project, forward tracing, where the analyst starts with the research of interest and traces the evolution of related events from that point forward, is generally more manageable and cost-effective than backward tracing, 54 and produces a relatively complete portrayal of a project’s impacts. Forward tracing enables the investigation of all known pathways leading forward from the project and contributes to a better understanding of the evolutionary processes of science and technology.
In contrast, backward tracing, in which the analyst starts with an outcome of interest and traces backward to identify the critical developments that appear instrumental to the outcome, may or may not lead back to the project of interest. And if it does, the study may have a narrow focus that misses other effects associated with the project. For these reasons, the backward tracing approach seems more appropriate: (1) when the outcome is the central focus, (2) when a particular outcome is of known significance and the programmatic linkage is also known to exist, or (3) when the purpose is to show in a general way how significant outcomes are rooted in a certain type of programmatic funding or in work funded or conducted by certain organizations. An appeal of the approach is that the significance of the outcome is already established rather than evolving.
Following the Evolutionary Trail
The historical tracing method usually uses an interview/investigative approach to follow the evolutionary trail from one organization or researcher or development to the next. It identifies the relationships and linkages among key events, people, documents, organizations, and scientific knowledge. To identify linkages among people or organizations, it may use tools of social network analysis and citation analysis. To identify linkages among documents, it may also use the tools of citation analysis. In fact, historical tracing is often used in combination with other methods.
Advantages and Disadvantages of Historical Tracing
The historical tracing method tends to produce interesting and credible studies documenting a chain of interrelated developments, and by providing linkage all the way from inputs to outputs it may shed light on process dynamics.
A disadvantage of the approach is that evolutionary chains of events tend to be highly complex with many organizations and researchers involved, making it sometimes difficult to know the significance of apparent linkages. Evolving pathways and dead ends can stymie the forward tracing approach, while disconnects can frustrate the backward tracing approach.
See suggested references on historical tracing at the end of this chapter.
Experts are often called on to give their opinions about the quality and effectiveness of a research program. The experts generally render their verdict after reviewing written or orally presented evidence or making direct observations of activities and results.
Requirements, Logistics, and Mechanics
To provide a quality evaluation the reviewers must be highly knowledgeable about the subject and able to clearly articulate their opinions. They must be free of conflict of interest, and subject to clear, timely, and consistent process and evaluation criteria. To carry out their assessments the experts may be assembled in conferring panels or they may perform their reviews independently. They may be supported by staff to assist in data collection and report writing. They may express their opinions in terms of descriptive narratives, quality ratings (such as excellent/good/fair, high/ medium/low, or satisfactory/unsatisfactory), or as numerical scores (e.g., a number on a scale of 0–5).
Types of Expert Methods
Most federal government agencies typically use several types of expert review methods, including 55 (1) peer review, which is commonly used to make judgments about the careers of individual staff members, the value of publications, the standing of institutions, and the allocation of funds to individuals, organizations, and fields of inquiry; (2) relevance review, which is used to judge whether an agency’s programs are relevant to its mission; and (3) benchmarking, which is used to evaluate the standing of an organization, program, or facility relative to another.
Advantages and Disadvantages of Expert Judgment
A principal advantage of the method lies in its practicality; that is, it provides a relatively quick, straightforward, feasible, and widely accepted approach to assessment. Another advantage is that it offers the chance for an interchange of ideas, which can lead to new perspectives.
At the same time, not much is known about the quality or accuracy of expert judgment as applied to R&D program impact assessment. It seems advisable to back up expert judgment with results from other evaluation methods and other supporting studies when attempting to assess complex phenomena.
Challenges to successful use of the method are to identify qualified reviewers, to keep reviewers free of bias and conflict of interest, and to calibrate reviewer ratings so as to render consistent judgments according to desired criteria.
See suggested references on expert judgment at the end of this chapter, and illustrations from ATP in Chapter 8.
Suggested Readings on Evaluation Methods
The following references, by no means comprehensive, are provided from the literature for those who wish to read further about the evaluation methods discussed above. Broad coverage of evaluation methods is also provided by a recent European counterpart report sponsored by the European Commission. 56
Case Study: Descriptive
Case Study: Economic Estimation
The following journals regularly publish papers that help convey the breadth of program impacts that are treated by economic case study as well as other evaluation methods:
Also see the following:
[The contents of the above-listed handbooks and ordering information can be found online at http://www.elsevier.nl/locate/hes.]
Sociometric/Social Network Analysis Methods
Bibliometrics: Counting, Citing, and Content Analysis
34 For example, Donaldson’s evaluation of social programs found multiple and indirect paths that a program’s actions may cause in producing intended and unintended outcomes. Stewart Donaldson, “The Theory-Driven View of Program Evaluation.” Paper presented at Evaluating Social Programs and Problems: Visions for the New Millennium, February 24, 2001.
35 “Population” refers to a given, finite collection of units; “sample,” to a subset of the population; and “parameter,” to population characteristics. Statistics from a sample can be used to estimate unknown values of parameters for the population.
36 There is a large literature on sampling design, which should be consulted. For additional information see, for example, Floyd J. Fowler, Jr., Survey Research Methods (Newbury Park, CA: Sage, 1993).
37 R. Yin, Case Study Research, 2nd ed. (Thousand Oaks, CA: Sage, 1994).
38 M. Shadish, T. Cook, and L. Leviton, Foundations of Program Evaluation (Newbury Park, CA: Sage, 1991), pp. 286–301.
39 K.M. Eisenhardt, “Building Theories from Case Study Research,” Academy of Management Review, 14(4): 532–550, 1989.
40 The economic case study method discussed here uses microeconomic estimation techniques. Case studies may also use macroeconomic analysis models to estimate national economic effects. These techniques are discussed under the category econometric/statistical methods in Chapter 7.
41 The basic equation for adjusting benefits occurring in a future year t to an equivalent amount occurring at the present is Bt/(1 + d) t, where Bt = benefits in future year t, and d = a discount rate. Thus, receiving benefits valued at $1 million in five years is equivalent to receiving $712,990 today if the discount rate is 7%. And paying any more than $712,990 today for a return of $1 million in five years would be a losing proposition if a 7% annual rate of return could otherwise be obtained. An expanded set of discounting formulas, as well as multiplicative discount factors based on applying the formulas for $1.00 of value, are readily available in most benefit-cost, engineering economics, or finance textbooks to cover the various discounting operations.
42 Ruegg and Marshall provide a detailed treatment of the strengths and weaknesses of methods beginning with net benefits and ending with payback period, and demonstrate how they are calculated and used; see R. Ruegg and H. Marshall, Building Economics: Theory and Practice (New York: Van Nostrand Reinhold, 1990), pp. 16–104.
43 In formulating the ratio, there are issues about which values go in the numerator and which go in the denominator. Guidance on the ratio formulation is provided by Ruegg and Marshall, Ibid., pp. 48–54.
44 The ratio method can be used for evaluating project or program performance, but tends to be less used than net benefits for this purpose. A reason is that it does not show the dollar magnitude of net benefits, a figure normally of keen interest in evaluation. Since a ratio computed on total benefits and costs begins to fall—as a project is expanded— before the optimal size is reached, it is important to compute ratios on marginal changes when using the method to size or scope a project or to allocate a budget among competing projects.
45 Note that the IRR, like the benefit-to-cost ratio measure, must be applied incrementally when used for sizing projects or budget allocations.
46 Risk assessment techniques include expected value analysis, mean-variance criterion and coefficient of variation, risk-adjusted discount rate technique, certainty equivalent technique, simulation analysis, and decision analysis. See R. Ruegg, “Risk Assessment,” in R. Dorf, ed., Engineering Economics and Management (Boca Raton, FL: CRC Press and IEEE Press, 1996), pp. 1953–1961.
47 Economists have pushed the quantification of effects far into difficult-to-measure areas, including environmental effects, aesthetics, human pain and suffering, and value of life.
48 Z. Griliches and M. Intriligator, ed., Handbook of Econometrics, vol. 1 (New York: Elsevier Science, 1983).
49 See “REMI Policy Insight, User Guide” for background on the economic theory underlying the REMI model and the REMI website (http://www.remi.com) for demonstration software.
50 The discussion of social networks draws on the treatment in K. Branch, M. Peffers, R. Ruegg, and R. Vallario, The Science Manager’s Resource Guide to Case Studies (Washington, DC: U.S. Department of Energy, Office of Science, 2001).
51 M. E. J. Newman, “The Structure of Scientific Collaboration Networks.” Proceedings of the National Academy of Sciences, 98(2): 404–409, 2001.
52 Some treatments present these as three distinct methods: bibliometrics, citation analysis, and content analysis.
53 See a description of these visualization tools at the SPIRE website: http://www.pnl.gov/infoviz/spire/spire.html.
54 If pathways from a project are unpromising, the investigation of them can be truncated with no further effort (i.e., the analysis stops).
55 National Academy of Sciences, Committee on Science, Engineering, and Public Policy, Evaluating Federal Research Programs: Research and the Government Performance and Results Act, 1999.
56 Gustavo Fahrenkrog, Wolfgang Polt, Jamime Rojo, Alexander Tubke, and Klaus Zinöcker, eds., RTD Evaluation Toolbox—Assessing the Socio-Economic Impact of RTD Policies, Strata Project HPV1CT 1999–00005, IPTS Technical Report Series (Seville, Spain: Joint Research Centre, European Commission, 2002). Additional information about the document can be found at http://www.jrc.es by searching on publications.
Date created: July 13,
NIST is an agency of the U.S. Commerce Department