In: Economics
Metrics This should be a review of several key metrics that the team will use to assess progress in achieving its vision for the proposed business model innovation (the “to be” business model). What metrics will be most impacted? What would success look like? What types of metric feedback would be sufficient for recognizing failure? What rough-range sales/market share projections might be possible under best case scenarios / average estimates / worst case scenarios? Use ranges, but be aim for plausible specificity. Ground your metrics and numbers in some supportable data/reality. You should be able to meaningfully explain if the risk is worth it for this size of a potential return.
Metrics are tools for supporting actions that allow programs to evolve toward successful outcomes, promote continuous improvement, and enable strategic decision making. Based on the lessons learned from industry, academia, and federal agencies discussed in the previous chapter, the committee offers a set of general principles to guide the development and use of metrics. Although targeted to the Climate Change Science Program (CCSP), many of these general principles have also been proposed elsewhere.1 The principles are divided into three categories: (1) prerequisites for using metrics to promote successful outcomes, (2) characteristics of useful metrics, and (3) challenges in the application of metrics.
Good leadership is required if programs are to evolve toward successful outcomes.
Good leaders have several characteristics. They are committed to
progress and are capable of articulating a vision, entraining
strong participants, promoting partnerships, recognizing and
enabling progress, and creating institutional and programmatic
flexibility. Good leaders facilitate and encourage the success of
others. They are vested with authority by their peers and
institutions, through title, an ability to control resources, or
other recognized mechanisms. Without leadership, programmatic
resources and research efforts cannot be directed and then
redirected to take advantage of new scientific, technological, or
political opportunities. Metrics, no matter how good, will have
limited use if resources cannot be directed to promote the program
vision and objectives established by the leader.
2. A good strategic plan must precede the development of
metrics.
Metrics gauge progress toward achieving a stated goal. Therefore,
they are meaningless outside the context of a plan of action. The
strategic plan must include the intellectual framework of the
program, clear and realizable goals, a sense of priorities, and
coherent and practical steps for implementation. The best metrics
are designed to assess whether the effort and resources match the
plan, whether actions are directed toward accomplishing the
objectives of the plan, and whether the focus of effort should be
altered because of new discoveries or new information. Metrics, no
matter how good, will have limited use if the strategic plan is
weak.
CHARACTERISTICS OF USEFUL METRICS
3. Good metrics should promote strategic analysis. Demands for higher levels of accuracy and specificity, more frequent reporting, and larger numbers of measures than are needed to improve performance can result in diminishing returns and escalating costs.
Preliminary data or results are often good enough to make strategic
decisions; additional effort to make them scientifically rigorous
might be wasted. Larger numbers of metrics may also promote
inefficiencies. For example, if a substantial amount of signed
paperwork is required to demonstrate that the federal Paperwork
Reduction Act is working then the metric clearly fails to meet its
primary objectives.
The frequency of assessment should reflect the needs and goals of the program. Very infrequent assessments are not likely to be useful for managing programs, and overly frequent assessments have the potential to promote micromanagement or to become burdensome. For example, the Intergovernmental Panel on Climate Change (IPCC) assessments are nearly continuous and require an enormous, sustained effort by a large segment of the climate science community.2 For short-term programs, such as the Tropical Ocean Global Atmosphere (TOGA) experiment, frequent scientific assessments would have been nearly useless, because a decade was required to clearly demonstrate some of the most important scientific outcomes.3 On the other hand, process metrics for evaluating progress on the creation and operation of the program, would have had value on much shorter time scales.
4. Metrics should serve to advance scientific progress or inquiry,
not the reverse.
A good metric will encourage actions that continuously improve the
program, such as the introduction of new measurement techniques,
cutting-edge research, or new applications or tools. On the other
hand, a poor measure could encourage actions to achieve high scores
(i.e., “teaching to the test”) and ultimately unbalance the
research and development portfolio. The misapplication of metrics
could lead to unintended consequences, as illustrated by the
following examples:
The author citation index provides a measure of research productivity. If this metric were the only way to measure faculty performance, it could drive researchers to invest more in writing review articles that are cited frequently than in working on new discoveries.
2
The IPCC was established in 1988 under the auspices of the United Nations Environment Programme and the World Meteorological Organization to conduct assessments of climate change and its consequences. Assessments are produced and peer reviewed by more than 1000 scientific researchers, policy experts, and risk analysts from all over the world. Writing and review of the assessment reports, which have been produced about every five years since 1990, take several years.
3
The development of the TOGA program, establishment of the TOGA-TAO (Tropical Atmosphere Ocean) observation array, and demonstration that the improved observations and process studies promoted improved forecasting and understanding of El Niño-Southern Oscillation (ENSO) events required more than a decade of effort. See National Research Council, 1996, Learning to Predict Climate Variations Associated with El Niño and the Southern Oscillation: Accomplishments and Legacies of the TOGA Program, National Academy Press, Washington, D.C.
The U.S. Global Change Research Program (USGCRP) has supported efforts to compare major climate models. Convergence of model results (e.g., similar temperature increases in response to a doubling of carbon dioxide) could be a measure of progress in climate modeling. The metric succeeds if it identifies differences in the way physical processes are incorporated in models, which then leads to research aimed at improving understanding of those processes and, eventually, to model improvements and the reduction of uncertainties in model predictions. The metric fails if it creates an unintended bias in researchers who adjust their models solely to bring them into better agreement with one another.
5. Metrics should be easily understood and broadly accepted by stakeholders. Acceptance is obtained more easily when metrics are derivable from existing sources or mechanisms for gathering information.
It is important to avoid creating requirements for measurements
that are difficult to obtain or that will not be recognized as
useful by stakeholders. The latter is especially difficult for
innovative or multidisciplinary sciences that have yet to establish
natural mechanisms of assessment. The following examples illustrate
these points:
A metric for measuring change in forest cover is the fraction of land surface covered by forest canopy, which is detectable using remote sensing. An area is considered “forest” when 10 to 70 percent of the land surface is covered by canopy. However, the lower threshold would not be viewed as useful by stakeholders. A metric based on this threshold (essentially, forest or not forest) could mean that an area with dense canopy would be defined as forest, despite being severely logged and degraded.4 The metric becomes more useful when it is associated with information about land-cover types. For example, a 10 percent threshold might be appropriate for savannah areas, whereas higher thresholds would be required for ecosystems with more continuous canopy cover. More detailed measures of forest cover can also be developed, such as selective removal of specific tree types, changes in species composition, or changes in indices (e.g., seed production, primary productivity, leaf density). However, one can quickly reach a point at which the difficulty of measuring the quantities systematically becomes overwhelming, limiting their use as metrics.
4
Intergovernmental Panel on Climate Change, 2000, Land Use, Land-Use Change, and Forestry: A Special Report, R.T. Watson, I.R. Noble, B. Bolin, N.H. Ravindranath, D.J. Verardo, and D.J. Dokken, eds., Cambridge University Press, Cambridge, U.K.
The number of users is commonly cited as a metric of the usefulness of holdings in data centers.5However, it is difficult to gather reliable information to support this metric. With the shift to on-line access, most users find and retrieve data via the Internet. Since the actual number of users is not known, data centers count “hits” on their web sites, which are likely to be several orders of magnitude greater than the actual number of users, or “distinct hosts,” which overcount users accessing the site from several different computers.
6. Promoting quality should be a key objective for any set of metrics. Quality is best assessed by independent, transparent peer review.
The success of the scientific enterprise and confidence in its
results depend strongly on the quality of the research. Although
peer review has well-known limitations (e.g., results depend on the
identity of the reviewers, there is a tendency to view research
results conservatively), it is the generally accepted mechanism to
assess research quality. Review occurs throughout the scientific
enterprise in the form of peer review of proposals submitted for
funding, peer review of manuscripts submitted for publication in
journals, and internal and peer review of programs and program
outcomes (Boxes 2.1 and 2.2). Peer review also provides the best
mechanism for judging when to change research directions and, thus,
make programs more evolutionary and responsive to new ideas.
7. Metrics should assess process as well as progress.
The success of any program depends on many factors, including
process (e.g., level of planning, type of leadership, availability
of resources, accessibility of information) and progress (e.g.,
addition of new observations, scientific discovery and innovation,
transition of research to practical applications, demonstration of
societal benefit). The assessment of process as well as progress is
important for every program, but its value is particularly high for
large, complex programs.
The sheer diversity and complexity of programs such as the USGCRP and the CCSP defies the application of a few simple metrics. Even the assessment of progress depends on the nature and maturity of the effort. Enhancing an existing data set is different from developing a new way to
5
National Research Council, 2003, Review of NOAA’s National Geophysical Data Center, The National Academies Press, Washington, D.C., 106 pp.; National Research Council, 2002, Assessment of the Usefulness and Availability of NASA’s Earth and Space Science Mission Data, National Academy Press, Washington, D.C.measure a specific variable. Process studies are different from model improvements. Mission-oriented science is different from discovery science. Metrics should reflect the diversity and complexity of the program and the level of maturity of the research. Comprehensive assessment of the program will include processes taken to achieve CCSP goals, as well as progress on all aspects of the research, from inputs to outputs, outcomes, and impacts.
8. A focus on a single measure of progress is often misguided.
The tendency to try to demonstrate progress with a single metric
can create an overly simplistic and even erroneous sense of
progress. Reliance on a single metric can also result in poor
management decisions. These points are illustrated in the following
examples:
The predicted increase in globally averaged temperature with a doubling of carbon dioxide has remained in the same range for more than 20 years (see Chapter 4). According to the metric of reducing uncertainty, climate models would seem to have advanced little over that period despite considerable investment of resources. In fact, however, the physics incorporated in climate models has changed dramatically. Incorporation of new processes, such as vegetation changes as a function of climate, is yielding previously unrecognized feedbacks that either amplify or dampen the response of the model to increased carbon dioxide. The result is often greater uncertainty in the range of predicted temperatures until the underlying processes are better understood. New discoveries can also indicate that certain elements of the weather and climate system are not as predictable as once thought. In such cases, significant scientific advance can result in an increase in uncertainty. Rather than relying solely on uncertainty reduction, it may be more appropriate to develop metrics for the three components of uncertainty: (1) success in identifying uncertainties, (2) success in understanding the nature of uncertainties, and (3) success in reducing uncertainties.
Change in biomass is commonly used as a metric to assess the health of marine fisheries. However, this metric fails to recognize the substitution of one species for another (an important indication of environmental change or degradation), interactions among species, and changes in other parts of the food web that result from fishing. Reliance on biomass alone could lead to the establishment of fishing targets that speed the decline of desirable fish stocks or adversely affect other desired species. For example, early management of Antarctic krill stocks strictly on a biomass basis did not account for two facts: (1) most harvesting was in regions that support feeding by large populations of krill-dependent predators such as penguins, whales, and seals, and (2) predator populations can be adversely affected bykrill fishing, especially during their breeding seasons.6 A more complex metric or set of metrics that incorporate species composition (multispecies management), information about dependent species (ecosystem-based management), and species distribution and environmental structure (area-based management) would reflect the state of knowledge and lead to better resource management decisions. Combining a biomass-based metric with information from quota-based or fishing-effort-based management practices would provide an approach for sustaining fishery stocks at levels that are both economically and environmentally desired.
CHALLENGES IN THE APPLICATION OF METRICS
9. Considerable challenge should be expected in providing useful a priori outcome or impact metrics for discovery science.
The assignment of outcome metrics implies that we can anticipate
specific results. This works well at the level of mission-oriented
tasks such as increasing the accuracy of a thermometer. However,
much of discovery science involves the unexpected and the outcome
is simply unknown. For example, the measurement of atmospheric
carbon dioxide concentrations by C.D. Keeling eventually revealed
both an annual cycle and a decadal trend in atmospheric
composition, neither of which was the original goal of the
observation program.7 This remarkable achievement could have been
defeated by the strenuous application of outcome metrics aimed at
determining whether a reliable “baseline” CO2 level in the
atmosphere had been established.
It is difficult to conceive of metrics for serendipity, yet serendipity has resulted in numerous discoveries—from X-rays to Post-it adhesives. Great care must be taken to avoid applying measures that stifle discovery and innovation. The most suitable metrics may be related to process (e.g., the level of investment in discovery, the extent to which serendipity is encouraged, the extent to which curiosity-driven research is supported). The National Science Foundation is highly regarded for its ability to promote discovery science, and its research performance measures focus on processes for developing a scientifically capable work force and tools to enable discovery, learning, and innovation (Table 2.4).
6
For example, see Committee for Conservation of Antarctic Marine Living Resources, 2003, Report of the 22nd Meeting of the Scientific Committee, SC-CCAMLR-XXII, Hobart, Australia, 577 pp.
Metrics must evolve to keep pace with scientific progress and program objectives.
The development of metrics is a learning process. No one gets it
right the first time, but practice and adjustments based on
previous trials will eventually yield useful measures and show what
information must be collected to evaluate them. Metrics must also
evolve to keep pace with changes in program goals and objectives.
Scientific enterprises experience considerable evolution as they
move through various phases of exploration and understanding.
Metrics for newly created science programs, which focus on data
collection, analysis, and model development to increase
understanding, will tend to focus on process and inputs. As the
science matures and the resulting knowledge is applied to serve
society, metrics will focus more on outputs and, finally, on
outcomes and impacts. As science transitions from the discovery
phase to the operational or mission-oriented phase, the types of
metrics should also be expected to evolve.
11. The development and application of meaningful metrics will
require significant human, financial, and computational
resources.
The development and application of metrics, especially those that
focus on quality, is far from a bookkeeping exercise. Efforts to
assess programmatic plans, scientific progress, and outcomes
require substantial resources, including the use of experts to
carry out the reviews. Funding to support the logistics of the
reviews is also required. The CCSP strategic plan includes a
substantial number of assessments and a growing emphasis on
measurable outcomes. As these are implemented, the choice of
meaningful measures of progress must be deliberate. If the IPCC
process is a representative example, the growing emphasis on
assessments has the potential to increasingly divert resources from
research and discovery to assessment.