In: Computer Science
Explain the importance of evaluating solutions, after development work on the software forming the solution has been completed and the software has been implemented (put into operation/use).
Evaluation is a process that critically examines a program. It involves collecting and analyzing information about a program’s activities, characteristics, and outcomes. Its purpose is to make judgments about a software , to improve its effectiveness, and/or to inform programming decisions.
Evaluation is a process that critically examines a program. It involves collecting and analyzing information about a program’s activities, characteristics, and outcomes. Its purpose is to make judgments about a program, to improve its effectiveness, and/or to inform programming decisions.
What happens if we don’t evaluate our solutions?
Once a solution has been decided and the algorithm designed, it can be tempting to miss out the evaluating stage and to start programming immediately. However, without evaluation any faults in the algorithm will not be picked up, and the program may not correctly solve the problem, or may not solve it in the best way.
Faults may be minor and not very important. For example, if a solution to the question ‘how to draw a cat?’ was created and this had faults, all that would be wrong is that the cat drawn might not look like a cat. However, faults can have huge – and terrible – effects, eg if the solution for an aeroplane autopilot had faults.
Evaluation is a task, which results in one or more reported
outcomes.
• Evaluation is an aid for planning, and therefore the outcome is
an evaluation of different possible
actions.
• Evaluation is goal oriented. The primary goal ist to check
results of actions or interventions, in
order to improve the quality of the actions or to choose the best
action alternative.
Evaluation is dependent on the current knowledge of science and
the methodological standards.
Evaluation as an aid for software development has been applied
since the last decade, when the com-
prehension of the role of evaluation within Human-Computer
Interaction had changed. In one of the
most influential models of iterative system design, the Star Life
Cycle Model of Hix & Hartson [33],
the activities
“Task analysis”, “Requirement specification”, “Conceptual and
formal design”,
“Prototyping”, “Implementation”
are each supplemented by an activity “Evaluation” which helps to
decide progression to the next step.
Software can be evaluated with respect to different aspects, for
example, functionality, reliability,
usability, efficiency, maintainability, portability [38]. In this
survey, we concentrate on the aspect of
usability from an ergonomic point of view. This aspect has gained
particular importance during the
last decade with the increasing use of interactive software.
Goals and
results of
evaluation
Any evaluation has pragmatically chosen goals. In the domain of
software evaluation, the goal can be
characterised by one or more of three simple questions:
1. “Which one is better?” The evaluation aims to compare
alternative software systems, e.g. to
choose the best fitting software tool for given application, for a
decision among several proto-
types, or for comparing several versions of a software system. An
example of such a strategy
can be found in [77].
2. “How good is it?” This goal aims at the determination of the
degree of desired qualities of a
finished system. The evaluation of the system with respect to
“Usability-Goals” [9, 94] is one
of the application of this goal. Other examples are the
certification of software, and the check
on conformity with given standards.
3. “Why is it bad?” The evaluation aims to determine the weaknesses
of a software such that the
result generates suggestions for further development. A typical
instance of this procedure is a
system developing approach using prototypes or a re-engineering of
an existing system.
Evaluation criteria
It is a matter of some debate in the human-factors community what
constitutes an evaluation criterion.
We follow Dzida [14] who advised that “criteria” should mean the
measurable part of attributes of
design or evaluation. Although this seems clear enough, the
literature on software evaluation shows
only a few attempts how to achieve general principles of design and
evaluation criteria.
The concept of usability, which is a general quality concept for
software systems, is often used for the
determination of evaluation criteria [30, 48, 64].
The international standard ISO 9241 (Part 11), which is the
methodological foundation of the HCI
standard “Ergonomic requirements for office work with visual
display terminals” (ISO 9241), states
that
“Usability of a product is the extent to which the product can be
used by specific users to
achieve specific goals with effectiveness, efficiency, and
satisfaction in a specific context
of use.”
Evaluation techniques
Evaluation techniques are activities of evaluators which can be
precisely defined in behavioural and
organisational terms. It is important not to confuse “Evaluation
techniques” with “Evaluation models”,
which usually constitute a combination of evaluation
techniques.
We classify evaluation techniques into two categories, the
descriptive evaluation techniques and the
predictive evaluation techniques, both of which should be present
in every evaluation:
Descriptive evaluation
techniques are used to describe the status and the
actual problems of the
software in an objective, reliable and valid way. These techniques
are user based and can be
subdivided into several approaches:
Behaviour based evaluation techniques record user behaviour while
working with a system
which “produces” some kind of data. These procedures include
observational techniques
and “thinking-aloud” protocols.
Opinion based evaluation methods aim to elicit the user’s
(subjective) opinions. Examples
are interviews, surveys and questionnaire.
Usability Testing stems from classical experimental design
studies. Nowadays, Usability Test-
ing (as a technical term) is understood to be a combination of
behaviour and opinion based
measures with some amount of experimental control, usually chosen
by an expert.
The predictive
evaluation techniques have as
their main aim to make recommendations for future
software development and the prevention of usability errors. These
techniques are expert – or
at least expertise – based, such as Walktrough or inspection
techniques. Even though the expert
is the driving power in these methods, users may also participate
in some instances.
Note that predictive evaluation techniques must rely on “data”. In
many predictive evaluation
techniques, such “data” are produced by experts who simulate “real”
users. The criteria objec-
tivity and reliability, which are at the basis of descriptive
techniques, are hard to apply in this
setting. Because validity must be the major aim of evaluation
procedures, there are attempts
to prove the validity of predictive evaluation techniques directly,
e.g. by comparing “hits” and
“false alarm” rates of the problems detected by a predictive
technique [66].
At the end of the Section we shall briefly discuss evaluation
techniques which can be used either for
predictive or descriptive evaluation (e.g. formal Usability Testing
methods) and those which do not fit
into the predictive/descriptive classification, such as the
“interpretative evaluation techniques” .