Evaluation of the User Testing

Next: Usability Inspection Up: User Testing Previous: Parsing

Evaluation of the User Testing

This section evaluates the user testing itself, how effective it was, and the shortcomings it had.

The primary goal of the user testing was to determine if a pen-based formula entry system was something that people would want to use, to test the new interface concepts of the squiggle-select for modify stroke groups mode and the pop up menu for modify character mode and to test how well the implementation of a graph rewriting formula parser worked on handwritten input.

With respect to finding out information on these things, the user testing worked very well.

The second goal of the user testing was to get statistical measures on the performance of the character recogniser and automatic stroke grouping algorithm. This information was easily gained by reviewing the video tapes, though, for any future work, it would be easier if the the system was modified to gather this information automatically instead. For every hour of video tape, it took almost two hours to review it and gather all the information desired.

There are a number of aspects to the user testing that made it less than ideal:

The observer was myself, the person who wrote the system. Ideally, the observer should have no connection with the system.
The users were allowed to talk freely with the observer. The observer should really be just an observer, and only a last-resort source of help.
The full thinking aloud method was not used. As a result, every thought users had as they used the system was not available.
The entire testing was designed, conducted and analysed by a single person: myself. Having more people involved throughout the entire user testing process would have probably resulted in a better test design, and the ability to easily gather and process more data.
The participants who took part in the testing represented the potential final users well, though there was no representation of a final ``home user''.
The number of participants in the user testing, while giving a good base for opinions, didn't supply enough data for good statistical measures of things such as error rates.

While these are important issues, they did not seem to have a significant impact on this user testing. The goal was to get general opinions of the system, and this has been achieved.

While Nielsen and Redish state that nine users is sufficient for user testing, increasing the number of participants in the user testing would have been useful for statistical purposes.

From my experience of user testing this system, after about the sixth participant in the user testing, there was a large proportion of comments that were repeats of existing ones.

The results gained from the user testing were very useful: a number of participants gave valuable feedback as to what the system would have to be able to do before it would be of use to them, or offered some very good ideas as to how the user interface could be improved. The user testing found the flaws that the system had: the weakness of the character recogniser when being used by users that it had not been trained for, and the issues that the formula processor had with processing handwritten input.

state that nine users is sufficient for user testing, increasing the number of participants in the user testing would have been useful for statistical purposes.

From my experience of user testing this system, after about the sixth participant in the user testing, there was a large proportion of comments that were repeats of existing ones.

Next: Usability Inspection Up: User Testing Previous: Parsing

Steve Smithies
1999-11-13