Metrics and Models to Evaluate Future Pervasive Interactive Systems

Research already concluded that effectiveness, efficiency, and satisfaction, the established metrics to assess interactive systems, are not sufficient. It is however, unclear which metrics and models can be used to describe the interaction with systems in varying contexts. Important dimensions that must be covered include effectiveness, workload, social acceptability, physiological and psychological fatigue, transferability scores, as well as effects on group dynamics. The research of today has, for many years, used simple standard tests such as the System Usability Scale and the NASA TLX, as introduced in Section 2. While these standards have also been employed for pervasive computing they will need to prove that they capture the whole user experience in a pervasive computing environment as envisioned above. Even though there were approaches to open metrics up towards hedonic quality (AttrakDiff, Hassenzahl) or to comfort of wearing a device (Comfort Rating Scale) they remain limited for larger scales. We expect that new models of assessing the quality of interaction paradigms in pervasive computing environments have to be developed. For future pervasive computing systems, our measurements and metrics have to reach much further than available scales and scores. For validating the effectiveness, quality, and user experience in pervasive computing, we need adequate and accepted metrics that indicate the success or failure of these systems in field tests. While we have access today to advanced sensor technology, such as physiological measures and neurophysiological measures, for detecting user behavior, state, and performance in the field, the use and reliability of these measurements is still not fully discovered. We need to investigate how measurement technology can allow for unobtrusive measurement of efficiency and effectiveness but also capture understanding, experience, and emotions that come with interaction paradigms. We are expecting a multimodal sensor fusion approach in which behavioral metrics and physiological metrics are integrated to increase the fidelity of the detection of user state, behavior, action, mood, performance and so on as the signals appear across these metrics. Success criteria are a rich metrics for post-hoc but also real-time assessment of the user behavior on complex pervasive computing environments using different interaction paradigms.

Research Questions

  1. How do we measure effectiveness and efficiency across ensembles of devices in large pervasive computing environments?
  2. How can we establish unobtrusive measurements to allow for truly unsupervised assessments in the field?
  3. What are suitable metrics to capture the individual aspects ranging from task performance, error rate, and workload but also to user experience, acceptance of technology, and joy of use?
  4. What are metrics for measuring boredom, wandering and attention in pervasive computing environments as important influence factors?
  5. What are generic testbeds and use cases that allow for a comparative analysis of interaction in Pervasive Computing Environments?

In this research area we expect research methods from data collection with different sensor technology and data analysis and machine learning to find robust metrics within single data streams such as eye-gaze behavior but also fusing of sensor streams to get more reliable and expressive measurements of a wide range of dependent valuables such as task performance, error rate, and workload but also to user experience, acceptance of technology, and joy of use that are suitable for and scalable in pervasive computing environments.