Interaction modalities

In the past it was easy to classify HSIs as either input (keyboards, switches, mice, etc.) or output devices (displays, gauges or printers). With the convergence of modern HSIs it is no longer that simple — many devices are now combining input and output on the same device (tablets, smartphones, etc.). Even distinguishing between hardware and software is becoming increasingly difficult because many devices have embedded software and the device is rather considered in terms of the functions that the user can perform with it. It is now more sensible to classify HSIs in terms of the mode of interaction, or ‘interaction modality’.

Interaction modality can be described as a means of communication between the human and the system or device. The term ‘communication’ implies the process of exchanging information between the human and the system primarily through the visual, auditory, speech and touch senses. All HSI technologies can be categorised according to the human sense for which the device is designed. Most devices rely on only two or three of the most common senses used to obtain information from the environment: vision, hearing and touch. Some technologies can combine these senses into one device; more advanced devices can also enable interaction through other senses, such as speech, smell, motion or event kinaesthesia or proprioception. (Kinesthesia is the subliminal awareness of the position and movement of parts of the body by means of proprioceptory organs in the muscles and joints; Hale, 2006.) When multiple modalities are available, that is, when more than one sense can be used for some tasks or parts of tasks, the system is said to offer multimodal interaction functions. A system that is based on only one modality is called uni-modal.

When technology types are categorised in terms of the human sense for which they are designed, it is possible to classify interaction modality as either:

• input — perceiving information produced by the system through a device that allows a human to observe it by means of one or more senses, such as visual, auditory, or tactile; or

• output — performing an action with a specific device that would cause the system to perform a function. This output in turn becomes the input to the system in the form of discrete actuations (for example, key presses) or continuous actions (using a mouse or similar device to select or manipulate objects on a display).

Based upon primary senses used in interacting with a device, HSIs can now be divided into three categories: visual, auditory and mechanical motion. Devices associated with these modalities would be either input or output devices (that is, devices accepting user input or providing output to the user), or hybrid devices where both input and output are combined in the same device.

In the control room and any of the operational domains described earlier, a multimodal interface acts as a facilitator of HSI via two or more modes of input that go beyond the traditional keyboard and mouse. Multimodal HSIs can incorporate different combinations of speech, gesture, gaze, touch and other non-conventional modes of input. Touch and gesture have become the most commonly supported combinations of input methods, as seen in the rapid development of tablets and smartphone devices (Oviatt, 2003). These are already making an appearance in control rooms for non-control applications, like procedure following and calculations, but they are likely to become much more prominent in future NPPs, provided that they can be proven reliable.

These combined modalities open up a vast world of possibilities to interact with the work environment. It is already possible, for example, to interact with displays, not only with both hands, but also with all fingers simultaneously or by various combinations of ‘hand waving’.

Based on the description above, it is now possible to define four classes of technology: visual technologies for visual perception, audio technologies for audio perceptions, mechanical control devices for providing input to a system, and hybrid devices for multimodal interaction.