Multimodal Rendering and Perception

The importance of multimodal feedback in computer graphics and interaction has been recognized for a long time and is motivated by our daily interaction with the world. Streams of information coming from different channels complement and integrate each other, with some modality possibly dominating over the remaining ones, depending on the task.

Interactive 3D sound rendering


The task of evaluating the location of a sound source is accomplished by integrating cues for the perception of azimuth and of elevation (i.e., angular directions in the horizontal and vertical planes, respectively). These auditory cues are produced by physical effects of the diffraction of sound waves by the torso, shoulders, head and outer ears (pinnae), which modify the spectrum of the sound that reaches the eardrums. All these effects are captured in the so-called head-related transfer function (HRTF). Effects of a reverberant environment also play a relevant role: early reflections and the change in proportion of reflected to direct energy are important cues especially for distance estimation.
We have developed algorithms for real-time 3D sound rendering that can be employed in interactive settings, when the the user can move within a virtual environment. Active head motion can be exploited to generate additional aural information for the evaluation of the spatial position of sound sources. As an example, an auditory motion parallax effect results from range-dependent azimuth changes, so that for close sources a small shift causes a large azimuth change, while for distant sources there is almost no azimuth change.

Haptic-audio interactions

multimodal1 An advantage of physically based approaches is interactivity and ease in associating motion to sound control, so that the sound feedback responds in a natural way to user gestures and actions. Moreover they can allow the creation of dynamic virtual environments in which sound-rendering attributes are incorporated into data structures that provide multimodal encoding of object properties: shape, material, elasticity, texture, mass, and so on. In this way, a unified description of the physical properties of an object can be used to control the visual, haptic, and sound rendering. We have studied audio-haptic rendering and perception. Rendering a virtual surface is the prototypical haptic task. Properly designed auditory feedback can be combined with haptics in order to improve perception of stiffness. Physical limitations of haptic devices constrain the range for haptic stiffness rendering, and the addition of auditory feedback can compensate for such limitations and enhance the range of perceived stiffness that can be effectively conveyed to the user.

Multimodal interaction and control metaphors

A few examples of recently developed applications that involve multimodal interaction with a focus on audio input/feedback.

Peek-a-book: playing with an interactive book

book_sketchesIn collaboration with the University of Verona a prototype of a new digitally augmented "lift-the-flap" book for children has been developed. Sensors are used to allow continuous user interaction, just in the same way as the children do when manipulating rattles or other physical sounding objects, and to generate (not just play back) sounds in real time. The book-prototype is designed as a set of scenarios where narration develops through sonic narratives, and where exploration is stimulated through continuous interaction with sensors and auditory feedback. Physical models of impacts and friction have been used to synthesize a variety of sounds: the steps of a walking character, the noise of a fly, the engine of a motor bike, and the sound of an inflatable ball.

The croacker: a physical model and a lego controller

into The futurist composer Luigi Russolo designed several versions of an instrument called Intonarumori ("Noise Intoner"), which differ on material of the vibrating string, shape and material of the exciting wheel and number of vibrating strings. In collaboration with the Medialogy deparment of the Aalborg University in Copenhagen we designed an interface inspired by the Gracidatore (the Croaker). This instrument allowed to obtain plucked string sonorities. A wheel, rotating at a speed controlled by an external crank, excited a vibrating string attached to the wooden sound-box. The player controlled the string tension with a lever. croaker Our digital Croaker is an interface built with Lego blocks and enhanced with sensors. Two potentiometers track the rotation of the crank and the displacement of the moving lever. Both sensors send a continuous stream of data, and are attached to a microprocessor connected to a computer. The data sent by the sensors are used as input to ad-hoc designed physicallyy-based sound synthesis modules of the instrument. The Croaker is easy to learn and is played by controlling the position of the lever with the left hand, while rotating the crank with the right hand.


Auditory display and sonification

sonif The implementation of intuitive and effective audio communication is an important aspect in the development of Human-Computer Interfaces. The auditory system allows sounds to be heard and cognitively processed by the user even when paying attention to other modalities. Auditory alarms and alerts are a typical example of auditory display of information. Within a project coordinated by the University of Verona, and in collaboration with the Centro Maree of Venice we have proposed a novel auditory alerting system for high tide, in which different expected tide levels are associated to different non-verbal sounds. Based on this study, the new system has been implemented in 2008 (note however that the final sounds have not been designed by us). The work has been hosted in three exhibitions around Europe (curator James Beckett): Amsterdam, Bergen, Berlin
In order to design effective auditory display, the expressive content of sounds can be taken into account since expression supplies information beyond the symbolic and iconic information provided by explicit messages such as texts or scores. sonif In particular expressive information can be used to communicate affective content when reacting to the actions of the user. To this end, effective mappings between acoustic parameters and expression need to be found. For musical performances, expression is mainly controlled using timing and intensity, but when considering non-structured sounds additional features must be considered. Our approach is based on the analysis of recorded non-structured sounds played by musicians. We found that different levels of arousal and valence can be effectively communicated by single tones, using a few perceptually relevant sound features (e.g. attack, spectral centroid and roughness) beside timing and intensity.

Top of page

Research Threads

3D Audio
Audio in multimodal interfaces
Audio restoration
Interactive environments for learning
Music expression modeling
Physically-based sound modeling
Virtual rehabilitation

History of CSC research


Multimodal rendering and perception
Interaction and control metaphors
Auditory display and sonification


A complete list of projects and industrial partners can be found here.