Jakob Nielsen's Alertbox for December 1995:

Guidelines for Multimedia on the Web

Multimedia is gaining popularity on the Web with several technologies to support use of animation, video, and audio to supplement the traditional media of text and images. These new media provide more design options but also require design discipline. Unconstrained use of multimedia results in user interfaces that confuse users and make it harder for them to understand the information. Not every webpage needs to bombard the user with the equivalent of Times Square in impressions and movement.

Notes about this month's column:
This column is longer than usual and much longer than recommended for a web page. I am doing this on request because many people have asked for advice on how to design for the new dynamic web media. Some of the links in this column point to Javatized pages and will not show anything interesting if your browser does not support the version of Java used on the pages.

Animation

Moving images have an overpowering effect on the human peripheral vision. This is a survival instinct from the time when it was of supreme importance to be aware of any saber-toothed tigers before they could sneak up on you. These days, tiger-avoidance is less of an issue, but anything that moves in your peripheral vision still dominates your awareness: it is very hard to, say, concentrate on reading text in the middle of the a page if there is a spinning logo up in the corner. Never include a permanently moving animation on a web page since it will make it very hard for your users to concentrate on reading the text.

Animation is good for:

Showing continuity in transitions. When something has two or more states, then changes between states will be much easier for users to understand if the transitions are animated instead of being instantaneous. An animated transition allows the user to track the mapping between different subparts through the perceptual system instead of having to involve the cognitive system to deduce the mappings. A great example is the winner of the first Java programming contest: proving the Pythagorean theorem by animating the movement of various squares and triangles as they move around to demonstrate that two areas are the same size (unfortunately, this otherwise good page uses animated text inappropriately: the text moves constantly and is hard to relate to the events in the main animation).
Indicating dimensionality in transitions. Sometimes opposite animated transitions can be used to indicate movement back and forth along some navigational dimension. For example, paging through a series of objects can be shown by an animated sweep from the right to the left for turning the page forward (if using a language where readers start on the left). Turning back to a previous page can then be shown by the opposite animation (sweeping from the left to the right). If users move orthogonally to the sequence of pages then other animated effects can be used to visualize the transition. For example following a hypertext link to a footnote might be shown by a "down" animation and tunneling through hyperspace to a different set of objects might be shown by an "iris open" animation.
One example used in several user interfaces is the use of zooming to indicate that a new object is "grown" from a previous one (e.g., a detailed view or property list opened by clicking on an icon) or that an object is closed or minimized to a smaller representation. Zooming out from the small object to the enlargement is a navigational dimension and zooming in again as the enlargement is closed down is the opposite direction along that dimension.
Illustrating change over time. Since an animation is a time-varying display, it provides a one-to-one mapping to phenomena that change over time. For example, deforestation of the rain forest can be illustrated by showing a map with an animation of the covered area changing over time.
Multiplexing the display. Animation can be used to show multiple information objects in the same space. A typical example is client-side imagemaps with explanations that pop up as the user moves the cursor over the various hypertext anchors. It is also possible to indicating the active areas by having them shimmer or by surrounding them with a marquee of "marching ants". As always, objects should only move when appropriate (e.g., when the cursor is over the image).
Enriching graphical representations. Some types of information are easier to visualize with movement than with still pictures. Consider, for example, how to visualize the tool used to remove pixels in a graphics application. The canonical icon is an eraser as shown on the left in the following figure, but in user testing I have sometimes found that people think that the icon is a tool for drawing three-dimensional boxes. Instead, one can use an animated icon as shown on the right in the figure: when the icon animates, the eraser is moved over the background and pixels are removed, clearly showing the functionality of the tool. In icon design, it is always easier to illustrate objects (a box) than operations (removing pixels), but animation provides the perfect support for illustrating any kind of change operation. In an experiment reported at the CHI'91 conference, Baecker, Small, and Mander increased the comprehension of a set of icons from 62% to 100% by animating them. Of course, an icon should only animate when the user indicates a special interest in it (for example, by placing the mouse cursor over it or by looking at it for more than a second if eye-tracking is available). Especially considering the preponderance of toolbars in current applications it would be highly distracting if all icons were to animate at all times.
Visualizing three-dimensional structures. Since the computer screen is two-dimensional, users can never get a full understanding of a three-dimensional structure by a single illustration, no matter how well designed. Animation can be used to emphasize the three-dimensional nature of objects and make it easier for users to visualize their spatial structure. The animation need not necessarily spin the object in a full circle: just slowly turning it back and forth a little will often be sufficient. The movement should be slow to allow the user to focus on the structure of the object. Three-dimensional objects may be moved under user control, but often it is better if the designer determines in advance how to best animate a movement that provides optimal understanding of the object: this pre-determined animation can then be activated by the user by simply placing the cursor over the object, whereas user-controlled movements require the user to understand how to manipulate the object (which is inherently difficult with a two-dimensional control device like the mouse used with most computers - to be honest, 3D is never going to make it big time in user interfaces until we get a true 3D control device).
Attracting attention. Finally, there are a few cases where the ability of animation to dominate the user's visual awareness can be turned to an advantage in the interface. If the goal is to draw the user's attention to a single element out of several or to alert the user to updated information then an animated headline will do the trick. Animated text should be drawn by a one-time animation (e.g., text sliding in from the right, growing from the first character, or smoothly becoming larger) and never by a continuous animation since moving text is much harder to read than static text. The user should be drawn to the new text by the initial animation and then left in peace to read the text without further distraction.

Video

Due to bandwidth constraints, use of video should currently be minimized on the web. Eventually, video will be used more widely, but for the next few years most videos will be short and will use very small viewing areas. Under these constraints, video has to serve as a supplement to text and images more often than it will provide the main content of a website.

Currently, video is good for:

Promoting television shows, films, or other non-computer media that traditionally have used trailers in their advertising.
Giving users an impression of a speaker's personality. Unfortunately, most corporate executives project a lot less personality than, say, Captain Janeway from Star Trek, so it is not necessarily a good idea to show a talking head unless the video clip truly adds to the user's experience.
Showing things that move. For example a clip from a ballet. Product demos of physical products (e.g., a coin counter) are also well suited for video, whereas software demos are often better presented as a series of full-sized screendumps where the potential customer can study the features at length.

A major problem with most videos on the web right now is that their production values are much too low. User studies of CD-ROM productions have found that users expect broadcast-quality production values and that users get very impatient with low-quality video.

A special consideration for video (and spoken audio) is that any narration may lead to difficulty for international users as well as for users with a hearing disability. People may be able to understand written text in a foreign language because they have time to read it at their own speed and because they can look up any unknown words in a dictionary. Spoken words are sometimes harder to understand, especially if the speaker is sloppy, has a dialect, speaks over a distracting soundtrack, or simply speaks very fast. Poor audio quality may contribute to the difficulty of understanding spoken text: it is recommended to use professional quality audio equipment and/or lavaliere microphones when recording a narrator. The classic solution to these problems is to use subtitles but as shown in the following figure, subtitles require special attention on the web.

Three screendumps from a videotape with different kinds of
subtitles

The figure shows a subtitled frame from Sun's Starfire video. The small subtitles (left image) look good on the original video tape (JPEG, 197 K) but are virtually unreadable on the smaller image size currently used for computerized videos. Using bigger subtitles that have been anti-aliased for computer viewing (middle image) improves readability significantly, but the best results are achieved by the letterbox format (right image). In this example, the subtitles in the letterbox are constructed by enlarging the video area for the movie file with a 24-pixels high black area. Doing so does not increase the file size proportionally since the black area compresses very nicely. Even so, it would be better to transmit the subtitles as ASCII (or Unicode) and have them rendered in the letterbox on the client machine: a perfect job for an applet. It would even be possible to have the user select the language for the subtitles through a preference setting or a pop-up menu (JPEG, 206 K).

Audio

The main benefit of audio is that it provides a channel that is separate from that of the display. Speech can be used to offer commentary or help without obscuring information on the screen. Audio can also be used to provide a sense of place or mood as done to perfection in the game Myst. Mood-setting audio should employ very quiet background sounds in order not to compete with the main information for the user's attention.

Music is probably the most obvious use of sound. Whenever you need to inform the user about a certain work of music, it makes much more sense to simply play it than to show the notes or to try to describe it in words. For example, if you are out to sell seats to the La Scala opera in Milan, Italy, it is an obvious ploy to allow users to hear a snippet of the opera: yes, Verdi really could write a good tune (AU file, 1.4 MB), so maybe I will go and hear the opera next time I am over there. In fact, the audio clip is superior to the video clip from the same opera which is too fidget to impress the user and yet takes much too long to download (QuickTime, 3.6 MB).

Voice recordings can be used instead of video to provide a sense of the speaker's personality (AU file, 1.4 MB): the benefits are smaller files, easier production, and the fact that people often sound good even if they would look dull on television. Speech is also perfect for teaching users the pronunciation of words as done by the French wine site: it used to be the case that you could buy good wine cheaply by going for chateaus that were hard to pronounce (because nobody dared ask for them in shops or restaurants) -- no more in the webbed world.

Non-speech sound effects can be used as an extra dimension in the user interface to inform users about background events: for example, the arrival of new information could be signaled by the sound of a newspaper dropping on the floor and the progress of a file download could be indicated by the sound of water pouring into a glass that gradually fills up. These kinds of background sounds have to be very quiet and nonintrusive. Also, there always needs to be a user preference setting to turn them off.

Good quality sound is known to enhance the user experience substantially so it is well worth investing in professional quality sound production. The classic example is the video game study where users claimed that the graphics were better when the sound was improved, even though the exact same graphics were used for the poor-quality sound and the good-quality sound experiments. Simple examples from web user interfaces are the use of a low-key clicking sound to emphasize when users click a button and the use of opposing sounds (cheeeek chooook) when moving in different directions through a navigation space.

Response Time

Many multimedia elements are big and take a long time to download with the horribly low bandwidth available to most users. It is recommended that the file format and size are indicated in parentheses after the link whenever you point to a file that would take more than 15 seconds to download with the bandwidth available to most of your users. If you don't know what bandwidth your users are using you should do a survey to find out since this information is important for many other page design issues. At this time, most home users have at most 28.8 Kb, meaning that files longer than 50 KB need a size warning. Business users often have higher bandwidth, but you should probably still mark files larger than about 200 KB.

The 15-second guideline in the previous paragraph was derived from the basic set of response time values that have been known since around 1968. System response needs to happen within about 10 seconds to keep the user's attention, so users should be warned before slower operations. On the web, current users have been trained to endure so much suffering that it may be acceptable to increase the limit value to 15 seconds. If we ever want the general population to start treating the web as more than a novelty, we will have to provide response times within the acceptable ranges, though.

Design of client-side multimedia effects has to consider the other two response time limits also:

The feeling of directly manipulating objects on the screen requires 0.1 second response times. Thus, the time from the user types a key on the keyboard or moves the mouse until the desired effect happens has to be faster than 0.1 seconds if the goal is to let the user control a screen object (e.g., rotate a 3D figure or get pop-ups while moving over an imagemap).
If users do not need to feel a direct physical connection between their actions and the changes on the screen, then response times of about 1.0 second become acceptable. Any slower response and the user will start feeling that he or she is waiting for the computer instead of operating freely on the data. So, for example, jumping to a new page or recalculating a spreadsheet should happen within a second. When response times surpass a second, users start changing their behavior to a more restricted use of the system (for example, they won't try out as many options or go to as many pages).

Next month: Relationships on the Web (no, not about dating.)

See Also: List of other Alertbox columns