Jakob
Nielsen's Alertbox for December 1995:
Guidelines for Multimedia on the Web
Multimedia is gaining popularity on
the Web with several technologies to support use of animation, video, and audio
to supplement the traditional media of text and images. These new media provide
more design options but also require design discipline. Unconstrained use of
multimedia results in user interfaces that confuse users and make it
harder for them to understand the information. Not every webpage needs
to bombard the user with the equivalent of Times Square in impressions and
movement.
Notes about this month's column:
This column is
longer than usual and much longer than recommended for a web page. I am doing
this on request because many people have asked for advice on how to design for
the new dynamic web media. Some of the links in this column point to Javatized
pages and will not show anything interesting if your browser does not support
the version of Java used on the pages.
Animation
Moving images have an overpowering effect on the human
peripheral vision. This is a survival instinct from the time when it was of
supreme importance to be aware of any saber-toothed tigers before they could
sneak up on you. These days, tiger-avoidance is less of an issue, but anything
that moves in your peripheral vision still dominates your awareness: it is very
hard to, say, concentrate on reading text in the middle of the a page if there
is a spinning logo up in the corner. Never include a permanently moving
animation on a web page since it will make it very hard for your users to
concentrate on reading the text.
Animation is good for:
- Showing continuity in transitions. When something has two
or more states, then changes between states will be much easier for users to
understand if the transitions are animated instead of being instantaneous. An
animated transition allows the user to track the mapping between different
subparts through the perceptual system instead of having to involve the
cognitive system to deduce the mappings. A great example is the winner of the
first Java programming contest: proving the Pythagorean theorem by animating
the movement of various squares and triangles as they move around to
demonstrate that two areas are the same size (unfortunately, this otherwise
good page uses animated text inappropriately: the text moves constantly and is
hard to relate to the events in the main animation).
- Indicating dimensionality in transitions. Sometimes
opposite animated transitions can be used to indicate movement back and forth
along some navigational dimension. For example, paging through a series of
objects can be shown by an animated sweep from the right to the left for
turning the page forward (if using a language where readers start on the
left). Turning back to a previous page can then be shown by the opposite
animation (sweeping from the left to the right). If users move orthogonally to
the sequence of pages then other animated effects can be used to visualize the
transition. For example following a hypertext link to a footnote might be
shown by a "down" animation and tunneling through hyperspace to a different
set of objects might be shown by an "iris open" animation.
One example
used in several user interfaces is the use of zooming to indicate that a new
object is "grown" from a previous one (e.g., a detailed view or property list
opened by clicking on an icon) or that an object is closed or minimized to a
smaller representation. Zooming out from the small object to the enlargement
is a navigational dimension and zooming in again as the enlargement is closed
down is the opposite direction along that dimension.
- Illustrating change over time. Since an animation is a
time-varying display, it provides a one-to-one mapping to phenomena that
change over time. For example, deforestation of the rain
forest can be illustrated by showing a map with an animation of the
covered area changing over time.
- Multiplexing the display. Animation can be used to show
multiple information objects in the same space. A typical example is client-side
imagemaps with explanations that pop up as the user moves the cursor over
the various hypertext anchors. It is also possible to indicating the active
areas by having them shimmer or by surrounding them with a marquee of
"marching ants". As always, objects should only move when appropriate (e.g.,
when the cursor is over the image).
- Enriching graphical representations. Some types of
information are easier to visualize with movement than with still pictures.
Consider, for example, how to visualize the tool used to remove pixels in a
graphics application. The canonical icon is an eraser as shown on the left in
the following figure, but in user testing I have sometimes found that people
think that the icon is a tool for drawing three-dimensional boxes. Instead,
one can use an animated icon as shown on the right in the figure: when the
icon animates, the eraser is moved over the background and pixels are removed,
clearly showing the functionality of the tool.
In icon design,
it is always easier to illustrate objects (a box) than operations (removing
pixels), but animation provides the perfect support for illustrating any kind
of change operation. In an experiment reported at the CHI'91
conference, Baecker, Small, and
Mander increased the comprehension of a set of icons from 62% to 100% by
animating them. Of course, an icon should only animate when the user indicates
a special interest in it (for example, by placing the mouse cursor over it or
by looking at it for more than a second if eye-tracking is available).
Especially considering the preponderance of toolbars in current applications
it would be highly distracting if all icons were to animate at all times.
- Visualizing three-dimensional structures. Since the
computer screen is two-dimensional, users can never get a full understanding
of a three-dimensional structure by a single illustration, no matter how well
designed. Animation can be used to emphasize the three-dimensional nature of
objects and make it easier for users to visualize their spatial structure. The
animation need not necessarily spin the object in a full circle: just slowly
turning it back and forth a little will often be sufficient. The movement
should be slow to allow the user to focus on the structure of the object.
Three-dimensional objects may be moved under user control, but often it is
better if the designer determines in advance how to best animate a movement
that provides optimal understanding of the object: this pre-determined
animation can then be activated by the user by simply placing the cursor over
the object, whereas user-controlled movements require the user to understand
how to manipulate the object (which is inherently difficult with a
two-dimensional control device like the mouse used with most computers - to be
honest, 3D is never going to make
it big time in user interfaces until we get a true 3D control device).
- Attracting attention. Finally, there are a few cases
where the ability of animation to dominate the user's visual awareness can be
turned to an advantage in the interface. If the goal is to draw the user's
attention to a single element out of several or to alert the user to updated
information then an animated headline will do the trick. Animated text should
be drawn by a one-time animation (e.g., text sliding in from the right,
growing from the first character, or smoothly becoming larger) and never by a
continuous animation since moving text is much harder to read than static
text. The user should be drawn to the new text by the initial animation and
then left in peace to read the text without further distraction.
Video
Due to bandwidth constraints, use of video should currently be
minimized on the web. Eventually, video will be used more widely, but for the
next few years most videos will be short and will use very small viewing areas.
Under these constraints, video has to serve as a supplement to text and images
more often than it will provide the main content of a website.
Currently, video is good for:
- Promoting television shows, films, or other non-computer media that
traditionally have used trailers in their advertising.
- Giving users an impression of a speaker's personality. Unfortunately, most
corporate executives project a lot less personality than, say, Captain Janeway
from Star Trek, so it is not necessarily a good idea to show a talking head
unless the video clip truly adds to the user's experience.
- Showing things that move. For example a clip from a ballet. Product demos
of physical products (e.g., a coin counter) are also well suited for video,
whereas software demos are often better presented as a series of full-sized
screendumps where the potential customer can study the features at length.
A major problem with most videos on the web right now is that their
production values are much too low. User studies of CD-ROM productions have
found that users expect broadcast-quality production values and that users get
very impatient with low-quality video.
A special consideration for video (and spoken audio) is that any narration
may lead to difficulty for international users as well as for users with a
hearing disability. People may be able to understand written text in a foreign
language because they have time to read it at their own speed and because they
can look up any unknown words in a dictionary. Spoken words are sometimes harder
to understand, especially if the speaker is sloppy, has a dialect, speaks over a
distracting soundtrack, or simply speaks very fast. Poor audio quality may
contribute to the difficulty of understanding spoken text: it is recommended to
use professional quality audio equipment and/or lavaliere microphones when
recording a narrator. The classic solution to these problems is to use subtitles
but as shown in the following figure, subtitles require special attention on the
web.
The figure shows a
subtitled frame from Sun's Starfire video. The small subtitles (left image) look
good on the original
video tape (JPEG, 197 K) but are virtually unreadable on the smaller image
size currently used for computerized videos. Using bigger subtitles that have
been anti-aliased for computer viewing (middle image) improves readability
significantly, but the best results are achieved by the letterbox format (right
image). In this example, the subtitles in the letterbox are constructed by
enlarging the video area for the movie file with a 24-pixels high black area.
Doing so does not increase the file size proportionally since the black area
compresses very nicely. Even so, it would be better to transmit the subtitles as
ASCII (or Unicode) and have them rendered in the letterbox on the client
machine: a perfect job for an applet. It would even be possible to have the user
select the language for the subtitles through a preference setting or a pop-up menu
(JPEG, 206 K).
Audio
The main benefit of audio is that it provides a channel that is
separate from that of the display. Speech can be used to offer commentary or
help without obscuring information on the screen. Audio can also be used to
provide a sense of place or mood as done to perfection in the game
Myst. Mood-setting audio should employ very quiet background sounds
in order not to compete with the main information for the user's attention.
Music is probably the most obvious use of sound. Whenever you need to inform
the user about a certain work of music, it makes much more sense to simply play
it than to show the notes or to try to describe it in words. For example, if you
are out to sell seats to the La Scala
opera in Milan, Italy, it is an obvious ploy to allow users to hear a
snippet of the opera: yes, Verdi
really could write a good tune (AU file, 1.4 MB), so maybe I will go
and hear the opera next time I am over there. In fact, the audio clip is
superior to the video
clip from the same opera which is too fidget to impress the user and yet
takes much too long to download (QuickTime, 3.6 MB).
Voice recordings can be used instead of video to provide a sense of the
speaker's personality
(AU file, 1.4 MB): the benefits are smaller files, easier production, and the
fact that people often sound good even if they would look dull on television.
Speech is also perfect for teaching users the pronunciation of words as done by
the French wine site: it used to be the case that you could buy good wine
cheaply by going for chateaus that were hard to pronounce (because nobody dared
ask for them in shops or restaurants) -- no more in the webbed world.
Non-speech sound effects can be used as an extra dimension in the user
interface to inform users about background events: for example, the arrival of
new information could be signaled by the sound of a newspaper dropping on the
floor and the progress of a file download could be indicated by the sound of
water pouring into a glass that gradually fills up. These kinds of background
sounds have to be very quiet and nonintrusive. Also, there always needs to be a
user preference setting to turn them off.
Good quality sound is known to enhance the user experience substantially so
it is well worth investing in professional quality sound production. The classic
example is the video game study where users claimed that the graphics were
better when the sound was improved, even though the exact same graphics were
used for the poor-quality sound and the good-quality sound experiments. Simple
examples from web user interfaces are the use of a low-key clicking sound to
emphasize when users click a button and the use of opposing sounds (cheeeek
chooook) when moving in different directions through a navigation space.
Response Time
Many multimedia elements are big and take a long time to
download with the horribly low bandwidth available to most users. It is
recommended that the file format and size are indicated in parentheses after the
link whenever you point to a file that would take more than 15 seconds to
download with the bandwidth available to most of your users. If you don't know
what bandwidth your users are using you should do a survey to find out since
this information is important for many other page design issues. At this time,
most home users have at most 28.8 Kb, meaning that files longer than 50 KB need
a size warning. Business users often have higher bandwidth, but you should
probably still mark files larger than about 200 KB.
The 15-second guideline in the previous paragraph was derived from the basic
set of response time
values that have been known since around 1968. System response needs to
happen within about 10 seconds to keep the user's attention, so users should be
warned before slower operations. On the web, current users have been trained to
endure so much suffering that it may be acceptable to increase the limit value
to 15 seconds. If we ever want the general population to start treating the web
as more than a novelty, we will have to provide response times within the
acceptable ranges, though.
Design of client-side multimedia effects has to consider the other two
response time limits also:
- The feeling of directly manipulating objects on the screen requires
0.1 second response times. Thus, the time from the user types
a key on the keyboard or moves the mouse until the desired effect happens has
to be faster than 0.1 seconds if the goal is to let the user control a screen
object (e.g., rotate a 3D figure or get pop-ups while moving over an
imagemap).
- If users do not need to feel a direct physical connection between their
actions and the changes on the screen, then response times of about
1.0 second become acceptable. Any slower response and the
user will start feeling that he or she is waiting for the computer instead of
operating freely on the data. So, for example, jumping to a new page or
recalculating a spreadsheet should happen within a second. When response times
surpass a second, users start changing their behavior to a more restricted use
of the system (for example, they won't try out as many options or go to as
many pages).
Next month: Relationships on
the Web (no, not about dating.)
See Also: List of other Alertbox columns