SigGraph 2002 Review

During my time in the film industry, I attended the ACM SigGraph conference several times. In 2002, I wrote a review for the UK Institution of Engineering and Technology, which originally appeared on their website. This is an old review, but I still enjoy reading about the technology that was on display that year. It’s fun to think about how far visual effects and technology has come since then. (Or … has it?)


Review by Andrew Bonello (MEng) – 2002

The annual ACM SigGraph conference was held in San Antonio, Texas from the 21st-26th July, 2002. One of the major annual computer graphics conferences, SigGraph is typically attended by a wealth of contributors and spectators from the academic community, as well as commercial representatives from the visual effects, computer game and graphics industries, to name but a few.

This year’s conference was hosted at the Henry B Gonzalez Conference Center, located in downtown San Antonio in southern Texas. The center is located just a minute’s walk away from the city’s famous Riverwalk, where a myriad of bars and restaurants line the river and are frequented in the evenings by tourists and locals alike.

One of the main events at SigGraph is the exhibition itself. Vendors present their latest products, often performing live demonstrations and inviting spectators to participate in short introductory courses where appropriate. The exhibition floor was a constant hive of activity, with many smaller companies present alongside the more established names.

Every year, dozens of contributors present papers at SigGraph encompassing a wide variety of topics. This year was no exception. The computer graphics industry is increasingly seeing partnerships between university departments and commercial companies marketing products in a related field.

Figure 1 - Part of the main exhibition floor

Figure 1 – Part of the main exhibition floor

This enables the investigations of expert researchers to be combined with the real-world requirements encountered in commercial environments to produce solutions to ever more challenging practical problems. Selected papers were presented in the following fields (amongst many others): realistic rendering of natural phenomena through physical simulation (flames and fluids), object-based editing of 2D images, realistic motion synthesis and motion generation using motion capture databases, and advanced illumination models for photorealistic rendering of transparent and translucent materials.

Below, a few outstanding papers from the conference are selected. A brief outline of each is given. This is followed by information about some of the other varied activities which took place during SigGraph 2002.

Trainable Videorealistic Facial Animation

Tony Ezzat, Gadi Geiger, and Tomaso Poggio from the Centre for Biological and Computational Learning at MIT presented a paper on Trainable Videorealistic Facial Animation. A human subject is filmed using a standard video camera. The subject speaks a pre-defined training set consisting of several utterances which might typically be spoken in everyday situations.

Figure 2 - Spectators were encouraged to interact with exhibits and displays wherever appropriate

Figure 2 – Spectators were encouraged to interact with exhibits and displays wherever appropriate

Through analysis and segmentation of the video data, the system is able to generate new video sequences in which the speaker is seen to speak (or even sing) novel phrases that were not included in the original training sequence.

After the training phase, such a sequence is synthesised by breaking down the new utterance to be generated into a “phone stream”. This represents the phonetic transcription of that utterance on a frame-by-frame basis. The stream is interpreted by the synthesis engine and used as a driver for combining labelled images corresponding to the appropriate phonetic elements in the training database. Corresponding audio data (ie. a recorded voice speaking the utterance specified by the phone stream) is overlaid onto the synthesised video to give the impression that the subject is actually speaking the phrases that are heard. The results presented were compelling. In tests, viewers were often unable to distinguish between an authentic video sequence (where the subject actually speaks an exact utterance) and a synthesised one.

The realism attained in the synthesised video sequences is augmented by the use of intelligent image compositing. Partial picture elements extracted from images from the training video sequence are carefully composited onto “backing” scenes of the subject such that the natural head movement that occurs when a human subject speaks is accounted for and does not cause continuity errors in the composite. Backing images are also chosen in order to give the subject natural behavioural characteristics over the course of the sequence, such as blinking.

For details please see the full paper.

Synthesis of Complex Dynamic Character Motion from Simple Animations

C. Karen Lui and Zoran Popovic from the University of Washington addressed the problem of synthesis of realistic animations for three-dimensional characters. Contemporary approaches to 3D character animation fall into two broad categories: traditional key-framed animation, and motion-capture driven animation. The former involves defining the precise pose of a synthetic character model manually in certain important “key” frames of the sequence. The animation system then interpolates each limb’s movement smoothly between these “anchored” poses. Key-framing is favored by many seasoned animators, who argue that it enables the introduction of artistic subtleties and expression into the resulting movements. Motion-capture, on the other hand, can provide highly realistic animation through the use of cameras linked to computers which observe (and later reproduce) a motion-capture subject?s actual three-dimensional movements. The motion-capture process implicitly records much of the nuance and physical character of a subject’s true movement. When these motions are mapped onto a computer-generated character, the resulting animation is often convincing and pleasing to the human eye. There is currently some debate over whether the use of motion capture techniques augments or damages the artistic process of 3D character animation. However, the two methods can and have been successfully used together in many cases. (For an example, see the feature film “Final Fantasy: The Spirits Within” – Square Pictures, 2001)

In their paper, Liu and Popovic lean toward the key-framed method of 3D character animation. However, they augment the more traditional animation pipeline by introducing an automated step. This generates complex synthetic motions based upon basic key-framed motions which are specified manually by the animator. During this initial stage, simplified key-poses of an articulated model of the subject are recorded. The exact pose of every degree of freedom of the model does not need to be precisely specified. The model is imbued with centers of gravity for each physical limb, and is placed into a global optimisation framework. This solves for the exact skeletal articulation of the model over the entire animation sequence. The solution of the optimisation is subject to the environmental constraints surrounding the subject. For example, a foot placed on the ground is an example of an “inverse-kinematic” constraint. Other constraints are the relative momentum of the various moving limbs with respect to one another, and a closeness measure which compares the candidate model pose to the original key-framed pose.

The effect of this synthesis step is to take basic key-framed animations and “render” them realistically, taking into account the physics of the model and its surroundings. The results presented gave convincing renditions of “realistic” motion, even for complicated movements such as a gymnast performing a back-flip off a high-bar, and a figure-skater performing an acrobatic twisting leap on ice. The method could provide an excellent framework for scenarios requiring the creation of large amounts of realistic motion data, for example, computer-generated animation programs for children?s television, or combat-based computer games.

Full details of this research, as well as an online version of the relevant paper, can be found on C. Karen Liu’s homepage.

Video Matting

Chuang, Agarwala, Curless (Uni. of Washington), Salesin (Uni. Of Washington/Microsoft Research) and Szeliski (Microsoft Research) presented work on video matting of complex scenes. A common requirement in both television and film special effects production is for a high-quality alpha-matte of an image sequence. Such a matte separates the foreground element of a scene (typically a human actor) from the background. The matte is usually specified using a variable opacity value (say between 0.0 and 1.0), set individually for each pixel. This is required to cater for the semi-transparency that often occurs within a foreground element. Consider, for example, an actor with an intricate hair-style which leaves wisps of hair near the subject boundary, or cigarette smoke being blown over the background. Accurate measurements of each foreground pixel?s transparency are required to enable high-quality digital compositing effects to be applied to the video sequence.

The method presented builds on established procedures for alpha-matte extraction. It attempts to dispense with the need for blue-screening, which is commonly used because of the high quality results obtained. Blue-screening has a high cost associated with it due to the requirement for a strictly controlled studio environment. Instead, the proposed method employs tri-maps specified by the user for a few key-frames of the sequence. A tri-map roughly segments the scene into areas which are “definitely foreground”, “definitely background”, and “unknown”. Optical flow (information about how various parts of the image are changing over time) is then used to propagate these tri-maps over the entire sequence.

Bayesian matting has recently been shown to work effectively on the difficult foreground regions described above, and is used as part of the proposed technique. Working from the tri-maps, foreground and background statistics in regions around the “unknown” areas are used to estimate the opacity and foreground and background colours of pixels in those unknown regions. Hence, a high-quality alpha-matte for each frame can be built up. This can be used, for example, to extract a foreground element to be overlaid onto a new background (an oft-used technique). Convincing results for several challenging image sequences were presented. The method was even used on commercial film sequences to extract a foreground matte which enabled new elements to added to the background scene.

Yung-Yu Chuang’s homepage includes more details on the Digital Matting project, as well as several other related projects.

In addition to the paper presentations, the first four days of SigGraph 2002 saw several courses being run. These typically gave an overview of the state of the art research in a particular topic. Recent advances were described in fields as diverse as perceptual principles for effective computer depiction, image-based segmentation for medical applications, multi-resolution image analysis, and multi-dimensional visualisation.

Attendees enjoying a more interactive environment were also invited to attend several sketches and applications. In one such example, typical procedures involved in performing a motion-capture shoot and processing the resulting data were followed, with pairs of attendees sat infront of computer workstations.

Panels were also held to discuss current hot-topics in computer graphics. Knowledgeable figures from both academic and commercial backgrounds were invited to be quizzed on questions such as “How does motion capture affect animation?” (see above) and “Digital humans: what roles will they play?”.

Figure 3 - Selected works were displayed in the Art Gallery

Figure 3 – Selected works were displayed in the Art Gallery

The conference featured an Art Gallery (Figure 3), where various passive and interactive pieces of work were on display. Exhibitors encouraged the spectators passing through to get involved in the more active pieces. This often involved donning a virtual reality helmet or device in order to influence the virtual environment into which the viewer was “immersed”.

The Emerging Technologies exhibition provided a forum for interesting new technological concepts. The Japan Science and Technology Corporation was presenting a fingernail-

mounted sense simulation device called “SmartFinger”. This uses electrical impulses to simulate texture passing under the finger as it is traced along smooth objects.

Sony Computer Science Laboratories demonstrated Block Jam (see Figure 4 below). This is an intriguing system of interconnecting electronic blocks which allow elaborate control-flows to be built up, thus generating a limitless variety of musical compositions through a connected digital audio synthesiser.

Figure 4 "BlockJam" - interconnecting blocks are assembled to create original musical themes

Figure 4 “BlockJam” – interconnecting blocks are assembled to create original musical themes

The University of Tokyo showcased a new force-feedback device used in a sword-to-sword combat game. The player dons a Virtual Reality helmet and holds an “active” sword whilst fighting the computer-generated enemy. The sword is equipped with a fly-wheel. This spins in order to simulate the feeling of the weapon impacting with the virtual creature as blows are struck.




Congratulations go to the ACM SigGraph committee who did an excellent job of making this enormous event run so smoothly. For details on the conference, or to find out about next year’s event please click on the logo below:


Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>