VISION in VR


As you move through the world, images of the objects that surround you fall onto your retinas. As you move past a fixed object, seeing it from various angles, the size and shape of the images on your retinas change, yet you effortlessly and unconsciously perceive the object to have a stable position, shape, and size.

This innate perceptual ability, honed by daily experience ever since infancy, is so fundamental and habitual to human beings that it seems almost absurd to talk about objects that could change their position or shape or size depending on how you moved your head. Yet in virtual reality there are lots of simulated objects that change their position, size, and shape as the head moves. The location of these objects appears to change as the head moves around, and their size and shape appear to change depending on whether they are being viewed directly in front of the user's head or off to the side.

In immersive virtual reality, a Head Mounted Display (HMD) and a head tracker are used to rapidly measure head position and create an image for each eye appropriate to its instantaneous viewpoint.

Usually, HMDs come with their built-in tracking system, so that the user must wear only one helmet on his head. But actually there are at least two technologies integrated into every HMD: the one dealing with the visualisation and the tracking one.

1. Head Mounted Displays

2. 3D Graphics


1. Head Mounted Displays

There are two primary technologies used in HMDs to deliver images to the eye: CRTs, essentially the same component as you find in your living room TV, though much smaller, and LCD's, the type of display used in watches, portable computers, and game systems. Each has certain advantages.

A problem with the current generation of headsets is that communication with the VR computer is achieved through direct cable links between the headset and the computer. This limits how far the human can walk. In the near future, cabling will be replaced by remote control scanners to overcome this problem. Later on, the computer need not even be nearby. Data from the user as to where he is, and data from the computer as to what he should be seeing, will be delivered using broadband communications networks. But since this is not yet possible, a key point when evaluating HMDs consists in their usability, which often suffers from weight and wiring.

The latest generation of HMDs is less bulky than the old motorcycle helmets, and looks more like a set of heavy duty industrial goggles, such as someone might use when handling dangerous chemicals. In the future, we can expect Virtual Reality spectacles, and later on contact lenses, both technologies of which are already in post-R&D phase of development. These systems can provide all the visual clues without the cumbersome weight and represent a major advance. Two other display systems, while not in common use, bear some examination.

One system has been getting a lot of attention but is a long way from delivery. It is the Virtual Retinal Laser Scanner, developed by the Human Interface Technology Lab at the University of Washington. This experimental system uses a modulated laser beam (very low power, of course) to paint an image directly upon the retina of the eye. The primary advantage of this technology at this point is high resolution and update rate. However, this system is years away from HMD use. Currently, it exists only as a proof of concept of providing very high resolution with low power requirements, but faces severe obstacles in reducing the size and providing colour output.

Another technology that is in use in a couple of HMD systems is the "light pipe." This system uses a collimated fibre optic bundle to carry an image to a set of mirrors, which then reflect the image into the eye. The image source can be anything that will connect to the feed end of the fibre optic bundle: a CRT, a slide projector, a microscope, or whatever image generator is appropriate. The system is desirable because of its versatility and the light weight of the HMD itself. It can provide colour and high resolution. However, the collimated bundles are very expensive and somewhat fragile, limiting the application of light pipe systems to the high end of the application spectrum.

Europe is relatively well-represented in the development of headsets, with companies like Division and Virtuality of the UK and VRT GmbH of Germany all offering self-developed headsets.


2. 3D-graphics

From desktop to immersive VR a key role is played by how spatial characteristics are rendered in the virtual environment. Crucial in this perspective is the quality of 3D graphics and how the visual performance of the interface is adequate to the project's goals. Technically speaking behind 3-D graphics activity lies the 3-D pipeline, which is an abstract entity (it would be better to say a procedure of data exchange) describing a series of stages responsible for moving the graphics from the application to the hardware to the screen. This activity is broken in different levels: at each level, the calls for shapes or objects are dynamically treated and moved thanks to the 3-D pipeline.

Since some of the stages in this sequential procedure rely heavily on floating-point calculations, a great importance is gained by the system processor. With the availability on the market of PC/Pentium and other high-end hardware, 3-D chip and board manufacturers may confidently delegate some of the floating-point work loads to the processor, so that the hardware can finally deal with the stages that involve manipulating pixels.

3-D pipelines may differ among the products in unimportant details, but for what concerns graphics generation they all share two major stages processes: geometry and rendering. The geometry engine resides primarily in software coding and relies the most on floating-point processing. The rendering portion of the pipeline instead deals mainly with pixel-based procedures and is best performed with dedicated hardware on the graphics card. This division is a rather natural one on a PC, not only because of the nature of operations being performed, but because of the data quantity involved in each step. Geometry engines deal with geometric figures in three-dimensional space and works primarily with the vertices of these objects. During the rendering stage, these vertices are transformed into individual pixels. Because the rendering portion involves much more data, a 3-D acceleration solution performs the rendering on the graphics card to avoid transferring it across the I/O bus. To move through a 3-D rendering without interruptions, the graphics subsystem needs to be able to execute all of the steps of the pipeline many times per second. The bandwidth challenges are similar to digital video in this regard; however, the 3-D graphics pipeline involves much more processing and transformation.

Before going any further in the description of a typical pipeline in detail, we should make clear that according to different design priorities system implementations can vary considerably, depending on which elements are supported in the 3-D graphics accelerator. However, an understanding of the various stages of the process can help to determine the relative power of each of the 3-D hardware items. The process begins, of course, with the application software that displays 3-D graphics. These applications make calls to 3-D APIs (see also 3.2.4), usually using high-level instructions for creating objects according to the instruction of the application. The geometry engine's tessellation stage then breaks these objects into individual triangles or polygons. In a process named as transformation, the geometry engine calculates the vertices of each polygon in three-dimensional space and stores them in system memory. The transformation stage is also able to rotate the objects appropriately, depending on the user's point of view. This user subjective experience is known as the viewpoint. A clipping algorithm then deletes the polygons that are redundant to the particular viewpoint.

The rendering portion of the pipeline is converts the polygons into individual pixel colour values to allow the display on the 2-D screen. This process is not as easy as it sounds and is generally performed by the 3-D graphics accelerator. These chips accept the co-ordinates of the polygons and produce the pixel representations in the video board's frame buffer. However, they also perform several operations on these objects along the way to ensure realism and accuracy.

Unlike traditional computer graphics, 3-D graphics display systems must contend with depth, a dimension that does not physically exist on our two-dimensional monitors but is crucial in accurately presenting a 3-D scene. There are two main approaches: z-sorting and z-buffering. Z-sorting is the easiest and cheapest way to handle depth, but it lacks accuracy. An application that uses z-sorting sorts the polygons in a scene according to their depth (z-axis) values to determine the order in which they should be displayed on the screen. The polygons are then displayed from back to front. This approach has its valuable advantages for it can be performed without requiring any special hardware requirement; unfortunately it can produce deteriorated images when graphic objects intersect or overlap each other. This approach has been often used for the design and development of many 3-D games, the use of z-sorting is meant to reduce their reliance on 3-D accelerator hardware.

Z-buffering is more sophisticated and complex solution that slowly is becoming a standard feature on most 3-D graphics cards. A z-buffer is a special memory area on the graphics card, similar functionally to its frame buffer. The similarity ends in the fact that instead of storing colour data for each pixel as the frame buffer does, the z-buffer stores a depth value for each pixel. By using this approach, the 3-D accelerator can compare the depth of each pixel it is about to write to the frame buffer against the appropriate location in the z-buffer. If the position of the new pixel is located behind the one already referred to in the z-buffer, the 3-D chip will ignore the frame-buffer write since that pixel should not be visible. This approach is more accurate since it ensures the through a better control the accuracy standard of each pixel; the degree of accuracy can vary. Just as the colour depth in the frame buffer (8-, 16-, 24-, or 32-bit) determines the number of colours that can be displayed, the width of the z-buffer (usually 16, 24, or 32 bits) determines the granularity of depth available. Most leisure applications require only 16-bit z-buffers, while more-sophisticated programs like CAD/CAM and scientific visualization can take advantage of the added accuracy of wider z-buffers.

The graduation of light is another major component of any 3-D graphics pipeline. Realistic presentation of objects and of the environmental characteristics cannot be achieved. Unlike 2-D graphics, realistic 3-D usually entails one or more light sources of various intensity and in different locations. Lighting can be applied by software or hardware, but the best techniques require hardware support. If a rendered object is solid, lighting equations need to be applied to produce realistic shadows across its surface. Like z-sorting, flat-shading offers a low-cost way of lighting objects that can be performed in software. The flat-shading approach applies a lighting equation to each polygon and determines a single resultant colour to be displayed throughout the entire face of that polygon.

While this technique is better than no shading at all, it's fairly blocky and unrealistic. The more sophisticated lighting technique is called Gouraud shading. Gouraud shading calculates the equation at the vertices of each polygon and interpolates the colours in between. This results in a smooth, realistic look. While the vertex colour calculations can be performed in the software portion of the pipeline, a 3-D chip that supports Gouraud shading can most efficiently interpolate the pixels between them. But the ultimate lighting technique is called Phong shading. With Phong shading, a graphics pipeline calculates the appropriate lighting equation for every single pixel. The processing and bandwidth demands of this technique make it impractical on PCs. The good news for our kind of applications, however, is that Gouraud shading works very well on the PC platform.

Texture mapping is one of the most challenging aspects of 3-D, and the most important in producing high-quality renderings. Until now, we've discussed three-dimensional objects in terms of vertices of polygons and everything in between. But faces of objects in the real world are seldom uniform; they usually have patterns and textures. To present the illusion of reality, many 3-D accelerator chips offer the capability to map bitmap images, called textures, to the surface of rendered polygons. This is a demanding task, but an important one for high-quality graphics. One problem that occurs with texture mapping surfaces is perspective. If a texture bitmap is applied to a surface that extends along the z-axis, the image may be irregularly contorted if it is not altered to account for perspective. Most accelerators that provide texture mapping also provide perspective correction.

Chips for 3-D that support texture mapping must also contend with other subtle problems that occur when textures of various sizes are applied to polygons. When a bitmap is directly applied pixel by pixel to a polygon, strange visual effects can occur when the user rotates the image. The image can appear to sparkle or pop out at you for no reason. A common way to deal with this is using a process called bilinear filtering. When a chip uses bilinear filtering, it samples four adjacent pixels and places the weighted average on the polygon. However, other problems occur when a texture is applied to a polygon that is smaller than the texture itself. To deal with this, some chips support MIP mapping and store multiple versions of the same texture in varying resolutions and sizes. The chip can then select the appropriate size texture for the depth of the target polygon. Bilinear filtering is often applied to MIP-mapped textures as well, but even more complications develop as an object moves along the z-axis and is switched from one size MIP texture to another. When this occurs, the object may suddenly change focus or resolution. To handle this, some chips support trilinear filtering that simply interpolates between the different MIP textures for a smooth transition of bitmaps. Texture mapping is a critical part of a 3-D chip, and the quality varies; however, these features are a good sign of quality support.

Transparency and fog are other features that are common in 3-D accelerators. These effects are usually implemented through the alpha channel, a separate channel tagged on to the traditional red, green, and blue channels that contains a value that indicates the amount of transparency for each pixel. The chip takes this value into account when performing a frame-buffer write to combine the new pixel value with the existing one appropriately to produce the desired effect. Some chips provide various modes of applying transparency that use different formulas to combine the colours.


For any questions or requests, please contact auxo.psylab@auxologico.inet.it