Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Understanding the Limits of Human Vision for 3D Graphics: A Study by Michael F. Deering, Lecture notes of Computer Graphics

This document, authored by Michael F. Deering of Sun Microsystems, explores the limits of human vision in relation to 3D graphics. The study estimates the number of variable resolution pixels per eye that can be perceived by the human visual system and predicts the rendering rate required to saturate it. The document also compares visual and display parameters for several representative display devices.

Typology: Lecture notes

2021/2022

Uploaded on 09/12/2022

geryle
geryle 🇺🇸

4.5

(23)

277 documents

1 / 6

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
The Limits of Human Vision
Michael F. Deering
Sun Microsystems
ABSTRACT
A model of the perception limits of the human visual system is presented, resulting in an esti-
mate of approximately 15 million variable resolution pixels per eye. Assuming a 60 Hz stereo
display with a depth complexity of 6, we make the prediction that a rendering rate of approxi-
mately ten billion triangles per second is sufficient to saturate the human visual system. 17 dif-
ferent physically realizable computer display configurations are analyzed to understand their
visual perceptions limits. The displays include direct view CRTs, stereo projection displays,
multi-walled immersive stereo projection displays, head-mounted displays, as well as standard
TV and movie displays for comparison. A theoretical maximum triangle per second rate is also
computed for each of these display configurations.
Keywords: Visual Perception, Image quality, virtual reality, stereo displays, immersive projec-
tion displays, fishtank stereo.
1INTRODUCTION
With improvements in 3D graphics technology we are on the brink of producing hardware that
matches or exceeds the needs of the human visual system. This fact must now be taken into
account when designing 3D graphics hardware. As part of an effort to design future hardware
and display systems, a study was made of the rendering impact of matching 3D rendering ca-
pability to the known limits of the human visual system.
While real-time 3D computer graphics has historically traded off image quality and resolution
to meet frame rate and cost constraints, this is becoming less and less the case. The ultimate
limits of human visual perception must now be included in hardware trade-offs. A model of
such visual limits will be developed, and illustrated in terms of several common display devic-
es. Combined with other rendering assumptions, an estimated theoretical bounds on the maxi-
mum triangle rendering rate needed to saturate the human visual system will be made.
2 Limits of Human Vision
The eventual consumer of all 3D rendering is the human visual system. With display technol-
ogy and real-time hardware rendering speeds ever increasing, we are on the threshold of a gen-
eration of machines that will surpass the visual system’s input capabilities. On a machine with
a single user and a sustained render frame rate of 60 Hz, even present day CRTs exceed the
maximum spatial frequency detection capability of the visual system, in regions away from
where the fovea is looking. To take advantage of this situation, a hardware rendering architec-
ture could implement some form of variable resolution frame buffer. In such a frame buffer,
the spatial resolution is not fixed, but must be per-frame programmable to match the variable-
resolution nature of human vision. It is assumed that such pixels must be antialiased, and that
the antialiasing filter’s frequency cut-off must also vary dynamically to match the local effec-
tive pixel density. An important question is to understand the precise quantitative details of the
variable resolution frame buffer configurations necessary to match the variable resolution na-
ture of the human visual system. This section will provide this information in familiar computer
pf3
pf4
pf5

Partial preview of the text

Download Understanding the Limits of Human Vision for 3D Graphics: A Study by Michael F. Deering and more Lecture notes Computer Graphics in PDF only on Docsity!

A A model of the perception limits of the human visual system is presented, resulting in an esti-mate of approximately 15 million variable resolution pixels per eye. Assuming a 60 Hz stereodisplay with a depth complexity of 6, we make the prediction that a rendering rate of approxi-mately ten billion triangles per second is sufficient to saturate the human visual system. 17 dif-ferent physically realizable computer display configurations are analyzed to understand theirvisual perceptions limits. The displays include direct view CRTs, stereo projection displays,multi-walled immersive stereo projection displays, head-mounted displays, as well as standardTV and movie displays for comparison. A theoretical maximum triangle per second rate is alsocomputed for each of these display configurations. Keywords: tion displays, fishtank stereo. 1 IBSTRACT NTRODUCTION Visual Perception, Image quality, virtual reality, stereo displays, immersive projec- The Limits of Human Vision^ Michael F. DeeringSun Microsystems

With improvements in 3D graphics technology we are on the brink of producing hardware thatmatches or exceeds the needs of the human visual system. This fact must now be taken intoaccount when designing 3D graphics hardware. As part of an effort to design future hardwareand display systems, a study was made of the rendering impact of matching 3D rendering ca-pability to the known limits of the human visual system.While real-time 3D computer graphics has historically traded off image quality and resolutionto meet frame rate and cost constraints, this is becoming less and less the case. The ultimatelimits of human visual perception must now be included in hardware trade-offs. A model ofsuch visual limits will be developed, and illustrated in terms of several common display devic-es. Combined with other rendering assumptions, an estimated theoretical bounds on the maxi-mum triangle rendering rate needed to saturate the human visual system will be made. 2 Limits of Human Vision The eventual consumer of all 3D rendering is the human visual system. With display technol-ogy and real-time hardware rendering speeds ever increasing, we are on the threshold of a gen-eration of machines that will surpass the visual system’s input capabilities. On a machine witha single user and a sustained render frame rate of 60 Hz, even present day CRTs exceed themaximum spatial frequency detection capability of the visual system, in regions away fromwhere the fovea is looking. To take advantage of this situation, a hardware rendering architec-ture could implement some form ofthe spatial resolution is not fixed, but must be per-frame programmable to match the variable- variable resolution frame buffer. In such a frame buffer,

resolution nature of human vision. It is assumed that such pixels must be antialiased, and thatthe antialiasing filter’s frequency cut-off must also vary dynamically to match the local effec-tive pixel density. An important question is to understand the precise quantitative details of thevariable resolution frame buffer configurations necessary to match the variable resolution na-ture of the human visual system. This section will provide this information in familiar computer

graphics terms, but first starts with some details of human spatial vision limitations. ([2] is agood reference for much of this material.) Highest resolution perceivable pixels: 28 seconds of arc. Several physical factors limit the highest spatial frequencies that can be perceived by the hu-man eye. The diffraction limit of the pupil, the foveal cone spacing, neural trace and physio-logical tests all agree on a maximum perceived frequency of approximately one cycle per arc-minute (half arc-minute pixels). This is under optimal (but non-vernier100% contrast. While not quite directly comparable, so-called “20/20” vision represents detect-ing image features twice as large. Variable resolution: This high resolution, however, applies only to the central 2spacing and measured preceptorial acuity drop off even faster than the optical limits. In mosttextbooks (see [2], page 60), this drop off is plotted as a sharp cusp. However, this representa-tion does not do justice to how small the high spatial frequency perception region of the visualfield is. Figure 1a plots an alternate visualization of this data onto the surface of a unit sphere:which portions of the 4false color bands, each corresponding to a factor of two less perceptorial resolution. Figure 1bis a zoom into the central region of the sphere. The center most green region corresponds to thecentralto the optical edge caused by the human face. The white represents the non-visible regions.This optical edge has a complex shape, and varies both in the individual and the literature. For ± 1 ° of the fovea. The purple from there to 1/2@ π steradian field of view are perceived at what resolution. There are 5± 1 ° , 1/4@ ± 2 ° , 1/8@ ± 5 ±° , 1/16@ 2 °, red to° ±of vision. Outside of this, the cone 12 ± 5 °°, orange to†) conditions, including ± 12 °, and yellow

our calculations, we used the data of [3], where the maximum field of view varied horizontallyfrom -59 180 left eyeThus if the direction of gaze is known, across the entire visual field, the human visual systemcan perceive approximately only one fifteenth the visual detail that would be discernible iffoveal resolutions were available for the entire fieldTo understand the possible impact on 3D graphics systems, Table 1 presents a comparison ofvisual and display parameters for several representative display devices.° field, two unit spheres are shown, one for a right eye and one for a symmetrically reversed° to +110°, and vertically from -70° to +56°. To show both sides of this more than

†. A common hyperacuity example is when one can detect a shifts as small as three seconds of arc in theangular position of a large visual object. Here the visual system is reconstructing higher spatial frequencyinformation from a large number of lower frequency samples. However, the visual system can do thesame for lower frequency rendered 3D graphics images so long as the higher spatial frequencies werepresent during the antialiasing process.

Display Devices: typical user viewing distance. The bottom two entries are the pure limits of the visual system,and a non-tracked visual system (Full Sphere). The table represents a stereo display table top,either tilted at 45Virtual Portal is a left, center, and right wall projective immersive display. The video resolutionnumbers are reversed to indicate that this display is taller than wide; in actuality the video pro-jectors are turned sideways, and the computer generated video format is more traditionally wid-er than tall. The 4 wall case adds a ceiling display; the 5 wall case adds the floor; the 6 wallcase is a complete cube for purposes of comparison. The two higher resolution 3 wall config-urations assume tiled 2 then 4 video projectors per wall, for a total of 6 then 9 video projectors.Notice that even the 9 projector system only brings the pixel visual angle to 3.4 minutes of arc,still more than four times lower resolution than that of a standard 1280 Display Physical Size (per plate): rectangular dimensions of each display device (or each screen for the multiple screen VirtualPortal) is given. The size numbers were adjusted to match both physical display Bezels (whereappropriate) and indicated video aspect ratio (typically 1.25 or 1.333). Display Pixel Resolution: empirical number for 35 mm production film. The IMAX number is the preferred digital sourceformat from their web site for 15/70 70 mm 15 perf film, and is approximately the same phys-ical pixel density on the film as the 35 mm number.) The aspect ratio of the device is also de-termined by these numbers.° , or with the users head and body tilted 45The rectangular ones are characterized by their diagonal measurement and The pixel resolution of the displays. (The movie resolution is an Because diagonal measurements can be misleading, the full° relative to a flat table. The 3 wall×1024 desktop CRT.

Display Pixel sz: wide displays, outlying pixels can be quite a bit narrower in one dimension than this number. Display FOV: dians (independent of eye limits). Pixels: 0.47 min limit: suming uniform 28 second of arc perception. This is simply the number of 28 second of arcpixels that fit within the steradians of column 5. Pixels: 1.5 min limit: arc-minute perception pixels. (Practical in the sense that for many applications 20/30 visualaquity quality would be more than acceptable.) Pixels: eye limit: the variable resolution perception of Figure 4. Pixels: display limit: column 3). Pixels: eye & display limit: limits into account. This was computed by checking for each area within the display FOVwhich was the limit: the eye or the display, and counting only the lesser. Triangles: eye & display limit: dering rates (in units of billions of triangles per second), using additional models developed in The total solid angle visual field of view (FOV) of the devices, in units of stera- The maximum angular size of a single display pixel in minutes of arc. ForThe maximum human perceivable pixels within the field of view, assuming The same information as the previous column, but for more practical 1.5The pixel limit of the display itself (multiplication of the numbers from The maximum human perceivable pixels within the field of view, as- The number of perceivable pixels taking both the display and eye The limits of the previous column into maximum triangle ren-

the next section.To compute the tables numbers, each display was broken up into 102,400 smaller rectangularregions in space, and projected onto the surface of a sphere. Here the local maximum percep-tible spatial frequency was computed and summed. Numerical integration was performed on

the intersection of these sections and the display FOV edges (or, in the case of the full eye, theedge of the visual field).The angular size of uniform pixels on a physically flat display is not a constant; they will be-come smaller away from the axis. The effect is minor for most displays, but becomes quite sig-nificant for very large field of view displays. However, for simplicity this effect was not com-pletely taken into account in the numbers in Table 1, as real display systems address this prob-lem with multiple displays and/or optics.All of these numbers represent a single snapshot in time, as they are computed for a single eyeposition relative to the display. All the numbers (other than the HMD, human eye, and fullsphere) will change as a user leans in/back from the display and moves around. A more exact-ing calculation computes minimum / maximum numbers based on a presumed limit of userhead / body motions.There are several things to note from this table. The FOV of a single human eye is about onethird of the entire 4and normal television is less than a hundredth. A hypothetical spherical display about a non-tracked rotating (in place) observer would need over two thirds of a billion pixels to be renderedand displayed (per eye) every frame to guarantee full visual system fidelity. An eye tracked dis-play would only require one forty-fifth as many rendered pixels, as the perception limit on thehuman eye is only about 15 million variable resolution pixels. This potential factor of forty-five performance requirement reduction was the motivating factor in investigating variable-resolution frame buffer architectures.π steradians FOV. A widescreen movie is only a twentieth of the eye’s FOV,

All the multi-screen configurations have a larger total FOV than the human visual system,though not necessarily of the same shape. These large screens pay the price in very large indi-vidual pixel angular sizes. The visual resolution for the standard resolution cases is equivalentto 20/210 vision, and the older 960x640 stereo video format was even worse.The 34” direct view CRT (commercially advertised as a 37” display) at 24” viewing distanceactually has a 65(and a lot less distorted). 3 The Limits of Rendering The maximum rendering rate that may be needed for a real-time system can be estimated bythe following simple model:An empirical estimate of this last term as near unity is made.The previous section developed estimates of screen pixels based on displays and perception.Frame rate has not been extensively discussed within this paper, other than an assumption thatit is at or above 60 Hz. Very little is known about the interaction of rapidly varying complexrendered images with the human visual system. The best we can do at this stage is to pick anumber that is hoped to be high enough. Some have even speculated that very high rendering° field of view, and thus in many ways can be as immersive as the 65° HMD(1)

frame rates (in excess of 300 Hz) may interact more naturally with the human visual system toproduce motion blur effects than the traditional computer graphics techniques. For the purposesof example in this paper we will use a 60 Hz rate, and hope that more complex visual experi-ments will be performed in the future to probe this question.^ ∆^ ⁄^ sec=⋅frame rate^ depth complexity^ ⋅^ #eyes^ ⋅^ ⋅screen pixels∆^ ⁄pixel