NVIDIA GeForce 8800 GT (G92)
Part 2: Features, Synthetic Tests
Installation and Drivers
Testbed configuration:
- Intel Core2 Duo (775 Socket) based computer
- CPU: Intel Core2 Duo Extreme X6800 (2930 MHz) (L2=4096K)
- Motherboard: EVGA nForce 680i SLI on NVIDIA nForce 680i
- RAM: 2 GB DDR2 SDRAM Corsair 1142MHz (CAS (tCL)=5; RAS to CAS delay (tRCD)=5; Row Precharge (tRP)=5; tRAS=15)
- HDD: WD Caviar SE WD1600JD 160GB SATA
- Operating system: Windows XP SP2; DirectX 9.0c
- Operating system: Windows Vista Ultimate; DirectX 10.0
- Monitor: Dell 3007WFP (30").
- Drivers: ATI CATALYST 7.10; NVIDIA Drivers 169.04.
VSync is disabled.
Synthetic tests
Starting from this review, we'll use new RightMark3D 2.0 for Direct3D 10 applications in MS Windows Vista. Some previously known tests were rewritten for DX10, new types of synthetic tests were added: modified tests of pixel shaders rewritten for SM 4.0, tests of geometry shaders, vertex texture fetch tests. However, previous versions of RightMark will also be used until low level fill rate and other tests appear in the new version.
All our synthetic benchmarks can be downloaded here:
- D3D RightMark Beta 4 (1050) with its description on http://3d.rightmark.org
- D3D RightMark Pixel Shading 2 P¸ D3D RightMark Pixel Shading 3 - tests of Pixel Shaders 2.0 and 3.0 link.
- RightMark3D 2.0 with a brief description: link
RightMark3D 2.0 requires MS Visual Studio 2005 runtime and the latest DirectX runtime update.
Synthetic tests were run with the following graphics cards:
- NVIDIA GeForce 8800 GT with standard parameters (GF8800GT)
- NVIDIA GeForce 8800 GTX with standard parameters (GF8800GTX)
- NVIDIA GeForce 8800 GTS with standard parameters (GF8800GTS)
- NVIDIA GeForce 8600 GTS with standard parameters (GF8600GTS)
- RADEON HD 2900 XT with standard parameters (HD2900XT)
We selected them to compare with the GeForce 8800 GT for the following reasons: the GeForce 8600 GTS is the best previous Mid-End product; the old GeForce 8800 will help us evaluate the effect of architectural changes (a different number of ROPs, modified TMUs), higher frequencies, and lower video memory bandwidth; comparison with the RADEON HD 2900 XT will be interesting, because new Mid-End solutions from AMD are based on the R600.
Direct3D 9: Pixel Filling tests
This test determines peak texel rate in FFP mode for different numbers
of textures applied to a pixel:

Many graphics cards demonstrate results close to theoretical maximum.
Results of synthetic tests are most often a tad lower than the theoretical
maximum in modes with many textures. The old GeForce 8800 and the
top AMD card (with some reservations) come closer to this threshold
than the other cards. The two graphics cards from NVIDIA feature GPUs
with improved TMUs, but they fail to reach the theoretic maximum in
our old test.
Judging by these results, the GeForce 8800 GT only adds more confusion. The situation with the G92 does not wholly repeat what happened to the G84. Judging by the figures, the new GPU looks up over 30 texels per cycle for 32bit textures with bilinear filtering. Theoretically, it must be higher with bilinear filtering (56) than with trilinear filtering (28). What's especially interesting, the test with trilinear filtering gave the same results.
In case of few textures per pixel, the GeForce 8800 GT looks worse than the
other GeForce 8800 cards. Its video memory bandwidth is insufficient
(lower than in the GTX and GTS cards.) But in heavier conditions,
the new graphics card starts outperforming all its competitors, revealing
its higher frequency and architectural changes in its TMUs. Have a
look at the fill rate test:

The second synthetic test measures the fill rate. It shows the same
situation adjusted for the number of pixels written into the frame
buffer. In case of 0, 1, and 2 textures, the new Mid-End solution
from NVIDIA is outperformed by the old top cards, coming forward only
with many textures per pixel. Compared to the future competitors from
AMD, the GeForce 8800 GT has a chance to offer faster texel and fill
rates, when they are not limited by video memory bandwidth.
Direct3D 9: Geometry Processing Speed Tests
Let's analyze extreme geometry tests. The first test uses the simplest
vertex shader that shows maximum triangle throughput:

As all the GPUs are based on unified architectures, all unified processors
in this test are busy with geometry processing. So all solutions demonstrate
high results, which are evidently not limited by peak performance
of unified processors, but by performance of other units, for example,
triangle setup.
Test execution efficiency of various GPUs in various modes is approximately
the same, peak performance in FFP, VS 1.1 and VS 2.0 modes is little
different. These results do not show anything for certain. But we
can see that the AMD solution is traditionally faster at processing
geometry than NVIDIA GPUs. Let's see what will change in a more complex
test with a single diffuse light source:

We can see some difference here, although potential of these solutions
is evidently higher. The GeForce 8600 GTS is not outperformed that
much by more powerful solutions. The FFP mode is a tad faster on all
graphics cards this time, except for the G84-based card. GeForce cards
are outperformed by the top RADEON product in all modes, although
the performance difference is not very big. Let's see what will happen
in heavier conditions - complex lighting with a single light source
and glares:

It's a similar situation. The leader in geometry performance is still
the R600. So future Mid-End solutions from AMD will be evidently faster
than the G92 at processing geometry. In case of a mixed light source,
the effect of optimized FFP emulation is apparent in most solutions.
This time the GeForce 8600 GTS is even more outperformed, and the
GeForce 8800 GT is no worse than its brothers. Let's analyze the most
complex geometry task with three light sources, including static and
dynamic branches:

We can see differences between all contenders. The RADEON HD 2900
XT is even at a greater advantage here. This most complex geometry
task seems not to reveal its full potential. We traditionally note
opposite weaknesses of vertex units in AMD and NVIDIA architectures
- dynamic branches cause a deeper performance drop in the former,
while static branches do it with the latter.
The GeForce 8800 GT will be analyzed separately. Higher frequency of the G92 versus both G80 units makes itself felt in FFP mode. So this GPU is faster, because its triangle setup units work faster. Perhaps, other reasons for this behavior may include new architectural optimizations, larger caches, etc. In all other cases, when the main bottleneck is in shader units, graphics cards perform strictly in compliance with their theoretical maximum, and the G92 is slightly outperformed by the top G80.
A brief conclusion on geometry tests: all GPUs perform well in these tests owing to their unified architecture, they can use all unified stream processors to solve geometry tasks. What concerns real applications, unified processors will be busy mostly with pixels there. We proceed to such tests now.