Digit-Life Hardware News
09.05.2008
[23:14] Daily Mailbox
[16:28] OCZ Expands on Gaming DDR3 Lineup with Special Ops Urban Elite Edition
[01:03] Plextor Unveils 1TB StorX NAS Drives
[00:52] OCZ Introduces New Additions to the Reaper HPC Series
[00:31] Iomega Announces New Camo Model in eGo Portable Hard Drive Line
[00:16] AMD Server Workstation Roadmap Updated
07.05.2008
[15:06] Daily Mailbox
[14:54] Super Talent Launches MLC SATA-II SSDs for Notebooks
[14:45] NVIDIA Introduces Hybrid SLI
[14:34] JVC Develops 1.75-inch 8K4K D-ILA Device
Your link here

Home Home
Latest News | Platform | Coolers | HDD/DVD | Video | Sound | Network | Imaging | Mobile
Monthly | Rightmark Tools | Search | Forum | Mailing | Links | Advertise | About Us
Digit-Life Articles Feed    Digit-Life News Feed

Latest Articles:

i3DSpeed, April 2008

Biostar TA780G M2+ Motherboard on AMD 780G Chipset (Socket AM2+)

NVIDIA GeForce 9800 GTX Graphics Card

NVIDIA GeForce 9800 GX2 Graphics Card

MSI K9A2 CF Motherboard on AMD 790X Chipset (Socket AM2+)






Analysing NVIDIA G8x Performance in Modern Games

This article was initially intended to continue our analysis of shader units and to show how their number affects NVIDIA G8x performance in modern games. We planned to change parameters of a G80-based graphics card to make it resemble the mid-end G84-based products. Our plan was to use RivaTuner by Alexei Nikolaychuk to leave only 32 active unified processors in the GeForce 8800, and change GPU and memory clock rates.

GPU clock rate was to be reduced to the level of G84, and video memory clock rate had to be adjusted to scale memory bandwidth of the GeForce 8800 down to the level of the GeForce 8600 (memory clock must be three times as low, because the difference between memory bus bit-capacity is 384/128=3 times). Theoretically, such cards would have differed only in the number of ROPs: 24 versus 8, as well as in on-die caches and other optimizations.

Our objective was to determine how much the different number of ROPs affects performance, and how much shader units differ in G84 and G80. We wanted to analyze performance of modern games using NVIDIA PerfKit. Unfortunately, our plans never got to the first base - mandatory hardware performance counters of the G80 do not work, when some shader units are disabled. And such an analysis would have been useless without them. So we decided not to trash our test results and to use them in a brief comparative analysis of performance and some other parameters of modern 3D games.

One more idea was discarded in the process - stream_out_busy readings. This counter proved to be practically useless, even though most our tests are Direct3D 10 applications. The counter indicated that stream output units were used only in one game - Lost Planet: Extreme Condition. Moreover, these units were loaded only by less than 1%, so we decided to discard this counter as well.

Testbed configuration and settings

We used the following testbed configuration:

  • CPU: AMD Athlon 64 X2 4600+ Socket 939
  • Motherboard: Foxconn WinFast NF4SK8AA-8KRS (NVIDIA nForce4 SLI)
  • RAM: 2048 MB DDR SDRAM PC3200
  • Graphics cards: NVIDIA GeForce 8800 GTX 768MB and NVIDIA GeForce 8600 GT 256MB
  • HDD: Seagate Barracuda 7200.7 120 Gb SATA
  • Operating system: Microsoft Windows Vista Home Premium
  • Video driver: NVIDIA ForceWare 163.16 (instrumented)

We used only one video mode with the most popular resolution 1280x1024 (or 1280x960 for games that do not support the former), MSAA 4x and anisotropic filtering 16x. Both features were enabled from game options, nothing was changed in the control panel of the video driver.

Our bundle of game tests includes recent projects. We gave preference to games supporting Direct3D 10 or containing new interesting 3D techniques. Here is the full list: Call of Juarez DX10 benchmark, Company of Heroes, S.T.A.L.K.E.R.: Shadow of Chernobyl, Lost Planet: Extreme Condition DX10 benchmark, Colin McRae Rally: DiRT, PT Boats: Knights of the Sea DX10 benchmark, SEGA Rally Revo, Clive Barker's Jericho. Our tests included several games without built-in ways to run demos, so we had to test them tentatively. Additional software: NVIDIA PerfKit 5, Riva Tuner 2.05, and Microsoft PIX for Windows from DirectX SDK.

Unfortunately, our tests did not include such interesting applications as World in Conflict, BioShock, Medal of Honor: Airborne, Test Drive Unlimited, TimeShift, Call of Duty 4: Modern Warfare, Half-Life 2: Episode 2, etc. Some demos and games did not make it in time, a couple of projects failed to run under PIX debugger: BioShock and Test Drive Unlimited.

Test Results

Lost Planet: Extreme Condition

We start our analysis with one of the most technically advanced games launched in 2007. Lost Planet: Extreme Condition has come to PC from Xbox 360. But its high-tech features are confirmed by many changes in the engine for Direct3D 10 GPUs. Compared to its console version (which is a high-tech game as well), it has some new features: FP16 frame buffer, motion blur and depth of field of higher quality, fur, more samples and improved shadow map filtering, ambient occlusion, soft particles, advanced parallax mapping, etc.




Lost Planet was tested with a built-in demo consisting of two game levels. They differ much - there are not many objects in the first level, so its performance is limited mostly by a graphics card; the second level contains a lot of objects, so the performance is limited by both CPU and GPU.

NVIDIA PerfKit counter GeForce 8800 GTX (G80) GeForce 8600 GT (G84)
FPS (avg) 20.8 4.5
FPS (min) 11.0 2.1
Video memory, MB 507 505
batch count (avg) 738 799
batch count (max) 4375 4336
primitive count (avg) 497858 475061
primitive count (max) 2335540 2104705
setup triangle count (avg) 83318 82474
setup triangle count (max) 311494 292719
gpu_idle, % 0.1 0.1
rop_busy, % 7.9 5.2
texture_busy (avg), % 74.4 77.5
texture_busy (max), % 83.4 90.6
shader_busy (avg), % 84.6 88.3
shader_busy (max), % 89.0 98.0
geom_busy, % 0.2 1.2
input_assembler_busy, % 3.1 0.8

In our tests we used a DirectX 10 demo of the game with a built-in benchmark, which does not reflect real performance of the release with all patches applied. That's why we could evaluate the frame rate only relatively, rendering speed of the latest release is much higher. We can see that the G84 is heavily outperformed by the G80. Let's locate the bottleneck. First of all, the GeForce 8600 can be slowed down by by video memory size, which is half as much as the game uses. Secondly, judging by the results, rendering speed is affected by the number of shader units and TMUs - the difference in performance is proportional to the difference in units.

Let's have a look at interesting results. The average number of draw calls does not reflect the real picture, because it depends on levels: there are not many calls in the first scene, and in the second scene this number reaches over 4000, which is really much, even for state-of-the-art games. The amount of geometry in this game is above average, but I don't quite understand the difference between primitive count and setup triangle count. Both GPUs were constantly working, we can see that performance does not depend on a CPU. Geometry and raster units are not loaded much, while texture and shader units are working at full capacity. We haven't seen such active usage of shader units before. That's what I call good optimization - performance depends on a GPU only.




Latest News | Platform | Coolers | HDD/DVD | Video | Sound | Network | Imaging | Mobile
Monthly | Rightmark Tools | Search | Forum | Mailing | Links | Advertise | About Us

Copyright © by Digit-Life.com, 1997-2008. Produced by iXBT.com
Design by Explosion