ATI RADEON 3850/3870 (RV670)
320 Shader Processors and 256-bit Memory Bus
Direct3D 9: Pixel Shaders Tests
The first group of pixel shaders to be reviewed here is too simple for modern GPUs. It includes various versions of pixel programs of relatively low complexity: 1.1, 1.4, and 2.0.

We can see that the tests are too easy for modern architectures and fail to reveal their true capacity. In simple tests performance is limited by texture lookups and fill rate. You can see it in weak results of all AMD cards compared to the GeForce 8800 GT. They shouldn't have limited the number of TMUs to 16... However, results become more interesting in more complex PS 2.0 tests. For example, the GeForce 8800 GT is outperformed by all cards based on the RV670 and the R600 in the most complex test (Phong with three light sources).
Results of the HD 3850 and the HD 3870 agree well with what the HD 2900 XT demonstrates (adjusted for frequencies). Both new graphics cards are naturally much faster than the RADEON HD 2600 XT. The older Mid-End solution is heavily outperformed in all tests, demonstrating more than twice as low results. It can be explained by a higher fill and texel rates as well as the increased number of unified shader units in the RV670. Let's have a look at results in more complex pixel programs of intermediate versions:

The procedural water test depends much on texturing speed. It uses dependent texture lookups of high nesting depth, so all AMD cards are indecently slower than the only representative of NVIDIA. The GeForce 8800 GT is far ahead, it's more than twice as fast as both cards based on the RV670. The HD 3850 and the HD 3870 perform on a par with the HD 2900 XT, and the HD 2600 XT is a traditional outsider. Previous Mid-End solutions were too weak.
In the second arithmetic-intensive test, the new cards from AMD shoot forward, and the GeForce 8800 GT is slightly outperformed. This task fits the AMD architecture with more unified processors. The relation between results of the HD 3850, the HD 3870, and the HD 2900 is conditioned by the differences in GPU clock rates, it matches theoretical values.
Direct3D 9: New Pixel Shaders Tests
These tests of DirectX 9 pixel shaders are even more complex, they are divided into two categories. We'll start with easier shaders - SM 2.0:
- Parallax Mapping - a texturing method used in many games, which is described in detail in our article Modern 3D Graphics Terms
- Frozen Glass - a complex procedural texture that visualizes frozen glass with adjustable parameters
There are two modifications of these shaders: arithmetic intensive and texture sampling intensive. Let's analyze arithmetic-intensive modifications, they are more promising from the point of view of future applications:

An apparent leader of the Frozen Glass test is the NVIDIA card. The GeForce 8800 GT is twice as fast as its competitors (the HD 2900 and the HD 3870), which indicates that performance is limited by the texel rate in the first place.
AMD cards lead in the second Parallax Mapping test. They line up exactly as they theoretically should, but the GeForce 8800 GT is outperformed only a little, considering that NVIDIA had always been much weaker in this test. There are no unexpected results here, the RV670 performs just like the R600 in these tests. Let's analyze results obtained in the texture sampling intensive tests, where the GeForce 8800 GT should perform even better:

Indeed, the situation has changed. Performance is even more limited by the speed of texture units, so the GeForce 8800 GT is always faster. The RADEON HD 2900 XT, the HD 3850, and the HD 3870 demonstrate close results, being noticeably slower than the GeForce 8800 GT and just as faster as the HD 2600 XT. However, arithmetic-intensive shaders work faster on all graphics cards. Texturing-intensive shaders make no sense for modern GPU architectures, new products from AMD and NVIDIA prefer arithmetic operations to texturing. Perhaps, we'll abandon texturing-intensive tests in our synthetic part in future.
Let's have a look at results of another two pixel shader tests - SM 3.0. They are the most complex of all our tests for Direct3D 9 pixel shaders. The tests load ALUs and texture units heavily. Both shader programs are complex, long, and include a lot of branches:
- Steep Parallax Mapping is a much heavier modification of parallax mapping, which is also described in the article Modern 3D Graphics Terms
- Fur - a procedural shader that visualizes fur

These tests generate heavy load even for the most powerful graphics, although the HD 2600 XT is not even twice as slow. The most interesting thing here is that our new RV670-based cards are outperformed by the GeForce 8800 GT and even the RADEON HD 2900 XT in both tests! It may indicate problems with drivers or the effect of memory bandwidth. The GeForce 8800 GT with lower memory bandwidth is faster than the RADEON HD 3870, although the R600 architecture seemed to execute complex PS 3.0 with many branches more efficiently than the G8x/G9x.