ATI RADEON 3850/3870 (RV670)
320 Shader Processors and 256-bit Memory Bus
Part 1: Theory and architecture
We've recently reviewed the the GeForce 8800 GT - an excellent Mid-End solution from NVIDIA. Now it's time to review a similar product from AMD. Remember how bad the previous Mid-End solutions from AMD and NVIDIA were (based on RV630 and G84)? Their performance suffered because too many features were cut down compared to top solutions, they had too few execution units: ALUs, TMUs, and ROPs. Their performance was also limited by the narrow 128-bit memory bus. Its implementation was especially weak in the RADEON HD 2600 XT compared to the 512-bit bus in the top RADEON HD 2900 XT.
The AMD R6xx architecture was announced in May. Only the top R600-based solution was launched at that time. Mid- and Low-End cards based on the unified R6xx architecture were postponed till summer. It became clear that we wouldn't get good Mid-End solutions manufactured by the 80/90 nm fabrication process from both GPU makers. DirectX 10 support and unified architectures limited GPU complexity. So even cheap GPUs had to use quite complex units, for example, thread processors. That's why there are not many transistors left for execution units, so it was very difficult to design a fast Mid-End processor based on the 80/90-nm fabrication process.
Now that NVIDIA adopts the 65-nm fabrication process, and AMD is mastering even the 55-nm fabrication process, there appear truly powerful Mid-End GPUs. They are actually former top GPUs manufactured by a thinner fabrication process. They demonstrate appropriate performance. The only problem is narrower memory bus width. The main difference between RV670/G92 and R600/G80 is the fabrication process (55-nm and 65-nm correspondingly). It reduced the primary costs of complex GPUs, which is important for inexpensive products. Moreover, unlike the R600, the new GPUs from AMD support DirectX 10.1, PCI Express 2.0, and include an improved unit of hardware-assisted video playback and processing.
Before you read this article, you should read the baseline reviews - DX Current, DX Next, and Longhorn, which describe various aspects of modern graphics cards and architectural peculiarities of NVIDIA and AMD (ATI) products.
These articles predicted the current situation with GPU architectures, a lot of forecasts on future solutions came true. Detailed information on the unified architecture of AMD R6xx (by the example of older GPUs) can be found in the following articles:
So, the overhauled RV670 is based on the RADEON HD 2000 (R6xx) architecture. This GPU includes all main features of this family, such as the unified shader architecture, full DirectX 10 support (DX 10.1), high-quality methods of anisotropic filtering and a new antialiasing algorithm with the increased number of samples, etc. The RV670 offers better features. This GPU is used in Mid-End solutions for 179-229 USD. I repeat that the key technological innovation is the 55 nm fabrication process, which helps reduce the costs and bring these solutions down to this price segment.
One of the important improvements is UVD, which is not available (or it does not work as it should, which is almost the same) in the R600. The situation is exactly like with NVIDIA cards - low-end and mid-end GPUs offer better features for video decoding. We shall return to analyzing performance and quality of video decoding in the new solutions from AMD and NVIDIA in continuation of our older analysis.
RADEON HD 3850 and HD 3870
- Code name: RV670
- Fabrication process: 55 nm
- 666 million transistors
- Unified architecture with an array of common processors for streaming processing of vertices and pixels, as well as other data
- Hardware support for DirectX 10.1, including Shader Model 4.1, geometry generation, and stream output
- 256-bit memory bus: four 64-bit controllers connected with a ring bus
- Core clock: 670-775 MHz
- 320 scalar floating-point ALUs (integer and floating-point formats, support for FP32 precision in compliance with IEEE 754)
- 4 enlarged texture units supporting FP16 and FP32 components in textures
- 32 texture address units (read the details in the baseline article)
- 80 texture fetch units (read the details in the baseline article)
- 16 bilinear filtering units that can filter FP16 textures at full speed and support trilinear and anisotropic filtering for all texture formats
- Dynamic branching in pixel and vertex shaders
- 16 ROPs supporting antialiasing modes with programmable sample patterns (over 16 samples per pixel, including FP16 or FP32 frame buffer formats). Peak performance: up to 16 samples per cycle, 32 samples per cycle in Z only mode
- Up to 8 MRT (multiple render targets)
- Integrated support for two RAMDACs, two Dual Link DVIs, HDMI, HDTV
RADEON HD 3870 Specifications
- Core clock: 775 MHz
- Unified processors: 320
- 16 texture units, 16 blending units
- Effective memory frequency: 2250 MHz (2*1125 MHz)
- GDDR4 memory
- Memory size: 512 MB
- Memory bandwidth: 72 GB/sec.
- Maximum theoretical fill rate: 12.4 gigapixel per second.
- Theoretical texture sampling rate: 12.4 gigatexel per second.
- Two CrossFireX connectors
- PCI Express 2.0 x16
- 2 x DVI-I Dual Link, 2560x1600 video output
- TV-Out, HDTV-Out, HDCP support, HDMI adapter
- Power consumption: up to 105 W
- Recommended price: $219
RADEON HD 3850 Specifications
- Core clock: 670 MHz
- Unified processors: 320
- 16 texture units, 16 blending units
- Effective memory frequency: 1660 MHz (2*830 MHz)
- Memory type: GDDR3
- Memory size: 256 MB
- Memory bandwidth: 53 GB/sec.
- Maximum theoretical fill rate: 10.7 gigapixel per second.
- Theoretical texture sampling rate: 10.7 gigatexel per second.
- Two CrossFireX connectors
- PCI Express 2.0 x16
- 2 x DVI-I Dual Link, 2560x1600 video output
- TV-Out, HDTV-Out, HDCP support, HDMI adapter
- Power consumption: up to 95 W
- Recommended price: $179
As you can see, AMD decided to change the traditional designation of ATI RADEON cards starting from the RADEON HD 3870 and 3850. For example, in the RADEON HD 3870, the first numeral (3) denotes a generation of graphics cards, probably starting from X1000 (X1800, X1600, etc), the second numeral (8) - a family of cards, the third and fourth numerals (70 in this case) - a given model of a graphics card within the specified generation and family. As usual, a greater numeral suggests higher performance. AMD gives the following translation of the new titles into the old designation scheme: 50 means PRO, 70 stands for XT.
We should publish our traditional digression about the amount of video memory required by modern games. Our recent analysis shows that many modern games have very high requirements to video memory size, they use up to 500-600 MB. It does not mean that all game resources must be stored in local video memory. Resource management may often be given to API, especially as Direct3D 10 uses video memory virtualization. Nevertheless, modern 3D applications tend to increase their requirements to local video memory size. So 256 MB is the minimum size, and the optimal solution is presently 512 MB. You must take it into into account, when you choose between two products of the HD 3800 family. They do not differ that much in price.
So, AMD again enters the market of Mid-End graphics cards as a technological leader, its new GPUs are manufactured by the 55 nm fabrication process. These transitions are relevant, because finer process technologies provide such advantages as smaller cores or more transistors on the same die area, they increase the frequency potential of GPUs and yield of effective GPUs at high clock rates, as well as reduce primary costs. If we compare surface areas of the R600 (80 nm) and RV670 (55 nm), which have a similar number of transistors (700 and 666 millions correspondingly), the difference is more than twofold: 408 and 192 square mm correspondingly!
Higher density of transistors affected the so called energy efficiency - the RV670 consumes twice as little power as the R600, demonstrating similar performance! Unfortunately, the advantage of the finer fabrication process is weaker here than in the competing product from NVIDIA. The older Mid-End card from AMD consumes similar amounts of power and dissipates as much heat as the competing graphics card based on the G92 manufactured by the 65 nm fabrication process, which has a larger die.