![]() |
|||||
|
|||||
|
|||||
July 3, 2008
AMD 780G/780V/740G Integrated Socket AM2+ ChipsetsHybrid CrossFire and High-Definition video. July 2, 2008
AMD Phenom X4 In Real-Life ApplicationsHow memory speed affects CPU performance. June 28, 2008
Corsair Dominator DDR2-1142 (PC2-9136) 4GB KitHigh capacity, high frequency and Green design. June 27, 2008
Foxconn GeForce 9800 GTX / GX2 2x512MBReference cards in nice boxes. June 26, 2008
NVIDIA GeForce GTX 260 896MBWhat does it offer for 399 USD? June 25, 2008
AMD Phenom X3 8750Stakes on the odd. June 23, 2008
NVIDIA nForce 790i and Intel X48 ChipsetsTransition to DDR3, 1600 MHz FSB support, fully-fledged PCI-E 2.0. NVIDIA GeForce GTX 280 1024MBWill it outperform 9800 GX2? (Updated: now also with synthetic test results.) June 17, 2008
x64 CPU Performance Testing MethodologyVersion 3.0. June 16, 2008
i3DSpeed, May 2008Added test results for 2 x GeForce 9800 GX2 Quad SLI. |
GeneralNo secret that on February 27 at Intel's Developer Forum (IDF) a new GPU from NVIDIA, known as NV20, must be announced officially. All are interested what's new could be seen in this graphics processor. So, we are going to lift the veil of secrecy. Note that we won't give results of performance measurements since this engineering sample is rather raw and the drivers are not finished. But on the NV20 functional features we can speak today in full volume. First comes the NV20 specification (we are demonstrating the most interesting parameters in order not to repeat everything supported in the GeForce2). NV20 specification
Even a fleeting glance allows to conclude that in the NV20 there is a completely new architecture that provides a full hardware support of the DirectX 8.0. I can even state that the DX8 support comes in time, this fact seems to be used as a basis when promoting the NV20 and cards based on it. NVIDIA didn't take a way of direct power increase (e.g. extension of the number of rendering pipelines). On the contrary, the engineers tried to do their best to realize a new architecture allowing full hardware support for DX8, but at the same time without a considerable increase in the number of transistors in the chip. Plus 0.15 micron technological process that makes possible to accommodate around 60 million transistors on the chip without making its size gigantic. A quick shift to the NV20 guarantees that NVIDIA has very good relations with vendors and reliable channels of memory supply together with the GPU. The reference design of cards for the NV20 doesn't differ much from the PCB design for the GeForce2 - memory chips are several mm shifted to the graphics core, and the position of power stabilizers has changed a bit. The DDR memory is available in industrial range that's why there is nothing to worry about. The NV20 graphics core frequency and number of pixel pipelines and texture blocks correspond to the GeForce2 series chips. It means that the potential pixel fillrate of the NV20 based card will be equal to that of the GeForce2 based cards, but it's not a tragedy. In fact the problem of all GPU, starting from the GeForce 256, concludes not in the potential fillrate, but in the effective (real) fillrate, which is directly connected with the local memory bandwidth. It is exactly a local memory that turns to be a bottleneck of all modern graphics accelerators. So I think that it's right of NVIDIA that took a way of increasing the architecture effectiveness instead of just a direct power extension. Another interesting feature of the NV20 concludes in 4 textures per pixel, what will allow to reveal a potential of programmable shaders lifting 3D realism on a completely new level. But being limited by the memory bandwidth, the chip developers didn't spend transistors realizing 4 texture blocks per each pixel pipeline. They took another way: they allowed keeping the results of working of two texture blocks for combining with results received at the following clock. It means that with usage of 3 or 4 textures the maximum reached performance falls twice. But in real applications, where everything is limited by a memory bandwidth, the punishment won't be so severe. As to a video memory - nothing new. Probably full value drivers for the NV20 will weaken undesirable effects connected with limitation of the memory bus bandwidth at the expense of activation of special schemes, e.g. deferred texturing, hierarchical z-buffer etc. There is only left to get the final samples of the NV20 based cards, the release version of drivers and to check it in practice. Let's digress a bit to recall the XBox. The official specs of the X-Box sound amazing: the pixel fillrate corresponds to 4000 million pix/sec! Frequency of the graphics core used in the XBox of the NV2A will be around 250 MHz. Even considering 8 pixel pipelines we won't get this figure (250*8=2000 million pixels). But 8 pipelines are unlikely - the XBox memory bandwidth won't be sufficient, in the current specification it's around 6.4 GBytes/sec, therefore they are again 4. Moreover, it's known that NV2A has 2 texture modules per pixel pipeline. Where is this figure from? In fact, we are talking about 4x Multi-Sample mode. There, 4 positions correspond to one received color value, it means that fillrate increases 4 times as compared with cards that don't support Multi-Sample mode hardwarely. This is what Microsoft and NVIDIA are doing while advertising their new products. You know that you can increase the value of effective fillrate - without usage of new or/and expensive memory technologies - only with usage of modified algorithms of primitive rendering which reduce requirements to its bandwidth. It's obvious that future games will greater rely on T&L and hardware removal of hidden surfaces than on their own effective algorithms. Besides, the increased complexity of models will extend the OverDraw. In the NV20 they realized a simple hardware support for HSR (early revision of Z). It's effeciency greatly depends on the scene transferred to the accelerator. If the primitives are sorted in the order of moving off from the observer, this will allow decrease the necessity considerably in calculation of hidden pixels of the primitives, the effective fillrate will rise several times. Besides, the NV20 is equipped with Z-buffer compression. And according to the data received, it's a bit different algorithm than an hierarchical Z, but the compressed buffer is divided into tiles which are read, recorded, processes and cached in a chip wholly. We are only to wait until the final samples of the cards on the NV20, release version of drivers and to check the effectiveness of these possibilities with real games. Besides, it's doubtful that MS would decide on usage of such exotic and therefore expensive memory types. Where is this figure from? In fact it concerns an effective fillrate - the NV2A (and probably the NV20) will be supplied with an algorithm(s) allowing to optimize the process of rendering - not to show invisible pixels. It's likely to be some combination of the hierarchical Z buffer and one more method of hidden surface removal. There is a sound ground for counting on special modified algorithms which reduce the requirements to the video memory bandwidth. Without new and/or expensive technologies you can lift an effective fillrate only with usage of modified algorithms of primitives' shading that reduces requirements to its bandwidth. So, now we will show where the claimed 4000 million pix/sec for XBox come from. The OverDraw factor of many modern games exceeds 2-4. For example, in case of movements it reaches 3 in Quake3, and 6 (!) in Unreal. But remember that everything depend on effective realization of hidden primitive removal. It's unlikely that the technique used in the NV2A will throw out 100% hidden pixels. So, for 4 pipelines and 60% effectiveness we have the OverDraw around 6. This figure is the most possible, and it served a basis for calculation of the XBox specs. It's clear that the number of necessary pipelines (with the OverDraw fixed) will directly depend on the effectiveness of algorithms of hidden pixel removal! A good algorithm will require only 4 pixel pipelines. Games of the future, to be created taking into account the XBox and DX8, will in greater degree rely on T&L and hardware hidden surface removal than on its own effective algorithms. Besides, an increased complexity of models will additionally lift the OverDraw. But nevertheless, remember that 4000 million pix/sec is a relative figure and it can be reached only in theory. Meanwhile, a question whether those tricks are realized in the NV20 and how much they are effective remains open. Now comes the HW T&L unit - its performance in comparison to the GeForce2 considerably increased, and undoubtedly in the NV20 there used an effective scheme of vertex cache and a mechanism of conversion and implementation of the lists of vertices and primitives stored in the local memory, without an access the main memory of the system at all. These measures allowed to minimize (and in the second case to avoid completely) limitations connected with AGP bus bandwidth. The more powerful T&L will let us get almost twice gain in games. Especially it will be well noticeable in games intended for the API DX 8.0 - there we will get high detailing and effects based on shaders. It concerns also the games which are prepared for the XBox release, and they will work successfully on the NV20 and following accelerators of the "generation X" 8.0. Old games will have practically the same speed as on the GeForce2-family. Some increase in the form of fps growth will have a place in all games which will be able to use HW T&L, e.g. Quake3, but only at low resolutions. NVIDIA counts on the fact that new effects and possibilities will make people buy new NV20-based cards. And the XBox will help in doing that. It will provoke games of new visual level to appear, and PC users will reach for it buying practically all accelerators identical to the XBox (considering games). It's a real chance not only to sell a huge number of NV2A but also new expensive NV20-based cards. For enthusiastsNow comes the internal potential of the NV20 (and consequently the NV2A). New possibilities of the NV20 are tightly connected with the DX8. Here is a comparable table:
If you haven't stopped readingSo, the comments:
So, what these operations do:
BLENDTEXTUREALPHA, BLENDFACTORALPHA, BLENDCURRENTALPHA - alpha blending of parameters with usage of one of 4 possible Alpha values (taken from the previous stage CURRENT, a value FACTOR taken from vertices and interpolated along the triangle surface, value from TEXTURE or constant value DIFFUSE. Correspondingly, Out=In1*Alpha+In2*(1-Alpha) MODULATEINVCOLOR_ADDALPHA - the same as the two previous, correspondingly, but instead of Alpha there used 1-Alpha On a based of this mechanism one can program many effects with usage of different number of textures. But the fact that different accelerators support different number of operations discredits much this mechanism of effects' control (NV20 is capable of everything, Radeon is a good boy, and the GeForce2 lags too far behind them). Anyway, pixel shaders are more flexible and convenient tool. We continue with comments on the table:
Cooments:
OK, let's return to the main table:
![]() Here you can see a black box - version 1.1 of the NV20, which is lacking in the Radeon and GeForce2. In fact, the Radeon and GeForce2 have the Vertex ALU which can interpret shaders, though they are not completely compatible with the final standard. You can say that these shaders are of 0.5 version. By the way, for the Radeon they can be switched on with a special key in the register, but in this case the most of the samples from the DX8 SDK will buzz, and only some shaders will be implemented as they should do. It's because the compatibility is partial. However that may be, Microsoft didn't introduce a conception of shaders of "0.5 version", and we will hold it only for the NV20. So, shaders deals with constants (next line of the table defines their number - Max Vertex Shader Const), for the NV20 they are 96. With 16 input and 8 variable (temporary) registers. When the shader's operation is working (the size of which for the NV20 is limited with 128 ops, but it differs with other chips) data are operated and on this base there created 4 sets of coordinates for textures and two color values for a vertex and the resulting vertex coordinates. A pixel shader or a chosen (by a user) configuration of texture stages further work with these data (while rendering a primitive):
![]() I'm not going to touch the performance of this unit. I just want to notice that it's very easy to make a parallel unit processing several vertices simultaneously, implementing in fact one shader's program. I think that it works this way here. By the way, when using a shader many constants are available (96 for the NV20). Nothing prevent us from writing a shader that would realize a blending with arbitrary degree of flexibility, e.g. with usage of 96/4 matrices. But remember about a restriction on the number of shader's operations. Taking this into account we will get ~20 matrices per vertex. Though, in case of one skip it's useless. Again comes the main table:
A place of pixel shaders in the general shade picture:
Simple but tasteful. A shader is calculated in every shaded point of a triangle, it should be implemented max quickly. There are 8 constants, two colors (interpolated along a surface of a primitive), texture stages which we can interact with. There are two temporary registers. The problem is to calculate the resulting color of the pixel. There are a huge number of operations available which are similar to the described above for calculations at the texture stage, but more flexible. The operations can work in several directions simultaneously. Their max number is equal to the number of texture stages (for the NV20 they are only 8). A pixel shader is rather a setting of pipeline stages. And it is implemented more effectively giving out one result per clock. And again nothing hampers parallel living of several pipelines in the hardware which are set equally. Well, again the main table:
Detailed info on operations:
Now comes the main table again:
Here comes a table with possible filtering modes for three possible texture types, with usage of MIP levels and without:
Here the NV20 takes the lead... Let's return to the parameters:
Look at a table with texturing parameters:
Comments:
And at last, possible formats for frame buffers, depth buffers and different types of textures:
Comments on texture and buffer formats:
Well, the NV20 turned out well in possibilities (unlike the GeForce2). NVIDIA company proves its rank of a technological leader. It's not clear up what will be with a speed, many things will depend on memory and programmers. I want to note a tighter integration of the API (DX8) and hardware and wide range of technological innovations. It's obvious that together with the XBox the NV2x series is capable to provoke a real revolution of trick effects in games. The main thing is that game developers manage to create new games or remake the current ones; it's cool that the upcoming XBox guaranties availability of games intended for the API DX8 possibilities. Expected that the first NV20-based cards will be released by ASUS, Leadtek, Elsa, Hercules in March, and in April-May there will come GigaByte, MSI and many others. In the very beginning of sales the NV20 based cards
with 64 MBytes will cost around $450-500, but a month later the
price will little by little come down. Interestingly
is the fact that in the roadmaps of many respected companies there
are only the cards with 128 MBytes memory.
|
July 3, 2008
Digit-Life - Graphics Card Processor - Page 4: Optimal PC, conclusions some problem with d-link switch. Welcome to the new design! Motherboard problem Hellllllp!! |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Platform & Cooling · Graphics Cards · Multimedia & ProAudio · Notebooks & Handhelds · Other Devices · Shopping Advertise With Us · About Us · Affiliates · Forum Copyright © 1997-2008: Byrds Research & Publishing Ltd. All rights reserved. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||