Intel Xeon Processor Review
In the beginning of 2002 simultaneously with a
release of the Pentium 4 Northwood Intel launched a Xeon based on
the Prestonia core (similar to the Pentium4's one) - a new version
of its CPU for dual-processor workstations and servers of the entry
level. Before that there were almost no previous Xeon versions (with
the Foster core) which also supported dual-processor configurations.
And now when the Xeon clocked at 2.2 GHz arrived at our lab everything
has fallen into place...
Pentium 4 and Xeon: what's the difference?
As you know, the Xeon Foster is called "Willamette
with SMP support", and the Xeon Prestonia is known as "Northwood
with SMP support". These CPUs are based on the same core. Besides,
the pairs of "Foster - Willamette" and "Prestonia - Northwood" are
based on the same fabrication process and have the same L2 cache
size. The processors have different sockets, but if you look at
the signal assignment diagram the pins of the Socket 603 processor
will show that most of them control either power supply or ground,
i.e. they are not significant.
The Xeon based on the P4 core has almost no new
important things except the Hyper-Threading technology. It just
acquired some functions of the Pentium III Xeon (Slot 2). Let's
examine them.
Processor Information ROM (P.I. ROM). It
contains such data as electrical characteristics of the core and
L2 cache, processor stepping, e-signature etc.
Scratch EEPROM - this chip is offered by
Intel to OEM companies who can record there whatever data they wish.
Besides, it can be used by the system to store data on a computer,
processor, default settings etc. This ROM is a universal solution
which is available in any Xeon based system.
Machine Check Architecture (MCA) is a processor
subsystem which searches and logs faults in operation of the system
logic. It controls faults of 5 basic subsystems: external and internal
bus, cache, Translation Look-aside Buffer and Instruction Fetch
Unit. The MCA can be used in different cases: for example, information
on failures can be read by the server OS.
Hyper-Threading is an Intel developed technology
of performance increasing in multitask systems which makes possible
to create two logical CPUs on one physical processor by parallel
implementation of two threads which use simultaneously different
processor units (e.g., ALU and FPU). This technology appeared first
in the Prestonia based Xeon and will probably be supported in future
Intel's server processors.
It will become clear what the developers of the
Xeon pursued if we look at these three units. Desktop users may
consider them unnecessary, but we are talking today about the sphere
where reliability and compatibility are the most important things.
And superfluity is more desirable than insufficiency. For example,
you can install a processor of the new revision onto an old mainboard.
But what if it won't work? What if the BIOS won't be able to define
power supply correctly? Taking into account the price of such system,
it's better let the system use information from the P.I. ROM. In
fact, all these functional units are designed to prevent any failures
of such an expensive system. It becomes clear what's the key difference
between the Pentium III-S and Athlon MP on the one hand, and Xeon
on the other hand. The first two are, in fact, modifications of
desktop processors for SMP systems; the Xeon contains units typical
of powerful server CPUs. In is interesting, though, that it is P-III-S
of Intel which is positioned as a server processor, and the current
versions of the Xeon are meant for workstations.
Intel i860 chipset
i860 chipset on the Supermicro P4DC6+. From left
to right: Intel 82806AA PCI 64 Hub (Intel P64H), Intel 82860 Memory
Controller Hub (MCH), Intel 82801BA I/O Controller Hub (ICH2)
Although the i860 processor is meant for high-performance
workstations it seems to be more suitable for servers. It is, in
fact, a i840 for SMP processors with the Pentium 4 core. The chipset
uses the same Accelerated Hub Architecture, works only with RDRAM
and has a dual-channel memory controller (RIMM modules are installed
in pairs). There is also 64-bit PCI support via a separate Intel
82806AA PCI 64 Hub (Intel P64H).
The Intel i860 is meant for the Intel Xeon processors
running at 100 MHz of the bus (it has quadruple bandwidth which
is equivalent to 400 MHz). The chipset consists of three main components:
- Intel 82860 Memory Controller Hub (MCH);
- Intel 82801BA I/O Controller Hub (ICH2).
Apart from these two, it's possible to use additional
chips to extend chipset's capabilities. Here are key parameters
of the i860:
- operation with 1/2 Intel Xeon processors;
- up to 2 GBytes RDRAM supported;
- support of AGP 4X, Ultra DMA/100/66/33, LPC (Low Pin Count)
interface and 4 USB ports (v1.1);
- integrated Intel 10/100 Mbps network controller.
Other standard functions such as ACPI, Suspend-to-RAM/Disk,
Wake-on-LAN are also supported.
The branched structure of the chipset supports
6 (!) PCI buses. All of them are marked on the diagram of the chipset
(or rather of the Supermicro P4DC6+ mainboard based on this chipset).
After installation of additional cards and start-up of the ICDiag
utility (www.icbook.com.ua)
all 6 PCI buses could be seen.
The system components (SCSI controller, PCI 32-bit
and 64-bit slots) are related to different buses which means that
data flows are well separated. Such architecture is very important
for a system where data are actively exchanged between components...
As such activity is vital for servers we named the i860 a server
chipset.
It is interesting that the AwardBIOS V6.0 Medallion
was never used either in "heavy" boards or in Supermicro products.
However, the CMOS Setup menu is made in style of Phoenix Technologies
which is typical of server solutions from Intel and Tyan.
After the problems with the i840+SDRAM Intel wasn't
going to couple the i860 with SDRAM. But it wasn't necessary anymore:
the prices of RDRAM fell down to the acceptable level and usage
of the PC133 SDRAM might kill the performance of the Xeon processors.
The rated 2 GBytes of the memory can be lifted up to 4 GBytes by
using the Intel 82803AA MRH-R chip (Memory Repeater Hub -- RDRAM)
installed on the mainboard between the Memory Controller chip and
two memory channels.
Supermicro P4DC6+ mainboard
Supermicro has 3 dual-processor models on this
chipset for the Socket 603: P4DCE, P4DC6 and P4DC6+. All of them
are based on the common design in the Extended ATX format, support
up to 2 GBytes RDRAM (4 slots for RIMM), have the same set of slots
(AGP 4X/Pro 1.5 V, 2 PCI 64-bit/66 MHz slots and 4 PCI 32-bit/33
MHz) and an integrated network adapter - Intel 82559. The P4DCE
doesn't has a SCSI controller, the P4DC6 comes with an integrated
Ultra 160 SCSI chip Adaptec AIC-7899W, and the P4DC6+, in addition
to the SCSI controller, has a SO-DIMM connector for an Adaptec 2005S
card - it is an inexpensive Zero Channel RAID controller (ZCR) which
uses an integrated SCSI chip.
One more difference is that the P4DCE and P4DC6
have VRM units designed as separate daughter cards. At the same
time, the Supermicro P4DC6+ which we used for testing the Intel
Xeon has both VRMs onboard.
The layout of the P4DC6+ is very good. There are
several jumpers to disable the integrated network and SCSI controllers
or to set a frequency of the 64-bit PCI segment (33/66 MHz). Graphics
is not integrated (which proves once again that the board is designed
for workstations). Usage of 1 8-pin or 2 4-pin and 8-pin additional
power supply connectors is a distinguishing feature of the Supermicro
line based on the i860. Besides, as I mentioned above, it's unusual
to see the AWARD BIOS on the Supermicro's board.
The P4DC6+ ships with two coolers for processors
(with vertical fins and a fan attached on one side); they are quite
noisy (4700 rpm, 16.2 CFM) but efficient. There are also cables
(FDD, IDE, 50-pin one and 68-pin SCSI), a bracket for the rear computer
panel, a CD-ROM and diskettes with drivers and a comprehensible
user manual.
It is known that there are Iwill and Tyan which
also released their boards on the i860 chipset, but they are not
widely available yet.
Test system configuration and testing technique
We happened to get the new and the old Intel Xeons
almost simultaneously, and thus we got a line of the processors
clocked at 1.7 GHz (Foster), 1.8 and 2.2 GHz (Prestonia). As a result,
we were able to test the top Xeon and compare performance of the
Foster and Prestonia running at close frequencies. We tested the
dual-processor systems as the SMP support is a distinguishing feature
of the Xeon. However we also included a uni-processor configuration
with the Pentium 4 2.2 GHz as it was interesting to take a look
at its performance in comparison to the Xeon with a similar frequency
and core.
The dual-processor system with Athlon MP 1900+
based on the Tyan Thunder K7 board was used as a competitor. Besides,
we included the Pentium III-S 1.26 GHz based system (Tualatin, 512
KBytes L2 cache) and ServerWorks ServerSet III HE-SL based boards.
You might be surprised as the chipset has awful driver support (in
particular, the AGP didn't work and it was impossible to use the
boards for graphics stations). Later Tyan removed an AGP port in
its new revision of the ServerSet III HE-SL based board - Thunder
HEsl-T.
However, Microsoft was able to make its Windows
XP (or, rather, Windows 2000 starting with Service Pack 2) "understand"
this chipset. And after that the AGP port on the ServerSet III HE-SL
started to work flawlessly, and the platform became a normal solution
for graphics stations. So, the board was included in the tests -
after all, it's not forbidden to use the server Pentium III-S in
workstations - it's just necessary to have an appropriate motherboard.
And we do have them - Supermicro left an AGP port in the P3TDE6
board and we took it for the tests. The OS we used was Windows XP
Professional.
Test system configurations
| Testbeds |
| CPU |
Intel Xeon |
Intel Pentium 4 |
Intel Pentium III-S |
AMD Athlon MP |
| Core |
Foster |
Prestonia |
Northwood |
Tualatin |
Palomino |
| Frequency |
1.7 GHz |
1.8 GHz |
2.2 GHz |
2.2 GHz |
1.26 GHz |
1900+ (1600 MHz) |
| FSB frequency, MHz |
400 |
400 |
400 |
133 |
266 |
| L1 cache, KB |
16 |
16 |
16 |
32 |
128 |
| L2 cache, KB |
256 |
512 |
512 |
512 |
256 |
| Chipset |
Intel i860 |
Intel i850 |
ServerSet III HE-SL |
AMD-760MP |
| Mainboard |
Supermicro P4DC6+ |
Intel D850MD |
Supermicro P3TDE6 |
Tyan Thunder K7 |
| Memory |
512 MBytes PC800 RDRAM |
512 MBytes Reg'd PC133 SDRAM |
512 MBytes Reg'd DDR SDRAM |
| Video card |
NVIDIA GeForce3 (ASUS V8200, 64
MBytes DDR SDRAM, Detonator 21.85) |
| Hard disc |
Seagate Cheetah X15 36LP, 36,4
GBytes, Ultra 160 SCSI |
| OS |
Windows XP Professional |
So, today we have gathered top processors for different
platforms. All the systems were equipped equally (see the table):
512 MBytes memory of the corresponding type, a 36 GBytes SCSI Seagate
Cheetah X15 36LP HDD and an AGP video card on the GeForce3 (the
GF3 copes rather well with the professional OpenGL used in workstation
tasks).
We used our standard performance estimation method
for tough systems with some additions, and applications of the following
classes:
- operation with 2D graphics (script for Adobe Photoshop 6.0.1);
- 3D modeling - 3D Studio MAX 4.26, Lightwave 7b and A|W Maya
4.0.1;
- 3D virtualization with the professional OpenGL (SPEC ViewPerf
6.1.2);
- DivX and MP3 encoding (DivX 4.12 and GOGO-no-coda 2.39c), archiving
(WinAce 2.11 with a 4096 KBytes library).
Besides, we included two CAD tests with design
engineering applications SolidWorks 2001 and Solid Edge V10. We
enabled standard SPECapc tests for SolidWorks 2001 and SPECapc for
Solid Edge V10.
Test results
As you remember, Xeon "Foster" hasn't been widely
available on the market. It shipped just to major assemblers such
as Compaq and Dell. And now, when we have its test results in front
of us, it is clear why Intel didn't hurry to promote the Xeon Foster.
Its rather low performance could have spoil the reputation of the
new family, that is why its promotion was put off until the Xeon
Prestonia - a 0.13-micron core with a doubled cache (whose size
matters much for the Pentium 4 core) and with a high performance.
The results of the dual-processor systems are shown
above those of the uni-processor ones and are of the same color.
Therefore, the second column for the Pentium 4 2.2 GHz is lacking.
Video and audio encoding, archiving
The DivX is a very memory-intensive test, and the
memory becomes a bottleneck here (all the P4 based processors have
the same results although they work at different frequencies - the
maximum gap is 500 MHz). As a video card can't influence conversion
from one format to another, there is only dual-channel PC800 RDRAM
or the hard drive which can limit the operation. But the results
of the Athlon MP and Pentium III-S deny the fault of the HDD. Besides,
as you can see, we used both uni- and dual-processor systems although
the DivX codec doesn't support SMP. It is interesting that while
SMP helps dual systems based on classical cores - followers of the
Pentium Pro architecture (Pentium III-S and Athlon MP), it puts
obstacles in the way of the Xeon (P4 core)!
Here the leading group consists of Pentium 4 2.2
GHz (uni-processor), Xeon 2.2 GHz and Athlon MP. The junior Xeon
keeps up with the Pentium III-S which means that the old horse won't
damage a furrow, unlike the young horse.
The WinAce results are quite interesting (the data
for the uni- and dual- processors systems are identical): it is
the first time when the Athlon MP loses to all. There is just one
suggestion: if the WinAce uses SSE, it could failed to find its
support in the Athlon MP; it sometimes happens if a program is written
badly. But it's just a suggestion.
3D modeling programs
Almost everywhere the Xeon 2.2 GHz system goes
ahead (both uni and dual) except the 3D Studio MAX 4.2 where it
goes on a par with the Athlon MP 1900+. The latter shares the second
place with the Xeon 1.8 GHz in the other tests sometimes outscoring
it. It is interesting that in the LightWave 7b the Athlon XP/MP
lose to the Pentium 4, though in the LightWave 6.5 it was vice versa.
However, NewTek (the developer of the packet) says the the 7th version
was changed much with regard to the peculiarities of the Intel's
architecture. Besides, the Pentium 4 2.2 GHz goes shoulder to shoulder
with the uni-processor Xeon 2.2 GHz and this fact proves that their
cores are very similar. The Pentium III-S fights successfully against
the Xeon "Foster" 1.7 GHz except the LightWave (when a scene is
rendered without ray-tracing).
Raster graphics
The Adobe Photoshop prefers the Pentium III and
Athlon MP to the new Intel's core although there are modified filters
supporting SSE2 for the Pentium 4 based processors. But the new
Intel processors were able to win at the expense of a frequency
- the dual Xeon 2.2 GHz outscores the dual Athlon MP though by a
little margin. On average, (if we used the complete set of instructions
and filters of the Photoshop) the SMP doesn't allow for a big gain.
However, the most of filters of this editor are plugins which are
developed not only by Adobe. Such promiscuity of the implemented
code doesn't let us hope for a considerable SMP optimization. Some
modules are able to use the second processor, that is why there
is a gain, but there are not many of them, that is why the gain
is not big.
SPEC ViewPerf
Here we are showing the results of only uni-processor
systems as the dual ones had the same or even lower scores.
AWadvs-04. All the systems are on one level except
the Pentium III-S in combination with the ServerWorks ServerSet
III HE-SL which falls behind by 22%. The PC133 memory is not enough
for applications which use intensively OpenGL and texturing. The
PC2100 DDR SDRAM (Athlon MP + AMD-760MP) is able to bear the load
though it has a smaller bandwidth than the dual-channel PC800 RDRAM
in the i850/i860 systems.
DX-06. We don't display the results of this test
though it was carried out. The matter is that the peculiarity of
the IBM Data Explorer core (which the DX-06 is based on) noticed
in examining of the Pentium 4 Northwood appeared on the scene once
again: all CPUs on the P4 core with 512 KBytes L2 cache turned out
to be slower than those whose L2 cache was equal to 256 KB. But
such situation didn't repeat anywhere apart from DX-06, that is
why we decided not to use the test for estimation of performance
of the Northwood/Prestonia based processors as there is some error
in the program that affects their results.
DRV-07. This subtest depends on a core performance,
memory efficiency and a speed of the 3D accelerator. The first place
is taken by the Northwood/Prestonia based system, and the difference
between them is not great (it seems that the video card is a bottleneck).
The Xeon "Foster" 1.7 GHz falls behind (because of a small L2 cache).
And the Athlon MP 1900+ lags behind as the throughput of the PC2100
DDR SDRAM (2.1 GBps) is much lower than that of the dual-channel
PC800 RDRAM (3.2 GBps). The Pentium III-S has lost because of the
lowest core speed and the slowest memory.
MedMCAD-01. Almost all systems are limited by the
graphics accelerator's performance, and the Pentium III-S with its
PC133 falls behind by a great margin.
CAD applications (design engineering)
Although we used two packets from different developers,
it makes no sense to comment the performance results separately
for the SolidWorks 2001 and for the Solid Edge V10. As we used tests
from SPEC in both cases (Standard Performance Evaluation Corporation)
the categories for performance estimation are used the same in both
cases. We chose three of them: Composite Score, Graphics Score and
CPU Score.
Composite Score. In both packets the first places
are taken by the Pentium 4 2.2 GHz (in the uni-processor class),
Xeon 2.2 GHz and Athlon MP, while the Xeon 1.7 GHz and Pentium III-S
are at the tail-end. The Xeon 1.8 GHz keeps ahead in the SolidWorks,
and in the Solid Edge there is almost no difference between the
1.7 and 1.8 GHz Xeon although the second has a twice larger cache.
By the way, SMP optimization is almost lacking: the speed in dual
systems is just a little higher.
Graphics Score. The difference between the systems
on different CPUs is the most considerable here! The video card
was used the same - and the speed of the graphics system differs
as much as twice. However, the leaders remain the same (they even
enlarged the gap): the Pentium 4 2.2 GHz, Xeon 2.2 GHz and Athlon
MP.
CPU Score. The Solid Edge prefers the classical
architecture - the Pentium III-S outscored even the faster Athlon
MP in performance of the computational system (but keep in mind
that the P-III-S has a twice larger L2 cache). Only the Xeon 2.2
GHz was able to outperform it. The SolidWorks 2001 has different
preferences: Pentium 4 2.2 GHz, Xeon 2.2 GHz and Athlon MP.
Conclusion
The new processor Intel released for dual-processor
systems is rather successful. Now there is one more highly efficient
processor from Intel like Pentium 4 (which supports SSE/SSE2, has
a large L2 cache and doesn't dissipate too much heat), and now also
supports SMP. There is also a very powerful Intel i860 chipset which
allows assembling modern and balanced systems, and there are high-quality
and powerful boards on it like Supermicro P4DC6+. When the Intel
Xeon will start shipping it will be much simpler to choose a platform
for high-performance workstations.
But you should be aware of the fact that it makes
no sense to build uni-processor systems on the Xeon. Performance
of uni-processor computers on the Xeon and Pentium 4 is the same,
while the Xeon+i860 platform is much more expensive. At the same
time, a dual-processor system based on the Xeon is almost twice
faster as compared with the Pentium 4 "Northwood" of the same frequency.
The server Pentium III-S 1.26 GHz looks paler than
the Xeon though in some applications this platform goes on a par
with the Xeon 1.8 GHz. The Athlon MP has a comparable performance
level with the Xeon, and many mobo makers have already announced
support of the dual Athlons and are developing AMD-760MPX based
boards but currently the market can offer only Tyan's boards.
The Xeon with the Pentium 4 core looks rather promising.
This year the company will release new versions working at the FSB
frequency of 533 MHz, and chipsets supporting DDR memory. Besides,
in the first quarter of 2002 we will get a new version of the Intel
Xeon equipped with L3 cache and supporting 4-processor configurations.