Although a lot of time passed since Intel released SMP versions of the Pentium 4 for dual-processor systems and mainboards meant for them are widely available, the P4 Xeon themselves are in short supply. That is why the Pentium III, or rather its server version - Tualatin, remains the top Intel CPU for dual-processor systems.
There are few big IT companies in the world which surprise us so often as Intel. Dual-processor systems on the Intel Xeon "Foster" - a chip with a core close to the Pentium 4 and working at 1.7 GHz and higher - have been selling for a long time already. Boards designed for them (i860 chipset, Socket 603) are available almost a year. Soon they are going to release a 0.13-micron Xeon "Prestonia" operating at 2 and 2.2 GHz, having a 512 KB L2 cache and Hyper Threading support. Well, theoretically the range of platforms for dual-processor systems is quite wide:
But what in practice? The Intel Xeon is not cheap, and although the processor is in demand, it is not widely available. The Pentium III Xeon is also expensive, and the largest L2 cache goes along with the lowest frequency. The Athlon MP is highly efficient and acceptable in price, but it is not very popular yet. That is why if you think that a server means Intel plus Intel (processor + chipset, and sometimes a mainboard and system case/chassis) you have only the Pentium III family to go with (or P-III Xeon which has a 2 MBytes cache and can work in 4- and 8-processor configurations).
As you know, the Pentium III "Coppermine" was limited by 1 GHz, that is why the line was extended by the new variants of the P-III on the Tualatin core.
Originally Tualatin was a name of the Intel's project of transition to the 0.13-micron technology. The CPUs with a new core were in fact the first products of this project, and there are 4 processor families in all codenamed Tualatin (see the Table). As a result, the Pentium III line was divided into two classes - desktop and server processors. An L2 cache of the desktop version was still 256 KBytes, and the second got a twice larger one - 512 KBytes; besides, the Desktop Tualatin lacked for the SMP support. This processor didn't live long: it was mainly delivered to big PC assemblers, and was taken out of the mass market to give way to the Pentium 4. Simultaneously, the Celeron line starting from 1.2 GHz was moved to the Tualatin core, thus, receiving a twice larger L2 cache (256 KBytes) and 0.13 micron technology. But the bus was still operating at 100 MHz. There are also Pentium III-M processors with the Tualatin core meant for mobile systems; they support SpeedStep technology and has a surprisingly large 512 KBytes L2 cache. And the last but not the least is the Pentium III-S which is called by Intel a purely server processor. The new 0.13-micron core of the Pentium III has the following advantages:
However, the Tualatin based processors required modification of the VRM (Voltage Regulator Module). The Coppermine chips worked with the VRM 8.4, and the Tualatin got a new version - VRM 8.5 which realized the Load Line conception.
It was required to modify a VRM exactly at the time of transition to the 0.13-micron process. As the distance between semiconductor elements decreases at such clock speeds, leakage current and noise pickups become noticeable. Growing point power output causes changes in parameters of conductors and an insulator, and the processor becomes less reliable. Warming up of the die is far not uniform, and failures become more frequent. Besides, the percentage of non-defective processors gets much lower, and their final price goes up without limit (just take the first samples of the Pentium Pro).
The Load Line conception was developed to eliminate all such effects. The earlier used "ideal VRM" maintains constant voltage of the core within the definite range despite variable current consumption (for the P-III it is from 0 to 30 A).
The VRM based upon the load line conception changes the processor voltage, depending on the current consumption.
The 0.18-micron Pentium III Coppermine processors can stand both power supply types: "ideal VRM" and Load Line. However, the desktop and server versions of the P-III Tualatin which differ in an L2 cache size have different characters of the load lines. Therefore, such processors can't be used in boards designed for the P-III Tualatin of the other type. A processor will hardly burn down, but it will become much less reliable and won't live 7 years as stated in its specification.
Besides, the developers changed processor package (now the FC-PGA2 core is covered with a heat-dissipating plate) and its design. For example, when the Coppermine based processor overheated (THERMTRIP# signal) it was enough to remove clocking, and in case of the Tualatin you should turn off power supply as well.
The Tualatin's innovations gave birth to new chipsets: i815 B-Step, VIA Apollo Pro133T/266T, SiS 633T/635T, and revised chipsets from ServerWorks - ServerSet III LE3/HE-SL, and new mainboards both for desktop systems and workstations and servers. Let's take a look at several dual-processor models.
Characteristics of the Tualatin based processors
* In the Maximum Performance/Battery Optimized modes of the SpeedStep technology.
Chipsets and dual-processor boards for Tualatin
Today there are 4 main chipsets supporting P-III Tualatin in dual-processor configurations: VIA Apollo Pro133T/266T and ServerWorks ServerSet III LE3/HE-SL. All of them are modified versions of the previous models which worked with processors meant for Socket 370. The chipsets' features are compared in the table, and I'm going to underline only the most considerable differences. The Apollo Pro133T works with the PC133 SDRAM, while the Apollo Pro266T supports both PC133 and PC2100 DDR SDRAM (it also supports up to 4 GBytes of the memory). Reportedly, it is only Tyan, among big companies, which produces dual-processor models on the VIA Apollo Pro133T -- Tiger 200T (S2505T) and Tiger 230T (S2507T). And one of few dual-processor boards on the Apollo Pro266T - Supermicro P3TDDE - is equipped only with PC133 slots (though there are Supermicro P3TDDR and Iwill DVD266u-RN designed for DDR SDRAM).
Well, it is well known that the Pentium III based systems do not benefit from the DDR memory. In the Apollo Pro266T the north and south bridges are connected via the V-Link bus (266 MBps) instead of a PCI one; besides, it also can work with the VPX-64 companion chip (VT8101) which supports PCI 64-bit/66 MHz.
Dual-processor chipsets with the Tualatin support
* depends on a south bridge.
The ServerWorks chipsets were meant for the high-end market from the very beginning. They support PCI 64-bit/66 MHz, several independent PCI buses, operation with large memory sizes, a lot of operation optimizations tricks (for example, a dual-channel memory controller on the ServerSet III HE-SL). The junior ServerSet III LE3 chipset doesn't support AGP. It should be noted that the second chipset disappeared from the ServerWorks' site (www.serverworks.com), that is why it is unclear how the new HE-SL chipset with the Tualatin support should be called. However, new boards on the ServerSet III HE-SL are produced and do work.
And now I'd like to say a couple of words on the boards participating today. The descriptions will be much shorter than for desktop models as here it makes no sense to judge whether design and layout are appropriate. There are usually just a couple of dozens of servers and workstations in one lot, and cases and other hardware stuff are used either those recommended by the board makers or are developed by request. All the boards are based on the ServerWorks chipsets; as it turned out dual-processor models for the P-III Tualatin on the VIA's models are quite rare.
Intel Server Board SCB2 "Coosbay"
The board suits servers more than workstations (although, the name implies this, I'm going to account for the realities of the market). Lack of the AGP slot (although the ServerWorks ServerSet III HE-SL chipset supports this bus) and a great deal of integrated components such as dual-channel Ultra 160 SCSI controller, two network adapters and a video card prove its destination. By the way, the video card could have been put onto the AGP even without installing this slot onto the board. But the ATI Rage XL on the Intel SCB2 works on the PCI 32-bit/33 MHz. The board has a non-standard 24-pin ATX connector and two connectors similar to the PCI 64-bit/66 MHz which are meant for one- or three-slot PCI Risers (they are needed for PCI expansion cards to build compact 1U/2U systems). The board can work with PCI controller Zero Channel RAID from Intel or from Adaptec.
One of the two connectors of the SCSI controller is used for devices located inside the system case, and the second is output onto the back panel. Besides, although there is a normal dual-channel IDE controller there is only one connector on the board. However, there is one more variant of the SCB2 board where the dual-channel SCSI controller is replaced with the UATA/100 IDE RAID one (Promise PDC20267 chip).
This board is meant exactly for workstations, even heavy ones: there is an AGP Pro slot which supports powerful video cards with increased power consumption (something like 3Dlabs Wildcat 4210 or Wildcat II 5110). Besides, it has a dual-channel Ultra 160 SCSI controller on the Adaptec AIC-7899W chip which is equally useful both for a server and a workstation. A network controller is only one (the Intel's board has two). There are also PCI 64-bit/33 MHz slots (64-bit, but 33 MHz). Both SCSI channels are designed for internal use; by the way, two 68-pin connectors come with one 50-pin. There are also both IDE channels, two ATX connectors, space for the second flash memory chip and an AGP video card supplied with the board (ATI Rage XL with 8 MBytes memory).
Two twins based on the ServerWorks ServerSet III LE3 chipset differ only in a one-channel Ultra 160 SCSI controller on the Adaptec AIC-7892 chip (the P3TDL3 has it, and the P3TDLE doesn't). The appearance and size do not allow assembling a powerful server on it. But the P3TDL3/P3TDLE boards do not suit rack-mount 1U solutions as the DIMMs are not angled at 25°. That is why such boards can be used primarily in small servers or workstations. But both boards have an ISA slot. And as there are a lot of "homemade" ISA cards, the Supermicro P3TDL3/P3TDLE becomes unique because there are no more alternatives for such devices.
The Supermicro P3TDLR board is the same P3TDL3 but with the less number of slots, angled DIMMs and an added network controller. That is why this board can be used for 1U/2U modules of rack-mount servers. Other differences are well seen from the photos and the table.
Tyan Thunder HEsl-T
Although the Thunder HEsl-T may seem to be a follower of the Thunder HEsl, they differ much. An AGP Pro slot is taken off, but now there are two additional DIMMs (6 in all) and a different design. On the contrary, this board is very similar to the Intel Server Board SCB2: the same chipset, number of DIMMs and lack of the AGP. Therefore, the board is meant for a high-performance server. However, one would hardly assemble a rackmount module on it. The board is more conservative than the Intel's one: there are no PCI Risers, usual PCI slots, both channels of the Ultra 160 SCSI controller AIC-7899W are used inside and output two 68-pin connectors (a 50-pin one is lacking). There are two IDE connectors. Servers assembled on these boards will be very similar in functions. However, there are minor differences which can be sometimes advantageous or disadvantageous, i.e. an external SCSI connector: you can connect a tape streamer top it but you can't assign two arrays of SCSI drives to different channels without an additional controller.
Tyan Thunder LE-T
This board turns out to be very similar to the Supermicro P3TDLR. Both are based on the ServerSet III LE3 chipset and are designed for rackmount servers (angled DIMMs). However, the Supermicro has two additional PCI 32-bit/33 MHz slots, and the Thunder LE-T has a dual-channel SCSI controller (against a one-channel one of the P3TDLR). Besides, an integrated PCI video adapter (ATI Rage XL) on the Tyan Thunder LE-T is equipped with 4 MBytes of memory against 8 MBytes of the Supermicro. But it doesn't matter much for a rackmount server. That is why a decision to prefer one or the other board will depend on the components: one may need additional PCI slots, and the other - the second channel of the SCSI controller.
Dual-processor boards supporting P-III Tualatin
* All boards are equipped with the AMI BIOS.
** depends on a type and the number of installed PCI Riser cards.
*** DIMMs are angled at 25°.
Intel positions its P-III only for the server market. But what then can we take to build a powerful workstation on, if the performance level of two P-III Coppermine 1 GHz is not enough already? On the P4 Xeon? But where can I fetch this rare processor? On the P-III Xeon? But it has only 700 and 900 MHz frequencies and supports only FSB 100 MHz; besides, its large cache is useful far not for all workstation applications. On the Athlon MP? Why not? However, it is not an easy decision for many. So, why not to assemble, say, a video editing station on the ServerSet III HE-SL and two P-III Tualatin 1.26 or 1.4 GHz (e.g., with a PCI video card like the Matrox G450)? Or even an inexpensive graphics or CAD station on the same Tualatins and a VIA Apollo Pro133T/266T based board? At least, if we take the right boards we won't lose the warranty for the processors. So, taking the above into account we developed a strategy for estimation of the P-III Tualatin performance in dual-processor systems. The applications we used in the tests suit better for desktop systems or workstations than for servers, but there are quite tough among them which load much ALU/FPU of both processors and the memory subsystem.
Our testing technique has undergone some changes:
The aim of the tests is to estimate the performance of the P-III Tualatin 1.13 and 1.26 GHz processors in the dual-processor systems in comparison to the top Coppermine - P-III 1 GHz. The results will tell us whether the new-wave boards are really that good. We will also compare efficiency of boards on two ServerWorks chipsets -- ServerSet III LE3 and ServerSet III HE-SL.
The test platform consists of two Tyan motherboards: Thunder LE-T and Thunder HEsl-T, one or two P-III 1/1.13/1.26 GHz processors, 1 GBytes Registered PC133 SDRAM, Cheetah X15 36LP (ST336752LW, Ultra320 SCSI, 15000 rpm, 8 MB cache) hard drive and integrated graphics and network adapters. First of all, let me describe one more estimation criterion which is not reflected in the diagrams - large cache efficiency. It is easy to calculate it: first you have to estimate how much the Pentium III-S 1.13 GHz is faster than the Pentium III 1 GHz. Then calculate the same for the P-III-S 1.26 GHz and P-III-S 1.13 GHz, and then subtract the second value from the first one. The frequency difference between the P-III-S 1.13 GHz and P-III 1 GHz is the same as between the P-III-S 1.26 GHz and P III-S 1.13 GHz. But in the first case we have both higher frequency and L2 cache, while in the second case it is only the frequency that grows. Therefore, this factor allows us to estimate how much doubling of the L2 cache affects the performance in comparison to a simple clock speed growth.
3D Studio MAX
Although we used the new version, there were no surprises: the performance in rendering gets considerably higher only with the second processor; however, the increasing frequency allows for performance gain as well. If you compare figures of the uni- and dual-processor systems, you will see that the performance gain of the each next more powerful CPU with respect to the speed of the previous one is much higher on uni-processor systems. I think that on dual-processor systems the performance is so high that the memory subsystem and bus throughput become determining factors. Unfortunately, this parameters can't be upgraded in the dual-processor systems on the Socket 370 platform. The large cache efficiency in the 3DS MAX is rather low - about 7%.
For the LightWave there are two diagrams we are showing. It turned out that the SMP differently affects performance not only in different applications but also in different scenes. For example, the SMP allows for almost 45% gain in the standard test scene Radiosity_ReflectiveThings which contains no semitransparent objects. However, in the Tracer-Radiosity scene the second processor allows only for 18% of growth. The LightWave results are very similar to the 3DS MAX; the only difference is that the benefit from the SMP is less on average.
The performance gain caused by the SMP in the Alias|Wavefront Maya 4.0.1 is as great as in the 3DS MAX - about 90%. The diagrams of both applications look similar, including the case when a rendering speed becomes limited in the SMP systems with high-frequency processors by the processor bus and memory bandwidths. The only difference consists in the performance of the CPU with the L2 cache of 256 KBytes and 512 KBytes. The large cache efficiency is 20% in the uni-processor systems and 23% in the dual ones.
The new testing technique in the Photoshop 6.0.1 includes three scripts. The first one creates an average file (about 40 MBytes) and works with it using both filters and Photoshop instructions. The second uses only filters when working with a relatively large file (80 MBytes), and the third only uses filters with a small file (20 MBytes). I mean the instructions which are not included into the Filters menu, i.e. image size, rotation, operations with color, layers etc. The results will help us decide what operations of the Photoshop suit better for SMP (the largest gain is achieved in the first diagram), and what's the use of the larger L2-cache. Here we have the greatest large cache efficiency - 25%. By the way, contrary to the SMP, the twice larger L2 cache performed best in operations with the filters.
As expected, the performance gain from the second processor is lacking. It's interesting that the performance doesn't grow with the frequency either. The results of the P-III-S 1.13 GHz and 1.26 GHz are almost the same. Probably, the RAM becomes a bottleneck in this case. The excellent results shown by the Pentium 4 with its dual-channel RDRAM (3.2 GBps) in all media-encoding applications prove the suggestion. The large cache efficiency is rather high for the DivX - 20%.
MP3 codec GOGO (WAV-->MP3)
The GOGO-no-coda 2.39c supports MMX, 3DNow!, SSE and even multiprocessor systems! The results are contradictory to some degree - in case of one processor the large cache efficiency is high (15%) and the performance is not much dependent on frequency, but in the SMP system it's vice versa.
This is a traditional test of the memory performance. However, on the Socket 370 this subsystem can't provide the required speed. The advantage of the P-III-S 1.13 and 1.26 GHz is achieved due to the 512 KBytes cache. As for the SMP, the WinAce doesn't support free threading. But it is natural as archivers do not use widely such optimization type.
ServerSet III: HE-SL vs. LE3
As you can see the performance results of the ServerSet III LE3 are shown only on two diagram because only in the DivX and WinAce the speed of the processors differs noticeably. So, the performance of the LE3 and HE-SL is identical although the latter is considered more powerful (and more expensive). The dual-channel memory controller of the HE-SL failed to win in the most applications.
Intel extended its CPU line for multiprocessor systems of the lower and middle level, and the new-comers scored excellent results. A life cycle of platforms for servers and workstations is much greater than for desktop systems (as a rule, about two years), and the new Pentium III-S arrived just in time - they showed an excellent performance as compared with the aging Coppermine. In the near future we will get Pentium III-S clocked at 1.4 GHz, and inexpensive servers including compact 1U/2U won't be left without processors. It is clear that future workstations will be based on the Intel Xeon on the Pentium 4. But if you don't want to pay much today, there is an alternative with a good performance level, a low price and all advantages of the 0.13-micron technology: low power consumption and heating.
The server Tualatins have one more advantage. They
follow a classic evolutionary route with all its merits: predictability,
debugged architecture, support of software developers and a good
old reputation of a stable solution the pitfalls os which are either
eliminated or, at least, well known.
Write a comment below. No registration needed!