As we predicted in the last article dedicated to AMD processors, signs of stability are showing in all their glory: company's processors are announced in subjectively correct intervals (not bothering on the one hand and not allowing to forget on the other) and series' clock rates grow. Besides, new platforms and sockets also testify to AMD confidence, as only those not afraid of threatening buyers with another motherboard upgrade can allow themselves to make announcements. So, let's speak of platforms first. Socket 939: revolution or evolution?First let's take a look at previous AMD64 solutions that have been exist for a long time: Socket 754 and Socket 940. Their reviews are provided in our platform section, so just let's quickly revise their primary technical and marketing features dividing them into two parts: "conditionally good" and "conditionally bad". Why conditionally? For example, because "bad" is usually a logical continuation of "good". So... Socket 754Good:
Bad:
Socket 940Good:
Bad:
You can see that disadvantages are really continuations of benefits. Summing up, we get the general picture: Socket 754 embodies mainstream and rather high performance at moderate price. Socket 940 personifies high-end and top, but expensive performance. It's easy to see differences in approaches of two major x86 CPU vendors, if you compare our couple with Intel's primary platforms - Socket 478 (Pentium 4/Celeron) and Socket 603/604 (Xeon). Let's do it. Socket 478Good:
Bad:
Socket 603/604Good:
Bad:
So we see that vendors have different focuses. While AMD separates mainstream from servers by multi-way border and some peculiarities directly affecting performance, Intel offers more or less universal Socket 478 alongside low-end to top processors. And "standalone" Socket 604 doesn't provide top performance, but compensates it with processor amount. We actually think Intel is more correct in its approach, but... there are no reasons to criticise AMD already, because it seems to have understood this as well. Now let's proceed to one of our today heroes, Socket 939 platform. Socket 939Good:
Bad:
We actually almost get Socket 478, don't we? Just without clear low-end (like Duron 64 or something) and single-channel solutions that imply low price. But it's easy to see that Socket A/Athlon XP still fits this gap all right. It's not very good for the vendor though, as additional platform provides additional problems. But this is history. Besides, AMD has been declaring the benefits of Socket A for too long. We guess the answer to the section header is getting clear for you. Of course, it's evolution. Socket 939 is actually a market positioning rectification. This is not a reproach, as everyone makes mistakes. Vice versa, Socket 939 has much more potential and vitality than Socket 754, as it provides wider price and feature ranges. Athlon 64 3800+ processor
We now get to the second hero of today article - Athlon 64 3800+ processor, the first for Socket 939. What are its features? Let's sum up a table and compare it with AMD's two latest products, Athlon 64 3400+ (Socket 754) and Athlon 64 FX-53 (Socket 940).
AMD processor summary
Speaking simple, the new Athlon 64 3800+ is the same Athlon 64 FX-53, but with halved L2 cache, faster Hyper-Transport, and usual DDR400 support. Considering the results of AMD Athlon 64 3200+ comparison to other top processors from Intel and AMD, we assume that halving L2 cache won't be fatal. If test results prove this is true, AMD's actions turn out to be rather reasonable, as new Athlon 64 for Socket 939 will combine the best of Athlon 64 and Athlon 64 FX at lower price (due to almost halved transistor amount). At that it will get rid of a part of L2 cache (not so required by software anyway). So now it would be logical to analyse test results that will prove we are either right or wrong. Testbed configurationTestbed:
Software:
Test resultsCPU RightMarkThe new CPU RightMark has finally obtained correct SSE3 support that affects the charts - now Pentium 4 "Prescott" has two lines instead of one. This didn't change the general picture though: SSE3 support slightly helped Prescott in the math unit, but couldn't help to catch up with all AMD processors. Though Intel processors (Prescott, in particular) played good in the rendering test. No need in general result, as it has been clear for long that it directly depends on performance in the hardest task, which is rendering. So, we'll just say that AMD group scored 12.6 fps, while Intel group scored 15.2 fps (due to Prescott). Comparing AMD processors with each other demonstrates almost no difference between Athlon 64 3800+ and Athlon 64 FX-53 that proves our above assumptions. But we shouldn't rush, as only one benchmark results were analyzed. RightMark Memory AnalyzerThese are the peak read bandwidth results obtained using Software Prefetch optimization method widespread in real software. Let's start with AMD. All three Athlon 64 showed very good results having achieved almost 100% of peak theoretical bandwidth in single-channel (Athlon 64 3400+) and dual-channel (Athlon 64 FX-53 è 3800+) mode. However, the new Athlon 64 3800+ let them down a bit. Now let's take a look at Pentium 4. The new Prescott demonstrates just excellent results equal to 100% of theoretical bandwidth. Unfortunately, Northwood could achive only 91% of it (but we expected it). Actually, if you look into Intel papers, you will see that Software Prefetch algorithm has been considerably improved in Prescott. We have written about this, as well.
You may be confused with the fact that in two cases (Athlon 64 3400+ and Pentium 4 3.4E GHz) actual bandwidth exceeds theoretical (!). But there's a simple explanation: the 1.5% "overhead" fits weel into possible inconsistency between actual memory clock rate and suggested 200 (400 DDR) MHz. As for achieving theoretical peaks in practice, this is normal for RMMA and is not incorrect. Besides, this feature may be used in motherboard examinations, because it will clearly indicate both slight overclocking and serious design errors (when bandwidth doesn't reach its peak).
Now let's look at peak write bandwidth that was measued using Non-Temporal Store optimization method widespread in writing large data units to memory bypassing processor cache to avoid its littering. In this test Athlon 64 processors are clear leaders. It's interesting that all three show almost the same results equal to about 96% of theoretical peak. It seems that integrated memory controller benefits Athlon 64, especially considering that both Pentium 4 do much worse indicating only 67% of theoretical maximum.
After the bandwidth, let's try to evaluate mininal and peak latencies. But first, several words about our measurement method. First of all, we accepted the "true" memory latency as the latency of pseudo-random walkabout of a 4MB memory block (knowingly larger than L2 cache of any processor tested). "Pseudo-random" means that any consequent memory page is loaded in strictly linear successive order (to minimize D-TLB misses), while lines in a page are accessed randomly. To achive minimal and peak results we used BIU (Bus Interface Unit or simply L2-RAM bus) offloading by inserting "blank" operations not related to memory access, but resulting in certain time lag between successive accesses. Enough methodology, let's analyze data we gathered. First, we'll mark a very interesting picture showed by all three Athlon 64 and... Pentium 4 Prescott. We mean that the dispersion of minimum and peak latency makes 3-5 ns with all four. However, Pentium 4 Northwood behaves differently and shows whole 19 ns dispersion. But the most surprising are the latency values that were minimal in case of Prescott (equal to 24 ns). In the order of minimal latency increase it was followed by Northwood and then by AMD Athlon 64 group. The latencies of the latter was about the same, except for FX-53 that showed slightly worse result (most likely, due to registered memory). What can we say about the picture in general? The most natural explanation seems to be like this: seemingly, "true" latencies are much lower then values indicated by tests. This is so, because real-life memory accesses result in BIU timeouts. Comparing Northwood and Prescott, you will see that Northwood has much longer timeouts (also indicated by large latency dispersion that means that processor needs considerable amount of "blank" operations to offload the bus). At the same time Prescott's low dispersion means that it almost doesn't need offloads and its BIU has been considerably improved. As for Athlon 64 processors, which lost this test, it seems AMD's praised integrated memory controller that did very well in peak write bandwidth couldn't do the same in this test.
3ds maxThe result is well predictable: Athlon 64 3800+ would hardly have outdone Athlon 64 FX-53 (just how?), so taking into account test results of the latter, it's reasonable to assume the general picture is not to change and both Pentium 4 processors are to win slightly. Just slightly. The clear victory of Athlon 64 3800+ over single-channel Athlon 64 3400+ indicates that halving L2 wasn't fatal, while additional clock rate came in handy. Pity, but neither A64 3400+ vs. A64 3800+, nor A64 3800+ vs. A64 FX-53 comparison enables us to unambiguously name new processor's ultimate benefit - increased clock rate or dual-channel memory controller. So, we'll assume it's clock rate, because 3ds max's traditional simple taste to bandwidth is proved by many other test results. LightwaveIn this case the picture is typical and is provided by the benchmark: Lightwave's considerable optimization (in versions 7+) for Intel Pentium 4 has repeatedly been confirmed by test results. Therefore we weren't surprised by victory of Northwood 3.4 GHz. However, as we have written before, seemingly this optimization was honed for Northwood so much that Prescott's benefits almost negated it. We can see that the newest cores with the same clock rate enabled Athlon 64 3800+ and Athlon 64 FX-53 to catch up with Socket 478. The former did almost the same despite halved cache of A64 3800+. Adobe PhotoshopThis benchmark is critical to both cache size (this is clearly seen if you compare results of Athlon 64 3800+ and Athlon 64 FX-53, and not that clearly seen on the example of Northwood 3.4 GHz vs. Prescott 3.4 GHz), and memory performance (the victory of dual-channel Athlon 64 FX-53 over single-channel Athlon 64 3400+ can't be explained by clock rate difference alone, because it's proportionally exceeds this difference). Pity, but it seems larger cache has more influence, because Athlon 64 3800+ results are closer to those of its single-channel brother. And neither increased clock rate, nor dual-channel controller could help it win, just compensated the halved cache. Besides, the large cache provided good result to Prescott as well. LAMELAME MP3 codec is rather indifferent to cache size (Athlon 64 3800+ and Athlon FX-53 are even) and is critical only to clock rate. Prescott is an unfortunate exception though. Formally the winner is Northwood 3.4 GHz again, but AMD's top processors managed to catch up anyway. Note that it's not the first time, when Socket 478 platform is saved by the good old Northwood, while Prescott (that should be more progressive) loses to AMD's high-end despite the same clock rate as Northwood has. OGG EncoderIt's the general parity as well, but this time AMD top processors lead. And Prescott lags behind again. They just have to do something about it. Either provide higher clock rate, or... Anyway, something has to be done, because in its current state this processor doesn't seem to be "NetBurst's radiant future". While it wins occasionally, the seemingly harmless application still trips is up. DivXBoth processor groups are almost even. At that each in each group both CPUs have different cache size at the same clock rate. This unambiguously proves that this codec is indifferent to L2 cache. Athlon 64 3400+ lags behind, but this lag fits into the clock rate gap between this processor and A64 3800+ and A64 FX-53. The difference between Intel and AMD top CPUs is not that considerable to speak about. Windows Media Video 9Intel group has Prescott as the champion in this test. Actually, Windows Media Video 9 codec supports Hyper-Threading, and we have already written that many applications optimized for this technology feel good on processors with this core. In general, it's parity of the top rivals, just Northwood is behind. Canopus ProCoderWe can clearly see sensitivity to cache here. Prescott wins over Northwood, and even Athlon 64 3400+ wins over Athlon 64 3800+ despite single-channel controller and lower clock rate. Athlon 64 FX-53 is the leader, but we are interested in A64 3800+ more now. The latter did rather good and at least outran all three Intel processors. Mainconcept MPEG EncoderThis resembles Lightwave with all its benefits for Socket 478 and without any disadvantages. This application, highly optimized for Hyper-Threading, feels only better on Intel's new core, and AMD is powerless here for now. Maybe encoder developers just don't know there's another CPU vendor? :) (Win)RARLarge-cache processors are triumphant, but Prescott did worst of all "megabyters". Athlon 64 3800+ did its simple job of catching up with larger-cache AMD products and not losing to Intel. 7-zipAnd again multi-threading affects test results: both Intel processors take the lead being due to virtual multi-way nature. We knowingly included their results with Hyper-Threading disabled, so it's clear. What can we say? AMD just doesn't have similar technology yet... GamesIt's a convincing victory of AMD64 platform in general. It's good that Athlon 64 3800+ is closer to Athlon 64 FX-53 then to the single-channel Athlon 64 3400+. This proves our assumptions that 512KB cache is enough for many applications again. Prescott doesn't disappoint much, but doesn't impress as well, being slower than Northwood in three games at once. ConclusionsPresentSuch a paradox it is: Socket 478 platform generally withstands the force of Athlon 64 3800+, but... it seems fragmented. In some cases AMD's new processor loses to Northwood, sometimes it loses to Prescott. But these cases don't coincide too often. In other words, if we actually take the one best result of Socket 478 and compare it to one result of Athlon 64 3800+, it would seem like parity. (You can see there's no Intel Pentium 4 eXtreme Edition in this test, so it would be logical to exclude Athlon 64 FX-53 as well). But a user won't buy two S478 processors at once to swap them depending on the task! But separated, both Northwood 3.4 GHz and Prescott 3.4 GHz lose to the new AMD processor. So this time we consider Athlon 64 3800+ is a convincing winner and as a complete solution name it the most powerful "truly desktop" x86 processor at the moment (excluding extreme variants like Athlon 64 FX and Pentium 4 XE). Well, isn't this a major event? However, you shouldn't overestimate it much, because market inertia is great and the main rival has enough time to answer. We guess today champion has about a half-year or slightly more. But, AMD won't be just sitting still all this time as well... Intel has two variants of a considerable answer: either to release a processor that will outrun (or catch up with) AMD's product in a performance boost, or... abruptly turn aside and explain that it's still at the front (just the front is elsewhere). Considering the latest events (the cancellation of Tejas and rumours on possible arrival of a Banias/Dothan-like core to Intel's desktop camp), the second variant seems more probable. In this case, it would be interesting to imagine where the front is now and why it's there. :) Future?..Considering Socket 939 an analog of "universal desktop" Socket 478 (that proved the success of such approach by its long happy life), the main problem for it is the "low-end problem", as we mentioned before. It has two main parts: competition between AMD platforms and inconsistency between some technical features of modern Socket 939 processors to the meaning of low-end. Let's describe these in order. Platform competition will obviously mean that inexpensive Socket 939 processors will almost negate the attraction to Socket 754 (at least for those without S754 boards). Besides, cheap S939 processors will surely kill Socket A as well. We don't think it's a problem, but AMD might think different... The second part of the problem is that S939 has initially been designed for dual-channel memory controller that increases cost price of S939 processors alongside their market price. However... who said that it would be impossible to release a single-channel processor for Socket 939? Today all AMD64 products can work with single memory modules, just not using the second channel. And now let's think over the general sense of Socket 754 and Socket A existence on the background of Socket 939. And in about a half-year after the announcement? There's just no sense in continuing to mess with Socket 754. At least, it's not logical. It's not good for those users, who have already bought S754-based systems, but a la guerre comme a la guerre and Socket 754 is to pass away... The only "advantage" of S754 is the availability of processors combining large L2 cache with single-channel memory controller. But on the background of the new Athlon 64 (dual-channel memory or usual DDR400 + 512KB L2) and future Athlon 64 FX (the same dual-channel memory or usual DDR400 + plus doubled L2), single-channel Athlon 64 with large L2 cache will look like an unfortunate hybrid of high-end and low-end that took everything useless (for middle-end) from both segments. Second, even if AMD leaves this ambiguous branch alive, nothing will prevent it from releasing such processors for Socket 939! A possibility of using dual-channel memory controller doesn't imply an obligation! And there even can be "special" single-channel Socket 939 boards. Therefore, the only reasonable argument for S754 can be its price, lower for some reason. But there are no such reason. S939 also has 4-layer boards and packages are similar. And we don't believe in considerable price increases due to 25% more pins. Especially considering that S939 single-channel processors might have fewer pins, because this won't prevent inserting them into socket. There's one argument left: the support of modern Socket 754 users. But they are few, because AMD didn't hurry to ship S754 processors, as if foreseeing this platform's death (they knew! :) Besides, caring for users, one shouldn't nevertheless forget that users don't excuse mistakes. The support of yet another (futureless!) platform will mean additional expenses of money and efforts that will negatively affect the new developments, Socket 939 production, etc. And users won't be that glad that all these delays are caused by Socket 754 philanthropy. So, it's better to invoke a storm in a teacup once and get rid of a prospectless platform than to accumulate problems that will reflect on all users in general. But things are more complicated with Socket A. First, we can soundly assume that such processors have considerably cheaper crystal and package. They will hardly make cost price of hypothetical AMD64 low-end even with that of Athlon XP. At least, before they move to 90nm process. We don't know if AMD can keep the bottom border of processor prices simultaneously increasing their cost price and therefore losing some profits. Perhaps, it's just not ready for such expenses at all. In this case, saving Socket A is not related to philanthropy and is purely pragmatic. Besides, it's not clear whether all Athlon XP production lines are ready to move to AMD64 products. Another aspect of this is purely political. As mentioned above, AMD had been declaring long support for Socket A for a long time. But... who needs "chainsaw massacre", when the patient might die in the old age anyway? We are almost 100% sure that if any Socket 939 processor offered for about $70, Socket A meaningless will become obvious for everyone, so chipset and motherboard makers alongside users will make at least the same efforts as AMD to speed up the replacement. And the latter will just have to react pathetically and say something about unwanted willingness to support. Therefore the end of Socket A depends exclusively on either AMD's readiness for some profit losses due to low-end CPU sales, or to transition to 90nm process. We'll just add that we think the above way is not only possible, but also the most strategically correct. Universal platforms were always the most successful providing the same processor socket for both secretary's typewriter and heavyweight graphics workstation, and advanced gaming system. The first such platform was Slot 1 followed by Socket 370. AMD has its own variant - Socket A - that has become hopelessly obsolete. Now only Intel has a suitable universal platform - Socket 478. So it's high time for AMD to obtain such product as well. Especially considering the time, when the main rival is shakes slightly, Socket 478 future is ambiguous, and everyone is waiting for a fully-fledged solution capable of living a more or less long life. We'd like to hope this is the actual purpose of Socket 939. As for the support of Socket 754 and Socket A, noone could ever bound the unbounded. We believe it would be strategically incorrect to disperse efforts at the moment. It would be even worse to make the separation of low-end and middle-end + high-end a future concept. It's just not comfortable for everyone, including AMD itself, chipset and motherboard makers, distributors and users. We don't say that everything except Socket 939 (and 940, for high-level workstations and servers) should be neglected right now. This should just be made a strategy. ConclusionIn general, Socket 939 platform and the first processor for it excite only positive emotions. Not introducing anything de facto innovative, these are just a correct reshuffle of previously separate processor and chipset features based on AMD64. This reshuffle enabled AMD to make Socket 939 and updated Athlon 64 even more attractive than Socket 754 + Athlon 64 and single-way Socket 940 + Athlon 64 FX. It's a very good reserve for the future that looks even better on the background of the rivalling camp. But let's not draw high-flown conclusions, as Intel still has time for preparations. Not much, but it has... Stanislav Garmatyuk (nawhi@ixbt.com) Dmitry Mayorov (destrax@ixbt.com) Dmitry Besedin (dmitri_b@ixbt.com) 01.06.2004 Write a comment below. No registration needed!
|
Platform · Video · Multimedia · Mobile · Other || About us & Privacy policy · Twitter · Facebook Copyright © Byrds Research & Publishing, Ltd., 1997–2011. All rights reserved. |