14. The Athlon XP
14. The Athlon XP
This is AMD’s competitor to the Pentium and is concentrating the mind of both companys and greatly benefiting the rest of us.
Competition concentrates the mind as well as improving things for the customers.
AMD has been creeping up on Intel for several years and finally the Athlon’s 37 million transistors are giving the Pentium a serious problem. It is usually cheaper and, in many tests, faster. The thought behind the Athlon is not to compete in terms of clock speed but to go for real speed by doing more work for each clock cycle. Even so, the Athlon XP is now competing head-to-head on speed, having matched the Pentium at 2.8 GHz using the same 0.13 micron technology though with a different internal design and ensuring (of course) that the two microprocessors are not pin-for-pin compatible. The Athlon includes a similar system of protection against thermal overload as in the Pentium.
An outline of the Athlon XP is shown in Figure 14.1.
Figure 14.1 The Athlon XP processor
For maximum speed the caches are on-chip. This eliminates the traveltime delay as the data is moved.
From the external memory and the surrounding hardware, the incoming information from the system bus is fed into a 64 kB instruction cache and a separate 64 kB data cache. The data cache feeds data into the L2 cache, which is somewhat larger at 256 kB and has techniques to ensure that the L2 cache does not duplicate any of the information stored in the data cache and hence we effectively have a 384 kB local high speed storage area.
As with all current microprocessors, great care is taken to guess the likely result in each branch instruction. Such instructions produce, usually, two alternative routes for the program. They answer questions like ‘is the result zero?’ and the answer will determine what happens next. If we always wait until the question is answered and only then do we load the instructions for the next bit of the program there is much wasted time as we saw in the earlier microprocessor designs. If we guess correctly, we can already pre-load the next part of the program and get started on it. The branch prediction circuitry does the guessing. If it gets it wrong, the old data is ditched and replaced.
Hardware data prefetch
This is a further form of prediction similar in which the incoming instructions are monitored and, as they are still arriving, the data that will be needed is guessed at, and loaded into the data cache so the Athlon loads data before it knows that it will be needed. As with the branch prediction, incorrect data has to be overwritten but on balance, it speeds up the data flow.
To make full use of its slower clock speed, the Athlon has three instruction decoders that can run independently. Each of these can handle three operations per clock cycle giving an overall throughput of nine operations per clock cycle, which is still significantly greater than the six operations per clock cycle of the Pentium.
Pipelines and instructions
The Athlon has three independent integer pipelines and also three similar floating-point pipelines whereas the Pentium has four pipelines for integers but only two for floating points. The three floating-point execution units simultaneously handle:
(a) store and load functions
(b) add functions
(c) multiply functions such as all the Intel MMX (multimedia extensions) instructions plus AMCs own SIMD (single instruction multiple data) instructions to provide full support SSE (streaming SIMD extension) and more lifelike 3D imaging and graphics – AMD’s name for these new instructions is ‘3D NOW!’ technology. (MMX is an Intel trademark; 3D NOW! is an AMD trademark.)
The state of the competition
The Pentium had a ‘rapid execution engine’ which had two ALUs (arithmetic and logic units) for the integer instructions, each clocked at twice the core processor speed running a front side bus at 533 MHz whereas the Athlon XP had only a 333 MHz FSB. This continues the pattern of the Pentium claiming the headline figure for speed. However, on balance, the Athlon is, by most tests, slightly faster than the Pentium.
That was written yesterday. This morning came the news that Intel has burst through the 3 GHz barrier (just) with a 3.06 GHz device. This, they say, includes hyper-threading, a technique that involves splitting a program into units that can be ran simultaneously. It allows the micro to run multiple applications at the same time, with the processor appearing to be two processors. Such multitasking is available in Windows XP and Linux and probably all their successors. So where does this leave the future, are we going to go for greater and greater speeds, or will we develop multi-tasking so we effectively have greater and greater numbers of micros sharing the work? I have a feeling that task sharing will be the answer.
It seems likely that Intel is now back out in front.
Exciting times ahead…
Almost immediately, Athlon has replied with what appears to be another significant step forward – 64-bit computing.
The microprocessor which as yet has been living with the codename ‘Hammer’ will be sold as the more user-friendly name of ‘AMD Athlon 64’ and will be available in mid-2003 and will join the PowerPC 970 in the ‘64’ club. It will be able to run 64-bit, 32-bit and 16-bit applications without any speed penalty and so avoid the cost of buying new software.
The only technical information that is included in the initial announcement is a new bus system using ‘hypertransport’ technology which AMD claims to increase throughput by 50% over existing designs. Intel will have something to say about that claim, I expect. The clock speed of the first batch will be little different from the XP, around the 2.8 GHz, but the design will provide more scope for development and will be able to run programs at a higher speed.
Really exciting times ahead… over 3 GHz clock speeds, 64-bit computing and multiple instructions being carried out simultaneously. Sounds good.
The desktop speed Olympics is shared between the PowerPC 970, Pentium 4 and the Athlon 64 whereas the computer market is dominated by the IBM clones leaving just a minor role for the PowerPC 970 in the Apple-Mac. As we saw earlier, the result of any speed test does depend on the nature of the test. Having said that, and at the risk of irritating the fans of each, in the race for the overall speed freak the Athlon 64, when it is available, will appear to be the winner with the other two virtually shoulder to shoulder a pace behind. But it depends on the test chosen and we know that any speed king will be dethroned so very quickly.
A (very) approximate comparison based on the currently available information is shown in Table 14.1.
|0.13/ 0.09 microns
|OSX IBM linux
- 4.4.4 The Dispatcher
- About the author
- Chapter 7. The state machine
- Appendix E. Other resources and links
- Example NAT machine in theory
- The final stage of our NAT machine
- Compiling the user-land applications
- The conntrack entries
- Untracked connections and the raw table
- Basics of the iptables command
- Other debugging tools
- Setting up user specified chains in the filter table