Table of contents |
2 Implementation 3 Concerns |
At the most basic level the Itanium design is similar to RISC. That is, the core logic consists of a small set of instructions that are designed to be able to run very fast. Like most modern CPUs the Itanium uses several cores run in parallel for extra speed, a design known as a superscalar processor. Where the Itanium breaks with current RISC design philosophy is in how it feeds instructions into those core units.
In a traditional design a complex decoder system examines each instruction as they flow through the pipeline, and sees which can be fed off to operate in parallel across the cores. For instance a series of instructions that says
Predicting which code can and cannot be split up this way is in fact a very complex task. In many cases the inputs to one line are dependent on the output from another, but only if some other condition is true. For instance, consider the slight modification of the example noted before,
In these cases the circuitry on the CPU typically "guesses" what the condition will be. In something like 90% of all cases, an IF will be taken, suggesting that in our example the second half of the command can be safely fed into another core. However, getting the guess wrong can cause a significant performance hit when the result has to be thrown out and the CPU waits for the results of the "right" command to be calculated. Much of the improving performance of modern CPUs is due to better prediction logic, but lately the improvements have begun to slow.
Itanium instead relies on the compiler for this task. Even before the program is fed into the CPU, the compiler examines the code and makes the same sorts of decisions that would otherwise happen at "run time" on the chip itself. Once it has decided what paths to take, it gathers up the instructions it knows can be run in parallel, bundles them into one larger instruction, and then stores it in that form in the program—hence the name VLIW or "very long instruction word".
Moving this task from the CPU to the compiler has several advantages. Firstly the compiler can spend considerably more time examining the code, a benefit the chip itself doesn't have because it has to complete as quickly as possible. Thus the compiler version can be considerably more accurate than the same code run on the chip's circuitry. Secondly the prediction circuitry is quite complex, and this system reduces that complexity enormously. It no longer has to examine anything, it simply breaks the instruction apart again and feeds the pieces off to the cores.
The downside in this case is that a running program's behaviour is not always obvious in the code used to generate it. That means that it is possible for the compiler to "get it wrong", perhaps (in theory) even more often than the same logic placed on the CPU. Thus the design relies heavily on the performance of the compilers, the trade-off being to decrease microprocessor hardware complexity by increasing compiler software complexity.
Design of the Itanium series started in 1994, based on pioneering research by Hewlett-Packard into VLIW designs. The original HP design was "clean", but that is to be expected from a design that was never to be used in a production setting. After Intel became involved the cleanliness of the original design was marred by the addition of several new capabilities needed for "real work" use, notably the ability to run IA-32 instructions, and HP added their own features to ease migration from the HP-PA.
The project to produce a production quality Itanium is still ongoing. Originally planned for release in 1997, the schedule has slipped several times. In 2001 the first version, code named Merced shipped. Speeds of 733 and 800MHz were offered, with a choice of 2Mb or 4Mb cache. Prices ranged from US$1200 to over US$4000. However, performance was disappointing. In IA-64 mode, it performed only slightly better than an equivalently clocked X86 design, and when running X86 code, performance was extremely poor, about 1/8th that of an similarly clocked X86 processor. Soon even Intel suggested it wasn't a "real" release.
The main (though by no means only) problem with the Itanium was that the latency of its third-level cache was extremely high, which resulted in the amount of usable bandwidth being greatly reduced. Intel was forced to use an on-die solution for the next design, and at the same time lowered the primary and secondary cache latencies to the lowest of any modern design (apart from IBM's Power4). They also upgraded the Itanium's 64-bit 266MHz bus to a 128-bit 400MHz bus, tripling system bandwidth.
The second generation Itanium chips (Itanium 2) were launched in July 2002. In IA-64 mode, Integer performance was the best out of any design at the time of launch, while Floating-point code was second only to Power4. Available clock speeds and L3 sizes were 1 Ghz with 3 MB and 900 Mhz with 1.5 MB. Unfortunately, X86 performance, while improved, was still vastly slower than that of current X86 processors; Itanium 2's performance is similar to a Pentium II's.
Approximately one year later the second revision of the Itanium 2 design was released. Available versions are 1.5 Ghz with 6 MB L3, 1.4 Ghz with 4 MB, and 1.3 Ghz with 3 MB. At the time of release, the 1.5 Ghz version posted the highest uniprocessor SpecFP and SpecInt scores of any shipping chip.
The most recent members of the Itanium family, released in 3Q 2003, are a low cost Itanium 2 at 1.4 Ghz with 1.5 MB L3 and a low power version at 1 Ghz with 1.5 MB L3. The former is targeted to workstations, lower-end servers, and HPC clusters, while the latter is targeted to blade servers and other "dense" computers.
A number of other CPU lines have been end-of-lifed in favor of Itanium. HP's DEC Alpha and PA-RISC family lines are planned to be retired in favor of Itanium hardware. HP plans to continue support for the older lines for about 5 years as of 2003. SGI originally intended to phase out its MIPS architecture CPUs in favor of Itanium as soon as possible, but its plans are now unclear and a two architechture product line is likely for the near future. SGI's Itanium line is doing well, but its IRIX technology and installed base are significant.
Software support has much improved since the release of the Itanium 2. Ported operating systems include HP-UX, Linux, and Microsoft Windows. OpenVMS and FreeBSD are being worked on. HP eventually wants to move Tru64 customers to HP-UX on Itanium rather than porting it. Oracle and DB/2 ports are available, among others.
In 2002, the Itanium is the second most expensive computing project in history, behind only the IBM 360 (which, it's important to note, was a huge success). Nevertheless there are serious doubts about the future of the product, centering mainly on two problems.
The first is that the benefits in simplicity, one of the main goals of the VLIW design, are not at all evident in the Itanium. The 2nd generation Itanium has a massive 221 million transistors drawing an equally massive 130 watts of power. For this same sort of budget the IBM POWER delivers four whole 64-bit CPUs on a single processor module. However, the power problem is beginning to be addressed as of 2003. With the addition of more and more L3 cache, the transistor count is only increasing.
Designing a compiler which allows the Itanium to perform up to its potential has proved to be a difficult task and a very serious issue. Improvements are steadily being made; still, porting software to Itanium has a reputation for difficulty.
The next step for the Itanium family should be an Itanium 2 with 9 MB L3 at perhaps 1.8 Ghz. After that, a dual-core (ala POWER4), billion transistor design is expected in 2005, followed in perhaps 2007 by a chip codenamed "Tanglewood" which is being designed by many of the engineers from the cancelled Alpha EV8 project and which could outperform the current (1.3 Ghz and up) Itanium 2 by a factor of 10.
Critics of the Itanium processor have labeled it the "Itanic". Intel will be in a difficult position if the Itanium processor is a disappointment, as the need for 64-bit architecture in commodity servers is now pressing, and the need for a 64-bit architecture in personal computers is only a few years away.
A possible architectural threat to Intel now exists in the form of AMD's AMD64 architecture. AMD's AMD64 follows Intel's earlier behavior of extending a single architecture, first from the 16-bit 8086, then from 16-bits to the 32-bit 80386 and beyond, without ever removing backwards compatibility. The AMD64 architecture extends the 32-bit x86 architecture by adding 64-bit registers, with a full 32-bit and 16-bit compatibility modes for earlier software. AMD64 systems began shipping in mid 2003. Performance is very good, but the processor, called the Opteron, appears to be more of a competitor to Intel's 32 bit server chips than to the Itanium as of this time. The largest non-clustered systems currently being shipped have 4 processors (versus Itanium 2's 64) and the only native, 64-bit server OS currently available for them is Linux.
The failure of Itanium would also have a substantial impact on manufacturers such as HP who have announced that they will abandon their proprietary CPU architectures for the Itanium.
See also List of Intel microprocessorsDesign
A = B + C
and D = F + G
will not affect each other, and so they can be fed into two cores to be run at the same time.A = B + C; IF A==5 THEN D = F + G
. In this case the calculations remain independent of the other, but the second command requires the results from the first calculation in order to know if it should be run at all.Implementation
While some efforts have been made to improve the execution speed of x86 code, it remains too slow for many purposes. How important this is is debatable--not many people are buying Itanium systems to run x86 code on. However, Intel plans to replace the hardware x86 translation unit with a software emulation package in the spirit of Digital's FX!32 for alpha. Faster execution and decreased chip complexity are expected. Software legacy-processor emulation has precedent in enterprise computing, being used in VAX and S/390 machines, among others.Concerns