The i860 combined a number of features that were fairly unique at the time, most notably its powerful superscalar support. The design mounted a 32-bit ALU along with a 64-bit FPU that was itself built in three parts, an adder, a multiplier, and a graphics processor. The system had separate pipelines for the ALU, adder and multiplier, and could hand off up to three instructions per clock.
All of the busses were 64-bits wide, or wider. The internal memory bus to the cache, for instance, was 128-bits wide. Both units had thirty-two 32-bit registers, but the FPU used its set as sixteen 64-bit registers. Instructions for the ALU were fetched two at a time to use the full external bus. Intel always referred to the design as the "i860 64-Bit Microprocessor".
The graphics unit was unique for the era. It was essentially a 64-bit integer unit using the FPU registers. It supported a number of commands for SIMD-like instructions in addition to basic 64-bit integer math. From this description, it should be obvious where Intel's later MMX functionality came from.
The chip was released in two versions, the basic XR, and the XP (code name N11). The XP added larger on-chip caches, a second level cache, faster busses, and hardware support for bus snooping, for cache consistency in parallel computing systems. The XR ran at 25 or 40MHz, and a process shrink for the XP (from 1 micron to 0.8) bumped the XR to 40 and 50MHz. Both ran the same instruction set.
Paper performance was impressive for a single-chip solution; however, real-world performance was anything but. While theoretically capable of peaking at about 60MFLOPS for the XP versions, hand-coded assemblers managed to get only about up to 40MFLOPS, and most compilers had difficultly getting even 10. This was due primarily to the state of the compilers at the time, which wasted most of the performance.
Another serious problem was the lack of any solution to quickly handle context switching. The i860 had several pipelines (for the ALU and FPU parts) and an interrupt could spill them and need them all to be re-loaded. This took 62 cycles in the best case, and almost 2000 cycles in the worst. The latter is 1/20000th of a second, an eternity for a CPU. This largely eliminated the i860 as a general purpose CPU.
At first the i860 was only used in a small number of very large machines like the iPSC/860 at Los Alamos National Laboratory. As the compilers improved, the general performance of the i860 did likewise, but by that point most other RISC designs had already passed the i860 in performance.
The i860 did see some use in the workstation world as a graphics accelerator. It was used, for instance, in the NeXTDimension, where it ran a cut-down version of the Mach kernel running a complete PostScript. This sort of use slowly disappeared as well.
In the late 1990s Intel replaced their entire RISC line with ARM-based designs, known as the XScale. Confusingly, the i860 name has now been re-used for a motherboard control chipset for Intel Xeon (high-end Pentium) systems.
Links: