Normal multithreading operating systems allow multiple processeses and threadss to utilize the processor one at a time, giving its exclusive ownership to a particular thread for a time slice in the order of milliseconds. Quite often, a process will stall for hundreds of cycles while waiting for some external resource (for example, a RAM load), thus wasting processor time.
A successive improvement is super-threading, where the processor can execute instructions from a different thread each cycle. Thus cycles left unused by a thread can be used by another that is ready to run.
Still, a given thread is almost surely not utilizing all the multiple execution units of a modern processor at the same time. Simultaneous multithreading allows multiple threads to execute different instructions in the same clock cycle, using the execution units that the first thread left spare. The number of concurrent threads can be decided by the chip designers, but practical restrictions on chip complexity usually limit the number to 2, 4 or sometimes 8 concurrent threads.
The only commercial processor using simultaneous multithreading is the Intel Pentium 4, starting from the 3.06 GHz model, and since introduced into a number of their processors. Intel calls the technology hyper-threading, which is basically a two-threads SMT engine. Up to 30% of speed improvement was measured against an otherwise identical, non-SMT Pentium 4.
The DEC Alpha EV8 was to be equipped with an even more powerful (4-threads) SMT engine, but the company owner Compaq terminated the project before it could be commercialized. The latest MIPS architechture designs include a two-threads SMT system known as MIPS MT.
The IBM POWER5, due to be released in 2004, will probably be a dual-core processor, with each core including a two-threads SMT engine. IBM's implementation is more powerful than the previous ones, because it will have the possibility of assigning a different priority to the various threads, and the SMT engine can be turned on and off dynamically, to better execute those workloads where a SMT processor would not increase performance.