An example of an application that can take advantage of SIMD is one where the same value is being added to a large number of data points, a common operation in many multimedia application. One example would be changing the brightness of an image. Each pixel of an image consists of three 8-bit values for the brightness of the red, green and blue portions of the color. To change the brightness, the R G and B values are read from memory, a value is added (or subtracted) from it, and the resulting value is written back out to memory.
With a SIMD processor there are two improvements to this process. For one the data is understood to be in blocks, and a number of values can be loaded all at once. Instead of a series of instructions saying "get this pixel, now get this pixel", a SIMD processor will have a single instruction that effectively says "get all of these pixels" ("all" is a number that varies from design to design). For a variety of reasons, this can take much less time than it would to load each one by one as in a traditional CPU design.
The other advantage is that SIMD systems typically include only those instructions that can be applied to all of the data in one operation. In other words, if the SIMD system works by loading up eight data points at once, the add
operation being applied to the data will happen to all eight values at the same time.
Sadly many SIMD designers are hampered by design considerations outside their control. One of these considerations is the cost of adding registers for holding the data to be processed. Ideally one would want the SIMD units of a CPU to have their own registers, but many are forced for practical reasons to re-use existing CPU registers - typically the floating point registers. These tend to be 64-bits in size, smaller than optimal for SIMD use, as well as leading to problems if the code attempts to use both SIMD and normal floating point instructions at the same time - at which point the units fight over the registers.
In the past there were a number of dedicated processors for this sort of task, commonly referred to as Digital Signal Processors, or DSPs. The main difference between SIMD and DSP is that DSPs were complete processors with their own (often difficult to use) instruction set, whereas SIMD designs rely on the general-purpose portions of the CPU to handle the program details, and the SIMD instructions handle the data manipulation only.
The first use of SIMD instructions was in vector supercomputers and was especially popularized by Cray in the 1970s. More recently, small-scale (64 or 128 bits) SIMD has become popular on general-purpose CPUs, starting in 1994 with PA-RISC's MAX instruction set. Today SIMD instructions can be found to one degree or another on most CPUs, including the PowerPC's AltiVec, Intel's MMX and SSE, AMD's 3DNow, SPARCs VIS, and the MIPS MDMX and MIPS-3D. The vast majority of software, however, does not exploit these instructions, and the main benefits come in specialized applications; one touted application has been graphics, but this may be dubious in the face of increasingly sophisticated graphics cards.