Intel was generally disappointed with their earlier SIMD effort, MMX. MMX had two main problems: it re-used existing floating point registers making the CPU unable to work on both floating point and SIMD data at the same time, and it worked on only integers.
SSE added eight new 128-bit registers that were divided up into four 32-bit floating point numbers. This was in addition to the existing eight "re-used" floating point registers in MMX. SSE also adds a number of instructions for working on floating point data, which sees much more use than the earlier MMX now that the graphics cards all handle integer math internally.
Oddly, however, SSE is implemented using the same circuitry as the FPU, meaning that, once again, the CPU cannot issue both FPU and SSE instructions at the same time for pipelining. The separate registers allow them be mixed together without the performance hit from MMX.
Intel's Pentium 4 implements SSE2, an extension to the basic SSE instruction set.