Simply described, in the discrete time case, the data to be transformed is broken up into chunks (which usually overlap each other). Each chunk is Fourier transformed, and the complex result is added to a matrix, which records magnitude and phase for each point in time and frequency.
One of the downfalls of the STFT is that the time resolution is poor for low frequencies but good for high frequencies, and the frequency resolution is good for high frequencies but poor for low. This is one of the reasons for the creation of the wavelet transform, which has equal resolution in both dimensions for all frequencies.