With regard to PSRLB, PSRLW, PMADDUBSW, and PMOVMSKB, I must say I loved assembly much more in the times when each instruction mnemonic was only 2-3 characters.
Oh, the real fun begins with SSE4.2 and things like PCMPISTRM (the whole PCMPxSTRx family), where you not only have the mnemonics, but also an 8-bit immediate with each bit specifying a different aspect of operation for the string comparison engine.