It's more that sound waves aren't simple 1D sine waves varying over time. Sound is a 3D pressure field varying over time. A single microphone on the outside of the can doesn't give you a full picture of what noise is entering the inside of the headphone. The nature of the noise, the headphone fit on the user's head, even the shape of the user's ears can all influence what the noise field looks like inside the headphones. ANC headphones use 1-3 microphones on the outside and 1 or more microphones inside the can. The DSP makes a guess as to what it should do based on the signal from the outside microphones, then has to make corrections based on what the internal microphones hear, all of this while filtering out your music signal (which the microphones also pick up) so it knows what is noise and what isn't. This is why the system can't be instantaneous, the overall system response is time varying, so the correction filters have to be adaptive. It's also why they do a really good job at blocking continuous noise that they can zero in on an ideal correction for over time, e.g. aircraft cabin noise, but don't do a great job with 'spiky' sounds, like speech.