Most of them are not equivalent. dict() is not the same as {} as someone might have overwritten the dict function to do something completely different. Realizing that dict() and {}, in fact, are equivalent in this particular case requires whole program analysis which maybe PyPy is capable of.
At least tests 3, 4, 5, 6, the first 3 cases of test 7, the first and third cases of test 11, 13, 16, 17, 18, 20, 21 cases 1 and 2, 22, the first two cases of 24, 25, 26, 27, 28, 29 tests 2 and 4, 30, and 32 can be optimized without this issue, if not more.
As I said, "some of these are not in fact equivalent, but at least some of them are".
That's not right, #5 is dependent on what True is defined to:
True = 0
def a():
a = True
if a:
return True
return False
print a()
There are likely similar subtleties with the other examples, but I don't know the Python spec well enough to see them. You asked why the Python compiler didn't optimize these snippets more and that is the answer.
The examples are for Python 2, not 3. But it's blatantly obvious that you are more interested in "not being wrong" than discussing Python's compiler so I'll stop here.
(I am aware that some of these are not in fact equivalent, but at least some of them are)
On an unrelated note, the website would be easier to grok if it did diff-like highlighting of changed lines.