Additionally, Josh's first generics implementation in Java was reified (like C#), but the necessary JVM changes were delayed, so he had to quickly do a second implementation using erasure.
One of Java's original goals was to work in embedded systems. As such, I think type erasure actually is helpful to reduce the size of compiled artefacts because you haven't monomorphised functions - hence avoiding potential bloat.
As in all of engineering, most things are a tradeoff.
The other proposals I've heard for Java generics didn't monomorphize at compile time but instead modified the .class format to represent generic classes/methods closer to their source representation. Because the embedded systems at the time that cared about code size in Java would have been structured as bytecode interpreters, they arguably would have had smaller binaries without type erasure. The casts the erasure does under the hood are represented as additional bytecode at most call sites, but would have been implicit if generic functions had first class JVM support.
Sometimes, reality bites.