Author here: I'm equally surprised it worked so well with such little training data. Each image contained 1 example of each character (18 total characters * 10 images = 180 examples). Having said that, I don't think it would generalize well to other people's handwriting until I provided a (lot) more training data.
I experimented with a few regularization factors and ultimately settled on a lambda of 0.1. Rather than using a stopping criteria, I ran a fixed number of training iterations (~100) and just eyeballed the cost function results. Since my total training time was fairly brief (~2 minutes, tops), I had the luxury of designing the ANN somewhat heuristically.