I agree. I'm in the pool of people who don't really understand p-values even after taking 2 statistics courses. The main issue I've noticed is that compared to many other calculations, you just have no idea if your result is correct. You can make wrong assumptions, wrong calculation, apply wrong methods, but in the end you get a number... and that's it. It may be a wrong number and you'll never know. This is completely different to many other practical applications of math where you can verify your result in various ways, or validate your answer against the initial assumptions, or test your program against lots of inputs.
The main issue I've noticed is that compared to many other calculations, you just have no idea if your result is correct. You can make wrong assumptions, wrong calculation, apply wrong methods, but in the end you get a number... and that's it.
The same is true in Bayesian statistics, and even simple formal reasoning with no statistics in sight. If you make wrong assumptions, you'll get the wrong result.
The only thing you can expect statistics to do is help you change your opinion about the relative merits of opposing theories. If both your opposing theories are wrong, you will still be equally wrong.
The true flaw with frequentist statistics is that it goes out of it's way to hide this fact from you. In contrast, Bayesian stats forces you to explicitly choose a prior, enumerate your assumptions, and accept that your conclusion is based on these things.
It's actually entirely possible to do some "checking of your answer" for p-values, as well. As you mentioned, for practical math, you can often validate your answer against the initial assumptions. This is true for statistical testing too, as it typically relies on many theoretical assumptions. So what you can do in practice is propose a different set of assumptions, perform the hypothesis test in a manner that follows those new assumptions, and see if you obtain a similar result. Typically, for any given hypothesis that you want to test, there are several possible methods for performing that test, so you can redo your test many times. This is one type of robustness checking, which includes many other things as well (e.g. running your test over subsamples or resamples of the data, checking for sensitivity to outliers, etc). Good statisticians generally like to do lots and lots of robustness checking.