I have taken some ML courses during my university studies and have also done some model implementations in other programming languages. So I didn't have to start from scratch. But I have been working on this project for about three weeks now.
Flink handles failures pretty well with its automated checkpointing mechanism (in addition to an other feature called "Savepoints", which allows you make a manual snapshot of the current state of the streaming pipeline to restore from it later in case of failure).