Very inconsistent machine learning model training

I'm trying to train a machine learning model to detect if an image is blurred or not.

I have 11,798 unblurred images, and I have a script to blur them and then use that to train my model.

However when I run the exact same training 5 times the results are wildly inconsistent (as you can see below). It also only gets to 98.67% accuracy max.

I'm pretty new to machine learning, so maybe I'm doing something really wrong. But coming from a software engineering background and just starting to learn machine learning, I have tons of questions. It's a struggle to know why it's so inconsistent between runs. It's a struggle to know how good is good enough (ie. when should I deploy the model). It's a struggle to know how to continue to improve the accuracy and make the model better.

Any advice or insight would be greatly appreciated.

View all the code: https://gist.github.com/fishcharlie/68e808c45537d79b4f4d33c26e2391dd