Very inconsistent machine learning model training
Very inconsistent machine learning model training
I'm trying to train a machine learning model to detect if an image is blurred or not.
I have 11,798 unblurred images, and I have a script to blur them and then use that to train my model.
However when I run the exact same training 5 times the results are wildly inconsistent (as you can see below). It also only gets to 98.67% accuracy max.
I'm pretty new to machine learning, so maybe I'm doing something really wrong. But coming from a software engineering background and just starting to learn machine learning, I have tons of questions. It's a struggle to know why it's so inconsistent between runs. It's a struggle to know how good is good enough (ie. when should I deploy the model). It's a struggle to know how to continue to improve the accuracy and make the model better.
Any advice or insight would be greatly appreciated.
View all the code: https://gist.github.com/fishcharlie/68e808c45537d79b4f4d33c26e2391dd
Your training loss is going fine, but your validation isn't really moving with it at all. That suggests either there's some flaw where the training and validation aren't the same or your model is overfitting rather than solving the problem in a general way. From a quick scan of the code nothing too serious jumps out at me. The convolution size seems a little small (is a 3x3 kernel large enough to recognize blur?), but I haven't thought much about the level of impact of a gaussian blur on the small scale. Increasing that could help. If it doesn't I'd look into reducing the number of filters or the dense layer. Reducing the available space can force an overfitting network to figure out more general solutions.
Lastly, I bet someone else has either solved the same problem as an exercise or something similar and you could check out their network architecture to see if your solution is in the ballpark of something that works.
Thanks so much for the reply!
I changed this to 5 instead of 3, and hard to tell if that made much of an improvement. It still is pretty inconsistent between training runs.
I'll try reducing the dense layer from 128 to 64 next.
This is a great idea. I did a quick Google search and nothing stood out to start. But I'll dig deeper more.
It's still super weird to me that with zero changes how variable it can be. I don't change anything, and one run it is consistently improving for a few epochs, the next run it's a lot less accurate to start and declines after the first epoch.
It looks like 5x5 led to an improvement. Validation is moving with training for longer before hitting a wall and turning to overfitting. I'd try bigger to see if that trend continues.
The difference between runs is due to the random initialization of the weights. You'll just randomly start nearer to solution that works better on some runs, priming it to reduce loss quickly. Generally you don't want to rely on that and just cherry pick the one run that looked the best in validation. A good solution will almost always get to a roughly similar end result, even if some runs take longer to get there.