The first problem you face when starting with machine learning:
Finding the perfect balance between training for too long or not training enough.
Either one and your model will be trash.
Here is the simplest and one of the most effective ways to work around this:
First of all, remember this:
• Overfitting will likely happen if you train for too long.
• Underfitting will likely happen if you don't train long enough.
It's not about time, but about number of iterations—or epochs.
We need a way to find the correct number.
This is a step-by-step guide to building your first deep learning model.
Let's get started:
Building the model is essential, but it's just the beginning.
I want to give you everything you need to understand what's going on:
• A way to make changes
• A way to experiment
• A way to keep track of everything you did
We are going to use a neat tool for that.
A lesson some people learn too late:
Building a machine learning system is not like building regular software.
But unless you've done it before, you'd think it's all pretty much the same.
Here is a reason they are different and what you can do about it:
For the most part, regular software looks like this:
• Build once
• Run forever*
I know there's no such thing as *forever*. Things change all the time.
But this rate of change pales in comparison with machine learning systems. Here is an example:
Everyone knows they need to replace missing values in their dataset.
Most people, however, miss one critical step.
Here is what you aren't doing and how you can fix it:
I'll start with an example.
A company surveys a bunch of people.
Some people leave one particular question unanswered.
When we collect the data, and before using it to build a model, we must take care of these missing values.
I'm sure you've used data augmentation before.
Most people think of data augmentation as a technique to improve their model during training.
This is true, but they are missing something.
Here's a brilliant approach to improve the predictions of your model:
Let's start with a quick definition of data augmentation:
Starting from an initial dataset, you can generate synthetic copies of each sample that will make a model resilient to variations in the data.
This is awesome. But there's more we can do with it.
I've helped hundreds of people start with machine learning.
Everyone asks me the same, fundamental question.
But they all hate my answer. Engineers even more.
Let me try again, but this time I'll show you a few lines of code that will 10x your process:
People always ask: "How do you know what to do now?"
The answer is simple, but nobody likes to hear it: "Well, you don't know."
After a few seconds, I follow up: "You need to experiment to find what works best."
Here is the reality they don't want to hear about:
One of the most common problems in machine learning:
How do you deal with imbalanced datasets?
Not only does this happen frequently, but it's also a popular interview question.
Here are seven different techniques to deal with this problem:
What's an imbalanced dataset?
Imagine you have pictures of cats and dogs. Your dataset has 950 cat pictures and only 50 dog pictures.
That's an imbalanced dataset.
There's a significant difference in the number of samples for each class.
Many people new to machine learning have no idea that labeling data is a problem they need to think about.
To be clear: "labeled datasets" aren't a thing in the real world.
Here is an excellent approach to getting past this problem:
I've found that gaining access to lots of data is not always an issue.
But getting access to "labeled data" is usually not a thing.
Sometimes you can solve this by throwing people at the problem, but this is not always a solution.