Machine Learning Regression Model Selection in Python
This blog is made based on the past six regression I have taught.
Hello, welcome back to ML! So far, we have covered regressions, and this blog is about choosing the right regression model. Which one should you apply to your model? Which one you should choose? You'll find all your answers in this blog.
I can confidently say this will be the ultimate guideline of regression!
Resources
https://drive.google.com/drive/folders/19gdkL2xaEsvCkRHbLgSYkkDEGEcX5cp_
This drive includes all the six regression model templates and the data sheet has almost 10k observations, with no missing data and no categorical data.
-
This is a great website containing a lot of data sets to download data sheets and practice. The provided data sheet is also from UCI.
Preparation of the regression code template
If you have a dataset with no missing data and no categorical data, you can use any of these regression models by simply changing the name of your dataset. If your dataset has missing data or categorical data, you need to use your data preprocessing tools to handle this first, and then you can apply these models.
First task is - make a copy of each regressions. Then rename the data sheet with Data.csv
, upload the file and run all the six templets one by one. You will see a Evaluating the model performance
. This demo works for any dataset, regardless of the number of features, as long as the features are in the first columns and the dependent variable is in the last column. It also assumes that any missing or categorical data has already been handled using your data preprocessing toolkit.
After running all of them, your random forest regression should evaluate a 0.96, which should be the greater of all the evaluation. So with a final R squared coefficient of 0.96, makes the random forest regression best fit for the given data. [you may try changing the degree level of polynomial regression and check them how they performs]
So, how do you find the best fit? You simply try all of the model (run the template) and using the R squared coefficient - you compare them and make your decision. And remember, if your dataset has missing or categorical data, you just need to use your data preprocessing tools to handle these situations. Once that's done, you can deploy your code templates.
Conclusion
Pros and cons of regression
That’s all for regression. Our regression part is done! Next we will learn classification.
Enjoy Machine Learning.