Experimental Study on Multiple Machine Learning Algorithms
This blogpost is more code than explanations, however, I did my best to make the code concise, clear and readable no matter the level of your knowledge, so kindly bare with me.
One of the crucial part of building a model is the choice of algorithm, which is dependent on the dataset, here we will be working with;
- Breast cancer data
- Car data
- Ecoli data
- Letter recognition data
- Mushroom data
Each of them can be found in my github repository below;
They will be evaluated based on Accuracy and standard deviation using;
- ID3
- Adaboost on Tree Stumps
- Random Forest
- Naive Bayes
- Bagging with Naive Bayes
- K-Nearest Neighbors with two different distance functions.
First, we initialize the Python methods we will be using;
For ease of parsing the dataset, a Python class with regression and classification methods is created ;
Now, comes the dirty part of data preparation, starting with Breast cancer dataset;
Moving forward with the next data on our list- Car data;
Next;
Letter recognition data;
Mushroom data;
Using the above methods we can determine how well each algorithm perform on each dataset; and this will determine the choice when building a model.
The Jupytr note, data and the above code samples can all be found at my github;
If you spot an error or will love to make contributions to make this more robust, kindly reach out to me. I appreciate your perspective, ciaos.