Experimental Study on Multiple Machine Learning Algorithms

stillbigjosh
2 min readAug 11, 2019

--

This blogpost is more code than explanations, however, I did my best to make the code concise, clear and readable no matter the level of your knowledge, so kindly bare with me.

One of the crucial part of building a model is the choice of algorithm, which is dependent on the dataset, here we will be working with;

  1. Breast cancer data
  2. Car data
  3. Ecoli data
  4. Letter recognition data
  5. Mushroom data

Each of them can be found in my github repository below;

They will be evaluated based on Accuracy and standard deviation using;

  1. ID3
  2. Adaboost on Tree Stumps
  3. Random Forest
  4. Naive Bayes
  5. Bagging with Naive Bayes
  6. K-Nearest Neighbors with two different distance functions.

First, we initialize the Python methods we will be using;

For ease of parsing the dataset, a Python class with regression and classification methods is created ;

Now, comes the dirty part of data preparation, starting with Breast cancer dataset;

Moving forward with the next data on our list- Car data;

Next;

Letter recognition data;

Mushroom data;

Using the above methods we can determine how well each algorithm perform on each dataset; and this will determine the choice when building a model.

The Jupytr note, data and the above code samples can all be found at my github;

If you spot an error or will love to make contributions to make this more robust, kindly reach out to me. I appreciate your perspective, ciaos.

--

--