Posts

Showing posts from October, 2018

Find best hyperparameters using Grid Search in SVM (Support Vector Machine)

Image
The hyperparameters optimization is a big part now a days in machine learning. In this post, we’ll use the grid search capability from the sklearn library to find the best parameters for SVM. We’ll be using wine dataset for this post( Link ). Here, I have divided the whole dataset as train and test data randomly. Grid search is the hyperparameter optimization technique. It is provided as GridSeachCV in sklearn. Let’s start with importing packages and loading the data. from sklearn.model_selection import train_test_split from sklearn import svm import numpy as np from sklearn.model_selection import StratifiedShuffleSplit from sklearn.model_selection import GridSearchCV # loading the data wine_train = np.loadtxt('wine.train',delimiter=',') wine_test = np.loadtxt('wine.test',delimiter=',') x = wine_train[:, 1:13] y = wine_train[:, 0] X_Test = wine_test[:, 1:13] x_train, x_test, y_train, y_test = train_test_split(x,y) Now, we will

What is Cross-Validation? Perform K-fold cross-validation without sklearn cross_val_score function on SVM and Random Forest

Image
Cross-validation is a model evaluation method. Generally, we use it to check the overfitting of the trained model. If the whole dataset is divided into train and test data, then the chances are high that the train model might not perform well on unseen (test) data. We can tackle this problem by dividing the whole dataset into 3 sets: train, validation and test dataset. Validation set will be used to check performance before test data is applied. There are several methods to perform cross-validation such as holdout, K-fold, leave-one-out cross-validation . There are plenty of machine learning tools in python. The range varies from sci-kit -learn, TensorFlow, Keras, Theano, Microsoft CNTK. All of these provide excellent support and tools to build your models, work through datasets. In this post, we will see an alternate way to k-fold cross-validation . Further, we will use the script for SVM and Random Forest Classifier. K-Fold Cross Validation We divide the whole dataset