Linear Classifier:
The most common supervised learning tasks are regression and classification. By using regression predicts a value and using classification to predict the class.
The label has only two classes its called as binary classifier
The label has more then two classes called as multiclass classifier.
Linear Classifier with TensorFlow:
Lets say our data having input x is in m*n and labels y is in n*1. And the cross entropy loss between the actual y and predicted y is defined as;
Our goal is reduce th ecross-entropy loss L. softmax is find the distribution of all probabilities of the correct class i,e. p=[1,0,0,…..0] this distribution is called as one-hot vector.
Importing libraries:
import numpy as np import tensorflow as tf import csv import matplotlib.pyplot as plt
Global variables:
Nu_LABELS = 2 # y is 1 and 0. Binary classifier BATCH_SIZE = 100 NUM_EPOCHS = 100
Let’s read the data from csv:
def readDataFromCSV(fileName): labels = [] features = [] with open(fileName) as csvFile: readCsv = csv.reader(csvFile, delimiter=',') for row in readCsv: labels.append(row[0]) features.append(row[1:]) features = np.matrix(features).astype(np.float32) labels = np.array(labels).astype(np.uint8) labels = (np.arange(NUM_LABELS) == labels[:, None]).astype(np.float32) return features, labels
Our data consists two classes and two features. Let’s call the features x1 and x2. Then the format is y, x1, x2. In each line of the training data file, first one is the class y (value 1 or 0) which we call labels, second number is the feature x1 and the third one is x2. As an example 0, 0.14141324833, 0.2430820358 . The labels are returned as one-hot vector, which is, all values in a row is 0 except the I th class which is 1.
Now, lets call the function to read the data:
trainData, trainLabels=readDataFromCSV('sudata/linear_data_train.csv') # 1000x2, 1000x2 (one hot)
testData, testLabels = readDataFromCSV('sudata/linear_data_eval.csv') #
trainSize, numFeatures = np.shape(trainData)
print(trainSize, numFeatures)
print(trainLabels.shape)
visualization of our data is:
plt.scatter(trainData[:, 0], trainData[:, 1], c=trainLabels[:, 1], edgecolor='w', linewidth=0.25)
If you observe the visualiztion it clearly shows the data is fairly linearly seperable. Now we want build a classifier fot that separate the clusters.for this we need to train our model.
Train the model:
To train our model, first we have to choose cost function or loss function. The basic thing of our model is to minimize the loss function. Here we are using softmax cross-entropy loss function. It is already inbuilt in Tensorflow, and directly invoked by the function.
tf.nn.softmax(tf.matmul(X, W) + b)
x is input matrix , w is weights , b is bias .
this is very useful to understand the working of our algorithm.here we consider
nF = no.of features , nE = no.of examples and
nC = no.of classes. Then
sizeof(X) = nE x nF sizeof(W) = nF x nC sizeof(b) = nC x1 sizeof(y) = nE x 1
now we are going to train with linear model.
x = tf.placeholder("float", shape=[None, numFeatures]) # 1000x2, data
y_ = tf.placeholder("float", shape=[None, NUM_LABELS]) # 1000x2, labels and true distribution
W = tf.Variable(tf.zeros([numFeatures, NUM_LABELS])) # 2x2, weights
b = tf.Variable(tf.zeros([NUM_LABELS])) # 2x1, biases
y = tf.nn.softmax(tf.matmul(x, W) + b)
crossEntropy = - tf.reduce_sum(y_ * tf.log(y))
trainStep = tf.train.GradientDescentOptimizer(0.1).minimize(crossEntropy)
sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)
for step in xrange(NUM_EPOCHS * trainSize // BATCH_SIZE):
offset = (step * BATCH_SIZE) % trainSize
batchData = trainData[offset:(offset+BATCH_SIZE), :] # e.g. from 100 to 200, all cols/features
batchLabels= trainLabels[offset:(offset+BATCH_SIZE)]
sess.run(trainStep, feed_dict={x: batchData, y_: batchLabels})
print('Weight matrix W: '); print(sess.run(W))
print('Bias vector b:'); print(sess.run(b))
After completion of training , the weights and biases are like this. Here x is the input and y is predicted.
Weight matrix W: [[-10.72918034 10.7291851 ] [-11.53842258 11.53842354]] Bias vector b: [ 8.89335346 -8.89334583]
Now, we can also train the model with minimum squared error (MSE) loss. Because how the loss functions affects the prediction performance. But probably will not see a significant difference for this linear set of datasets. In
this case we need non-linear model for prediction, In this non-linear models will notice some difference in the performance for different cost functions. In fact, sometimes, we have to select a specific cost function over the others for some specific datasets.
y = tf.sigmoid(tf.matmul(x, W) + b) loss = tf.reduce_sum(tf.square(y - y_)) optimizer = tf.train.GradientDescentOptimizer(0.01) trainStep = optimizer.minimize(loss)
Here the learning rate is a hypermeter which we are going to tune. The standard value of the learning rate is 0.01. learning rate is a step size to search for global minima. The lower learning rate will takes long time to settle for lowest point and it reaches slowly to global minima.but the higher learning rate which overshoots the our solution.
Plot the decision boundary:
First, we consider the weights and biases and the our classifier should find a separating hyperplane between these two classes, For now, consider only the w1=−10.72918034,w2=−11.53842258 and the corresponding b=8.89335346. once reffer the w1 is parameter of x1 amd w2 is the parameter of y
(parameter x2). Now the hyperplane satisfy the equation Wx+b=0 it is in the form of y=mx+b.
W x + b = 0 w1 x + w2 y + b = 0 y = (-w1/w2) x - b/w2
Now, plot the decesion boundary,
plot_x = [np.min(trainData[:, 0])-0.10, np.max(trainData[:, 1])+0.250]# 2x1, extend the line to left and right
plot_y = -1/W[1, 0] * (W[0, 0] * plot_x + b[0])# 2x1, Similar line can be found by using the second cols of W and b
plot_x = tf.cast(plot_x, tf.float32) # convert to tensor
print(sess.run(plot_x))
print(sess.run(plot_y))
plt.scatter(trainData[:, 0], trainData[:, 1], c=trainLabels[:, 1], edgecolor='w', linewidth=0.25)
plt.plot(sess.run(plot_x), sess.run(plot_y), color='k', linewidth=1.5 )
plt.xlim([-0.2, 1.1]); plt.ylim([-0.4, 1.0]);
plt.title('on Training data')
Evaluate our test set:
Here its not surprising that our model is very well and fit the our training data. But, what about our testing data, our model is never seen before right..? so lets evaluate the our test data with softmax function and see the accuracy and decesion boundary.
def plotDecisionBoundary(X, Y, predFunc):
# find out the plot boundary
mins = np.amin(X, 0)
mins = mins - 0.1 * np.abs(mins)
maxs = np.amax(X, 0)
maxs = maxs + 0.1 * maxs
xs,ys = np.meshgrid(np.linspace(mins[0,0],maxs[0,0], 300), np.linspace(mins[0,1], maxs[0,1], 300))
Z = predFunc(np.c_[xs.flatten(), ys.flatten()]);
Z = Z.reshape(xs.shape)
plt.contourf(xs, ys, Z, cmap=plt.cm.Paired)
plt.scatter(X[:, 0], X[:, 1], c=Y[:,1], s= 50, edge color='w', line width=0.25)
plt.title('on Test set')
plt.show()
And the predictions are,
predClass = tf.argmax(y, 1)
correctPred = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correctPred, "float"))
print('Accuracy: ', accuracy.eval(session=sess, feed_dict={x: testData, y_: testLabels}))
predFunc = lambda X: predClass.eval(session=sess, feed_dict={x: X})
plotDecisionBoundary(testData, testLabels, predFunc)Wahhh… you can observe the plot the softmax function will gives 100% accracy on test set.but,practically in real world the data is in non-seperable form , more complicated and thousends of dimensions.
So they need special processing techniques to work with. They are
Preprocessing data , sophisticated cost functions and regularization methods to control the overfitting models.


