/    /  TensorFlow Linear Classifier

Linear Classifier:

The most common supervised learning tasks are regression and classification. By using regression predicts a value and using classification to predict the class.

The label has only two classes its called as binary classifier

The label has more then two classes called as multiclass classifier.

 

Linear Classifier with TensorFlow:

Lets say our data having input x is in m*n and labels y is in n*1. And the cross entropy loss between the actual y and predicted y is defined as;

Tensorflow 15 (i2tutorials)

 

Our goal is reduce th ecross-entropy loss L. softmax is find the distribution of all probabilities of the correct class i,e. p=[1,0,0,…..0] this distribution is called as one-hot vector.

 

Importing libraries:

import numpy as np



import tensorflow as tf



import csv



import matplotlib.pyplot as plt

 

Global variables:

Nu_LABELS = 2 # y is 1 and 0. Binary classifier



BATCH_SIZE = 100



NUM_EPOCHS = 100

 

Let’s read the data from csv:

 

def readDataFromCSV(fileName):



  labels = []



  features = []



  with open(fileName) as csvFile:



    readCsv = csv.reader(csvFile, delimiter=',')



    for row in readCsv:



      labels.append(row[0])



      features.append(row[1:])



  features = np.matrix(features).astype(np.float32)



  labels = np.array(labels).astype(np.uint8)



  labels = (np.arange(NUM_LABELS) == labels[:, None]).astype(np.float32)



  return features, labels

 

Our data consists two classes and two features. Let’s call the features x1 and x2. Then the format is y,  x1,  x2. In each line of the training data file, first one is the class y (value 1 or 0) which we call labels, second number is the feature x1 and the third one is x2. As an example 0, 0.14141324833, 0.2430820358 . The  labels are returned as one-hot vector, which is, all values in a row is 0 except the I th class which is 1.

 

Now, lets call the function to read the data:

 

trainData, trainLabels=readDataFromCSV('sudata/linear_data_train.csv') # 1000x2, 1000x2 (one hot)



testData, testLabels = readDataFromCSV('sudata/linear_data_eval.csv') #



trainSize, numFeatures = np.shape(trainData)



print(trainSize, numFeatures)



print(trainLabels.shape)

 

visualization of our data is:

plt.scatter(trainData[:, 0], trainData[:, 1], c=trainLabels[:, 1], edgecolor='w', linewidth=0.25)

Tensorflow 16 (i2tutorials)

If you observe the visualiztion it clearly shows the data is fairly linearly seperable. Now we want build a classifier fot that separate the clusters.for this we need to train our model.

 

Train the model:

To  train our model, first we have to choose cost function or loss function. The basic thing of our model is to minimize the loss function. Here we are using softmax cross-entropy loss function. It is already inbuilt in Tensorflow, and directly invoked by the function.

tf.nn.softmax(tf.matmul(X, W) + b)
x is input matrix ,

w is weights ,

b is bias .

this is very useful to understand the working of our algorithm.here we consider

nF = no.of features , nE = no.of examples and

nC = no.of classes. Then

 

sizeof(X) = nE x nF sizeof(W) = nF x nC

sizeof(b) = nC x1

sizeof(y) = nE x 1

now we are going to train with linear model.

 

x = tf.placeholder("float", shape=[None, numFeatures]) # 1000x2, data



y_ = tf.placeholder("float", shape=[None, NUM_LABELS]) # 1000x2, labels and true distribution



W = tf.Variable(tf.zeros([numFeatures, NUM_LABELS])) # 2x2, weights



b = tf.Variable(tf.zeros([NUM_LABELS])) # 2x1, biases



y = tf.nn.softmax(tf.matmul(x, W) + b)



crossEntropy = - tf.reduce_sum(y_ * tf.log(y))



trainStep = tf.train.GradientDescentOptimizer(0.1).minimize(crossEntropy)



sess = tf.Session()



init = tf.global_variables_initializer()



sess.run(init)



for step in xrange(NUM_EPOCHS * trainSize // BATCH_SIZE):



  offset = (step * BATCH_SIZE) % trainSize



  batchData = trainData[offset:(offset+BATCH_SIZE), :] # e.g. from 100 to 200, all cols/features



  batchLabels= trainLabels[offset:(offset+BATCH_SIZE)]



  sess.run(trainStep, feed_dict={x: batchData, y_: batchLabels})



print('Weight matrix W: '); print(sess.run(W))



print('Bias vector b:'); print(sess.run(b))

 

After completion of training , the weights and biases are like this. Here  x is the input and y is predicted.

Weight matrix W: [[-10.72918034  10.7291851 ]  [-11.53842258  11.53842354]] Bias vector b: [ 8.89335346 -8.89334583]

Now, we can also train the model with minimum squared error (MSE) loss. Because how the loss functions affects the prediction performance. But probably will not see a  significant difference for this linear set of datasets.  In

this case we need non-linear model for prediction, In this non-linear models will notice some difference in the performance for different cost functions. In fact, sometimes, we have to select a specific cost function over the others for some specific datasets.

 

y = tf.sigmoid(tf.matmul(x, W) + b)



loss = tf.reduce_sum(tf.square(y - y_))



optimizer = tf.train.GradientDescentOptimizer(0.01)



trainStep = optimizer.minimize(loss)

Here the learning rate is a hypermeter which we are going to tune. The standard value of the learning rate is 0.01. learning rate is a step size to search for global minima. The lower learning rate will takes long time to settle for lowest point and it reaches slowly to global minima.but the higher learning rate which overshoots the our solution.

 

Plot the decision boundary:

First, we consider the weights and biases and the our classifier should find a separating hyperplane between these two classes, For now, consider only the w1=−10.72918034,w2=−11.53842258 and the corresponding b=8.89335346. once reffer the w1 is parameter of x1 amd w2 is the parameter of y

(parameter x2). Now the hyperplane satisfy the equation Wx+b=0 it is in the form of y=mx+b.

 

W x + b = 0 w1 x + w2 y + b = 0 y = (-w1/w2) x - b/w2

 

Now, plot the decesion boundary,

 

plot_x  = [np.min(trainData[:, 0])-0.10, np.max(trainData[:, 1])+0.250]# 2x1, extend the line to left and right



plot_y = -1/W[1, 0] * (W[0, 0] * plot_x + b[0])# 2x1, Similar line can be found by using the second cols of W and b



plot_x = tf.cast(plot_x, tf.float32) # convert to tensor



print(sess.run(plot_x))



print(sess.run(plot_y))



plt.scatter(trainData[:, 0], trainData[:, 1], c=trainLabels[:, 1], edgecolor='w', linewidth=0.25)



plt.plot(sess.run(plot_x), sess.run(plot_y), color='k', linewidth=1.5 )



plt.xlim([-0.2, 1.1]); plt.ylim([-0.4, 1.0]);



plt.title('on Training data')

 

Tensorflow 17 (i2tutorials)

 

Evaluate our test set:

Here its not surprising that our model is very well and fit the our training data. But, what about our testing data, our model is never seen before right..? so lets evaluate the our test data with softmax function and see the accuracy and decesion boundary.

 

def plotDecisionBoundary(X, Y, predFunc):



   # find out the plot boundary



   mins = np.amin(X, 0)



   mins = mins - 0.1 * np.abs(mins)



   maxs = np.amax(X, 0)



   maxs = maxs + 0.1 * maxs



   xs,ys = np.meshgrid(np.linspace(mins[0,0],maxs[0,0], 300), np.linspace(mins[0,1], maxs[0,1], 300))



   Z = predFunc(np.c_[xs.flatten(), ys.flatten()]);



   Z = Z.reshape(xs.shape)



   plt.contourf(xs, ys, Z, cmap=plt.cm.Paired)



   plt.scatter(X[:, 0], X[:, 1], c=Y[:,1], s= 50, edge color='w', line width=0.25)



   plt.title('on Test set')



   plt.show()

 

And the predictions are,

predClass = tf.argmax(y, 1)



correctPred = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))



accuracy = tf.reduce_mean(tf.cast(correctPred, "float"))



print('Accuracy: ', accuracy.eval(session=sess, feed_dict={x: testData, y_: testLabels}))



predFunc = lambda X: predClass.eval(session=sess, feed_dict={x: X})



plotDecisionBoundary(testData, testLabels, predFunc)

Tensorflow 18 (i2tutorials)

Wahhh… you can observe the plot the softmax function will gives 100% accracy on test set.but,practically in real world the data is in    non-seperable form , more complicated and thousends of dimensions.

So they need special processing techniques to work with. They are

Preprocessing data , sophisticated cost functions and regularization methods to control the overfitting models.