Blog for Coders

Institute of Coding

Introduction to TensorFlow 2.0 and Keras with Face Recognition

In this notebook, we will continue on our Face Recognition with SVM notebook and replicate the work has been done using the Google's TensorFlow 2.0 library. We will create a Convolutional Neural Network model for face recognition, train it on the same data we used earlier and test it against the test set.

If you don't have a decent hardware, you can run this notebook inside the Google Colab.

When running in the Colab, we need to switch to Tensorflow 2.0+. This can be done easily using the magic function: %tensorflow_version 2.x.

If you run this code on your local machine, you can skip or remove the following cell.

In [1]:
import sys
IN_COLAB = 'google.colab' in sys.modules
if IN_COLAB:
    %tensorflow_version 2.x
    import tensorflow as tf
    print(f"Running in Colab with Tensorflow version: {tf.__version__}")
TensorFlow 2.x selected.
Running in Colab with Tensorflow version: 2.0.0

Let's import the face dataset we previously used using the scikit-learn. This is exactly the same and nothing changed here.

In [ ]:
from sklearn import datasets
data = datasets.fetch_olivetti_faces()

Then, we need to prepare our data for deep learning model. Colored images,most of the time, are represented with 3 different matrixes consist of different colours/channel information. These colours are Red, Green, and Blue, or RGB in short. Therefore, an image with the size of 128 x 128 can be represented as a 128x128x3, or 3x128x128. The number of channels/colours can be represented either as the first or last dimension.

Since all the face images are grayscale, we only have one channel/colour which is black. The numbers inside this matrix indicates how bright or how dark is each pixel. The closer to 1 the brigher the pixel, and the closer to 0, the darker the pixel.

In [ ]:
import tensorflow as tf
import numpy as np
import cv2

# rename dataset for easy access
X = data["images"]
y = data["target"]

num_class = len(set(y)) # number of different people in the dataset
X = np.expand_dims(X, -1) # add an axis for channel information

We can identify people inside this data with the y variable. y is an ordinally encoded variable which every people is represented with a unique number starting from 0.

For this deep learning model, we need to convert these y variable into a vector which will be unique for each person. This vector will be made of zeros and only one 1 value, representing the class/person. This is called One-hot Encoding.

Suppose that we have 3 people in this dataset and we represent those people as 1, 2, and 3. When converting these into one-hot encoded vectors 1 becomes [1 0 0], 2 becomes [0 1 0], and 3 becomes [0 0 1]. The order of the number becomes 1 while rest of the vector is filled with zero.

How can we convert our y variable to one-hot encoded vectors? Well, we can use Tensorflow's one_hot function as follows:

In [ ]:
y = tf.one_hot(y, depth=num_class).numpy() # convert y to one hot vectors

In order to have reproducible results, we can fix the seed value for random number generators of both numpy and tensorflow libraries.

In [ ]:
np.random.seed(1)
tf.random.set_seed(2)

Then we can split our data and labels for training and testing. This is exactly the same procedure as the previous SVM notebook.

In [ ]:
import numpy as np
from sklearn.model_selection import StratifiedShuffleSplit

# split data randomly into train & test sets by preserving train/test ratio across classes
sss = StratifiedShuffleSplit(n_splits=1, test_size=0.1, random_state=0)

# get the train and test indexes
train_index, test_index = next(sss.split(X, y))

# split X and y into train & test sets
X_train, X_test = X[train_index], X[test_index]
y_train, y_test = y[train_index], y[test_index]

When training our deep learning model, we can see how it performs during the training. Since the training will be used to create deep learning model and test set will be used to calculate how the final model performs, we need additional set which will not be a part of either training or test sets. That dataset is called as validation set. We can follow above procedure to split training set into training and validation set.

In [7]:
# split training set into training and validation sets
sss = StratifiedShuffleSplit(n_splits=1, test_size=0.15, random_state=0)

train_index, val_index = next(sss.split(X_train, y_train))
X_train, X_val = X[train_index], X[val_index]
y_train, y_val = y[train_index], y[val_index]

# Print statistics about it
print(f"Train data size: {len(y_train)}")
print(f"Validation data size: {len(y_val)}")
print(f"Test data size : {len(y_test)}")
Train data size: 306
Validation data size: 54
Test data size : 40

Convolutional Neural Network Model

We have our data ready, let's create a deep learning model. To define a model, we'll use the Keras library comes with TensorFlow as it provides easy-to-use API for defining deep learning models.

We'll use several deep learning layers to define a convolutional neural network. Let's peek Keras' documentation to find out what these layers do.

Dense

A densely-connected NN layer. All the neurons/elements are connected to all the neurons/elements in the previous and next layer.

Dropout

Drops the connection randomly in order to prevent memorizing/over-fitting the data.

Convolution (Covn2D)

Learns spatially-correlated features. For low level features, it can be edges, corners, etc. For high level features, it can be eyes, mouth or nose of a human.

Sample Convolution operation.

Ref: Narges Khatami, Wikipedia

Pooling (MaxPooling2D)

Combines several neurons into one neuron and reduces the dimension. There are several types of pooling layers: Max pooling selects the neuron with the maximum value, Average pooling calculates the average value of all neurons, Min Pooling selects the the neuron with the minimum value. You can find a sample MaxPooling operation below:

Sample Max Pooling operation

Ref: Aphex34, Wikipedia

Flatten

Gather all the matrix/tensor elements into a vector.

In [ ]:
from tensorflow.keras.layers import Dense, Dropout, Conv2D, MaxPooling2D, Flatten
from tensorflow.keras import Sequential


model = Sequential()
model.add(Conv2D(16, (2, 2), activation='relu', input_shape=X_train[0].shape))
model.add(Conv2D(16, (2, 2), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Conv2D(32, (2, 2), activation='relu'))
model.add(Conv2D(16, (2, 2), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))

We have the model and we need an output from that model. For output, we can use Dense layer with a softmax activation. The length of output vector will be the same as one hot vectors. Then to have a outline of how model looks like, we can use model.summary() and print the overall structure of CNN model.

In [9]:
model.add(Dense(num_class, activation='softmax'))
model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d (Conv2D)              (None, 63, 63, 16)        80        
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 62, 62, 16)        1040      
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 31, 31, 16)        0         
_________________________________________________________________
dropout (Dropout)            (None, 31, 31, 16)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 30, 30, 32)        2080      
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 29, 29, 16)        2064      
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 14, 14, 16)        0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 14, 14, 16)        0         
_________________________________________________________________
flatten (Flatten)            (None, 3136)              0         
_________________________________________________________________
dense (Dense)                (None, 128)               401536    
_________________________________________________________________
dropout_2 (Dropout)          (None, 128)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 40)                5160      
=================================================================
Total params: 411,960
Trainable params: 411,960
Non-trainable params: 0
_________________________________________________________________

Since we have model ready, we need to find out how can we train that model. Unlike the previous machine learning notebook, we need to define every parameter and option when training deep learning models.

To train a model, we need an optimizer, and a cost/lost function. Cost function defines how bad the model works. It gets smaller as it makes better predictions. Our aim is to reduce the cost and that's where optimizer help us.

There are different cost functions for different purposes. In this face recognition problem, we try to tackle this as a classification problem and we have more than 2 classes. In that multi-class classification cases, we can use categorical cross entropy as our cost function.

Like cost functions, there are many optimizers, too. We'll use the Adam optimizer for training our model. We also define metrics to see the performance of our model. To see how accurate our predictions are, we cam pass accuracy as a metric.

We'll put all these together into model.compile function where cost function is indicated as loss and the rest remains the same.

In [ ]:
model.compile(optimizer="RMSProp", loss="categorical_crossentropy", metrics=['accuracy'])

Training

Let's train the model and see how that works. To do that, we can call model.fit function with train data and labels, namely X_train and y_train, respectively. The training data will be iterated over and over again for epochs times where we call each run as epoch.

The model will not iterate all of the training data at once. Therefore it will be feeded in smaller chunks, defined as batch_size. In order to see how the training phase goes, we can specify validation data we prepared above as validation_data parameter to model.fit function. Calling this function will take time depending on how powerful your machine is. If you're running this on Colab, you can leverage hardware acceleration for faster training times from the Runtime - Change Runtime menu. Select either GPU or TPU enabled runtime and run all the cells again.

In [ ]:
model.fit(X_train, y_train, epochs=50, batch_size=32, validation_data=(X_val, y_val));

Now we have trained our model. We know how it behaves on training and validation data. At that point, we have a ready-to-use model. We can deploy this model to a real world application. Before that, we need to test how it performs on the test set. The following procedure will predict the test set classes (person in the dataset). Then compares predictions with the ground truth values. After that we have outputs for each metric we compiled the model with. (Note that, loss comes as a built-in metric)

To see accuracy results, we can call model.evaluate function and provide test data and ground truth labels, namely y_test. Then we can print the accuracy as percentage by multiplyting the accuracy value by 100.

In [12]:
loss, accuracy = model.evaluate(X_test, y_test, verbose=False)
print(f"Accuracy: {accuracy*100:.2f}%")
Accuracy: 90.00%