Tutorial: Using keras for deep learning (And speeding it up with a GPU).

This is a tutorial on how to use deep learning to solve the popular MNIST classification problem.

There is not a load of innovation happening here; the take away are the pre-processing steps and the tuning of the training process. I have done this with two objectives:

Firstly, to get to speed with existing libraries (i.e. tensorflow and keras).

I ended up purchasing a GTX 1070 that has 1920 CUDA cores as I wanted to get back in touch with the practical aspects of using it to reduce tranining time.

Now, what’s better than putting all the models I build into a structured form that I can publish as a blog?! So here is a notebook talking about a model I built on Kaggle and got to the Top-10 leader board (as of 09.30.2016)

Setting up the system

I used this link (primarily) to decide on the GPU I’d buy (and of course based on the pricing).

http://timdettmers.com/2014/08/14/which-gpu-for-deep-learning/

Hardware Config:

$ uname -a
Linux xxxx 4.4.0-36-generic #55-Ubuntu SMP Thu Aug 11 18:01:55 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 16.04.1 LTS
Release:        16.04
Codename:       xenial

$ cat /proc/meminfo
MemTotal:       32884660 kB
MemFree:        24282104 kB
MemAvailable:   27538148 kB

$ cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 60
model name      : Intel(R) Core(TM) i5-4460  CPU @ 3.20GHz

$ nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.44                 Driver Version: 367.44                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1070    Off  | 0000:01:00.0      On |                  N/A |
|  0%   44C    P2    39W / 180W |   7909MiB /  8107MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0      4216    G   /usr/lib/xorg/Xorg                              19MiB |
|    0      5135    G   /usr/lib/xorg/Xorg                             125MiB |
|    0      5464    G   /usr/bin/gnome-shell                           180MiB |
|    0      7207    C   /usr/bin/python                               7579MiB |
+-----------------------------------------------------------------------------+

To run this notebook, you you will need the following dependencies installed; in brackets, I’ve given the version I’m using:

Libraries + Software needed:

  • scipy (0.17.0)
  • numpy (1.11.1)
  • sklearn (0.17)
  • keras (1.1.0)
  • pandas (0.18.0)
  • matplotlib (1.5.1)
  • tensorflow (0.10.0)

Also, I’ve installed the following additional packages:

  • cuDNN (5.1) using the .run file (you can get this after registering for the NVIDIA developer program)
  • CUDA studio 7.5 (I skipped installing drivers; instead, I installed them seperately with the packages given below).
  • nvidia-367 libcuda1-367 nvidia-modprobe

Note that I had installed tensorflow (with GPU support) even before installing the graphics card.

Building a ConvNet for the dataset “First steps with Julia”

I picked up some ideas from http://ankivil.com/kaggle-first-steps-with-julia-chars74k-first-place-using-convolutional-neural-networks/

First, let’s import all the stuff we need.

In [18]:
from os import listdir, makedirs
from shutil import rmtree

from functools import wraps
from time import clock

from os.path import join, exists
from fnmatch import fnmatch

from scipy.misc import imread, imresize, imsave
from pandas import read_csv, DataFrame
from numpy import array, zeros
import pickle

from numpy import vstack, ones, stack, concatenate
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.cross_validation import train_test_split
from random import shuffle

import keras as kr
from keras.preprocessing.image import ImageDataGenerator
from keras.callbacks import ModelCheckpoint

import matplotlib.pyplot as plt
import matplotlib.cm as cm

Next, let’s define a few housekeeping functions.

The timed_function can be used as a decorator during the definition of any fucntion and use to to measure the time taken by the function during execution. I was just curious about the time taken by some of the operations (like rescaling, etc).

The file_iterator behaves just like glob.glob, but is case-insensitive. I had to do this because the image files had .Bmp as their extension and somehow, my eyes twiched when I used it in glob.glob… 🙂

rescale_images just rescales all image to a given size. If it is a grayscale image, it also adds three channels, so that all output images have the same output size.

The load_dataset function loads all the images into a pandas dataframe. Strictly speaking, I could have done away with pandas, but it just made my life easier (as you’ll see in a couple of steps).

In [3]:
def timed_function(func):
    def _decorator(*args, **kwargs):
        start = clock()
        response = func(*args, **kwargs)
        print "--"
        print "Time taken: ", clock() - start, "seconds"
        print "--"
        return response
    return wraps(func)(_decorator)

def file_iterator(d, t):
    for f in listdir(d):
        if fnmatch(f.lower(), t):
            yield f, join(d, f)
            
@timed_function            
def rescale_images(input_dir, output_dir, dim=28, redo=False):
    print_cnt = 0
    for f, full_path_f in file_iterator(input_dir, "*.bmp"):
        if print_cnt%1000==0:
            print "Rescaling...", f, " : ", print_cnt
        print_cnt += 1
        
        output_filename = join(output_dir, f)
        if exists(output_filename) and not redo:
            continue
        img = imread(full_path_f)
        img = imresize(img, (dim, dim, 3), interp='bilinear')
        if len(img.shape)==2:
            img = stack([img]*3, axis=2)
        imsave(output_filename, img)
        
    print "Done."

@timed_function
def load_dataset(input_dir):
    training_data = []
    print_cnt = 0
    for f, full_path_f in file_iterator(input_dir, "*.bmp"):
        img_id = int(f.split(".")[0])
        img = imread(full_path_f)
        img = img/255.0
        #print imread(full_path_f).shape
        training_data.append({"data": img, "ID": img_id})
        if print_cnt%1000==0:
            print "Loading...", f, ": ", print_cnt
        print_cnt += 1
    training_data = DataFrame(training_data)
    training_data.set_index("ID")
    return training_data

Defining all constants here. You’ll have to probably modify it to your needs.

In [4]:
TRAINING_IMAGES_DIR = "/opt/data/firstStepsWithJulia/train"
TRAINING_LABELS_FILE = "/opt/data/firstStepsWithJulia/trainLabels.csv"
WORKING_DIR = "/opt/tmp/firstStepsWithJulia"
MODEL_NAME = "cnnv4"

TESTING_IMAGES_DIR = "/opt/data/firstStepsWithJulia/test"

RESCALE_DIM = 32
VALIDATION_SIZE = 0.1
NO_RUNS = 1
EPOCS = 50
SAMPLES_PER_EPOCH_FACTOR = 20
BATCH_SIZE = 128

RESAMPLED_OUTPUT_DIR_TRAIN = join(WORKING_DIR, "train/resampled")
MODEL_DIR = join(WORKING_DIR,"models",MODEL_NAME)

RESAMPLED_OUTPUT_DIR_TEST = join(WORKING_DIR, "test/resampled")

Creating necessary directories

In [5]:
if not exists(WORKING_DIR):
    makedirs(WORKING_DIR)

if exists(RESAMPLED_OUTPUT_DIR_TRAIN):
    rmtree(RESAMPLED_OUTPUT_DIR_TRAIN)

if exists(RESAMPLED_OUTPUT_DIR_TEST):
    rmtree(RESAMPLED_OUTPUT_DIR_TEST)
    
if not exists(MODEL_DIR):
    makedirs(MODEL_DIR)
#    rmtree(MODEL_DIR)

makedirs(RESAMPLED_OUTPUT_DIR_TRAIN)
makedirs(RESAMPLED_OUTPUT_DIR_TEST)

Next, we rescale images to the desired size…

In [6]:
rescale_images(TRAINING_IMAGES_DIR, RESAMPLED_OUTPUT_DIR_TRAIN, dim=RESCALE_DIM, redo=True)
Rescaling... 5632.Bmp  :  0
Rescaling... 1852.Bmp  :  1000
Rescaling... 3748.Bmp  :  2000
Rescaling... 1430.Bmp  :  3000
Rescaling... 5430.Bmp  :  4000
Rescaling... 2286.Bmp  :  5000
Rescaling... 5205.Bmp  :  6000
Done.
--
Time taken:  4.954766 seconds
--

Now, we load the training data and the labels. The labels are merged with the training data.

In [7]:
training_data = load_dataset(RESAMPLED_OUTPUT_DIR_TRAIN)
training_labels = read_csv(TRAINING_LABELS_FILE, delimiter=",")
training_data = training_data.merge(training_labels)
nb_classes = len(training_data["Class"].unique())
print training_data.head()
Loading... 5632.Bmp :  0
Loading... 1852.Bmp :  1000
Loading... 3748.Bmp :  2000
Loading... 1430.Bmp :  3000
Loading... 5430.Bmp :  4000
Loading... 2286.Bmp :  5000
Loading... 5205.Bmp :  6000
--
Time taken:  0.642379 seconds
--
     ID                                               data Class
0  5632  [[[0.227450980392, 0.235294117647, 0.313725490...     O
1  2916  [[[0.164705882353, 0.164705882353, 0.172549019...     I
2  5753  [[[0.105882352941, 0.109803921569, 0.082352941...     A
3  4546  [[[0.301960784314, 0.396078431373, 0.607843137...     R
4  5534  [[[0.988235294118, 0.905882352941, 0.258823529...     S

Now let’s see how the images look; also, this helps verify (albiet roughly) if the labels attached to the images are correct.

In [8]:
%matplotlib inline

def display_image(data):
    
    plt.axis('off')
    plt.imshow(data, cmap=cm.seismic)
    plt.show()

cv = CountVectorizer(analyzer='char', lowercase=False)
encoded_labels = cv.fit_transform(training_data["Class"]).todense()
        
for ii in xrange(NO_RUNS):
    import numpy as np
    training_data_values = np.array(list(training_data["data"].values))
    for ii in xrange(3):
        display_image(training_data_values[ii*4])
        print "Label: ", cv.inverse_transform(encoded_labels[ii*4])[0][0]
Label:  O
Label:  S
Label:  E

I have created a wrapper for a ConvNet defined with keras, so that I don’t keep writing the same sentences over and over again. I will abstract it better as I go through more data image classification sets…

The cnn_model function defines the model for this particular dataset.

The experiment function basically re-organizes the training and test data (as the case may be) and feeds it into the cnn model and calls the train() or predict() functions.

In [15]:
class SimpleImageSequentialNN:
    def __init__(self, data_generator, image_size, model_path, model_name):
        self.data_generator = data_generator
        self.model = kr.models.Sequential()
        self.image_size = image_size
        self.input_layer_defined = False
        self.model_path = model_path
        self.model_name = model_name
        
    def compile(self, loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"]):
        self.model.compile(loss=loss, optimizer=optimizer, metrics=metrics)
    
    def train(self, X_train, y_train, X_validation, y_validation, batch_size, epochs, samples_per_epoch_factor=20):
        saveBestModel = ModelCheckpoint(join(self.model_path, self.model_name), monitor="val_acc", verbose=0, save_best_only=True)
        self.model.fit_generator(self.data_generator.flow(X_train, y_train, batch_size=batch_size),
                                samples_per_epoch=len(X_train)*samples_per_epoch_factor,
                                nb_epoch=epochs,
                                validation_data=(X_validation, y_validation),
                                callbacks=[saveBestModel],
                                verbose = 1)
    
    def addLayer(self, name, *args, **kwargs):
        method = "layer_%s" %(name)
        if method in dir(self):
            method = getattr(self, method)
            layer = method(*args, **kwargs)
            if not type(layer) is list:
                layer = [layer]
            for l in layer:
                self.model.add(l)
        print "Added Layer to give output:", self.model.output_shape
        
    def layer_flatten(self):
        return kr.layers.core.Flatten()
    
    def layer_dense(self, neurons, activation="relu", dropout=0.5):
        layers = []
        layers.append(kr.layers.core.Dense(neurons, init="he_normal", activation=activation))
        if dropout>0:
            layers.append(kr.layers.core.Dropout(dropout))
        return layers
        

    def layer_pooling(self, pool_size=2):
        return kr.layers.convolutional.MaxPooling2D(pool_size=(pool_size, pool_size))
    
    def layer_conv2D(self, maps, size, activation="relu"):
        input_shape = None
        if not self.input_layer_defined:
            input_shape = self.image_size
            self.input_layer_defined = True
        if input_shape is not None:
            return kr.layers.convolutional.Convolution2D(maps, size, size, border_mode="same", init="he_normal", activation=activation, input_shape=input_shape, dim_ordering="tf")
        else:
            return kr.layers.convolutional.Convolution2D(maps, size, size, border_mode="same", init="he_normal", activation=activation, dim_ordering="tf")
    
    def load(self):
        self.model = kr.models.load_model(join(self.model_path, self.model_name))
    
    def predict(self, X_test):
        return self.model.predict_classes(X_test)
    
def cnn_model(X_train, y_train, X_validation, y_validation, data_generator, nb_classes, model_path, model_name):
    model = SimpleImageSequentialNN( data_generator, (RESCALE_DIM, RESCALE_DIM, X_train.shape[-1]), model_path, model_name)
    model.addLayer("conv2D", 64, 3)
    model.addLayer("conv2D", 64, 3)
    model.addLayer("conv2D", 64, 3)
    model.addLayer("pooling", 2)
    model.addLayer("conv2D", 128, 3)
    model.addLayer("conv2D", 128, 3)
    model.addLayer("pooling", 2)
    model.addLayer("conv2D", 256, 3)
    model.addLayer("conv2D", 256, 3)
    model.addLayer("pooling", 2)
    model.addLayer("flatten")
    model.addLayer("dense", 2048, dropout=0.75)
    model.addLayer("dense", 4096, dropout=0.75)
    model.addLayer("dense", nb_classes, activation="softmax", dropout=0)
    return model

def experiment(data, encoded_labels, nb_classes, model_name, predict_only=False):
    encoder_filename = join(MODEL_DIR, "%s.encoder" %(model_name))
    data_values = array(list(data["data"].values))
    
    print "Data size", data_values.shape
    
    if not predict_only:
        cv = CountVectorizer(analyzer='char', lowercase=False)
        encoded_labels = cv.fit_transform(data["Class"]).todense()
        pickle.dump(cv, open(encoder_filename,'w'))
    
        X_train, X_validation, y_train, y_validation = train_test_split(data_values, encoded_labels, test_size=VALIDATION_SIZE)
    else:
        X_train = data_values
        X_validation = None
        y_train = None
        y_validation = None
    data_generator = ImageDataGenerator(
        rotation_range = 30,
        width_shift_range = 0.2,
        height_shift_range = 0.2,
        shear_range = 0.1,
        zoom_range = 0.4,                    
        channel_shift_range = 0.1, dim_ordering='tf')
    
    model = cnn_model(X_train, y_train, X_validation, y_validation, data_generator, nb_classes, MODEL_DIR, model_name)
    if not predict_only:
        optimizer = kr.optimizers.Adam(lr=1e-4)
        model.compile(optimizer=optimizer)
        print X_train.shape, y_train.shape, X_validation.shape, y_validation.shape
        model.train(X_train, y_train, X_validation, y_validation, BATCH_SIZE, EPOCS, SAMPLES_PER_EPOCH_FACTOR)
    else:
        model.load()
        predictions = model.predict(data_values)
        predictions_one_hot = zeros((predictions.shape[0], nb_classes))
        for ii in xrange(predictions.shape[0]):
            predictions_one_hot[ii, predictions[ii]] = 1
        cv = pickle.load(open(encoder_filename))
        return cv.inverse_transform(predictions_one_hot)
            
    

Now, let’s run the experiment.

I’m not showing the whole output as you can possibly guess what happens.

In [19]:
for ii in xrange(NO_RUNS):
    experiment(training_data, encoded_labels, nb_classes, "round_%s" %(ii))
    
Data size (6283, 32, 32, 3)
Added Layer to give output: (None, 32, 32, 64)
Added Layer to give output: (None, 32, 32, 64)
Added Layer to give output: (None, 32, 32, 64)
Added Layer to give output: (None, 16, 16, 64)
Added Layer to give output: (None, 16, 16, 128)
Added Layer to give output: (None, 16, 16, 128)
Added Layer to give output: (None, 16, 16, 128)
Added Layer to give output: (None, 8, 8, 128)
Added Layer to give output: (None, 8, 8, 256)
Added Layer to give output: (None, 8, 8, 256)
Added Layer to give output: (None, 4, 4, 256)
Added Layer to give output: (None, 4096)
Added Layer to give output: (None, 4096)
Added Layer to give output: (None, 2048)
Added Layer to give output: (None, 62)
(5654, 32, 32, 3) (5654, 62) (629, 32, 32, 3) (629, 62)
Epoch 1/50
  1536/113080 [..............................] - ETA: 134s - loss: 4.1249 - acc: 0.0599

Now that the model(s) has/have been built, let’s prepare the test set and the the submission 🙂

First, let’s rescale the images, just like we did with the training set.

In [11]:
rescale_images(TESTING_IMAGES_DIR, RESAMPLED_OUTPUT_DIR_TEST, dim=RESCALE_DIM, redo=True)
Rescaling... 9772.Bmp  :  0
Rescaling... 9008.Bmp  :  1000
Rescaling... 10541.Bmp  :  2000
Rescaling... 8694.Bmp  :  3000
Rescaling... 9206.Bmp  :  4000
Rescaling... 11601.Bmp  :  5000
Rescaling... 8245.Bmp  :  6000
Done.
--
Time taken:  5.227374 seconds
--

Next, we load the test set.

In [12]:
testing_data = load_dataset(RESAMPLED_OUTPUT_DIR_TEST)
print testing_data.head()
Loading... 9772.Bmp :  0
Loading... 9008.Bmp :  1000
Loading... 10541.Bmp :  2000
Loading... 8694.Bmp :  3000
Loading... 9206.Bmp :  4000
Loading... 11601.Bmp :  5000
Loading... 8245.Bmp :  6000
--
Time taken:  0.606241 seconds
--
      ID                                               data
0   9772  [[[0.749019607843, 0.211764705882, 0.239215686...
1   6609  [[[0.81568627451, 0.811764705882, 0.8156862745...
2  11672  [[[0.698039215686, 0.674509803922, 0.674509803...
3  11483  [[[0.862745098039, 0.894117647059, 0.803921568...
4  12424  [[[0.729411764706, 0.56862745098, 0.0352941176...

… and see a few examples …

In [13]:
%matplotlib inline


import numpy as np
testing_data_values = np.array(list(testing_data["data"].values))
for ii in xrange(3):
    display_image(testing_data_values[ii*4])
    

Lastly, we compute predictions and store it into a CSV file.

In [16]:
for ii in xrange(NO_RUNS):
    predictions = experiment(testing_data, encoded_labels, nb_classes, "round_%s" %(ii), predict_only=True)
    predictions = [p[0] for p in predictions]
    output_file = join(MODEL_DIR,"test_output_%s.csv" %(ii))
    testing_data["predictions"] = predictions
    result = testing_data[["ID", "predictions"]]
    result.rename(columns={"predictions": "Class"}, inplace=True)
    result.to_csv(output_file, index=False, delimiter=",")
    print "Result written to ", output_file
Data size (6220, 32, 32, 3)
Added Layer to give output: (None, 32, 32, 64)
Added Layer to give output: (None, 32, 32, 64)
Added Layer to give output: (None, 32, 32, 64)
Added Layer to give output: (None, 16, 16, 64)
Added Layer to give output: (None, 16, 16, 128)
Added Layer to give output: (None, 16, 16, 128)
Added Layer to give output: (None, 16, 16, 128)
Added Layer to give output: (None, 8, 8, 128)
Added Layer to give output: (None, 8, 8, 256)
Added Layer to give output: (None, 8, 8, 256)
Added Layer to give output: (None, 4, 4, 256)
Added Layer to give output: (None, 4096)
Added Layer to give output: (None, 4096)
Added Layer to give output: (None, 2048)
Added Layer to give output: (None, 62)
6220/6220 [==============================] - 1s     
Result written to  /opt/tmp/firstStepsWithJulia/models/cnnv4/test_output_0.csv
Advertisements
This entry was posted in Uncategorized and tagged , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s