\( \newcommand{\matr}[1] {\mathbf{#1}} \newcommand{\vertbar} {\rule[-1ex]{0.5pt}{2.5ex}} \newcommand{\horzbar} {\rule[.5ex]{2.5ex}{0.5pt}} \newcommand{\E} {\mathrm{E}} \)
deepdream of
          a sidewalk

Experiment 1.3.1

ResNet transfer learning for the orange-brown-neither dataset.

This started as a copy of 1.3. Code from 1.3 is moved into a Python package and work continues from there. This was done as Jupyterlab doesn’t make it easy to share or test code.

This experiment is to see if we can retrain some of the top levels of a pre-trained resnet model, pretrained for imagenet.


First, we need to organise our orange-brown dataset in a way that Pytorch can consume it. We will be trying to train it to distinguish orange and brown colors.

Most of the work for this has been done by the nncolor.data module. Below we just test it out to make sure it’s working.

The dataset is supposed to use the data from experiment 1.1.1 to produce circles against a background. But we make the circles smaller and place them in 1 of 4x4=16 grid positions. This is done to force the model to learn the answer invariant of where it appears in the image.

import cv2
import numpy as np
from enum import Enum
import colorsys
import moviepy.editor as mpe
import moviepy
from typing import *
import random
import pandas as pd
import json
import torch
from icecream import ic
import nncolor as nc
import nncolor.data
import IPython
def imshow(img):
    img = img*255
    img = cv2.cvtColor(img, cv2.COLOR_RGB2BGR)
    _,ret = cv2.imencode('.jpg', img) 
    i = IPython.display.Image(data=ret)
def demo_data():
    FPS = 2
    #frames = [circle_img(p, (255, 255, 255), (50, 20, 20)) for p in range(NUM_POSITIONS)]
    #labels = ['WB-0']*len(frames)
    frames, labels = nc.data.create_samples(30)
    frames = [f*255 for f in frames]
    x_clip = mpe.ImageSequenceClip(frames, fps=2)
    y_clip = mpe.TextClip('WB-0', font='DejaVu-Sans')

    class FrameText(mpe.VideoClip):
        def __init__(self, text, fps):
            def make_frame(f):
               return mpe.TextClip(text[int(f)], font='DejaVu-Sans', color='white').get_frame(f)
            self.duration = 1.0 * len(text) / fps
            mpe.VideoClip.__init__(self, make_frame=make_frame, duration=self.duration)

    y_clip =   FrameText(labels, FPS)
    label_clip = mpe.CompositeVideoClip([mpe.ImageClip(np.zeros(nc.data.DEFAULT_IMG_SHAPE), duration=5), y_clip])
    comp_clip = mpe.clips_array([[y_clip],[x_clip]])
    return comp_clip
clip = demo_data() 
def test_dataset():
    train, test, val = nc.data.load_datasets()



Let’s fine tune a Resnet model.

# Copied from: https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html

import torchvision as tv
import torchvision.datasets
import torchvision.models
import torchvision.transforms
import torch.nn
import torch.optim
import time
import copy
import os

# Data augmentation and normalization for training
# Just normalization for validation
data_transform = tv.transforms.Compose([
        tv.transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
# Haven't coded up a test loop yet, so ignore test set for now.
colors = nncolor.data.filter_colors(nncolor.data.exp_1_1_data, 
                                    include_colors={'orange', 'brown', 'neither'})
train_ds, test_ds, val_ds = nncolor.data.train_test_val_split(colors, split_ratio=(11, 0, 7))
train_ds.transform = data_transform
val_ds.transform = data_transform
ds = {'train': train_ds, 'val': val_ds}
dataloaders = {x: torch.utils.data.DataLoader(ds[x], batch_size=4,
                                             shuffle=True, num_workers=4)
              for x in ['train', 'val']}
dataset_sizes = {x: len(ds[x]) for x in ['train', 'val']}

device = (torch.device("cuda:0") if torch.cuda.is_available() else "cpu")

def train_model(model, criterion, optimizer, scheduler, num_epochs=25):
    since = time.time()

    best_model_wts = copy.deepcopy(model.state_dict())
    best_acc = 0.0

    for epoch in range(num_epochs):
        print('Epoch {}/{}'.format(epoch, num_epochs - 1))
        print('-' * 10)

        # Each epoch has a training and validation phase
        for phase in ['train', 'val']:
            if phase == 'train':
                model.train()  # Set model to training mode
                model.eval()   # Set model to evaluate mode

            running_loss = 0.0
            running_corrects = 0

            # Iterate over data.
            #for inputs, labels in dataloaders[phase]:
            for batch in dataloaders[phase]:
                inputs = batch['image'].to(device)
                labels = batch['label'].to(device)

                # zero the parameter gradients

                # forward
                # track history if only in train
                with torch.set_grad_enabled(phase == 'train'):
                    outputs = model(inputs)
                    _, preds = torch.max(outputs, 1)
                    loss = criterion(outputs, labels)

                    # backward + optimize only if in training phase
                    if phase == 'train':

                # statistics
                running_loss += loss.item() * inputs.size(0)
                running_corrects += torch.sum(preds == labels.data)
            if phase == 'train':

            epoch_loss = running_loss / dataset_sizes[phase]
            epoch_acc = running_corrects.double() / dataset_sizes[phase]

            print('{} Loss: {:.4f} Acc: {:.4f}'.format(
                phase, epoch_loss, epoch_acc))

            # deep copy the model
            if phase == 'val' and epoch_acc > best_acc:
                best_acc = epoch_acc
                best_model_wts = copy.deepcopy(model.state_dict())


    time_elapsed = time.time() - since
    print('Training complete in {:.0f}m {:.0f}s'.format(
        time_elapsed // 60, time_elapsed % 60))
    print('Best val Acc: {:4f}'.format(best_acc))

    # load best model weights
    return model

# Load & train
model_ft = tv.models.resnet18(pretrained=True)
num_ftrs = model_ft.fc.in_features
# Here the size of each output sample is set to 2.
# Alternatively, it can be generalized to nn.Linear(num_ftrs, len(class_names)).
model_ft.fc = torch.nn.Linear(num_ftrs, 4)
model_ft = model_ft.to(device)
criterion = torch.nn.CrossEntropyLoss()
# Observe that all parameters are being optimized
optimizer_ft = torch.optim.SGD(model_ft.parameters(), lr=0.001, momentum=0.9)
# Decay LR by a factor of 0.1 every 7 epochs
exp_lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer_ft, step_size=7, gamma=0.1)
num_epochs = 10 
model_ft = train_model(model_ft, criterion, optimizer_ft, exp_lr_scheduler,
Epoch 0/9
train Loss: 0.6732 Acc: 0.7424
val Loss: 0.5161 Acc: 0.8229

Epoch 1/9
train Loss: 0.4866 Acc: 0.8310
val Loss: 1.0469 Acc: 0.6317

Epoch 2/9
train Loss: 0.3988 Acc: 0.8561
val Loss: 0.6116 Acc: 0.8430

Epoch 3/9
train Loss: 0.3556 Acc: 0.8774
val Loss: 0.6171 Acc: 0.8155

Epoch 4/9
train Loss: 0.2975 Acc: 0.8887
val Loss: 0.9877 Acc: 0.6615

Epoch 5/9
train Loss: 0.2989 Acc: 0.8991
val Loss: 0.5163 Acc: 0.8817

Epoch 6/9
train Loss: 0.2722 Acc: 0.9044
val Loss: 0.6749 Acc: 0.8296

Epoch 7/9
train Loss: 0.1893 Acc: 0.9257
val Loss: 0.8103 Acc: 0.8527

Epoch 8/9
train Loss: 0.1692 Acc: 0.9347
val Loss: 0.7714 Acc: 0.8304

Epoch 9/9
train Loss: 0.1460 Acc: 0.9432
val Loss: 0.7465 Acc: 0.8363

Training complete in 1m 41s
Best val Acc: 0.881696


The most epochs tested was 24, and on this run, we have ~90% accuracy. So, there is enough information coming into the last layer for it to do a reasonable job at distinguishing brown, orange and neither. What is this information? Havin said that, 90% is low enough to wonder where the model is having difficulty.

Next steps

Try to get more details on how the model is working.

There are a few ways we can probe. Some ideas:

  1. randomly choose a set of activations to zero. How big can we make the set?
  2. estimate the order of the activation importance, and start from the top or bottom.
  3. look at the layer before final pooling. We are interested in seeing if the network has a degree of resolution to its color.

For the first method, we need a way to aggregate the result of multiple trials in order to draw conclusions about particular activations. For the second method, we need a way to estimate the activation importance, maybe like how its done in Optimal Brain Damage.

The 3rd method is the most similar to this experiment, so let’s do that first (experiment 1.4).