\( \newcommand{\matr}[1] {\mathbf{#1}} \newcommand{\vertbar} {\rule[-1ex]{0.5pt}{2.5ex}} \newcommand{\horzbar} {\rule[.5ex]{2.5ex}{0.5pt}} \newcommand{\E} {\mathrm{E}} \)
deepdream of
          a sidewalk

Experiment 2.2.1

Testing to see if a simple high pass filter can be used to improve classification accuracy in the presence of atypical illumination.

Section 1, 2 and 3 of this notebook are practically identical to that of 2.1.2. The only and critical difference is that when creating the dataset, all images are run through the following high pass filter. The filter was chosen to match Photoshop’s high-pass filter functionality, which in turn is the filter used in the paper “Kitaoka’s Tomato: Two Simple Explanations Based on Information in the Stimulus”, by Shapiro et al. The discussion in section 4 is new and reflects on the results in this notebook compared to those in experiment 2.1.2.

def high_pass_filter(img):
    """High-pass filter replicating Photoshops high-pass filter."""
    filter_sd = 100
    filter_size = (200, 200) 
    blurred = cv2.GaussianBlur(img,  filter_size, filter_sd, filter_sd)
    hp = 127.0 + (img - blurred)
    hp = np.clip(hp, 0, 255.0)
    return hp

Until section 4, all below text and code is duplicated from experiment 2.1.2.

1. Experiment improvements

This time, I make sure each input image contains an object that corresponds to an ImageNet class. Many of the images in 2.1.1 contained no objects with corresponding ImageNet labels.

I also noticed that the images have an embedded colorspace, “Linear Rec2020 RGB”, and so the images need to be converted in order to be in the colorspace expected by ResNet (sRGB). This was not done in 2.1.1, but is done here in 2.1.2.

import tempfile
import zipfile
import urllib
import numpy as np
import torch
import pathlib
import torchvision as tv
import torchvision.datasets
import torchvision.models
import torchvision.transforms
import pandas as pd
from icecream import ic
import json
import xarray as xr
import matplotlib as mpl
import matplotlib.pyplot as plt
from collections import namedtuple
import ipyplot
import cv2
import einops
import PIL
import PIL.ImageCms
import IPython
with open('./resources/imagenet-simple-labels.json') as f:
    labels = json.load(f)
    labels_to_id = {s:i for (i,s) in enumerate(labels)}
    
    
NUM_CLASSES = 1000
assert NUM_CLASSES == len(labels)
    
    
def class_id_to_label(cid):
    assert int(cid) == cid
    cid = int(cid)
    return labels[cid]


def label_to_class_id(label):
    return labels_to_id[label]
def imshow(img):
    """Show image. 
    
    Image is a HWC numpy array with values in the range 0-1."""
    img = img*255
    img = cv2.cvtColor(img, cv2.COLOR_RGB2BGR)
    # cv2 imencode takes images in HWC dimension order.
    _,ret = cv2.imencode('.jpg', img) 
    i = IPython.display.Image(data=ret)
    IPython.display.display(i) 
    
    
def imlist(images, labels=None, use_tabs=False):
    if use_tabs:
        ipyplot.plot_class_tabs(images, labels, max_imgs_per_tab=300)
    else:
        ipyplot.plot_images(images, labels)
# Choose CPU or GPU.
device = torch.device('cuda:0')
#device = "cpu"

# Choose small or large (standard) model variant
#model_name = "resnet18"
model_name = 'resnet50'
def model_fctn():
    if model_name == 'resnet18':
        return tv.models.resnet18(pretrained=True)
    elif model_name == 'resnet50':
        return tv.models.resnet50(pretrained=True)
model = model_fctn()
state = torch.hub.load_state_dict_from_url(tv.models.resnet.model_urls[model_name])
model.load_state_dict(state)
model = model.to(device)
model.eval()


def model_name_str():
    """Returns the printable string form of the model."""
    res = None
    if model_name == 'resnet18':
        res = 'ResNet-18'
    elif model_name == 'resnet50':
        res = 'ResNet-50'
    else:
        raise Exception('Unexpected model.') 
    return res

IMG_SHAPE = (224, 224, 3)
ds_path = pathlib.Path('resources/exp_2/mls_dataset')
def is_empty(path):
    return not any(path.iterdir())
is_downloaded = ds_path.is_dir() and not is_empty(ds_path)
if not is_downloaded:
    ds_path.mkdir(exist_ok=True)
    zip_path, _ = urllib.request.urlretrieve('ftp://vis.iitp.ru/mls-dataset/images_preview.zip')
    with zipfile.ZipFile(zip_path, "r") as f:
        f.extractall(ds_path)

2. Dataset

The dataset is constructed by extracting crops from the following 24 scenes, repeated in 18 different illuminations. Dataset is the mls-dataset: https://github.com/Visillect/mls-dataset.

The crops are hand chosen to insure that each image can be meaningfully labeled with 1 of the 1000 ImageNet labels. The cropped images are transformed again as a form of data augmentation. The data augmentation carries out a 5-crop transform, outputting 5 cropped images for each input image.

scenes_overview.png

The next code section prepares the dataset. The dataset images are printed at the end.

# Transforms
normalize_transform =  tv.transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
crop = tv.transforms.FiveCrop(size=IMG_SHAPE[0])
pre_norm_transform = tv.transforms.Compose([crop])
norm_transform = tv.transforms.Compose([normalize_transform])

# Data details
num_scenes = 24
ignore_scenes = {1,} # 0: the Macbeth chart
scenes = [s for s in range(1, num_scenes+1) if s not in ignore_scenes]
crops = ['topleft', 'topright', 'bottomleft', 'bottomright', 'center']
Crop = namedtuple('Crop', 'y, x, h, w')
Scene = namedtuple('Scene', 'imagenet_label, id, crop') 
# Choose a crop size that allows for 5 sub-crops of size 224.
crop_size = int(224 * 3/2) # 336 
scenes = [
    Scene('bath towel',      11, Crop(157, 264, crop_size, crop_size)),
    Scene('ping-pong ball',  12, Crop(230, 110, 263,       292)),       # Image made smaller to keep ball in all crops.
    Scene('cup',             13, Crop(157,   0, crop_size, crop_size)), # Not sure if 'cup' or 'coffee mug' is the best class.
    Scene('pot',             14, Crop(117,  50, crop_size, crop_size)), 
    Scene('Granny Smith',    15, Crop(185, 264, crop_size, crop_size)), 
    Scene('bell pepper',     16, Crop(157, 216, crop_size, crop_size)), 
    Scene('banana',          17, Crop(157,   0, crop_size, crop_size)), # This image also contains an orange and apple.
    Scene('coffee mug',      20, Crop(197,  15, 296,       crop_size)), # Image made to keep the cup in all crops.
    Scene('water bottle',    22, Crop( 65, 264, crop_size, crop_size)), 
    Scene('water bottle',    24, Crop( 65,   0, crop_size, crop_size)), # This image also contains a banana.
    Scene('banana',          24, Crop(197,  50, 296,       crop_size))] # This image also contains a water bottle.
illuminants = [
    '2HAL_DESK_LED-B025',
    '2HAL_DESK_LED-B050',
    '2HAL_DESK_LED-B075',
    '2HAL_DESK_LED-B100',
    '2HAL_DESK_LED-BG025',
    '2HAL_DESK_LED-BG050',
    '2HAL_DESK_LED-BG075',
    '2HAL_DESK_LED-BG100',
    '2HAL_DESK_R025',
    '2HAL_DESK_R050',
    '2HAL_DESK_R075',
    '2HAL_DESK_R100',
    '2HAL_DESK_RG025',
    '2HAL_DESK_RG050',
    '2HAL_DESK_RG075',
    '2HAL_DESK_RG100',
    '2HAL_DESK',
    '2HAL']


def img_key(scene_id, imagenet_label, crop_label, illuminant):
    return f'{scene_id}.{imagenet_label}.{crop_label}.{illuminant}'


def rec_2020_to_sRGB(rgb_img):
    """Convert image in REC2020 linear colorspace to an sRGB colorspace image.
    
    This method didn't actually seem to work, yet I'm not exactly sure why, so 
    I'm keeping it here so I can come back and try understand why it isn't working. 
    Below, I ended up using the inbuilt features of PIL instead.
    """
    # Rec 2020 to CIE XYZ
    to_xyz_mat = torch.tensor([[6.36953507e-01, 1.44619185e-01, 1.68855854e-01], 
                               [2.62698339e-01, 6.78008766e-01, 5.92928953e-02], 
                               [4.99407097e-17, 2.80731358e-02, 1.06082723e+00]])
    def dot_vector(m, v):
        return  torch.einsum('...ij,...j->...i', m, v)
    xyz = dot_vector(to_xyz_mat, rgb_img)
    
    # CIE XYZ to sRGB
    to_linear_rgb_mat = torch.tensor([
        [3.2406, -1.5372, -0.4986],
        [-0.9689, 1.8758, 0.0415],
        [0.0557, 0.2040, 1.057]])
    linear_rgb = dot_vector(to_linear_rgb_mat, xyz)
    
    def to_srgb(c):
        res = 12.92*c if c <= 0.0031308 else (1.055 * c**(1/2.4) - 0.055)
        return res
    s_rgb = linear_rgb.apply_(to_srgb)
    s_rgb_chw = einops.rearrange(s_rgb, 'h w c -> c h w')
    return s_rgb_chw


def open_as_srgb(img_path):
    """Open an image and convert it to sRGB.
    
    The image must have an embedded ICC color profile."""
    img = PIL.Image.open(img_path)
    icc = tempfile.mkstemp(suffix='.icc')[1]
    with open(icc, 'wb') as f:
        f.write(img.info.get('icc_profile'))
    srgb = PIL.ImageCms.createProfile('sRGB')
    img = PIL.ImageCms.profileToProfile(img, icc, srgb)
    return img

def mixed_pass(img):
    blurred0 = cv2.GaussianBlur(img,  (999, 999), 400, 400)
    blurred1 = cv2.GaussianBlur(img,  (999, 999), 30, 30)
    #blurred = np.array(blurred, dtype=np.float32)
    t0 = 0.05
    t1 =  0.9
    hp = img*t0 + (1-t0)*(img - blurred0 + 127.0)
    hp = hp*t1 + (1-t1)*(0.5*(hp + blurred1))
    hp = np.clip(hp, 0, 255.0)
    return hp
    
def open_img(scene, illuminant):
    """Open the image corresponding to the given scene and illuminant."""
    img_path = ds_path / 'images_preview' / f'{scene.id:02d}' / f'{scene.id:02d}_{illuminant}.jpg'
    img = open_as_srgb(img_path) 
    img = np.array(img, dtype=np.float32)
    cropped = img[scene.crop.y:scene.crop.y+scene.crop.h, scene.crop.x:scene.crop.x+scene.crop.w, :]
    return cropped


def create_dataset():
    """
    Dataset as a dict. Keys are of the form: 04-topleft-2HAL_DESK_LED-B025.
    Images are 0-1 tensors.
    """
    images = dict()
    for s in scenes:
        for ill in illuminants:
            img = open_img(s, ill)
            #img = np.asarray(img, dtype=np.float32) / 255.0
            img = np.asarray(img, dtype=np.float32)
            #img = high_pass_filter(img)
            img = mixed_pass(img)
            img = img / 255.0
            img = torch.tensor(einops.rearrange(img, 'h w c -> c h w'))
            cropped_images = pre_norm_transform(img)
            for crop_label, ci in zip(crops, cropped_images):
                images[img_key(s.id, s.imagenet_label, crop_label, ill)] = ci
    return images

ds = create_dataset()


def get_ds_image(ds, scene, illuminant, subcrop):
    # img is torch in CxHxW format.
    img = ds[img_key(scene.id, scene.imagenet_label, subcrop, illuminant)]
    img = einops.rearrange(img, 'c h w -> h w c').numpy()
    return img


def print_originals():
    """Print the images before the 5-crop transformation.
    
    Only images for one illumination are printed."""
    images = []
    labels = []
    for s in scenes:
        img = open_img(s, illuminants[-2])
        labels.append(s.imagenet_label)
        images.append(img)
    imlist(images, labels)
            

def print_dataset(inc_illuminants=None):
    """Print the dataset images.
    
    Args:
        inc_illuminants (set): restrict the illuminants to this set. Without
                               setting this option, the number of images printed
                               with be quite large (11x5x18).
   """
    if not inc_illuminants:
        inc_illuminants = set(illuminants)
    ds = create_dataset()
    tab_labels = []
    images = []
    custom_labels = []
    for k,v in ds.items():
        sid, imagenet_label, crop, illuminant = k.split('.')
        if not illuminant in inc_illuminants:
            continue
        tab_labels.append(illuminant)
        images.append(einops.rearrange(v.numpy(), 'c h w -> h w c'))
        custom_labels.append(f'{imagenet_label} ({crop})')
        
    ipyplot.plot_class_tabs(images, tab_labels, custom_labels, max_imgs_per_tab=200)

The following images are the hand-chosen 11 different 336x366 crops, under 2HAL illumination (two halogen lights).

One aspect of the dataset that is worth pointing out is that they use a naming convention for their illuminants, so 2HAL means 2 halogen lights, DESK means a tungsten desk lamp, LED-B025 means a blue LED light at 25% power. Check the dataset source for more details.

print_originals()
/opt/conda/lib/python3.8/site-packages/numpy/core/_asarray.py:83: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
  return array(a, dtype, copy=False, order=order)

bath towel

ping-pong ball

cup

pot

Granny Smith

bell pepper

banana

coffee mug

water bottle

water bottle

banana

Below, is a subset of the whole dataset. The 11 336x336 are cropped into 5 different 224x224 images, and this is done for all 18 illuminations. Below, only 3 out of the 18 illuminations are shown.

inc_illuminants={'2HAL_DESK', '2HAL_DESK_RG075', '2HAL_DESK_LED-B100'}
print_dataset(inc_illuminants)

0

bath towel (topleft)

1

bath towel (topright)

2

bath towel (bottomleft)

3

bath towel (bottomright)

4

bath towel (center)

5

ping-pong ball (topleft)

6

ping-pong ball (topright)

7

ping-pong ball (bottomleft)

8

ping-pong ball (bottomright)

9

ping-pong ball (center)

10

cup (topleft)

11

cup (topright)

12

cup (bottomleft)

13

cup (bottomright)

14

cup (center)

15

pot (topleft)

16

pot (topright)

17

pot (bottomleft)

18

pot (bottomright)

19

pot (center)

20

Granny Smith (topleft)

21

Granny Smith (topright)

22

Granny Smith (bottomleft)

23

Granny Smith (bottomright)

24

Granny Smith (center)

25

bell pepper (topleft)

26

bell pepper (topright)

27

bell pepper (bottomleft)

28

bell pepper (bottomright)

29

bell pepper (center)

30

banana (topleft)

31

banana (topright)

32

banana (bottomleft)

33

banana (bottomright)

34

banana (center)

35

coffee mug (topleft)

36

coffee mug (topright)

37

coffee mug (bottomleft)

38

coffee mug (bottomright)

39

coffee mug (center)

40

water bottle (topleft)

41

water bottle (topright)

42

water bottle (bottomleft)

43

water bottle (bottomright)

44

water bottle (center)

45

water bottle (topleft)

46

water bottle (topright)

47

water bottle (bottomleft)

48

water bottle (bottomright)

49

water bottle (center)

50

banana (topleft)

51

banana (topright)

52

banana (bottomleft)

53

banana (bottomright)

54

banana (center)

0

bath towel (topleft)

1

bath towel (topright)

2

bath towel (bottomleft)

3

bath towel (bottomright)

4

bath towel (center)

5

ping-pong ball (topleft)

6

ping-pong ball (topright)

7

ping-pong ball (bottomleft)

8

ping-pong ball (bottomright)

9

ping-pong ball (center)

10

cup (topleft)

11

cup (topright)

12

cup (bottomleft)

13

cup (bottomright)

14

cup (center)

15

pot (topleft)

16

pot (topright)

17

pot (bottomleft)

18

pot (bottomright)

19

pot (center)

20

Granny Smith (topleft)

21

Granny Smith (topright)

22

Granny Smith (bottomleft)

23

Granny Smith (bottomright)

24

Granny Smith (center)

25

bell pepper (topleft)

26

bell pepper (topright)

27

bell pepper (bottomleft)

28

bell pepper (bottomright)

29

bell pepper (center)

30

banana (topleft)

31

banana (topright)

32

banana (bottomleft)

33

banana (bottomright)

34

banana (center)

35

coffee mug (topleft)

36

coffee mug (topright)

37

coffee mug (bottomleft)

38

coffee mug (bottomright)

39

coffee mug (center)

40

water bottle (topleft)

41

water bottle (topright)

42

water bottle (bottomleft)

43

water bottle (bottomright)

44

water bottle (center)

45

water bottle (topleft)

46

water bottle (topright)

47

water bottle (bottomleft)

48

water bottle (bottomright)

49

water bottle (center)

50

banana (topleft)

51

banana (topright)

52

banana (bottomleft)

53

banana (bottomright)

54

banana (center)

0

bath towel (topleft)

1

bath towel (topright)

2

bath towel (bottomleft)

3

bath towel (bottomright)

4

bath towel (center)

5

ping-pong ball (topleft)

6

ping-pong ball (topright)

7

ping-pong ball (bottomleft)

8

ping-pong ball (bottomright)

9

ping-pong ball (center)

10

cup (topleft)

11

cup (topright)

12

cup (bottomleft)

13

cup (bottomright)

14

cup (center)

15

pot (topleft)

16

pot (topright)

17

pot (bottomleft)

18

pot (bottomright)

19

pot (center)

20

Granny Smith (topleft)

21

Granny Smith (topright)

22

Granny Smith (bottomleft)

23

Granny Smith (bottomright)

24

Granny Smith (center)

25

bell pepper (topleft)

26

bell pepper (topright)

27

bell pepper (bottomleft)

28

bell pepper (bottomright)

29

bell pepper (center)

30

banana (topleft)

31

banana (topright)

32

banana (bottomleft)

33

banana (bottomright)

34

banana (center)

35

coffee mug (topleft)

36

coffee mug (topright)

37

coffee mug (bottomleft)

38

coffee mug (bottomright)

39

coffee mug (center)

40

water bottle (topleft)

41

water bottle (topright)

42

water bottle (bottomleft)

43

water bottle (bottomright)

44

water bottle (center)

45

water bottle (topleft)

46

water bottle (topright)

47

water bottle (bottomleft)

48

water bottle (bottomright)

49

water bottle (center)

50

banana (topleft)

51

banana (topright)

52

banana (bottomleft)

53

banana (bottomright)

54

banana (center)

2. Method

Same as experiment 2.1.1.

Pass each of the 11x18x5 (11 scenes, 18 illuminations, 5 crops) preview images into a ResNet model for classification. The ResNet model is pretrained for ImageNet classification. Record every classification.

def run_all_images_through_ResNet():
    """Input the images into the ResNet model and collect results.
    
    Returns: results as an xarray.DataArray.
    """
    ds = create_dataset()
    raw_data = np.zeros((len(scenes), len(crops), len(illuminants), NUM_CLASSES))
    classification_val_id_pairs = []
    for idx, s in enumerate(scenes): 
        for crop_idx, crop in enumerate(crops):
            classification_ids = []
            for ill_idx, ill in enumerate(illuminants):
                img = ds[img_key(s.id, s.imagenet_label, crop, ill)]
                input_ = torch.unsqueeze(normalize_transform(img), 0).to(device)
                model_out = model.forward(input_)
                raw_data[idx][crop_idx][ill_idx][:] = model_out.squeeze().detach().cpu()
    xdata = xr.DataArray(raw_data,  
                         coords={'scene':range(len(scenes)), 'crop':crops,  'illuminant':illuminants, 
                                 'class_id':np.arange(0, NUM_CLASSES)})
    return xdata
data = run_all_images_through_ResNet()

3. Inspect data, part 1. Some figures.

The data is inspected to see how illumination effects the ResNet outputs.

The data s a 4 dimensional table of floats. Each 1000 floats grouped into the last dimension “class_id” collectively represent the 1000 outputs of ResNet for a single input image. The following summary from xarray shows some more details:

data
<xarray.DataArray (scene: 11, crop: 5, illuminant: 18, class_id: 1000)>
array([[[[-4.29903626e-01,  9.21166301e-01,  8.81139278e-01, ...,
          -1.87220252e+00,  1.87858179e-01,  1.91726017e+00],
         [ 1.09480464e+00,  2.08758259e+00,  1.14955485e+00, ...,
          -7.93911934e-01,  3.84823143e-01,  1.50083888e+00],
         [ 2.66796851e+00,  3.98082662e+00,  1.57907903e+00, ...,
          -7.87773609e-01, -3.67825618e-03,  1.86860979e+00],
         ...,
         [-3.30766797e-01,  8.17212284e-01,  3.17816913e-01, ...,
          -2.02598977e+00,  2.53353930e+00,  1.99032009e+00],
         [-1.93771315e+00,  7.33527422e-01,  4.27033484e-01, ...,
          -3.20578170e+00,  1.78447938e+00,  1.63053477e+00],
         [-1.95729876e+00, -7.73610473e-02,  8.97646919e-02, ...,
          -2.88542867e+00,  1.01842630e+00,  1.84713519e+00]],
        [[ 3.26670885e-01,  1.18755805e+00,  1.45558274e+00, ...,
          -1.35615468e+00, -7.96209097e-01,  2.75238490e+00],
         [ 1.42138445e+00,  2.30918431e+00,  2.16223526e+00, ...,
          -8.16984057e-01, -1.12978137e+00,  2.13594365e+00],
         [ 1.94081175e+00,  3.02392721e+00,  2.38246727e+00, ...,
          -5.31695962e-01, -1.60632801e+00,  2.75798750e+00],
...
         [ 2.74941742e-01,  1.94216102e-01,  1.26644158e+00, ...,
           3.08545280e+00,  7.25672781e-01,  2.78782511e+00],
         [ 5.94751954e-01,  8.42354596e-01,  1.14717877e+00, ...,
           5.70019674e+00,  1.10939133e+00,  1.51744628e+00],
         [ 1.52889836e+00,  1.75904572e+00,  1.43566942e+00, ...,
           6.89457655e+00,  1.12486601e+00,  6.88156962e-01]],
        [[ 5.31372279e-02,  2.20003223e+00, -3.59411165e-02, ...,
           3.31279707e+00,  1.82042837e+00,  4.23886955e-01],
         [-1.26856244e+00,  2.12585020e+00, -4.50966895e-01, ...,
           2.86830544e+00,  1.54614854e+00,  1.96660471e+00],
         [-2.21372819e+00,  2.38486147e+00, -6.58210635e-01, ...,
           1.99152052e+00,  8.11449707e-01,  2.64881468e+00],
         ...,
         [ 9.16015744e-01,  2.54737282e+00,  2.70456135e-01, ...,
           2.58635926e+00,  1.83936763e+00,  2.85220480e+00],
         [ 1.12276042e+00,  2.35252476e+00,  4.38723594e-01, ...,
           3.81984282e+00,  2.00980163e+00,  5.44086218e-01],
         [-5.05281448e-01,  1.62945235e+00, -3.17298412e-01, ...,
           4.87919569e+00,  1.48574662e+00,  1.11035967e+00]]]])
Coordinates:
  * scene       (scene) int64 0 1 2 3 4 5 6 7 8 9 10
  * crop        (crop) <U11 'topleft' 'topright' ... 'bottomright' 'center'
  * illuminant  (illuminant) <U19 '2HAL_DESK_LED-B025' ... '2HAL'
  * class_id    (class_id) int64 0 1 2 3 4 5 6 7 ... 993 994 995 996 997 998 999

I’ll skip explaining the below code, and I’ll delay explanations until the figures are printed.

# Organise the data into some useful formats. 
# These results are used throughout the rest of the notebook. 
#
# 1. classifications
#    Flatten the class_id dimension so that the table is now a table of the 
#    class ids chosen by the ResNet model (the max class output).
# 2. is_equal_ground_truth 
#    The classifications table is converted to true/false depending on whether 
#    the output matches the ground truth (ground truth is decided by me).
# 3. classification_hist
#    A pseudo histogram. Count how many different classes ImageNet classifies a 
#    single scene-crop image set. That is, over all 18 illuminations, how many 
#    different classes are assigned by ResNet?
# 4. diversities
#    The classification_hist can be seen as a list of distributions. For each
#    distribution, calculate a measure of diversity. The measure of diversity 
#    chosen is entropy.
classifications = data.argmax(dim='class_id')
is_equal_groundtruth = []
for idx, group in classifications.groupby('scene'):
    is_equal_groundtruth.append(xr.where(group == label_to_class_id(scenes[idx].imagenet_label), True, False))
is_equal_groundtruth = xr.concat(is_equal_groundtruth, dim='scene')

def entropy(x):
    """Shannon entropy."""
    x = x / x.sum()
    # Filled allows log(0) to be ignored and set as 0.
    l = np.ma.log2(x).filled(0)
    entr = -np.sum(x*l)
    return entr
diversity_measure = entropy

def unique_counts(x):
    _, unique_counts = np.unique(x, return_counts=True)
    # There can be as many as 18 unique classes; however in reality, 
    # the most I saw was 7, so I'd like to set the number of bins 
    # to be about 8 to make the graphs clearer. Unfortunately,
    # xarray forces the output axis to have the same size as 
    # the original, so we are stuck with 18.
    bin_count = len(illuminants)
    res = np.zeros([bin_count])
    res[0:len(unique_counts)] = unique_counts
    return res
classification_hist = classifications.reduce(
    lambda x,axis: np.apply_along_axis(unique_counts, axis, x), 
    dim='illuminant')
diversities = classification_hist.reduce(
    lambda x,axis: np.apply_along_axis(diversity_measure, axis, x),
    dim='illuminant')
ave_scores = data.max(dim='class_id').mean(dim='illuminant')
def show_pseudo_distribution_grid():
    # The most unique classifications observed for a given scene-crop was 8 (maximum 
    # possible is 18, the number of illuminations). Cutting off the extra x axis makes
    # the charts less compacted, so hopefully easier to read.
    max_unique = 8 
    fg = classification_hist.isel(illuminant=slice(0, max_unique+1)).plot.step(x='illuminant', col='crop', row='scene')
    fg.set_axis_labels('classes', 'classification counts')
    for ax in fg.axes.flat:
        #ax.get_xaxis().set_visible(False)

        ax.tick_params(labelbottom=False)
    fg.fig.tight_layout()
    IPython.display.display(IPython.display.Markdown('### Classification histograms for each scene-crop pair'))
    IPython.display.display(IPython.display.Markdown('The x-axis represents _unordered_ ImageNet classes that '
                                                     'the machine thought was most likely for _at least one illuminant_.'))
    fg.fig.show()
   

Classification histograms

First up we have a grid of distribution-like figures. The grid of figures tries to express the variation in ResNet output classification as the illuminant is varied.

The creation of the grid is explained in more detail underneath.

show_pseudo_distribution_grid()

Classification histograms for each scene-crop pair

The x-axis represents unordered ImageNet classes that the machine thought was most likely for at least one illuminant.

png

Figure details

There are 11x5 figures (11 scenes, 5 crops). Each figure represents a single (scene, crop) pair. Each figure tallies the 18 classifications, binned by the ImageNet class id. However, if processing was stopped here, each figure would have an x-axis with 1000 bins, which would make it hard to view. So, I fixed a sufficient domain size (8 bins) and for each figure, I removed the class-id entries that had a tally of zero. I removed enough of these entries so that all figures have the same number of bins. The maximum number of non-zero bins was 8, hence 8 being chosen for the domain size.

Some issues with the figure

The grid of figures are a bit hard to read. In future, I’ll avoid using xarray’s plot functionality, as it is difficult to configure, which makes it hard to create nice and readable figures. Most frustratingly, xarray doesn’t support bar charts, so I have used a step chart, which is as close as I can get.

In addition, because of the removal of classes from each figure, the information about what class each x-axis position corresponds to is lost. Furthermore, there is no x-axis correspondence between the figures.

Entropy

The above 55 (11x5) figures are showing the classification distribution as illumination is varied (for fixed a scene-crop). We can condense each of these distributions into a single number that measures the diversity inherent in the distribution: more diversity means ResNet was less decisive in it’s classification as illumination varied. The specific measure for diversity I’ll use is entropy (Shannon entropy). Below is the 55 distributions plotted against their entropy (x-axis). To spread the points in 2D, and to add some more information to the chart, I’ve plotted the points against a measure of the model’s confidence. This confidence measure maximum output activation of ResNet, averaged over the 18 illuminations.

(Note: there are multiple ways one can measure diversity; entropy seems fine for this situtaion. (Diversity measures is an interesting topic in itself; Tom Leinster’s ideas on diversity are great, for example: https://arxiv.org/abs/2012.02113. I try to find any excuse at all to read something by Tom Leinster.)

plt.scatter(diversities, ave_scores)
plt.title('Classification Confidence vs Classification Diversity for\nEach Cropped Image (Illumination is Varied)')
plt.xlabel('Classification diversity (entropy, base 2) over the 18 images')
plt.ylabel('Confidence\n(mean of the 18 max output\nactivations of ResNet)')
plt.show()

png

The mean diversity measure is:

mean_diversity = diversities.mean()
mean_diversity.data
array(0.68719282)

Mean diversity lower for experiment 2.1.2

We have mean diversity of 1.13 for ResNet-18 and 0.91 for ResNet-50. This mean is lower than what was obtained in experiment 2.1.1 (1.43 for ResNet-18 and 1.37 for ResNet-50). So, if images have relatively obvious ImageNet labels, the illumination has less of an effect on the output. This doesn’t seem surprising; the more uncertain the ResNet model is, the more likely that the maximum confidence label has relatively low confidence, and that there are one or more other labels with not too dissimilar confidence. This means than a change in illumination only needs to cause a small change in label confidence for the qualification of maximum confidence to switch to another label.

Classification accuracy vs. illuminants

Below is a 2D table showing whether ResNet correctly labeled an image. Green represents a correct classification, and black represents an incorrect classification. Illuminants vary over the y-axis and both scenes and crop vary over the x-axis.

green = '#80b696'
gray = '#565656'
def plot_by_illuminant(condense=False):
    data = is_equal_groundtruth.stack(scene_crop=('scene', 'crop'))
    if condense:
        by_scene_crop = data.groupby('scene_crop')#, squeeze=True)
        non_zero = []
        for _,g in by_scene_crop:
            if g.any() and not g.all():
                non_zero.append(g)
        data = xr.concat(non_zero, dim='scene_crop')
    y_ticks = data.illuminant.data
    x_ticks = [f'{c}-{s}' for (s,c) in data.scene_crop.data]
    fig, ax = plt.subplots(figsize=(15,15))
    ax.set_title(f'{model_name_str()} classification (correct/incorrect) for illumination vs. crop-scene pair')
    ax.set_ylabel('scene illumination')
    ax.set_xlabel('crop-scene pair')
    ax.set_xticks(np.arange(data.shape[1]))
    ax.set_yticks(np.arange(data.shape[0]))
    plt.setp(ax.get_xticklabels(), rotation=90, ha='right', rotation_mode='anchor')
    ax.set_xticklabels(x_ticks)
    ax.set_yticklabels(y_ticks)
    cmap = mpl.colors.ListedColormap([gray, green])
    ax.imshow(data, cmap=cmap)
    fig.show()
plot_by_illuminant(condense=False)

png

The 2D table is repeated again below, this time, empty columns are removed. Empty columns represent scene-crop pairs that were labeled identically under every illumination. When investigating effects of illumination, it may be interesting to ignore these cases.

plot_by_illuminant(condense=True)

png

Effect of illuminants

The above figure is transformed by summing along the x-axis (scene-crop axis disappears). The illumination axis stays the same. The x-axis becomes a count. Below is a figure with the number of successful classifications summed across scenes and crops, then plotted against illumination. From this we see that actually, there is quite a difference between the different illuminants. The cube above is thus not a great visualization.

by_illuminant = is_equal_groundtruth.sum(dim=['scene', 'crop'])
#ans.plot()
fig, ax = plt.subplots()
ax.set_title(f'Correct classification counts for {model_name_str()} under different illumination')
ax.set_ylabel('scene illumination')
ax.set_xlabel('number of correct classifications (max. possible is 55)')
ax.set_yticks(np.arange(0, len(illuminants)))
ax.set_yticklabels(illuminants)
ax.barh(np.arange(len(illuminants)), by_illuminant.data, color=green)
fig.show()

png

3. Inspect data, part 2. Good illumination, bad illumination.

The above figures make it clear that illumination has an effect on classification accuracy. To get a better feel for the details of this effect, this section compares the best and worst illuminants.

On average, the best illumination was the illumination where both the red and green LED lights were enabled and set at 75% power.

The worst illumination occurred with the LED set to blue only, at 100% power. The next two worst illuminations were the same except for the LED power set to 75% and 50%. The blue only LED setup at 25% power managed to just eclipse the blue-green LED setup at 100% power.

In both of these scenes there was also 2 halogen lights and 1 desktop lamps.

Best was red-green, worst was blue.

worst_illum, best_illum = illuminants[3], illuminants[14]
print(f'Best illuminant: {best_illum}')
print(f'Worst illuminant: {worst_illum}')
example_best = np.asarray(open_as_srgb(ds_path / 'images_preview' / '03' / f'03_{best_illum}.jpg'))
example_worst = np.asarray(open_as_srgb(ds_path / 'images_preview' / '03' / f'03_{worst_illum}.jpg'))
imlist([example_best, example_worst], ['best illuminant', 'worst illuminant'])
Best illuminant: 2HAL_DESK_RG075
Worst illuminant: 2HAL_DESK_LED-B100

best illuminant

worst illuminant

It might be useful to refer to the dataset printout shown towards the top of the page; both of these illuminants were included as tabs.

Inspect the best and worst illumination

Below we inspect the classifications for the two illuminations by dividing all scene-crop pairs into the 4 tabs depending on whether the image was correctly or incorrectly labeled. For each tab, images come in pairs: best illumination followed by worst illumination.

Interestingly, there are actually only 3 tabs, as there were no images for which ResNet failed under the good illumination but succeeded under the bad illumination.

def compare_classifications():
    images = []
    tab_labels = []
    class_labels = []

    for s in is_equal_groundtruth.scene:
        for c in is_equal_groundtruth.crop:
            res_best_illum = is_equal_groundtruth.sel(scene=s, crop=c, illuminant=best_illum)
            res_worst_illum = is_equal_groundtruth.sel(scene=s, crop=c, illuminant=worst_illum)
            label = None
            if res_best_illum:
                if res_worst_illum:
                    label = 'both correct'
                else:    
                    label = 'best illum correct, worst illum incorrect'
            else:
                if res_worst_illum:
                    label = 'best illum incorrect, worst illum correct'
                else:
                    label = 'both incorrect'
            #images.append(open_img(scenes[s.data], best_illum))
            images.append(get_ds_image(ds, scenes[s.data], best_illum, c.data))
            images.append(get_ds_image(ds, scenes[s.data], worst_illum, c.data))
            class_labels.append(class_id_to_label(classifications.sel(scene=s, crop=c, illuminant=best_illum)))
            class_labels.append(class_id_to_label(classifications.sel(scene=s, crop=c, illuminant=worst_illum)))
            tab_labels.append(label)
            tab_labels.append(label)
    ipyplot.plot_class_tabs(images, tab_labels, class_labels)
compare_classifications()            

0

bath towel

1

ant

2

bath towel

3

wool

4

cup

5

coffee mug

6

cup

7

candle

8

pot

9

mixing bowl

10

pot

11

mixing bowl

12

Granny Smith

13

candle

14

Granny Smith

15

ping-pong ball

16

bell pepper

17

bubble

18

bell pepper

19

bubble

20

bell pepper

21

bubble

22

bell pepper

23

bubble

24

bell pepper

25

bubble

26

coffee mug

27

teapot

28

banana

29

coil

0

pill bottle

1

water bottle

0

bath towel

1

bath towel

2

bath towel

3

bath towel

4

bath towel

5

bath towel

6

ping-pong ball

7

ping-pong ball

8

ping-pong ball

9

ping-pong ball

10

ping-pong ball

11

ping-pong ball

12

ping-pong ball

13

ping-pong ball

14

ping-pong ball

15

ping-pong ball

16

cup

17

cup

18

cup

19

cup

20

cup

21

cup

22

banana

23

banana

24

banana

25

banana

26

banana

27

banana

28

banana

29

banana

0

bucket

1

bucket

2

mixing bowl

3

mixing bowl

4

ice pop

5

mixing bowl

6

lemon

7

balloon

8

lemon

9

billiard table

10

ping-pong ball

11

ping-pong ball

12

cup

13

cup

14

pitcher

15

mixing bowl

16

cup

17

cup

18

teapot

19

teapot

20

pill bottle

21

pill bottle

22

banana

23

candle

24

banana

25

banana

4. Discussion

Below are the two correct classification tallies for experiment 2.1.2 and experiment 2.2.1.

Experiment 2.1.2:

image.png

Experiment 2.2.1:

image.png

The scenes whose illumination led to poor results in experiment 2.1.2 experience much better classification accuracy in this experiment when the images are passed through the high-pass filter. The scenes which experienced good results in experiment 2.1.2 did not experience much better results in this experiment. Thus, it seems as though the basic high pass filter is helping to improve classification accuracy for scenes whose illumination is negatively effecting classification.

There are still differences between the illuminants; however, so there is room for improvement. The filter used in this experiment was very basic, so investigating alternatives is an interesting direction.