Using the fastai library in computer vision.
from import *

This tutorial highlights how quickly build a Learner and finetune a pretrained model on most computer vision tasks.

Single-label classification

For this task, we will use the Oxford-IIIT Pet Dataset that contains images of cats and dogs of 37 different breeds. We will first show how to build a simple cat-vs-dog classifier, then a little bit more advanced models that can classify all breeds.

The dataset can be downloaded and decompressed with this line of code:

path = untar_data(URLs.PETS)

It will only do this download once, and return the location of the decompressed archive. We can check what is inside with the .ls() method.
(#3) [Path('/home/sgugger/.fastai/data/oxford-iiit-pet/models'),Path('/home/sgugger/.fastai/data/oxford-iiit-pet/annotations'),Path('/home/sgugger/.fastai/data/oxford-iiit-pet/images')]

We will ignore the annotations folder for now, and focus on the images one. get_image_files is a fastai function that helps us grab all the image files (recursively) in one folder.

files = get_image_files(path/"images")

Cats vs dogs

To label our data for the cats vs dogs problem, we need to know which filenames are of dog pictures and which ones are of cat pictures. There is an easy way to distinguish: the name of the file begins with a capital for cats, and a lowercased letter for dog:


We can then define an easy label function:

def label_func(f): return f[0].isupper()

To get our data ready for a model, we need to put it in a DataLoaders object. Here we have a function that labels using the file names, so we will use ImageDataLoaders.from_name_func. There are other factory methods of ImageDataLoaders that could be more suitable for your problem, so make sure to check them all in

dls = ImageDataLoaders.from_name_func(path, files, label_func, item_tfms=Resize(224))

We have passed to this function the directory we're working in, the files we grabbed, our label_func and one last piece as item_tfms: this is a Transform applied on all items of our datasets that will resize each imge to 224 by 224, by using a random crop on the largest dimension to make it a square, then resizing to 224 by 224. If we didn't pass this, we would get an error later as it would impossible to batch the items together.

We can then check everything looks okay with the show_batch method (True is for cat, False is for dog):


Then we can create a Learner, which is a fastai object that combines data and model for training, and use transfer learning to finetune a pretrained model in just two lines of codes:

learn = cnn_learner(dls, resnet34, metrics=error_rate)
epoch train_loss valid_loss error_rate time
0 0.141778 0.010346 0.004060 00:12
epoch train_loss valid_loss error_rate time
0 0.049628 0.012845 0.005413 00:14

The first line downloaded a model called ResNet34, pretrained on ImageNet, and adapted it to our specific problem. It then finetuned that model and in a relatively short time, we get a model with an error rate of 0.3%... amazing!

If you want to make a prediction on a new image, you can use learn.predict:

('False', tensor(0), tensor([1.0000e+00, 5.8945e-07]))

The predict method returns three things: the decoded prediction (here False for dog), the index of the predicted class and the tensor of probabilities that our image is one of a dog (here the model is quite confident!) This method accepts a filename, a PIL image or a tensor directly in this case.

We can also have a look at some predictions with the show_results method: