Using `Datasets`, `Pipeline`, `TfmdLists` and `Transform`
from fastai2.data.all import *
from fastai2.vision.core import *
from fastai2.vision.data import *
from nbdev.showdoc import *

Overview

In the second half of this page we'll look at complete examples of loading and using datasets using the fastai.data low-level APIs. But first, let's look at simple examples of the following applied to a single filename:

(We won't look at TfmdLists or Datasets in this initial overview since they require a full dataset. Have a look at the second half of the page for examples of those.)

Transform

First we'll have a look at the basic steps using a single MNIST image.

Let's start with a filename, and see step by step how it can be converted in to a labelled image that can be displayed and used for modeling.

source = untar_data(URLs.MNIST_TINY)/'train'
items = get_image_files(source)
fn = items[0]; fn
Path('/home/sgugger/.fastai/data/mnist_tiny/train/3/7861.png')

We'll look at each Transform needed in turn. Here's how we can open an image file:

img = PILImage.create(fn); img

Then create a c*h*w tensor:

tconv = ToTensor()
img = tconv(img)
img.shape,type(img)
(torch.Size([3, 28, 28]), fastai2.torch_core.TensorImage)

Now that's done, we can create our labels. First extracting the text label:

lbl = parent_label(fn); lbl
'3'

And then converting to an int for modeling:

tcat = Categorize(vocab=['3','7'])
lbl = tcat(lbl); lbl
TensorCategory(0)

We use decode to reverse transforms for display. Reversing the Categorize transform result in a class name we can display:

lbld = tcat.decode(lbl)
lbld
'3'

Pipeline

We can compose our image steps using Pipeline:

pipe = Pipeline([PILImage.create,tconv])
img = pipe(fn)
img.shape
torch.Size([3, 28, 28])

A Pipeline can decode and show an item.

pipe.show(img, figsize=(1,1), cmap='Greys');

The show method works behind the scenes with types. Transforms will make sure the type of an element they receive is preserved. Here PILImage.create returns a PILImage, which knows how to show itself. tconv converts it to a TensorImage, which also knows how to show itself.

type(img)
fastai2.torch_core.TensorImage

Those types are also used to enable different behaviors depending on the input received (for instance you don't do data augmentation the same way on an image, a segmentation mask or a bounding box).

Loading the Pets dataset using only Transform

Let's see how to use fastai.data to process the Pets dataset. We use source to refer to the underlying source of our data (e.g. a directory on disk, a database connection, a network connection, etc).

source = untar_data(URLs.PETS)/"images"

First we grab the items and split indices:

items = get_image_files(source)
split_idx = RandomSplitter()(items)

We'll use this function to create consistently sized tensors from image files:

def resized_image(fn:Path, sz=128):
    x = Image.open(fn).convert('RGB').resize((sz,sz))
    # Convert image to tensor for modeling
    return tensor(array(x)).permute(2,0,1).float()/255.

Before we can create a Transform, we need a type that knows how to show itself. Here we define a TitledImage:

class TitledImage(Tuple):
    def show(self, ctx=None, **kwargs): show_titled_image(self, ctx=ctx, **kwargs)
img = resized_image(items[0])
TitledImage(img,'test').show()

Try 1: State outside class

class PetTfm(Transform):
    def __init__(self, vocab, o2i, lblr): self.vocab,self.o2i,self.lblr = vocab,o2i,lblr
    def encodes(self, o): return [resized_image(o), self.o2i[self.lblr(o)]]
    def decodes(self, x): return TitledImage(x[0],self.vocab[x[1]])
labeller = RegexLabeller(pat = r'/([^/]+)_\d+.jpg$')
vals = list(map(labeller, items[split_idx[0]]))
vocab,o2i = uniqueify(vals, sort=True, bidir=True)
pets = PetTfm(vocab,o2i,labeller)
x,y = pets(items[0])
x.shape,y
(torch.Size([3, 128, 128]), 36)
dec = pets.decode([x,y])
dec.show()

Try 2: State inside class

We can now create a Transform that converts from an index to our x and y for modeling.

class PetTfm(Transform):
    def __init__(self, items, train_idx):
        self.items,self.train_idx = items,train_idx
        self.labeller = RegexLabeller(pat = r'/([^/]+)_\d+.jpg$')
        vals = map(self.labeller, items[train_idx])
        self.vocab,self.o2i = uniqueify(vals, sort=True, bidir=True)

    def encodes(self, i):
        o = self.items[i]
        return [resized_image(o), self.o2i[self.labeller(o)]]
    
    def decodes(self, x): return TitledImage(x[0],self.vocab[x[1]])

Note that decodes is intended to operate on the return value of encodes. In the case above, decodes takes in a tuple consisting of a Tensor representing the image and an integer being the class idx and returns an instance of TitledImage.

It's important to give the type that can show itself to fully decoded elements because when in a Pipeline, we stop decoding as soon as we can find a show method.

pets = PetTfm(items, split_idx[0])
x,y = pets(0)
x.shape,y
(torch.Size([3, 128, 128]), 36)
dec = pets.decode([x,y])
dec.show()

Using Pipeline to create Siamese model dataset

A Siamese model needs a dataset with two images as input, and a boolean output that is True if the two images are the "same" (e.g. are pictures of the same breed of pet). Custom structures like this are often easiest to create using your own Pipeline, which makes no assumptions about the structure of your input or output data.

We'll be creating a dataset that returns two images and a boolean. So let's first define a showable type for a tuple with those things:

class SiameseImage(Tuple):
    def show(self, ctx=None, **kwargs): 
        img1,img2,same_breed = self
        return show_image(torch.cat([img1,img2], dim=2), title=same_breed, ctx=ctx)
SiameseImage(img,img,True).show();

We create a SiamesePair transform that creates the tuple we'll need for a SiameseImage.

class SiamesePair(Transform):
    def __init__(self,items,labels):
        self.items,self.labels,self.assoc = items,labels,self
        sortlbl = sorted(enumerate(labels), key=itemgetter(1))
        # dict of (each unique label) -- (list of indices with that label)
        self.clsmap = {k:L(v).itemgot(0) for k,v in itertools.groupby(sortlbl, key=itemgetter(1))}
        self.idxs = range_of(self.items)
        
    def encodes(self,i):
        "x: tuple of `i`th image and a random image from same or different class; y: True if same class"
        othercls = self.clsmap[self.labels[i]] if random.random()>0.5 else self.idxs
        otherit = random.choice(othercls)
        return SiameseImage(self.items[i], self.items[otherit], self.labels[otherit]==self.labels[i])

Then we want to open and resize the image filenames but not the boolean. This kind of filtering is done with type annotations. We indicated with the return type annotation of SiamesePair.encodes the types of our objects and we can then create a new transform that opens and resizes Path objects but leaves other types untouched.

Then the Pipeline will compose our two transforms, and create the SiameseImage.

OpenAndResize = Transform(resized_image)
labeller = RegexLabeller(pat = r'/([^/]+)_\d+.jpg$')
sp = SiamesePair(items, items.map(labeller))
pipe = Pipeline([sp, OpenAndResize])
x,y,z = t = pipe(0)
x.shape,y.shape,z
(torch.Size([3, 128, 128]), torch.Size([3, 128, 128]), False)
for _ in range(3): pipe.show(pipe(0))

Using Datasets

Datasets applies a list of list of transforms (or list of Pipelines) lazily to items of a collection, creating one output per list of transforms/Pipeline. This makes it easier for us to separate out steps of a process, so that we can re-use them and modify the process more easily. For instance, we could add data augmentation, data normalization, etc. Here we separate out the steps of the basic pets process.

class ImageResizer(Transform):
    order=0
    "Resize image to `size` using `resample`"
    def __init__(self, size, resample=Image.BILINEAR):
        if not is_listy(size): size=(size,size)
        self.size,self.resample = (size[1],size[0]),resample

    def encodes(self, o:PILImage): return o.resize(size=self.size, resample=self.resample)
    def encodes(self, o:PILMask):  return o.resize(size=self.size, resample=Image.NEAREST)
tfms = [[PILImage.create, ImageResizer(128), ToTensor(), IntToFloatTensor()],
        [labeller, Categorize()]]
dsets = Datasets(items, tfms)
t = dsets[0]
type(t[0]),type(t[1])
(fastai2.torch_core.TensorImage, fastai2.torch_core.TensorCategory)
x,y = dsets.decode(t)
x.shape,y
(torch.Size([3, 128, 128]), 'yorkshire_terrier')
dsets.show(t);

The types being properly propagated and dispatched, we can do the same thing with ImageResizer, ImageToByteTensor, IntToFloatTensor being passed as tranforms over the tuple. This is done in TfmdDL by passing them to after_item. They won't do anything to the category but will only be applied to the inputs.

tfms = [[PILImage.create], [labeller, Categorize()]]
dsets = Datasets(items, tfms)
tdl = TfmdDL(dsets, bs=1, after_item=[ImageResizer(128), ToTensor(), IntToFloatTensor()])
t = tdl.one_batch()
x,y = tdl.decode_batch(t)[0]
x.shape,y
(torch.Size([3, 128, 128]), 'yorkshire_terrier')
dsets.show((x,y));

Adding splits

When you create a Datasource, you can pass along splits indices that represent the split between train and validation set (there can be multiple validation sets) on top of the items and tfms.

pets = Datasets(items, tfms, splits=split_idx)

To access an element we need to specify the subset (either with train/valid or with subset(i)):

x,y = pets.subset(1)[0]
x.shape,y
((240, 300), TensorCategory(13))

...or equivalently:

x2,y2 = pets.valid[0]
test_eq(x.shape,x2.shape)
test_eq(y,y2)

We can decode an element for display purposes, either passing a tuple to decode or by passing an index to decode_at.

xy = pets.valid.decode((x,y))
xy[1]
'american_pit_bull_terrier'
xy2 = decode_at(pets.valid, 0)
test_eq(type(xy2[1]), Category)
test_eq(xy2, xy)

We're now ready to show our items.

pets.show((x,y));

We can use this to feed a DataLoader and view a batch. It's faster to convert to float on GPU, so we'll do it as a DataLoader transform in after_batch. We have to set the device in TfmdDL as it's None by default.

item_img_tfms = [ImageResizer(128), ToTensor()]
batch_tfms = [IntToFloatTensor()]

trn_dl = TfmdDL(pets.train, bs=9, after_item=item_img_tfms, after_batch=batch_tfms, device=default_device())
b = trn_dl.one_batch()

test_eq(len(b[0]), 9)
test_eq(b[0][0].shape, (3,128,128))
test_eq(b[0].type(), 'torch.cuda.FloatTensor' if default_device().type=='cuda' else 'torch.FloatTensor')
bd = trn_dl.decode_batch(b)

test_eq(len(bd), 9)
test_eq(bd[0][0].shape, (3,128,128))
trn_dl.show_batch()