High level API to quickly get your data in a `DataLoaders`
from nbdev.showdoc import *

class TransformBlock[source]

TransformBlock(type_tfms=None, item_tfms=None, batch_tfms=None, dl_type=None, dls_kwargs=None)

A basic wrapper that links defaults transforms for the data block API

CategoryBlock[source]

CategoryBlock(vocab=None, add_na=False)

TransformBlock for single-label categorical targets

MultiCategoryBlock[source]

MultiCategoryBlock(encoded=False, vocab=None, add_na=False)

TransformBlock for multi-label categorical targets

RegressionBlock[source]

RegressionBlock(c_out=None)

TransformBlock for float targets

General API

#For example, so not exported
from fastai2.vision.core import *
from fastai2.vision.data import *

class DataBlock[source]

DataBlock(blocks=None, dl_type=None, getters=None, n_inp=None, item_tfms=None, batch_tfms=None, get_items=None, splitter=None, get_y=None, get_x=None)

Generic container to quickly build Datasets and DataLoaders

To build a DataBlock you need to give the library four things: the types of your input/labels then at least two functions: get_items and splitter. You may also need to include get_x and get_y or a more generic list of getters that are applied to the results of get_items.

Once those are provided, you automatically get a Datasets or a DataLoaders:

DataBlock.datasets[source]

DataBlock.datasets(source, verbose=False)

Create a Datasets object from source

DataBlock.dataloaders[source]

DataBlock.dataloaders(source, path='.', verbose=False, bs=64, shuffle=False, num_workers=None, do_setup=True, pin_memory=False, timeout=0, batch_size=None, drop_last=False, indexed=None, n=None, device=None, wif=None, before_iter=None, after_item=None, before_batch=None, after_batch=None, after_iter=None, create_batches=None, create_item=None, create_batch=None, retain=None, get_idxs=None, sample=None, shuffle_fn=None, do_batch=None)

Create a DataLoaders object from source

You can create a DataBlock by passing functions:

mnist = DataBlock(blocks = (ImageBlock(cls=PILImageBW),CategoryBlock),
                  get_items = get_image_files,
                  splitter = GrandparentSplitter(),
                  get_y = parent_label)

Each type comes with default transforms that will be applied

  • at the base level to create items in a tuple (usually input,target) from the base elements (like filenames)
  • at the item level of the datasets
  • at the batch level

They are called respectively type transforms, item transforms, batch transforms. In the case of MNIST, the type transforms are the method to create a PILImageBW (for the input) and the Categorize transform (for the target), the item transform is ToTensor and the batch transforms are Cuda and IntToFloatTensor. You can add any other transforms by passing them in DataBlock.datasets or DataBlock.dataloaders.

test_eq(mnist.type_tfms[0], [PILImageBW.create])
test_eq(mnist.type_tfms[1].map(type), [Categorize])
test_eq(mnist.default_item_tfms.map(type), [ToTensor])
test_eq(mnist.default_batch_tfms.map(type), [IntToFloatTensor])
dsets = mnist.datasets(untar_data(URLs.MNIST_TINY))
test_eq(dsets.vocab, ['3', '7'])
x,y = dsets.train[0]
test_eq(x.size,(28,28))
show_at(dsets.train, 0, cmap='Greys', figsize=(2,2));
test_fail(lambda: DataBlock(wrong_kwarg=42, wrong_kwarg2='foo'))

Debugging

DataBlock.summary[source]

DataBlock.summary(source, bs=4, **kwargs)