Helper functions to get data in a `DataLoaders` in the tabular application and higher class `TabularDataLoaders`

The main class to get your data ready for model training is TabularDataLoaders and its factory methods. Checkout the tabular tutorial for examples of use.

class TabularDataLoaders[source]

TabularDataLoaders(*loaders, path='.', device=None) :: DataLoaders

Basic wrapper around several DataLoaders with factory methods for tabular data

This class should not be used directly, one of the factory methods should be prefered instead. All those factory methods accept as arguments:

  • cat_names: the names of the categorical variables
  • cont_names: the names of the continuous variables
  • y_names: the names of the dependent variables
  • y_block: the TransformBlock to use for the target
  • valid_idx: the indices to use for the validation set (defaults to a random split otherwise)
  • bs: the batch size
  • val_bs: the batch size for the validation DataLoader (defaults to bs)
  • shuffle_train: if we shuffle the training DataLoader or not
  • n: overrides the numbers of elements in the dataset
  • device: the PyTorch device to use (defaults to default_device())

TabularDataLoaders.from_df[source]

TabularDataLoaders.from_df(df, path='.', procs=None, cat_names=None, cont_names=None, y_names=None, y_block=None, valid_idx=None, bs=64, val_bs=None, shuffle_train=True, n=None, device=None)

Create from df in path using procs

Let's have a look on an example with the adult dataset:

path = untar_data(URLs.ADULT_SAMPLE)
df = pd.read_csv(path/'adult.csv')
df.head()
age workclass fnlwgt education education-num marital-status occupation relationship race sex capital-gain capital-loss hours-per-week native-country salary
0 49 Private 101320 Assoc-acdm 12.0 Married-civ-spouse NaN Wife White Female 0 1902 40 United-States >=50k
1 44 Private 236746 Masters 14.0 Divorced Exec-managerial Not-in-family White Male 10520 0 45 United-States >=50k
2 38 Private 96185 HS-grad NaN Divorced NaN Unmarried Black Female 0 0 32 United-States <50k
3 38 Self-emp-inc 112847 Prof-school 15.0 Married-civ-spouse Prof-specialty Husband Asian-Pac-Islander Male 0 0 40 United-States >=50k
4 42 Self-emp-not-inc 82297 7th-8th NaN Married-civ-spouse Other-service Wife Black Female 0 0 50 United-States <50k
cat_names = ['workclass', 'education', 'marital-status', 'occupation', 'relationship', 'race']
cont_names = ['age', 'fnlwgt', 'education-num']
procs = [Categorify, FillMissing, Normalize]
dls = TabularDataLoaders.from_df(df, path, procs=procs, cat_names=cat_names, cont_names=cont_names, 
                                 y_names="salary", valid_idx=list(range(800,1000)), bs=64)
dls.show_batch()
workclass education marital-status occupation relationship race education-num_na age fnlwgt education-num salary
0 Private HS-grad Never-married Other-service Not-in-family Black False 30.000000 144593.001229 9.0 <50k
1 Private Some-college Married-civ-spouse Exec-managerial Husband White False 57.999999 289363.997582 10.0 >=50k
2 Self-emp-not-inc Bachelors Married-civ-spouse Craft-repair Husband White False 35.000000 241998.001274 13.0 <50k
3 Private 10th Never-married Other-service Unmarried White False 28.000000 66434.001077 6.0 <50k
4 Private Bachelors Divorced Sales Not-in-family Black False 48.000000 149209.999774 13.0 >=50k
5 Private Some-college Married-civ-spouse Handlers-cleaners Husband White False 65.000000 83800.005693 10.0 <50k
6 Private Some-college Divorced Adm-clerical Unmarried White False 43.000000 35210.003687 10.0 <50k
7 Private HS-grad Married-civ-spouse Craft-repair Wife White False 40.000000 87770.998000 9.0 <50k
8 Self-emp-not-inc Some-college Married-civ-spouse Exec-managerial Husband White False 42.000000 99185.002887 10.0 >=50k
9 Private Bachelors Never-married Craft-repair Not-in-family White False 25.000000 308144.002521 13.0 <50k

TabularDataLoaders.from_csv[source]

TabularDataLoaders.from_csv(csv, path='.', procs=None, cat_names=None, cont_names=None, y_names=None, y_block=None, valid_idx=None, bs=64, val_bs=None, shuffle_train=True, n=None, device=None)

Create from csv file in path using procs

cat_names = ['workclass', 'education', 'marital-status', 'occupation', 'relationship', 'race']
cont_names = ['age', 'fnlwgt', 'education-num']
procs = [Categorify, FillMissing, Normalize]
dls = TabularDataLoaders.from_csv(path/'adult.csv', path=path, procs=procs, cat_names=cat_names, cont_names=cont_names, 
                                  y_names="salary", valid_idx=list(range(800,1000)), bs=64)