All the functions necessary to build `Learner` suitable for transfer learning in NLP

The most important functions of this module are language_model_learner and text_classifier_learner. They will help you define a Learner using a pretrained model. See the text tutorial for exmaples of use.

Loading a pretrained model

In text, to load a pretrained model, we need to adapt the embeddings of the vocabulary used for the pre-training to the vocabulary of our current corpus.

match_embeds[source]

match_embeds(old_wgts, old_vocab, new_vocab)

Convert the embedding in old_wgts to go from old_vocab to new_vocab.

For words in new_vocab that don't have a corresponding match in old_vocab, we use the mean of all pretrained embeddings.

wgts = {'0.encoder.weight': torch.randn(5,3)}
new_wgts = match_embeds(wgts.copy(), ['a', 'b', 'c'], ['a', 'c', 'd', 'b'])
old,new = wgts['0.encoder.weight'],new_wgts['0.encoder.weight']
test_eq(new[0], old[0])
test_eq(new[1], old[2])
test_eq(new[2], old.mean(0))
test_eq(new[3], old[1])

load_ignore_keys[source]

load_ignore_keys(model, wgts)

Load wgts in model ignoring the names of the keys, just taking parameters in order

class TextLearner[source]

TextLearner(model, dls, alpha=2.0, beta=1.0, moms=(0.8, 0.7, 0.8), loss_func=None, opt_func='Adam', lr=0.001, splitter='trainable_params', cbs=None, metrics=None, path=None, model_dir='models', wd=None, wd_bn_bias=False, train_bn=True) :: Learner

Basic class for a Learner in NLP.

Adds a ModelReseter and an RNNRegularizer with alpha and beta to the callbacks, the rest is the same as Learner init.

This Learner adds functionality to the base class:

TextLearner.load_pretrained[source]

TextLearner.load_pretrained(wgts_fname, vocab_fname, model=None)

Load a pretrained model and adapt it to the data vocabulary.

wgts_fname should point to the weights of the pretrained model and vocab_fname to the vocabulary used to pretrain it.

TextLearner.save_encoder[source]

TextLearner.save_encoder(file)

Save the encoder to file in the model directory

The model directory is Learner.path/Learner.model_dir.

TextLearner.load_encoder[source]

TextLearner.load_encoder(file, device=None)

Load the encoder file from the model directory, optionally ensuring it's on device

Language modeling predictions

For language modeling, the predict method is quite different form the other applications, which is why it needs its own subclass.

decode_spec_tokens[source]

decode_spec_tokens(tokens)

Decode the special tokens in tokens

test_eq(decode_spec_tokens(['xxmaj', 'text']), ['Text'])
test_eq(decode_spec_tokens(['xxup', 'text']), ['TEXT'])
test_eq(decode_spec_tokens(['xxrep', '3', 'a']), ['aaa'])
test_eq(decode_spec_tokens(['xxwrep', '3', 'word']), ['word', 'word', 'word'])

class LMLearner[source]

LMLearner(model, dls, alpha=2.0, beta=1.0, moms=(0.8, 0.7, 0.8), loss_func=None, opt_func='Adam', lr=0.001, splitter='trainable_params', cbs=None, metrics=None, path=None, model_dir='models', wd=None, wd_bn_bias=False, train_bn=True) :: TextLearner

Add functionality to TextLearner when dealingwith a language model

LMLearner.predict[source]

LMLearner.predict(text, n_words=1, no_unk=True, temperature=1.0, min_p=None, no_bar=False, decoder='decode_spec_tokens')

Return text and the n_words that come after

The words are picked randomly among the predictions, depending on the probability of each index. no_unk means we never pick the UNK token, tempreature is applied to the predictions, if min_p is passed, we don't consider the indices with a probability lower than it. Set no_bar to True if you don't want any progress bar, and you can pass a long a custom decoder to process the predicted tokens.

Learner convenience functions

language_model_learner[source]

language_model_learner(dls, arch, config=None, drop_mult=1.0, pretrained=True, pretrained_fnames=None, loss_func=None, opt_func='Adam', lr=0.001, splitter='trainable_params', cbs=None, metrics=None, path=None, model_dir='models', wd=None, wd_bn_bias=False, train_bn=True, moms=(0.95, 0.85, 0.95))

Create a Learner with a language model from dls and arch.

You can use the config to customize the architecture used (change the values from awd_lstm_lm_config for this), pretrained will use fastai's pretrained model for this arch (if available) or you can pass specific pretrained_fnames containing your own pretrained model and the corresponding vocabulary. All other arguments are passed to Learner.

path = untar_data(URLs.IMDB_SAMPLE)
df = pd.read_csv(path/'texts.csv')
dls = TextDataLoaders.from_df(df, path=path, text_col='text', is_lm=True, valid_col='is_valid')
learn = language_model_learner(dls, AWD_LSTM)
learn.predict('This movie is about', n_words=20)
'This movie is about having a fictional history as a teen tragedy , the exact source is for The Down Heavy'

text_classifier_learner[source]

text_classifier_learner(dls, arch, seq_len=72, config=None, pretrained=True, drop_mult=0.5, n_out=None, lin_ftrs=None, ps=None, max_len=1440, y_range=None, loss_func=None, opt_func='Adam', lr=0.001, splitter='trainable_params', cbs=None, metrics=None, path=None, model_dir='models', wd=None, wd_bn_bias=False, train_bn=True, moms=(0.95, 0.85, 0.95))

Create a Learner with a text classifier from dls and arch.

You can use the config to customize the architecture used (change the values from awd_lstm_clas_config for this), pretrained will use fastai's pretrained model for this arch (if available). drop_mult is a global multiplier applied to control all dropouts. n_out is usually infered from the dls but you may pass it.

The model uses a SentenceEncoder, which means the texts are passed seq_len tokens at a time, and will only compute the gradients on the last max_len steps. lin_ftrs and ps are passed to get_text_classifier.

All other arguments are passed to Learner.

path = untar_data(URLs.IMDB_SAMPLE)
df = pd.read_csv(path/'texts.csv')
dls = TextDataLoaders.from_df(df, path=path, text_col='text', label_col='label', valid_col='is_valid')
learn = text_classifier_learner(dls, AWD_LSTM)