Experiment configuration file format¶
Intro¶
Configuration files are in YAML format.
At the top-level, a config file consists of a dictionary where keys are experiment
names and values are the experiment specifications. By default, all experiments
are run in lexicographical ordering, but xnmt_run_experiments
can also be told
to run only a selection of the specified experiments. An example template with
2 experiments looks like this
exp1: !Experiment
exp_global: ...
preproc: ...
model: ...
train: ...
evaluate: ...
exp2: !Experiment
exp_global: ...
preproc: ...
model: ...
train: ...
evaluate: ...
!Experiment
is YAML syntax specifying a Python object of the same name, and
its parameters will be passed on to the Python constructor.
There can be a special top-level entry named defaults
; this experiment will
never be run, but can be used as a template where components are partially shared
using YAML anchors or the !Ref mechanism (more on this later).
The usage of exp_global
, preproc
, model
, train
, evaluate
are explained below.
Not all of them need to be specified, depending on the use case.
Experiment¶
This specifies settings that are global to this experiment. An example
exp_global: !ExpGlobal
model_file: '{EXP_DIR}/models/{EXP}.mod'
log_file: '{EXP_DIR}/logs/{EXP}.log'
default_layer_dim: 512
dropout: 0.3
Not that for any strings used here or anywhere in the config file {EXP}
will
be over-written by the name of the experiment, {EXP_DIR}
will be overwritten
by the directory the config file lies in, {PID}
by the process id, and
{GIT_REV}
by the current git revision.
To obtain a full list of allowed parameters, please check the documentation for ExpGlobal.
Preprocessing¶
xnmt supports a variety of data preprocessing features. Please refer to Preprocessing for details.
Model¶
This specifies the model architecture. An typical example looks like this
model: !DefaultTranslator
src_reader: !PlainTextReader
vocab: !Vocab {vocab_file: examples/data/head.ja.vocab}
trg_reader: !PlainTextReader
vocab: !Vocab {vocab_file: examples/data/head.en.vocab}
encoder: !BiLSTMSeqTransducer
layers: 1
attender: !MlpAttender
hidden_dim: 512
state_dim: 512
input_dim: 512
decoder: !AutoRegressiveDecoder
embedder: !SimpleWordEmbedder
emb_dim: 512
rnn_layer: !UniLSTMSeqTransducer
layers: 1
transform: !NonLinear
output_dim: 512
bridge: !CopyBridge {}
The top level entry is typically DefaultTranslator, which implements a standard attentional sequence-to-sequence model. It allows flexible specification of encoder, attender, source / target embedder, and other settings. Again, to obtain the full list of supported options, please refer to the corresponding class in the API Doc.
Note that some of this Python objects are passed to their parent object’s initializer method, which requires that the children are initialized first. xnmt therefore uses a bottom-up initialization strategy, where siblings are initialized in the order they appear in the constructor. Among others, this guarantees that preprocessing is carried out before the model training.
Training¶
A typical example looks like this
train: !SimpleTrainingRegimen
trainer: !AdamTrainer
alpha: 0.001
run_for_epochs: 2
src_file: examples/data/head.ja
trg_file: examples/data/head.en
dev_tasks:
- !LossEvalTask
src_file: examples/data/head.ja
ref_file: examples/data/head.en
The expected object here is a subclass of TrainingRegimen. Besides
xnmt.training_regimen.SimpleTrainingRegimen
, multi-task style training regimens are supported.
For multi task training, each training regimen uses their own model, so in this
case models must be specified as sub-components of the training regimen. An example
Multi-task configuration can be refered to for more details on this.
Evaluation¶
If specified, the model is tested after training finished.
Config files vs. saved model files¶
Saved model files are written out in the exact same YAML format as the config files (with the addition
of some .data directories that contain DyNet weights). This means that it is possible to specify a
saved model as the configuration file. There is one subtle difference: In a config file, placeholders
such as {EXP_DIR}
are resolved based on the current context, which will be different when directly
specifying the saved model file as config file. For this purpose a --resume
option exists that
makes sure to use the context from the saved model file: xnmt --resume /path/to/saved-model.mod
.
This feature is currently implemented only in a very basic form: When resuming a crashed experiment, this will cause the whole experiment to be carried out from the start. When resuming a finished experiment, xnmt will return without performing any action. In the future, this will be extended to support resuming from the most recent saved checkpoint, etc.
Examples¶
Here are more elaborate examples from the github repository.
Standard¶
# A standard setup, specifying model architecture, training parameters,
# and evaluation of the trained model
!Experiment # 'standard' is the name given to the experiment
name: standard # every experiment needs a name
# global parameters shared throughout the experiment
exp_global: !ExpGlobal
# {EXP_DIR} is a placeholder for the directory in which the config file lies.
# {EXP} is a placeholder for the experiment name (here: 'standard')
model_file: '{EXP_DIR}/models/{EXP}.mod'
log_file: '{EXP_DIR}/logs/{EXP}.log'
default_layer_dim: 512
dropout: 0.3
# model architecture
model: !DefaultTranslator
src_reader: !PlainTextReader
vocab: !Vocab {vocab_file: examples/data/head.ja.vocab}
trg_reader: !PlainTextReader
vocab: !Vocab {vocab_file: examples/data/head.en.vocab}
src_embedder: !SimpleWordEmbedder
emb_dim: 512
encoder: !BiLSTMSeqTransducer
layers: 1
attender: !MlpAttender
hidden_dim: 512
state_dim: 512
input_dim: 512
decoder: !AutoRegressiveDecoder
embedder: !SimpleWordEmbedder
emb_dim: 512
rnn: !UniLSTMSeqTransducer
layers: 1
transform: !AuxNonLinear
output_dim: 512
activation: 'tanh'
bridge: !CopyBridge {}
scorer: !Softmax {}
# training parameters
train: !SimpleTrainingRegimen
batcher: !SrcBatcher
batch_size: 32
trainer: !AdamTrainer
alpha: 0.001
run_for_epochs: 2
src_file: examples/data/head.ja
trg_file: examples/data/head.en
dev_tasks:
- !LossEvalTask
src_file: examples/data/head.ja
ref_file: examples/data/head.en
# final evaluation
evaluate:
- !AccuracyEvalTask
eval_metrics: bleu
src_file: examples/data/head.ja
ref_file: examples/data/head.en
hyp_file: examples/output/{EXP}.test_hyp
Minimal¶
# Most entries in the config file have default values and don't need to be
# specified explicitly. This config file produces the same results as
# 01_standard.yaml.
# Default parameters are specified and documented directly in the __init__()
# method of the corresponding classes.
# For example,xnmt.translator.DefaultTranslator.__init__()
# specifies MlpAttender as the default attender, which will be used in this
# examples since nothing is specified.
!Experiment
name: minimal
model: !DefaultTranslator
src_reader: !PlainTextReader
vocab: !Vocab {vocab_file: examples/data/head.ja.vocab}
trg_reader: !PlainTextReader
vocab: !Vocab {vocab_file: examples/data/head.en.vocab}
train: !SimpleTrainingRegimen
run_for_epochs: 2
src_file: examples/data/head.ja
trg_file: examples/data/head.en
dev_tasks:
- !LossEvalTask
src_file: examples/data/head.ja
ref_file: examples/data/head.en
evaluate:
- !AccuracyEvalTask
eval_metrics: bleu
src_file: examples/data/head.ja
ref_file: examples/data/head.en
hyp_file: examples/output/{EXP}.test_hyp
Multiple experiments¶
# A config file can contain multiple experiments.
# These are run in sequence.
# It's also possible to run experiments in parallel:
# by default, experiments are skipped when the corresponding log file already
# exists, i.e. when the experiment is currently running or has already finnished.
# That means it's safe to run ``xnmt my_config.yaml`` on the same config file
# multiple times.
#
# This particular examples runs the same experiment, changing only the amount
# of dropout. model, train, evaluate settings are shared using YAML anchors,
# see here for more information: http://yaml.readthedocs.io/en/latest/example.html
#
# There are two ways of specifying multiple experiments: the dictionary-way and the
# list-way. The dictionary-way is shown below. Here, dictionary keys are experiment
# names and the values are !Experiment objects. The order is determined by lexicographic
# ordering of the experiment names.
exp1_dropout: !Experiment
exp_global: !ExpGlobal
dropout: 0.5
model: &my_model !DefaultTranslator
src_reader: !PlainTextReader
vocab: !Vocab {vocab_file: examples/data/head.ja.vocab}
trg_reader: !PlainTextReader
vocab: !Vocab {vocab_file: examples/data/head.en.vocab}
train: &my_train !SimpleTrainingRegimen
run_for_epochs: 2
src_file: examples/data/head.ja
trg_file: examples/data/head.en
dev_tasks:
- !LossEvalTask
src_file: examples/data/head.ja
ref_file: examples/data/head.en
evaluate: &my_eval
- !AccuracyEvalTask
eval_metrics: bleu
src_file: examples/data/head.ja
ref_file: examples/data/head.en
hyp_file: examples/output/{EXP}.test_hyp
exp2_no_dropout: !Experiment
exp_global: !ExpGlobal
dropout: 0.0
model: *my_model
train: *my_train
evaluate: *my_eval
# This example demonstrates specifying multiple experiments as a list.
# Here, the list makes the order of experiments explicit.
# Experiment names have to be passed as arguments to !Experiment
- !Experiment
name: exp1_dropout
exp_global: !ExpGlobal
dropout: 0.5
model: &my_model !DefaultTranslator
src_reader: !PlainTextReader
vocab: !Vocab {vocab_file: examples/data/head.ja.vocab}
trg_reader: !PlainTextReader
vocab: !Vocab {vocab_file: examples/data/head.en.vocab}
train: &my_train !SimpleTrainingRegimen
run_for_epochs: 2
src_file: examples/data/head.ja
trg_file: examples/data/head.en
dev_tasks:
- !LossEvalTask
src_file: examples/data/head.ja
ref_file: examples/data/head.en
evaluate: &my_eval
- !AccuracyEvalTask
eval_metrics: bleu
src_file: examples/data/head.ja
ref_file: examples/data/head.en
hyp_file: examples/output/{EXP}.test_hyp
- !Experiment
name: exp2_no_dropout
exp_global: !ExpGlobal
dropout: 0.0
model: *my_model
train: *my_train
evaluate: *my_eval
# Finally, it's possible to specify a single experiment as top-level entry,
# where again the experiment name has to be passed as an argument.
!Experiment
name: exp1_dropout
exp_global: !ExpGlobal
dropout: 0.5
model: &my_model !DefaultTranslator
src_reader: !PlainTextReader
vocab: !Vocab {vocab_file: examples/data/head.ja.vocab}
trg_reader: !PlainTextReader
vocab: !Vocab {vocab_file: examples/data/head.en.vocab}
train: &my_train !SimpleTrainingRegimen
run_for_epochs: 2
src_file: examples/data/head.ja
trg_file: examples/data/head.en
dev_tasks:
- !LossEvalTask
src_file: examples/data/head.ja
ref_file: examples/data/head.en
evaluate: &my_eval
- !AccuracyEvalTask
eval_metrics: bleu
src_file: examples/data/head.ja
ref_file: examples/data/head.en
hyp_file: examples/output/{EXP}.test_hyp
Settings¶
# The basic XNMT behavior can be controlled via predefined configurations.
# These are defined under xnmt/settings.py, and include "standard", "debug", and "unittest" settings.
# These specify things like verbosity, default paths, whether experiments should be skipped if the log file already
# exists, and whether to activate the DyNet check_validity and immediate_compute options.
#
# As the name suggests, e.g. when debugging one might use XNMT as follows:
# ``xnmt --settings=debug examples/04_settings.yaml``
#
# It is easy to change behavior by either changing these configurations, or adding a new configuration to the module.
!Experiment
name: settings-exp
model: !DefaultTranslator
src_reader: !PlainTextReader
vocab: !Vocab {vocab_file: examples/data/head.ja.vocab}
trg_reader: !PlainTextReader
vocab: !Vocab {vocab_file: examples/data/head.en.vocab}
train: !SimpleTrainingRegimen
run_for_epochs: 2
src_file: examples/data/head.ja
trg_file: examples/data/head.en
dev_tasks:
- !LossEvalTask
src_file: examples/data/head.ja
ref_file: examples/data/head.en
evaluate:
- !AccuracyEvalTask
eval_metrics: bleu
src_file: examples/data/head.ja
ref_file: examples/data/head.en
hyp_file: examples/output/{EXP}.test_hyp
Preprocessing¶
# XNMT supports various ways to preprocess data as demonstrated in this example.
# Note that some preprocessing functionality relies on third-party tools.
!Experiment
name: preproc
exp_global: !ExpGlobal
# define some named strings that can be used throughout the experiment config:
placeholders:
DATA_IN: examples/data/
DATA_OUT: examples/preproc/
preproc: !PreprocRunner
overwrite: False
tasks:
- !PreprocTokenize
in_files:
- '{DATA_IN}/train.ja'
- '{DATA_IN}/train.en'
- '{DATA_IN}/dev.ja'
- '{DATA_IN}/dev.en'
- '{DATA_IN}/test.ja'
- '{DATA_IN}/test.en'
out_files:
- '{DATA_OUT}/train.tok.ja'
- '{DATA_OUT}/train.tok.en'
- '{DATA_OUT}/dev.tok.ja'
- '{DATA_OUT}/dev.tok.en'
- '{DATA_OUT}/test.tok.ja'
- '{DATA_OUT}/test.tok.en'
specs:
- filenum: all
tokenizers:
- !UnicodeTokenizer {}
- !PreprocNormalize
in_files:
- '{DATA_OUT}/train.tok.ja'
- '{DATA_OUT}/train.tok.en'
- '{DATA_OUT}/dev.tok.ja'
- '{DATA_OUT}/dev.tok.en'
- '{DATA_OUT}/test.tok.ja'
- '{DATA_OUT}/test.tok.en'
- '{DATA_IN}/dev.en'
- '{DATA_IN}/test.en'
out_files:
- '{DATA_OUT}/train.tok.norm.ja'
- '{DATA_OUT}/train.tok.norm.en'
- '{DATA_OUT}/dev.tok.norm.ja'
- '{DATA_OUT}/dev.tok.norm.en'
- '{DATA_OUT}/test.tok.norm.ja'
- '{DATA_OUT}/test.tok.norm.en'
- '{DATA_OUT}/dev.norm.en'
- '{DATA_OUT}/test.norm.en'
specs:
- filenum: all
normalizers:
- !NormalizerLower {}
- !PreprocFilter
in_files:
- '{DATA_OUT}/train.tok.norm.ja'
- '{DATA_OUT}/train.tok.norm.en'
out_files:
- '{DATA_OUT}/train.tok.norm.filter.ja'
- '{DATA_OUT}/train.tok.norm.filter.en'
specs:
- !SentenceFiltererLength
min_all: 1
max_all: 60
- !PreprocVocab
in_files:
- '{DATA_OUT}/train.tok.norm.ja'
- '{DATA_OUT}/train.tok.norm.en'
out_files:
- '{DATA_OUT}/train.vocab.ja'
- '{DATA_OUT}/train.vocab.en'
specs:
- filenum: all
filters:
- !VocabFiltererFreq
min_freq: 2
model: !DefaultTranslator
src_reader: !PlainTextReader
vocab: !Vocab
vocab_file: examples/preproc/train.vocab.ja
trg_reader: !PlainTextReader
vocab: !Vocab
vocab_file: examples/preproc/train.vocab.en
src_embedder: !SimpleWordEmbedder
emb_dim: 512
encoder: !BiLSTMSeqTransducer
layers: 1
attender: !MlpAttender
hidden_dim: 512
state_dim: 512
input_dim: 512
decoder: !AutoRegressiveDecoder
embedder: !SimpleWordEmbedder
emb_dim: 512
rnn: !UniLSTMSeqTransducer
layers: 1
transform: !AuxNonLinear
output_dim: 512
bridge: !NoBridge {}
inference: !AutoRegressiveInference
post_process: join-piece
train: !SimpleTrainingRegimen
run_for_epochs: 20
src_file: '{DATA_OUT}/dev.tok.norm.ja'
trg_file: '{DATA_OUT}/dev.tok.norm.en'
dev_tasks:
- !AccuracyEvalTask
eval_metrics: bleu
src_file: '{DATA_OUT}/dev.tok.norm.ja'
ref_file: '{DATA_OUT}/dev.norm.en'
hyp_file: examples/output/{EXP}.dev_hyp
- !LossEvalTask
src_file: '{DATA_OUT}/dev.tok.norm.ja'
ref_file: '{DATA_OUT}/dev.tok.norm.en'
evaluate:
- !AccuracyEvalTask
eval_metrics: bleu
src_file: '{DATA_OUT}/test.tok.norm.ja'
ref_file: '{DATA_OUT}/test.norm.en'
hyp_file: examples/output/{EXP}.test_hyp
Early stopping¶
# Early stopping is achieved by configuring SimpleTrainingRegimen, with the following options:
# - run_for_epochs
# - lr_decay
# - lr_decay_times
# - patience
# - initial_patience
# - dev_tasks (to configure the metric used to determine lr decay or early stopping)
!Experiment
name: minimal-early-stopping
model: !DefaultTranslator
src_reader: !PlainTextReader
vocab: !Vocab {vocab_file: examples/data/head.ja.vocab}
trg_reader: !PlainTextReader
vocab: !Vocab {vocab_file: examples/data/head.en.vocab}
train: !SimpleTrainingRegimen
run_for_epochs: 100 # maximum number of epochs, but might stop earlier depending on the following settings.
lr_decay: 0.5
lr_decay_times: 3
patience: 1
initial_patience: 2
dev_tasks: # the first metric (here: bleu) is used for checking whether LR should be decayed.
- !AccuracyEvalTask
eval_metrics: bleu,gleu
src_file: examples/data/head.ja
ref_file: examples/data/head.en
hyp_file: examples/output/{EXP}.test_hyp
- !LossEvalTask
src_file: examples/data/head.ja
ref_file: examples/data/head.en
src_file: examples/data/head.ja
trg_file: examples/data/head.en
evaluate:
- !AccuracyEvalTask
eval_metrics: bleu
src_file: examples/data/head.ja
ref_file: examples/data/head.en
hyp_file: examples/output/{EXP}.test_hyp
Fine-tuning¶
# Saving and loading models is a key feature demonstrated in this config file.
# This example shows how to load a trained model for fine tuning.
# pretrained model.
exp1-pretrain-model: !Experiment
exp_global: !ExpGlobal
# The model file contain the whole contents of this experiment in YAML
# format. Note that {EXP} expressions are left intact when saving.
default_layer_dim: 64
dropout: 0.3
weight_noise: 0.1
model: !DefaultTranslator
src_reader: !PlainTextReader
vocab: !Vocab {vocab_file: examples/data/head.ja.vocab}
trg_reader: !PlainTextReader
vocab: !Vocab {vocab_file: examples/data/head.en.vocab}
src_embedder: !SimpleWordEmbedder
emb_dim: 64
encoder: !BiLSTMSeqTransducer
layers: 2
input_dim: 64
attender: !MlpAttender
state_dim: 64
hidden_dim: 64
input_dim: 64
decoder: !AutoRegressiveDecoder
embedder: !SimpleWordEmbedder
emb_dim: 64
rnn: !UniLSTMSeqTransducer
layers: 1
transform: !AuxNonLinear
output_dim: 64
input_feeding: True
bridge: !CopyBridge {}
inference: !AutoRegressiveInference {}
train: !SimpleTrainingRegimen
run_for_epochs: 2
src_file: examples/data/head.ja
trg_file: examples/data/head.en
dev_tasks:
- !AccuracyEvalTask
eval_metrics: bleu
src_file: examples/data/head.ja
ref_file: examples/data/head.en
hyp_file: examples/output/{EXP}.dev_hyp
evaluate:
- !AccuracyEvalTask
eval_metrics: bleu
src_file: examples/data/head.ja
ref_file: examples/data/head.en
hyp_file: examples/output/{EXP}.test_hyp
exp2-finetune-model: !LoadSerialized
# This will load the contents of the above experiments that were saved to the
# YAML file specified after filename:
# This will carry out the exact same thing, except that {EXP} is resolved to
# a different value (making sure we don't overwrite the previous model),
# and except for the things explicitly overwritten in the overwrite: section.
# It's possible to change any settings as long as these don't change the number
# or nature of DyNet parameters allocated for the component.
filename: examples/models/exp1-pretrain-model.mod
path: ''
overwrite: # list of [path, value] pairs. Value can be scalar or an arbitrary object
- path: train.trainer
val: !AdamTrainer
alpha: 0.0002
- path: exp_global.dropout
val: 0.5
- path: train.dev_zero
val: True
- path: status
val: null
Beam search¶
# This example shows how to configure beam search, and how use the loading mechanism for the purpose of evaluating a
# model.
exp1-train-model: !Experiment
exp_global: !ExpGlobal
# The model file contain the whole contents of this experiment in YAML
# format. Note that {EXP} expressions are left intact when saving.
model_file: examples/output/{EXP}.mod
log_file: examples/output/{EXP}.log
default_layer_dim: 64
dropout: 0.5
weight_noise: 0.1
model: !DefaultTranslator
src_reader: !PlainTextReader
vocab: !Vocab {vocab_file: examples/data/head.ja.vocab}
trg_reader: !PlainTextReader
vocab: !Vocab {vocab_file: examples/data/head.en.vocab}
src_embedder: !SimpleWordEmbedder
emb_dim: 64
encoder: !BiLSTMSeqTransducer
layers: 2
input_dim: 64
attender: !MlpAttender
state_dim: 64
hidden_dim: 64
input_dim: 64
decoder: !AutoRegressiveDecoder
embedder: !SimpleWordEmbedder
emb_dim: 64
rnn: !UniLSTMSeqTransducer
layers: 1
transform: !AuxNonLinear
output_dim: 64
input_feeding: True
bridge: !CopyBridge {}
inference: !AutoRegressiveInference
search_strategy: !BeamSearch
beam_size: 5
len_norm: !PolynomialNormalization
apply_during_search: true
m: 0.8
train: !SimpleTrainingRegimen
run_for_epochs: 2
src_file: examples/data/head.ja
trg_file: examples/data/head.en
dev_tasks:
- !AccuracyEvalTask
eval_metrics: bleu
src_file: examples/data/head.ja
ref_file: examples/data/head.en
hyp_file: examples/output/{EXP}.dev_hyp
evaluate:
- !AccuracyEvalTask
eval_metrics: bleu
src_file: examples/data/head.ja
ref_file: examples/data/head.en
hyp_file: examples/output/{EXP}.test_hyp
exp2-eval-model: !LoadSerialized
filename: examples/output/exp1-train-model.mod
overwrite: # list of [path, value] pairs. Value can be scalar or an arbitrary object
- path: train # skip the training loop
val: null
- path: status
val: null
- path: model.inference.search_strategy.beam_size # try some new beam settings
val: 10
- path: evaluate
val: # (re-)define test data and other evaluation settings
- !AccuracyEvalTask
eval_metrics: bleu,gleu
src_file: examples/data/head.ja
ref_file: examples/data/head.en
hyp_file: examples/output/{EXP}.test_hyp
Programmatic usage¶
# It is also possible to configure model training using Python code rather than
# YAML config files. This is less convenient and usually not necessary, but there
# may be cases where the added flexibility is needed. This basically works by
# using XNMT as a library of components that are initialized and run in this
# config file.
#
# This demonstrates a standard model training, including set up of logging, model
# saving, etc.; models are saved into YAML files that can again be loaded using
# the standard YAML way (examples/07_load_finetune.yaml) or the Python way
# (10_programmatic_load.py)
#
# To launch this, use ``python -m examples.09_programmatic``, making sure that XNMT
# setup.py has been run properly.
import os
import random
import numpy as np
from xnmt.modelparts.attenders import MlpAttender
from xnmt.batchers import SrcBatcher, InOrderBatcher
from xnmt.modelparts.bridges import CopyBridge
from xnmt.modelparts.decoders import AutoRegressiveDecoder
from xnmt.modelparts.embedders import SimpleWordEmbedder
from xnmt.eval.tasks import LossEvalTask, AccuracyEvalTask
from xnmt.experiments import Experiment
from xnmt.inferences import AutoRegressiveInference
from xnmt.input_readers import PlainTextReader
from xnmt.transducers.recurrent import BiLSTMSeqTransducer, UniLSTMSeqTransducer
from xnmt.modelparts.transforms import AuxNonLinear
from xnmt.modelparts.scorers import Softmax
from xnmt.optimizers import AdamTrainer
from xnmt.param_collections import ParamManager
from xnmt.persistence import save_to_file
import xnmt.tee
from xnmt.train.regimens import SimpleTrainingRegimen
from xnmt.models.translators.default import DefaultTranslator
from xnmt.vocabs import Vocab
seed=13
random.seed(seed)
np.random.seed(seed)
EXP_DIR = os.path.dirname(__file__)
EXP = "programmatic"
model_file = f"{EXP_DIR}/models/{EXP}.mod"
log_file = f"{EXP_DIR}/logs/{EXP}.log"
xnmt.tee.set_out_file(log_file, EXP)
ParamManager.init_param_col()
ParamManager.param_col.model_file = model_file
src_vocab = Vocab(vocab_file="examples/data/head.ja.vocab")
trg_vocab = Vocab(vocab_file="examples/data/head.en.vocab")
batcher = SrcBatcher(batch_size=64)
inference = AutoRegressiveInference(batcher=InOrderBatcher(batch_size=1))
layer_dim = 512
model = DefaultTranslator(
src_reader=PlainTextReader(vocab=src_vocab),
trg_reader=PlainTextReader(vocab=trg_vocab),
src_embedder=SimpleWordEmbedder(emb_dim=layer_dim, vocab_size=len(src_vocab)),
encoder=BiLSTMSeqTransducer(input_dim=layer_dim, hidden_dim=layer_dim, layers=1),
attender=MlpAttender(hidden_dim=layer_dim, state_dim=layer_dim, input_dim=layer_dim),
decoder=AutoRegressiveDecoder(input_dim=layer_dim,
embedder=SimpleWordEmbedder(emb_dim=layer_dim, vocab_size=len(trg_vocab)),
rnn=UniLSTMSeqTransducer(input_dim=layer_dim, hidden_dim=layer_dim,
decoder_input_dim=layer_dim, yaml_path="decoder"),
transform=AuxNonLinear(input_dim=layer_dim, output_dim=layer_dim,
aux_input_dim=layer_dim),
scorer=Softmax(vocab_size=len(trg_vocab), input_dim=layer_dim),
bridge=CopyBridge(dec_dim=layer_dim, dec_layers=1)),
inference=inference
)
train = SimpleTrainingRegimen(
name=f"{EXP}",
model=model,
batcher=batcher,
trainer=AdamTrainer(alpha=0.001),
run_for_epochs=2,
src_file="examples/data/head.ja",
trg_file="examples/data/head.en",
dev_tasks=[LossEvalTask(src_file="examples/data/head.ja",
ref_file="examples/data/head.en",
model=model,
batcher=batcher)],
)
evaluate = [AccuracyEvalTask(eval_metrics="bleu,wer",
src_file="examples/data/head.ja",
ref_file="examples/data/head.en",
hyp_file=f"examples/output/{EXP}.test_hyp",
inference=inference,
model=model)]
standard_experiment = Experiment(
name="programmatic",
model=model,
train=train,
evaluate=evaluate
)
# run experiment
standard_experiment(save_fct=lambda: save_to_file(model_file, standard_experiment))
exit()
Programmatic loading¶
# This demonstrates how to load the model trained using ``09_programmatic.py``
# the programmatic way and for the purpose of evaluating the model.
import os
import xnmt.tee
from xnmt.param_collections import ParamManager
from xnmt.persistence import initialize_if_needed, YamlPreloader, LoadSerialized, save_to_file
EXP_DIR = os.path.dirname(__file__)
EXP = "programmatic-load"
model_file = f"{EXP_DIR}/models/{EXP}.mod"
log_file = f"{EXP_DIR}/logs/{EXP}.log"
xnmt.tee.set_out_file(log_file, EXP)
ParamManager.init_param_col()
load_experiment = LoadSerialized(
filename=f"{EXP_DIR}/models/programmatic.mod",
overwrite=[
{"path" : "train", "val" : None},
{"path": "status", "val": None},
]
)
uninitialized_experiment = YamlPreloader.preload_obj(load_experiment, exp_dir=EXP_DIR, exp_name=EXP)
loaded_experiment = initialize_if_needed(uninitialized_experiment)
# if we were to continue training, we would need to set a save model file like this:
# ParamManager.param_col.model_file = model_file
ParamManager.populate()
# run experiment
loaded_experiment(save_fct=lambda: None)
Parameter sharing¶
# This illustrates component and parameter sharing. This is useful for making
# config files less verbose, and more importantly makes it possible to realize
# weight-sharing between components, which will also be demonstrated in the
# multi-task example later.
#
# There are 2 ways to achieve sharing:
# - YAML's anchor system where '&' denotes a named anchor, '*' denotes a reference to an anchor.
# This essentially copies values or subcomponents from one place to another.
# It can be combined with the << operator that allows copying parts of a dictionary, but overwriting other parts.
# More info is found here: http://yaml.readthedocs.io/en/latest/example.html
# - XNMT's !Ref object creates a reference, meaning both places will point to the exact same Python object,
# and that DyNet parameters will be shared.
# References can be made by path or by name, as illustrated below. The name refers to a _xnmt_id that can
# be set in any component and must be unique.
# Note that references do not work across experiments (e.g. we cannot refer to exp2.load from within exp1.pretrain)
exp1.pretrain: !Experiment
exp_global: !ExpGlobal
default_layer_dim: 32
model_file: 'examples/output/{EXP}.mod'
model: !DefaultTranslator
src_reader: !PlainTextReader
vocab: !Vocab {vocab_file: examples/data/head.ja.vocab}
trg_reader: !PlainTextReader
vocab: !Vocab {vocab_file: examples/data/head.en.vocab}
src_embedder: !SimpleWordEmbedder
emb_dim: 32
encoder: !BiLSTMSeqTransducer
layers: 1
attender: !MlpAttender {}
# reference-sharing between softmax projection and target embedder. This means both layers share DyNet parameters!
decoder: !AutoRegressiveDecoder
embedder: !DenseWordEmbedder
_xnmt_id: trg_emb # this id must be unique and is needed to create a reference-by-name below.
emb_dim: 32
rnn: !UniLSTMSeqTransducer
layers: 1
scorer: !Softmax
output_projector: !Ref { name: trg_emb }
# alternatively, the same could be achieved like this,
# in which case model.decoder.embedder._xnmt_id is not required:
# !Ref { path: model.decoder.embedder }
bridge: !CopyBridge {}
inference: !AutoRegressiveInference {}
train: !SimpleTrainingRegimen
run_for_epochs: 2
src_file: examples/data/head.ja
trg_file: examples/data/head.en
dev_tasks:
- !LossEvalTask
src_file: &dev_src examples/data/head.ja # value-sharing between train.training_corpus.dev_src and inference.src_file
ref_file: &dev_trg examples/data/head.en # value-sharing between train.training_corpus.dev_trg and evaluate.ref_file
evaluate:
- !AccuracyEvalTask
eval_metrics: bleu
src_file: *dev_src # Copy over the file path from the dev tasks using YAML anchors.
ref_file: *dev_trg # The same could also be done for more complex objects.
hyp_file: examples/output/{EXP}.test_hyp
exp2.load: !LoadSerialized
filename: examples/output/exp1.pretrain.mod
Multi-task¶
# XNMT offers a very flexible way of multi-task training by specifying multiple
# models and using the !Ref mechanism for weight sharing, as demonstrated
# in this config file.
# The possible multi-task training strategies can be looked up in
# xnmt/regimens.py and include same-batch, alternating-batch, and serial
# strategies.
exp1-multi_task: !Experiment
exp_global: !ExpGlobal
model_file: examples/output/{EXP}.mod
log_file: examples/output/{EXP}.log
default_layer_dim: 64
train: !SameBatchMultiTaskTrainingRegimen
trainer: !AdamTrainer {}
n_task_steps: [2,1]
tasks:
- !SimpleTrainingTask # first task is the main task: it will control early stopping, learning rate schedule, model checkpoints, ..
name: first_task
run_for_epochs: 6
batcher: !SrcBatcher
batch_size: 6
src_file: examples/data/head.ja
trg_file: examples/data/head.en
model: !DefaultTranslator
_xnmt_id: first_task_model
src_reader: !PlainTextReader
vocab: !Vocab
_xnmt_id: src_vocab
vocab_file: examples/data/head.ja.vocab
trg_reader: !PlainTextReader
vocab: !Vocab
_xnmt_id: trg_vocab
vocab_file: examples/data/head.en.vocab
src_embedder: !SimpleWordEmbedder
emb_dim: 64
vocab: !Ref {name: src_vocab}
encoder: !BiLSTMSeqTransducer # the encoder shares parameters between tasks
_xnmt_id: first_task_encoder
layers: 1
attender: !MlpAttender
state_dim: 64
hidden_dim: 64
input_dim: 64
decoder: !AutoRegressiveDecoder
embedder: !SimpleWordEmbedder
emb_dim: 64
vocab: !Ref {name: trg_vocab}
rnn: !UniLSTMSeqTransducer
layers: 1
hidden_dim: 64
bridge: !CopyBridge {}
scorer: !Softmax
vocab: !Ref {name: trg_vocab}
dev_tasks:
- !AccuracyEvalTask
model: !Ref { name: first_task_model }
src_file: &first_task_dev_src examples/data/head.ja # value-sharing between first task dev and final eval
ref_file: &first_task_dev_trg examples/data/head.en # value-sharing between first task dev and final eval
hyp_file: examples/output/{EXP}.first_dev_hyp
eval_metrics: bleu # tasks can specify different dev_metrics
- !SimpleTrainingTask
name: second_task
batcher: !SrcBatcher
batch_size: 6
src_file: examples/data/head.ja
trg_file: examples/data/head.en
model: !DefaultTranslator
_xnmt_id: second_task_model
src_reader: !PlainTextReader
vocab: !Ref {name: src_vocab}
trg_reader: !PlainTextReader
vocab: !Ref {name: trg_vocab}
src_embedder: !SimpleWordEmbedder
emb_dim: 64
vocab: !Ref {name: src_vocab}
encoder: !Ref { name: first_task_encoder }
attender: !MlpAttender
state_dim: 64
hidden_dim: 64
input_dim: 64
decoder: !AutoRegressiveDecoder
embedder: !SimpleWordEmbedder
emb_dim: 64
vocab: !Ref {name: trg_vocab}
bridge: !CopyBridge {}
scorer: !Softmax
vocab: !Ref {name: trg_vocab}
dev_tasks:
- !AccuracyEvalTask
model: !Ref { name: second_task_model }
src_file: examples/data/head.ja
ref_file: examples/data/head.en
hyp_file: examples/output/{EXP}.second_dev_hyp
eval_metrics: gleu # tasks can specify different dev_metrics
evaluate:
- !AccuracyEvalTask
model: !Ref { name: first_task_model }
eval_metrics: bleu
src_file: *first_task_dev_src
ref_file: *first_task_dev_trg
hyp_file: examples/output/{EXP}.test_hyp
exp2-finetune-model: !LoadSerialized
filename: examples/output/exp1-multi_task.mod
Speech¶
# This config file demonstrates how to specify a speech recognition model
# using the Listen-Attend-Spell architecture: https://arxiv.org/pdf/1508.01211.pdf
# Compared to the conventional attentional model, we remove input embeddings,
# instead directly read in a feature vector the pyramidal LSTM reduces length of
# the input sequence by a factor of 2 per layer (except for the first layer).
# Output units should be characters according to the paper.
!Experiment
name: speech
exp_global: !ExpGlobal
save_num_checkpoints: 2
default_layer_dim: 32
dropout: 0.4
preproc: !PreprocRunner
overwrite: False
tasks:
- !PreprocExtract
in_files:
- examples/data/LDC94S13A.yaml
out_files:
- examples/data/LDC94S13A.h5
specs: !MelFiltExtractor {}
model: !DefaultTranslator
src_embedder: !NoopEmbedder
emb_dim: 40
encoder: !PyramidalLSTMSeqTransducer
layers: 3
downsampling_method: concat
reduce_factor: 2
input_dim: 40
hidden_dim: 64
attender: !MlpAttender
state_dim: 64
hidden_dim: 64
input_dim: 64
decoder: !AutoRegressiveDecoder
embedder: !SimpleWordEmbedder
emb_dim: 64
rnn: !UniLSTMSeqTransducer
layers: 1
transform: !AuxNonLinear
output_dim: 64
bridge: !CopyBridge {}
src_reader: !H5Reader
transpose: True
trg_reader: !PlainTextReader
vocab: !Vocab {vocab_file: examples/data/char.vocab}
output_proc: join-char
train: !SimpleTrainingRegimen
run_for_epochs: 1
batcher: !SrcBatcher
pad_src_to_multiple: 4
batch_size: 3
trainer: !AdamTrainer {}
src_file: examples/data/LDC94S13A.h5
trg_file: examples/data/LDC94S13A.char
dev_tasks:
- !LossEvalTask
src_file: examples/data/LDC94S13A.h5
ref_file: examples/data/LDC94S13A.char
- !AccuracyEvalTask
eval_metrics: cer,wer
src_file: examples/data/LDC94S13A.h5
ref_file: examples/data/LDC94S13A.char
hyp_file: examples/output/{EXP}.dev_hyp
inference: !AutoRegressiveInference
batcher: !InOrderBatcher
_xnmt_id: inference_batcher
pad_src_to_multiple: 4
batch_size: 1
evaluate:
- !AccuracyEvalTask
eval_metrics: cer,wer
src_file: examples/data/LDC94S13A.h5
ref_file: examples/data/LDC94S13A.words
hyp_file: examples/output/{EXP}.test_hyp
inference: !AutoRegressiveInference
batcher: !Ref { name: inference_batcher }
Reporting attention matrices¶
# XNMT supports writing out reports, such as attention matrices generated during inference or difference highlighting
# between outputs and references.
# These are generally created by setting exp_global.compute_report to True, and adding one or several reporters
# to the inference class.
!Experiment
name: report
model: !DefaultTranslator
src_reader: !PlainTextReader
vocab: !Vocab {vocab_file: examples/data/head.ja.vocab}
trg_reader: !PlainTextReader
vocab: !Vocab {vocab_file: examples/data/head.en.vocab}
train: !SimpleTrainingRegimen
run_for_epochs: 2
src_file: examples/data/head.ja
trg_file: examples/data/head.en
dev_tasks:
- !LossEvalTask
src_file: examples/data/head.ja
ref_file: examples/data/head.en
train: !SimpleTrainingRegimen
run_for_epochs: 0
src_file: examples/data/head.ja
trg_file: examples/data/head.en
evaluate:
- !AccuracyEvalTask
eval_metrics: bleu
src_file: examples/data/head.ja
ref_file: examples/data/head.en
hyp_file: examples/output/{EXP}.test_hyp
inference: !AutoRegressiveInference
reporter:
- !AttentionReporter {} # plot attentions
- !ReferenceDiffReporter {} # difference highlighting
- !CompareMtReporter {} # analyze MT outputs
- !OOVStatisticsReporter # report on recovered OOVs, fantasized new words, etc.
train_trg_file: examples/data/head.en
Scoring N-best lists¶
# Using a trained model to add hypothesis score for an nbest list
# First, exp1-model trains a model which is saved at examples/output/exp1-model.mod
# Then, exp2-score loads the exp1-model, and use it to score an nbest list
# The nbest list example used here is located at examples/data/head.nbest.en
# exp2-score outputs a new nbest list with hypothesis score.
# The output file will be in examples/output/exp2-score.test_hyp
exp1-model: !Experiment
exp_global: !ExpGlobal
model_file: examples/output/{EXP}.mod
log_file: examples/output/{EXP}.log
default_layer_dim: 64
dropout: 0.5
weight_noise: 0.1
model: !DefaultTranslator
src_reader: !PlainTextReader
vocab: !Vocab {vocab_file: examples/data/head.ja.vocab}
trg_reader: !PlainTextReader
vocab: !Vocab {vocab_file: examples/data/head.en.vocab}
src_embedder: !SimpleWordEmbedder
emb_dim: 64
encoder: !BiLSTMSeqTransducer
layers: 2
input_dim: 64
attender: !MlpAttender
state_dim: 64
hidden_dim: 64
input_dim: 64
decoder: !AutoRegressiveDecoder
embedder: !SimpleWordEmbedder
emb_dim: 64
rnn: !UniLSTMSeqTransducer
layers: 1
transform: !AuxNonLinear
output_dim: 64
input_feeding: True
bridge: !CopyBridge {}
inference: !AutoRegressiveInference {}
train: !SimpleTrainingRegimen
run_for_epochs: 2
src_file: examples/data/head.ja
trg_file: examples/data/head.en
dev_tasks:
- !AccuracyEvalTask
eval_metrics: bleu
src_file: examples/data/head.ja
ref_file: examples/data/head.en
hyp_file: examples/output/{EXP}.dev_hyp
evaluate:
- !AccuracyEvalTask
eval_metrics: bleu
src_file: examples/data/head.ja
ref_file: examples/data/head.en
hyp_file: examples/output/{EXP}.test_hyp
exp2-score: !LoadSerialized
filename: examples/output/exp1-model.mod
overwrite:
- path: train
val: ~
- path: model.inference
val: !AutoRegressiveInference
mode: score
ref_file: examples/data/head.nbest.en
src_file: examples/data/head.ja
- path: evaluate.0
val: !AccuracyEvalTask
src_file: examples/data/head.ja
ref_file: examples/data/head.nbest.en
hyp_file: examples/output/{EXP}.test_hyp
Ensembling¶
# This example shows different ways to perform model ensembling
# First, let's a define a simple experiment with a single model
exp1-single: !Experiment
exp_global: &globals !ExpGlobal
model_file: examples/output/{EXP}.mod
log_file: examples/output/{EXP}.log
default_layer_dim: 32
# Just use default model settings here
model: &model1 !DefaultTranslator
src_reader: &src_reader !PlainTextReader
vocab: !Vocab {vocab_file: examples/data/head.ja.vocab}
trg_reader: &trg_reader !PlainTextReader
vocab: !Vocab {vocab_file: examples/data/head.en.vocab}
train: &train !SimpleTrainingRegimen
run_for_epochs: 2
src_file: examples/data/head.ja
trg_file: examples/data/head.en
dev_tasks:
- !LossEvalTask
src_file: examples/data/head.ja
ref_file: examples/data/head.en
# Another single model, but with a different number of layers and some other
# different settings
exp2-single: !Experiment
exp_global: *globals
model: &model2 !DefaultTranslator
src_reader: *src_reader
trg_reader: *trg_reader
encoder: !BiLSTMSeqTransducer
layers: 3
hidden_dim: 64
decoder: !AutoRegressiveDecoder
embedder: !DenseWordEmbedder
_xnmt_id: dense_embed
emb_dim: 64
rnn: !UniLSTMSeqTransducer
hidden_dim: 64
transform: !AuxNonLinear
output_dim: 64
scorer: !Softmax
output_projector: !Ref {name: dense_embed}
train: *train
# Load the previously trained models and combine them to an ensemble
exp3-ensemble-load: !Experiment
exp_global: *globals
model: !EnsembleTranslator
src_reader: !Ref {path: model.models.0.src_reader}
trg_reader: !Ref {path: model.models.0.trg_reader}
models:
- !LoadSerialized
filename: 'examples/output/exp1-single.mod'
path: model
- !LoadSerialized
filename: 'examples/output/exp2-single.mod'
path: model
evaluate:
- !AccuracyEvalTask
eval_metrics: bleu,wer
src_file: examples/data/head.ja
ref_file: examples/data/head.en
hyp_file: examples/output/{EXP}.test_hyp
# Alternatively, we can also hook up the models during training time already
exp4-ensemble-train: !Experiment
exp_global: *globals
model: !EnsembleTranslator
src_reader: *src_reader
trg_reader: *trg_reader
models:
- *model1
- *model2
train: *train
evaluate:
- !AccuracyEvalTask
eval_metrics: bleu,wer
src_file: examples/data/head.ja
ref_file: examples/data/head.en
hyp_file: examples/output/{EXP}.test_hyp
Minimum risk training¶
# Saving and loading models is a key feature demonstrated in this config file.
# This example shows how to load a trained model for fine tuning.
# pretrained model.
exp1-pretrain-model: !Experiment
exp_global: !ExpGlobal
# The model file contain the whole contents of this experiment in YAML
# format. Note that {EXP} expressions are left intact when saving.
default_layer_dim: 64
dropout: 0.3
weight_noise: 0.1
model: !DefaultTranslator
src_reader: !PlainTextReader
vocab: !Vocab {vocab_file: examples/data/head.ja.vocab}
trg_reader: !PlainTextReader
vocab: !Vocab {vocab_file: examples/data/head.en.vocab}
src_embedder: !SimpleWordEmbedder
emb_dim: 64
encoder: !BiLSTMSeqTransducer
layers: 2
input_dim: 64
attender: !MlpAttender
state_dim: 64
hidden_dim: 64
input_dim: 64
decoder: !AutoRegressiveDecoder
embedder: !SimpleWordEmbedder
emb_dim: 64
rnn: !UniLSTMSeqTransducer
layers: 1
transform: !AuxNonLinear
output_dim: 64
input_feeding: True
bridge: !CopyBridge {}
inference: !AutoRegressiveInference {}
train: !SimpleTrainingRegimen
run_for_epochs: 2
src_file: examples/data/head.ja
trg_file: examples/data/head.en
dev_tasks:
- !AccuracyEvalTask
eval_metrics: bleu
src_file: examples/data/head.ja
ref_file: examples/data/head.en
hyp_file: examples/output/{EXP}.dev_hyp
evaluate:
- !AccuracyEvalTask
eval_metrics: bleu
src_file: examples/data/head.ja
ref_file: examples/data/head.en
hyp_file: examples/output/{EXP}.test_hyp
exp2-finetune-minrisk: !LoadSerialized
# This will perform minimum risk training with SamplingSearch.
# Same as above, the pretrained model will be loaded and an appropriate search_strategy
# will be used during minimum risk training.
filename: examples/models/exp1-pretrain-model.mod
path: ''
overwrite:
- path: train.loss_calculator
val: !MinRiskLoss
alpha: 0.005
- path: model.inference.search_strategy
val: !SamplingSearch
sample_size: 10
max_len: 50
- path: train.run_for_epochs
val: 1
Biased Lexicon¶
(this is currently broken)
lexbias: !Experiment # 'standard' is the name given to the experiment
exp_global: !ExpGlobal
model_file: '{EXP_DIR}/models/{EXP}.mod'
log_file: '{EXP_DIR}/logs/{EXP}.log'
default_layer_dim: 512
dropout: 0.3
# model architecture
model: !DefaultTranslator
src_reader: !PlainTextReader
vocab: !Vocab {vocab_file: examples/data/head.en.vocab}
trg_reader: !PlainTextReader
vocab: !Vocab {vocab_file: examples/data/head.ja.vocab}
src_embedder: !SimpleWordEmbedder
emb_dim: 512
encoder: !BiLSTMSeqTransducer
layers: 1
attender: !MlpAttender
hidden_dim: 512
state_dim: 512
input_dim: 512
decoder: !AutoRegressiveDecoder
embedder: !SimpleWordEmbedder
emb_dim: 512
rnn: !UniLSTMSeqTransducer
layers: 1
transform: !AuxNonLinear
output_dim: 512
bridge: !CopyBridge {}
scorer: !LexiconSoftmax
lexicon_file: examples/data/head-ja_given_en.lex
# can choose between bias/linear
lexicon_type: bias
# The small epsilon value to be added to the bias
lexicon_alpha: 0.001
# training parameters
train: !SimpleTrainingRegimen
batcher: !SrcBatcher
batch_size: 32
trainer: !AdamTrainer
alpha: 0.001
run_for_epochs: 2
src_file: examples/data/head.en
trg_file: examples/data/head.ja
dev_tasks:
- !LossEvalTask
src_file: examples/data/head.en
ref_file: examples/data/head.ja
# final evaluation
evaluate:
- !AccuracyEvalTask
eval_metrics: bleu
src_file: examples/data/head.en
ref_file: examples/data/head.ja
hyp_file: examples/output/{EXP}.test_hyp
Subword Sampling¶
# Sampling subword units for subword regularization
# Note that this requires 'sentencepiece' as an extra dependency
!Experiment
name: subword_sample
exp_global: !ExpGlobal
model_file: '{EXP_DIR}/models/{EXP}.mod'
log_file: '{EXP_DIR}/logs/{EXP}.log'
default_layer_dim: 512
dropout: 0.3
model: !DefaultTranslator
# Here we set the sample_train and alpha parameters to turn on sampling
src_reader: !SentencePieceTextReader
sample_train: True
alpha: 0.1
vocab: !Vocab
vocab_file: examples/data/big-ja.vocab
sentencepiece_vocab: True
model_file: examples/data/big-ja.model
trg_reader: !SentencePieceTextReader
sample_train: True
alpha: 0.1
vocab: !Vocab
vocab_file: examples/data/big-en.vocab
sentencepiece_vocab: True
model_file: examples/data/big-en.model
src_embedder: !SimpleWordEmbedder
emb_dim: 512
encoder: !BiLSTMSeqTransducer
layers: 1
attender: !MlpAttender
hidden_dim: 512
state_dim: 512
input_dim: 512
decoder: !AutoRegressiveDecoder
embedder: !SimpleWordEmbedder
emb_dim: 512
rnn: !UniLSTMSeqTransducer
layers: 1
transform: !AuxNonLinear
output_dim: 512
activation: 'tanh'
bridge: !CopyBridge {}
inference: !AutoRegressiveInference
post_process: join-piece
# training parameters
train: !SimpleTrainingRegimen
batcher: !SrcBatcher
batch_size: 32
trainer: !AdamTrainer
alpha: 0.001
run_for_epochs: 20
src_file: examples/data/head.ja
trg_file: examples/data/head.en
dev_tasks:
- !LossEvalTask
src_file: examples/data/head.ja
ref_file: examples/data/head.en
# final evaluation
evaluate:
- !AccuracyEvalTask
eval_metrics: bleu
src_file: examples/data/head.ja
ref_file: examples/data/head.en
hyp_file: examples/output/{EXP}.test_hyp
Self Attention¶
# A setup using self-attention
!Experiment
name: self_attention
exp_global: !ExpGlobal
model_file: '{EXP_DIR}/models/{EXP}.mod'
log_file: '{EXP_DIR}/logs/{EXP}.log'
default_layer_dim: 512
dropout: 0.3
placeholders:
DATA_IN: examples/data
DATA_OUT: examples/preproc
preproc: !PreprocRunner
overwrite: False
tasks:
- !PreprocVocab
in_files:
- '{DATA_IN}/train.ja'
- '{DATA_IN}/train.en'
out_files:
- '{DATA_OUT}/train.ja.vocab'
- '{DATA_OUT}/train.en.vocab'
specs:
- filenum: all
filters:
- !VocabFiltererFreq
min_freq: 2
model: !DefaultTranslator
src_reader: !PlainTextReader
vocab: !Vocab {vocab_file: '{DATA_OUT}/train.ja.vocab'}
trg_reader: !PlainTextReader
vocab: !Vocab {vocab_file: '{DATA_OUT}/train.en.vocab'}
src_embedder: !SimpleWordEmbedder
emb_dim: 512
encoder: !ModularSeqTransducer
modules:
- !PositionalSeqTransducer
input_dim: 512
max_pos: 100
dropout: 0.1
- !ModularSeqTransducer
modules: !Repeat
times: 2
content: !ModularSeqTransducer
modules:
- !ResidualSeqTransducer
input_dim: 512
child: !MultiHeadAttentionSeqTransducer
num_heads: 8
dropout: 0.1
layer_norm: True
dropout: 0.1
- !ResidualSeqTransducer
input_dim: 512
child: !TransformSeqTransducer
transform: !MLP
activation: relu
layer_norm: True
dropout: 0.1
attender: !MlpAttender
hidden_dim: 512
state_dim: 512
input_dim: 512
decoder: !AutoRegressiveDecoder
embedder: !SimpleWordEmbedder
emb_dim: 512
rnn: !UniLSTMSeqTransducer
layers: 1
transform: !AuxNonLinear
output_dim: 512
activation: 'tanh'
bridge: !CopyBridge {}
train: !SimpleTrainingRegimen
batcher: !SrcBatcher
batch_size: 32
trainer: !NoamTrainer
alpha: 1.0
warmup_steps: 4000
run_for_epochs: 2
src_file: examples/data/train.ja
trg_file: examples/data/train.en
dev_tasks:
- !LossEvalTask
src_file: examples/data/head.ja
ref_file: examples/data/head.en
evaluate:
- !AccuracyEvalTask
eval_metrics: bleu
src_file: examples/data/head.ja
ref_file: examples/data/head.en
hyp_file: examples/output/{EXP}.test_hyp
Char Segment¶
# Examples of using SegmentingSeqTransducer
# Look available composition functions at xnmt/specialized_encoders/segmenting_encoder/segmenting_composer.py
# Looking up characters from word vocabulary
# Basically this is the same as 01_standard.yaml
seg_lookup: !Experiment
exp_global: !ExpGlobal {}
model: !DefaultTranslator
src_reader: !CharFromWordTextReader
# Can be produced by script/vocab/make_vocab.py --char_vocab < [CORPUS]
vocab: !Vocab {vocab_file: examples/data/head.ja.charvocab}
trg_reader: !PlainTextReader
vocab: !Vocab {vocab_file: examples/data/head.en.vocab}
# It reads in characters and produce word embeddings
encoder: !SegmentingSeqTransducer
segment_composer: !LookupComposer
word_vocab: !Vocab {vocab_file: examples/data/head.ja.vocab}
final_transducer: !BiLSTMSeqTransducer {}
train: !SimpleTrainingRegimen
run_for_epochs: 1
src_file: examples/data/head.ja
trg_file: examples/data/head.en
evaluate:
- !AccuracyEvalTask
eval_metrics: bleu,wer
src_file: examples/data/head.ja
ref_file: examples/data/head.en
hyp_file: test/tmp/{EXP}.test_hyp
inference: !AutoRegressiveInference {}
# Summing together character composition functions.
seg_sum: !Experiment
exp_global: !ExpGlobal {}
model: !DefaultTranslator
src_reader: !CharFromWordTextReader
vocab: !Vocab {vocab_file: examples/data/head.ja.charvocab}
trg_reader: !PlainTextReader
vocab: !Vocab {vocab_file: examples/data/head.en.vocab}
encoder: !SegmentingSeqTransducer
### Pay attention to this part
segment_composer: !SumComposer {}
###
final_transducer: !BiLSTMSeqTransducer {}
train: !SimpleTrainingRegimen
run_for_epochs: 1
src_file: examples/data/head.ja
trg_file: examples/data/head.en
evaluate:
- !AccuracyEvalTask
eval_metrics: bleu,wer
src_file: examples/data/head.ja
ref_file: examples/data/head.en
hyp_file: test/tmp/{EXP}.test_hyp
inference: !AutoRegressiveInference {}
# Using BiLSTM to predict word embeddings.
seg_bilstm: !Experiment
exp_global: !ExpGlobal {}
model: !DefaultTranslator
src_reader: !CharFromWordTextReader
vocab: !Vocab {vocab_file: examples/data/head.ja.charvocab}
trg_reader: !PlainTextReader
vocab: !Vocab {vocab_file: examples/data/head.en.vocab}
encoder: !SegmentingSeqTransducer
### Pay attention to this part
segment_composer: !SeqTransducerComposer
seq_transducer: !BiLSTMSeqTransducer {}
###
final_transducer: !BiLSTMSeqTransducer {}
train: !SimpleTrainingRegimen
run_for_epochs: 1
src_file: examples/data/head.ja
trg_file: examples/data/head.en
evaluate:
- !AccuracyEvalTask
eval_metrics: bleu,wer
src_file: examples/data/head.ja
ref_file: examples/data/head.en
hyp_file: test/tmp/{EXP}.test_hyp
inference: !AutoRegressiveInference {}
# Using CHARAGRAM composition function
seg_charagram: !Experiment
exp_global: !ExpGlobal {}
model: !DefaultTranslator
src_reader: !CharFromWordTextReader
vocab: !Vocab {vocab_file: examples/data/head.ja.charvocab}
trg_reader: !PlainTextReader
vocab: !Vocab {vocab_file: examples/data/head.en.vocab}
encoder: !SegmentingSeqTransducer
### Pay attention to this part
segment_composer: !CharNGramComposer
ngram_size: 4
word_vocab: !Vocab {vocab_file: examples/data/head.ngramcount.ja}
###
final_transducer: !BiLSTMSeqTransducer {}
train: !SimpleTrainingRegimen
run_for_epochs: 1
src_file: examples/data/head.ja
trg_file: examples/data/head.en
evaluate:
- !AccuracyEvalTask
eval_metrics: bleu,wer
src_file: examples/data/head.ja
ref_file: examples/data/head.en
hyp_file: test/tmp/{EXP}.test_hyp
inference: !AutoRegressiveInference {}
# Using Composition of CHARAGRAM and Lookup
seg_lookup_charagram: !Experiment
exp_global: !ExpGlobal {}
model: !DefaultTranslator
src_reader: !CharFromWordTextReader
vocab: !Vocab {vocab_file: examples/data/head.ja.charvocab}
trg_reader: !PlainTextReader
vocab: !Vocab {vocab_file: examples/data/head.en.vocab}
encoder: !SegmentingSeqTransducer
### Pay attention to this part
segment_composer: !SumMultipleComposer
composers:
- !LookupComposer
word_vocab: !Vocab {vocab_file: examples/data/head.ja.vocab}
- !CharNGramComposer
ngram_size: 4
word_vocab: !Vocab {vocab_file: examples/data/head.ngramcount.ja}
###
final_transducer: !BiLSTMSeqTransducer {}
train: !SimpleTrainingRegimen
run_for_epochs: 1
src_file: examples/data/head.ja
trg_file: examples/data/head.en
evaluate:
- !AccuracyEvalTask
eval_metrics: bleu,wer
src_file: examples/data/head.ja
ref_file: examples/data/head.en
hyp_file: test/tmp/{EXP}.test_hyp
inference: !AutoRegressiveInference {}
Switchout¶
# Implements SwitchOut, a data augmentation strategy for NMT
# RAML corrupts target side only, while SwitchOut corrupts both source and target
# https://arxiv.org/pdf/1808.07512.pdf
switchout: !Experiment
# global parameters shared throughout the experiment
exp_global: !ExpGlobal
# {EXP_DIR} is a placeholder for the directory in which the config file lies.
# {EXP} is a placeholder for the experiment name (here: 'standard')
model_file: '{EXP_DIR}/models/{EXP}.mod'
log_file: '{EXP_DIR}/logs/{EXP}.log'
default_layer_dim: 512
dropout: 0.3
# model architecture
model: !DefaultTranslator
src_reader: !RamlTextReader
vocab: !Vocab {vocab_file: examples/data/head.ja.vocab}
tau: 0.8
trg_reader: !RamlTextReader
vocab: !Vocab {vocab_file: examples/data/head.en.vocab}
tau: 0.8
src_embedder: !SimpleWordEmbedder
emb_dim: 512
encoder: !BiLSTMSeqTransducer
layers: 1
attender: !MlpAttender
hidden_dim: 512
state_dim: 512
input_dim: 512
decoder: !AutoRegressiveDecoder
embedder: !SimpleWordEmbedder
emb_dim: 512
rnn: !UniLSTMSeqTransducer
layers: 1
transform: !AuxNonLinear
output_dim: 512
activation: 'tanh'
bridge: !CopyBridge {}
scorer: !Softmax {}
# training parameters
train: !SimpleTrainingRegimen
batcher: !SrcBatcher
batch_size: 32
trainer: !AdamTrainer
alpha: 0.001
run_for_epochs: 2
src_file: examples/data/head.ja
trg_file: examples/data/head.en
dev_tasks:
- !LossEvalTask
src_file: examples/data/head.ja
ref_file: examples/data/head.en
# final evaluation
evaluate:
- !AccuracyEvalTask
eval_metrics: bleu
src_file: examples/data/head.ja
ref_file: examples/data/head.en
hyp_file: examples/output/{EXP}.test_hyp