Experiment configuration file format¶

Intro¶

Configuration files are in YAML format.

At the top-level, a config file consists of a dictionary where keys are experiment names and values are the experiment specifications. By default, all experiments are run in lexicographical ordering, but xnmt_run_experiments can also be told to run only a selection of the specified experiments. An example template with 2 experiments looks like this

exp1: !Experiment
  exp_global: ...
  preproc: ...
  model: ...
  train: ...
  evaluate: ...
exp2: !Experiment
  exp_global: ...
  preproc: ...
  model: ...
  train: ...
  evaluate: ...

!Experiment is YAML syntax specifying a Python object of the same name, and its parameters will be passed on to the Python constructor. There can be a special top-level entry named defaults; this experiment will never be run, but can be used as a template where components are partially shared using YAML anchors or the !Ref mechanism (more on this later).

The usage of exp_global, preproc, model, train, evaluate are explained below. Not all of them need to be specified, depending on the use case.

Experiment¶

This specifies settings that are global to this experiment. An example

exp_global: !ExpGlobal
  model_file: '{EXP_DIR}/models/{EXP}.mod'
  log_file: '{EXP_DIR}/logs/{EXP}.log'
  default_layer_dim: 512
  dropout: 0.3

Not that for any strings used here or anywhere in the config file {EXP} will be over-written by the name of the experiment, {EXP_DIR} will be overwritten by the directory the config file lies in, {PID} by the process id, and {GIT_REV} by the current git revision.

To obtain a full list of allowed parameters, please check the documentation for ExpGlobal.

Preprocessing¶

xnmt supports a variety of data preprocessing features. Please refer to Preprocessing for details.

Model¶

This specifies the model architecture. An typical example looks like this

model: !DefaultTranslator
  src_reader: !PlainTextReader
    vocab: !Vocab {vocab_file: examples/data/head.ja.vocab}
  trg_reader: !PlainTextReader
    vocab: !Vocab {vocab_file: examples/data/head.en.vocab}
  encoder: !BiLSTMSeqTransducer
    layers: 1
  attender: !MlpAttender
    hidden_dim: 512
    state_dim: 512
    input_dim: 512
  decoder: !AutoRegressiveDecoder
    embedder: !SimpleWordEmbedder
      emb_dim: 512
    rnn_layer: !UniLSTMSeqTransducer
      layers: 1
    transform: !NonLinear
      output_dim: 512
    bridge: !CopyBridge {}

The top level entry is typically DefaultTranslator, which implements a standard attentional sequence-to-sequence model. It allows flexible specification of encoder, attender, source / target embedder, and other settings. Again, to obtain the full list of supported options, please refer to the corresponding class in the API Doc.

Note that some of this Python objects are passed to their parent object’s initializer method, which requires that the children are initialized first. xnmt therefore uses a bottom-up initialization strategy, where siblings are initialized in the order they appear in the constructor. Among others, this guarantees that preprocessing is carried out before the model training.

Training¶

A typical example looks like this

train: !SimpleTrainingRegimen
  trainer: !AdamTrainer
    alpha: 0.001
  run_for_epochs: 2
  src_file: examples/data/head.ja
  trg_file: examples/data/head.en
  dev_tasks:
    - !LossEvalTask
      src_file: examples/data/head.ja
      ref_file: examples/data/head.en

The expected object here is a subclass of TrainingRegimen. Besides xnmt.training_regimen.SimpleTrainingRegimen, multi-task style training regimens are supported. For multi task training, each training regimen uses their own model, so in this case models must be specified as sub-components of the training regimen. An example Multi-task configuration can be refered to for more details on this.

Evaluation¶

If specified, the model is tested after training finished.

Config files vs. saved model files¶

Saved model files are written out in the exact same YAML format as the config files (with the addition of some .data directories that contain DyNet weights). This means that it is possible to specify a saved model as the configuration file. There is one subtle difference: In a config file, placeholders such as {EXP_DIR} are resolved based on the current context, which will be different when directly specifying the saved model file as config file. For this purpose a --resume option exists that makes sure to use the context from the saved model file: xnmt --resume /path/to/saved-model.mod.

This feature is currently implemented only in a very basic form: When resuming a crashed experiment, this will cause the whole experiment to be carried out from the start. When resuming a finished experiment, xnmt will return without performing any action. In the future, this will be extended to support resuming from the most recent saved checkpoint, etc.

Examples¶

Here are more elaborate examples from the github repository.

Standard¶

# A standard setup, specifying model architecture, training parameters, 
# and evaluation of the trained model
!Experiment # 'standard' is the name given to the experiment
  name: standard # every experiment needs a name
  # global parameters shared throughout the experiment
  exp_global: !ExpGlobal
    # {EXP_DIR} is a placeholder for the directory in which the config file lies.
    # {EXP} is a placeholder for the experiment name (here: 'standard')
    model_file: '{EXP_DIR}/models/{EXP}.mod'
    log_file: '{EXP_DIR}/logs/{EXP}.log'
    default_layer_dim: 512
    dropout: 0.3
  # model architecture
  model: !DefaultTranslator
    src_reader: !PlainTextReader
      vocab: !Vocab {vocab_file: examples/data/head.ja.vocab}
    trg_reader: !PlainTextReader
      vocab: !Vocab {vocab_file: examples/data/head.en.vocab}
    src_embedder: !SimpleWordEmbedder
      emb_dim: 512
    encoder: !BiLSTMSeqTransducer
      layers: 1
    attender: !MlpAttender
      hidden_dim: 512
      state_dim: 512
      input_dim: 512
    decoder: !AutoRegressiveDecoder
      embedder: !SimpleWordEmbedder
        emb_dim: 512
      rnn: !UniLSTMSeqTransducer
        layers: 1
      transform: !AuxNonLinear
        output_dim: 512
        activation: 'tanh'
      bridge: !CopyBridge {}
      scorer: !Softmax {}
  # training parameters
  train: !SimpleTrainingRegimen
    batcher: !SrcBatcher
      batch_size: 32
    trainer: !AdamTrainer
      alpha: 0.001
    run_for_epochs: 2
    src_file: examples/data/head.ja
    trg_file: examples/data/head.en
    dev_tasks:
      - !LossEvalTask
        src_file: examples/data/head.ja
        ref_file: examples/data/head.en
  # final evaluation
  evaluate:
    - !AccuracyEvalTask
      eval_metrics: bleu
      src_file: examples/data/head.ja
      ref_file: examples/data/head.en
      hyp_file: examples/output/{EXP}.test_hyp

Minimal¶

# Most entries in the config file have default values and don't need to be
# specified explicitly. This config file produces the same results as
# 01_standard.yaml.
# Default parameters are specified and documented directly in the __init__()
# method of the corresponding classes.
# For example,xnmt.translator.DefaultTranslator.__init__()
# specifies MlpAttender as the default attender, which will be used in this
# examples since nothing is specified.
!Experiment
  name: minimal
  model: !DefaultTranslator
    src_reader: !PlainTextReader
      vocab: !Vocab {vocab_file: examples/data/head.ja.vocab}
    trg_reader: !PlainTextReader
      vocab: !Vocab {vocab_file: examples/data/head.en.vocab}
  train: !SimpleTrainingRegimen
    run_for_epochs: 2
    src_file: examples/data/head.ja
    trg_file: examples/data/head.en
    dev_tasks:
      - !LossEvalTask
        src_file: examples/data/head.ja
        ref_file: examples/data/head.en
  evaluate:
    - !AccuracyEvalTask
      eval_metrics: bleu
      src_file: examples/data/head.ja
      ref_file: examples/data/head.en
      hyp_file: examples/output/{EXP}.test_hyp

Multiple experiments¶

# A config file can contain multiple experiments.
# These are run in sequence.
# It's also possible to run experiments in parallel:
# by default, experiments are skipped when the corresponding log file already
# exists, i.e. when the experiment is currently running or has already finnished.
# That means it's safe to run ``xnmt my_config.yaml`` on the same config file
# multiple times.
#
# This particular examples runs the same experiment, changing only the amount
# of dropout. model, train, evaluate settings are shared using YAML anchors,
# see here for more information: http://yaml.readthedocs.io/en/latest/example.html
#
# There are two ways of specifying multiple experiments: the dictionary-way and the
# list-way. The dictionary-way is shown below. Here, dictionary keys are experiment
# names and the values are !Experiment objects. The order is determined by lexicographic
# ordering of the experiment names.
exp1_dropout: !Experiment
  exp_global: !ExpGlobal
    dropout: 0.5
  model: &my_model !DefaultTranslator
    src_reader: !PlainTextReader
      vocab: !Vocab {vocab_file: examples/data/head.ja.vocab}
    trg_reader: !PlainTextReader
      vocab: !Vocab {vocab_file: examples/data/head.en.vocab}
  train: &my_train !SimpleTrainingRegimen
    run_for_epochs: 2
    src_file: examples/data/head.ja
    trg_file: examples/data/head.en
    dev_tasks:
      - !LossEvalTask
        src_file: examples/data/head.ja
        ref_file: examples/data/head.en
  evaluate: &my_eval
    - !AccuracyEvalTask
      eval_metrics: bleu
      src_file: examples/data/head.ja
      ref_file: examples/data/head.en
      hyp_file: examples/output/{EXP}.test_hyp

exp2_no_dropout: !Experiment
  exp_global: !ExpGlobal
    dropout: 0.0
  model: *my_model
  train: *my_train
  evaluate: *my_eval

# This example demonstrates specifying multiple experiments as a list.
# Here, the list makes the order of experiments explicit.
# Experiment names have to be passed as arguments to !Experiment
- !Experiment
  name: exp1_dropout
  exp_global: !ExpGlobal
    dropout: 0.5
  model: &my_model !DefaultTranslator
    src_reader: !PlainTextReader
      vocab: !Vocab {vocab_file: examples/data/head.ja.vocab}
    trg_reader: !PlainTextReader
      vocab: !Vocab {vocab_file: examples/data/head.en.vocab}
  train: &my_train !SimpleTrainingRegimen
    run_for_epochs: 2
    src_file: examples/data/head.ja
    trg_file: examples/data/head.en
    dev_tasks:
      - !LossEvalTask
        src_file: examples/data/head.ja
        ref_file: examples/data/head.en
  evaluate: &my_eval
    - !AccuracyEvalTask
      eval_metrics: bleu
      src_file: examples/data/head.ja
      ref_file: examples/data/head.en
      hyp_file: examples/output/{EXP}.test_hyp

- !Experiment
  name: exp2_no_dropout
  exp_global: !ExpGlobal
    dropout: 0.0
  model: *my_model
  train: *my_train
  evaluate: *my_eval

# Finally, it's possible to specify a single experiment as top-level entry,
# where again the experiment name has to be passed as an argument.
!Experiment
  name: exp1_dropout
  exp_global: !ExpGlobal
    dropout: 0.5
  model: &my_model !DefaultTranslator
    src_reader: !PlainTextReader
      vocab: !Vocab {vocab_file: examples/data/head.ja.vocab}
    trg_reader: !PlainTextReader
      vocab: !Vocab {vocab_file: examples/data/head.en.vocab}
  train: &my_train !SimpleTrainingRegimen
    run_for_epochs: 2
    src_file: examples/data/head.ja
    trg_file: examples/data/head.en
    dev_tasks:
      - !LossEvalTask
        src_file: examples/data/head.ja
        ref_file: examples/data/head.en
  evaluate: &my_eval
    - !AccuracyEvalTask
      eval_metrics: bleu
      src_file: examples/data/head.ja
      ref_file: examples/data/head.en
      hyp_file: examples/output/{EXP}.test_hyp

Settings¶

# The basic XNMT behavior can be controlled via predefined configurations.
# These are defined under xnmt/settings.py, and include "standard", "debug", and "unittest" settings.
# These specify things like verbosity, default paths, whether experiments should be skipped if the log file already
# exists, and whether to activate the DyNet check_validity and immediate_compute options.
# 
# As the name suggests, e.g. when debugging one might use XNMT as follows:
# ``xnmt --settings=debug examples/04_settings.yaml``
# 
# It is easy to change behavior by either changing these configurations, or adding a new configuration to the module.
!Experiment
  name: settings-exp
  model: !DefaultTranslator
    src_reader: !PlainTextReader
      vocab: !Vocab {vocab_file: examples/data/head.ja.vocab}
    trg_reader: !PlainTextReader
      vocab: !Vocab {vocab_file: examples/data/head.en.vocab}
  train: !SimpleTrainingRegimen
    run_for_epochs: 2
    src_file: examples/data/head.ja
    trg_file: examples/data/head.en
    dev_tasks:
      - !LossEvalTask
        src_file: examples/data/head.ja
        ref_file: examples/data/head.en
  evaluate:
    - !AccuracyEvalTask
      eval_metrics: bleu
      src_file: examples/data/head.ja
      ref_file: examples/data/head.en
      hyp_file: examples/output/{EXP}.test_hyp

Preprocessing¶

# XNMT supports various ways to preprocess data as demonstrated in this example.
# Note that some preprocessing functionality relies on third-party tools.
!Experiment
  name: preproc
  exp_global: !ExpGlobal
    # define some named strings that can be used throughout the experiment config:
    placeholders:
      DATA_IN: examples/data/
      DATA_OUT: examples/preproc/
  preproc: !PreprocRunner
    overwrite: False
    tasks:
    - !PreprocTokenize
      in_files:
      - '{DATA_IN}/train.ja'
      - '{DATA_IN}/train.en'
      - '{DATA_IN}/dev.ja'
      - '{DATA_IN}/dev.en'
      - '{DATA_IN}/test.ja'
      - '{DATA_IN}/test.en'
      out_files:
      - '{DATA_OUT}/train.tok.ja'
      - '{DATA_OUT}/train.tok.en'
      - '{DATA_OUT}/dev.tok.ja'
      - '{DATA_OUT}/dev.tok.en'
      - '{DATA_OUT}/test.tok.ja'
      - '{DATA_OUT}/test.tok.en'
      specs:
      - filenum: all
        tokenizers:
        - !UnicodeTokenizer {}
    - !PreprocNormalize
      in_files:
      - '{DATA_OUT}/train.tok.ja'
      - '{DATA_OUT}/train.tok.en'
      - '{DATA_OUT}/dev.tok.ja'
      - '{DATA_OUT}/dev.tok.en'
      - '{DATA_OUT}/test.tok.ja'
      - '{DATA_OUT}/test.tok.en'
      - '{DATA_IN}/dev.en'
      - '{DATA_IN}/test.en'
      out_files:
      - '{DATA_OUT}/train.tok.norm.ja'
      - '{DATA_OUT}/train.tok.norm.en'
      - '{DATA_OUT}/dev.tok.norm.ja'
      - '{DATA_OUT}/dev.tok.norm.en'
      - '{DATA_OUT}/test.tok.norm.ja'
      - '{DATA_OUT}/test.tok.norm.en'
      - '{DATA_OUT}/dev.norm.en'
      - '{DATA_OUT}/test.norm.en'
      specs:
      - filenum: all
        normalizers:
        - !NormalizerLower {}
    - !PreprocFilter
      in_files:
      - '{DATA_OUT}/train.tok.norm.ja'
      - '{DATA_OUT}/train.tok.norm.en'
      out_files:
      - '{DATA_OUT}/train.tok.norm.filter.ja'
      - '{DATA_OUT}/train.tok.norm.filter.en'
      specs:
      - !SentenceFiltererLength
        min_all: 1
        max_all: 60
    - !PreprocVocab
      in_files:
      - '{DATA_OUT}/train.tok.norm.ja'
      - '{DATA_OUT}/train.tok.norm.en'
      out_files:
      - '{DATA_OUT}/train.vocab.ja'
      - '{DATA_OUT}/train.vocab.en'
      specs:
      - filenum: all
        filters:
        - !VocabFiltererFreq
            min_freq: 2
  model: !DefaultTranslator
    src_reader: !PlainTextReader
      vocab: !Vocab
        vocab_file: examples/preproc/train.vocab.ja
    trg_reader: !PlainTextReader
      vocab: !Vocab
        vocab_file: examples/preproc/train.vocab.en
    src_embedder: !SimpleWordEmbedder
      emb_dim: 512
    encoder: !BiLSTMSeqTransducer
      layers: 1
    attender: !MlpAttender
      hidden_dim: 512
      state_dim: 512
      input_dim: 512
    decoder: !AutoRegressiveDecoder
      embedder: !SimpleWordEmbedder
        emb_dim: 512
      rnn: !UniLSTMSeqTransducer
        layers: 1
      transform: !AuxNonLinear
        output_dim: 512
      bridge: !NoBridge {}
    inference: !AutoRegressiveInference
      post_process: join-piece
  train: !SimpleTrainingRegimen
    run_for_epochs: 20
    src_file: '{DATA_OUT}/dev.tok.norm.ja'
    trg_file: '{DATA_OUT}/dev.tok.norm.en'
    dev_tasks:
      - !AccuracyEvalTask
        eval_metrics: bleu
        src_file: '{DATA_OUT}/dev.tok.norm.ja'
        ref_file: '{DATA_OUT}/dev.norm.en'
        hyp_file: examples/output/{EXP}.dev_hyp
      - !LossEvalTask
        src_file: '{DATA_OUT}/dev.tok.norm.ja'
        ref_file: '{DATA_OUT}/dev.tok.norm.en'
  evaluate:
  - !AccuracyEvalTask
    eval_metrics: bleu
    src_file: '{DATA_OUT}/test.tok.norm.ja'
    ref_file: '{DATA_OUT}/test.norm.en'
    hyp_file: examples/output/{EXP}.test_hyp

Early stopping¶

# Early stopping is achieved by configuring SimpleTrainingRegimen, with the following options:
# - run_for_epochs
# - lr_decay
# - lr_decay_times
# - patience
# - initial_patience
# - dev_tasks (to configure the metric used to determine lr decay or early stopping)
!Experiment
  name: minimal-early-stopping
  model: !DefaultTranslator
    src_reader: !PlainTextReader
      vocab: !Vocab {vocab_file: examples/data/head.ja.vocab}
    trg_reader: !PlainTextReader
      vocab: !Vocab {vocab_file: examples/data/head.en.vocab}
  train: !SimpleTrainingRegimen
    run_for_epochs: 100 # maximum number of epochs, but might stop earlier depending on the following settings.
    lr_decay: 0.5
    lr_decay_times: 3
    patience: 1
    initial_patience: 2
    dev_tasks: # the first metric (here: bleu) is used for checking whether LR should be decayed.
      - !AccuracyEvalTask
        eval_metrics: bleu,gleu
        src_file: examples/data/head.ja
        ref_file: examples/data/head.en
        hyp_file: examples/output/{EXP}.test_hyp
      - !LossEvalTask
        src_file: examples/data/head.ja
        ref_file: examples/data/head.en
    src_file: examples/data/head.ja
    trg_file: examples/data/head.en
  evaluate:
    - !AccuracyEvalTask
      eval_metrics: bleu
      src_file: examples/data/head.ja
      ref_file: examples/data/head.en
      hyp_file: examples/output/{EXP}.test_hyp

Fine-tuning¶

# Saving and loading models is a key feature demonstrated in this config file.
# This example shows how to load a trained model for fine tuning.
# pretrained model.
exp1-pretrain-model: !Experiment
  exp_global: !ExpGlobal
    # The model file contain the whole contents of this experiment in YAML
    # format. Note that {EXP} expressions are left intact when saving.
    default_layer_dim: 64
    dropout: 0.3
    weight_noise: 0.1
  model: !DefaultTranslator
    src_reader: !PlainTextReader
      vocab: !Vocab {vocab_file: examples/data/head.ja.vocab}
    trg_reader: !PlainTextReader
      vocab: !Vocab {vocab_file: examples/data/head.en.vocab}
    src_embedder: !SimpleWordEmbedder
      emb_dim: 64
    encoder: !BiLSTMSeqTransducer
      layers: 2
      input_dim: 64
    attender: !MlpAttender
      state_dim: 64
      hidden_dim: 64
      input_dim: 64
    decoder: !AutoRegressiveDecoder
      embedder: !SimpleWordEmbedder
        emb_dim: 64
      rnn: !UniLSTMSeqTransducer
        layers: 1
      transform: !AuxNonLinear
        output_dim: 64
      input_feeding: True
      bridge: !CopyBridge {}
    inference: !AutoRegressiveInference {}
  train: !SimpleTrainingRegimen
    run_for_epochs: 2
    src_file: examples/data/head.ja
    trg_file: examples/data/head.en
    dev_tasks:
    - !AccuracyEvalTask
      eval_metrics: bleu
      src_file: examples/data/head.ja
      ref_file: examples/data/head.en
      hyp_file: examples/output/{EXP}.dev_hyp
  evaluate:
    - !AccuracyEvalTask
      eval_metrics: bleu
      src_file: examples/data/head.ja
      ref_file: examples/data/head.en
      hyp_file: examples/output/{EXP}.test_hyp

exp2-finetune-model: !LoadSerialized
  # This will load the contents of the above experiments that were saved to the
  # YAML file specified after filename:
  # This will carry out the exact same thing, except that {EXP} is resolved to
  # a different value (making sure we don't overwrite the previous model),
  # and except for the things explicitly overwritten in the overwrite: section.
  # It's possible to change any settings as long as these don't change the number
  # or nature of DyNet parameters allocated for the component.
  filename: examples/models/exp1-pretrain-model.mod
  path: ''
  overwrite: # list of [path, value] pairs. Value can be scalar or an arbitrary object
  - path: train.trainer
    val: !AdamTrainer
        alpha: 0.0002
  - path: exp_global.dropout
    val: 0.5
  - path: train.dev_zero
    val: True
  - path: status
    val: null

Beam search¶

# This example shows how to configure beam search, and how use the loading mechanism for the purpose of evaluating a
# model.
exp1-train-model: !Experiment
  exp_global: !ExpGlobal
    # The model file contain the whole contents of this experiment in YAML
    # format. Note that {EXP} expressions are left intact when saving.
    model_file: examples/output/{EXP}.mod
    log_file: examples/output/{EXP}.log
    default_layer_dim: 64
    dropout: 0.5
    weight_noise: 0.1
  model: !DefaultTranslator
    src_reader: !PlainTextReader
      vocab: !Vocab {vocab_file: examples/data/head.ja.vocab}
    trg_reader: !PlainTextReader
      vocab: !Vocab {vocab_file: examples/data/head.en.vocab}
    src_embedder: !SimpleWordEmbedder
      emb_dim: 64
    encoder: !BiLSTMSeqTransducer
      layers: 2
      input_dim: 64
    attender: !MlpAttender
      state_dim: 64
      hidden_dim: 64
      input_dim: 64
    decoder: !AutoRegressiveDecoder
      embedder: !SimpleWordEmbedder
        emb_dim: 64
      rnn: !UniLSTMSeqTransducer
        layers: 1
      transform: !AuxNonLinear
        output_dim: 64
      input_feeding: True
      bridge: !CopyBridge {}
    inference: !AutoRegressiveInference
      search_strategy: !BeamSearch
        beam_size: 5
        len_norm: !PolynomialNormalization
          apply_during_search: true
          m: 0.8
  train: !SimpleTrainingRegimen
    run_for_epochs: 2
    src_file: examples/data/head.ja
    trg_file: examples/data/head.en
    dev_tasks:
    - !AccuracyEvalTask
      eval_metrics: bleu
      src_file: examples/data/head.ja
      ref_file: examples/data/head.en
      hyp_file: examples/output/{EXP}.dev_hyp
  evaluate:
    - !AccuracyEvalTask
      eval_metrics: bleu
      src_file: examples/data/head.ja
      ref_file: examples/data/head.en
      hyp_file: examples/output/{EXP}.test_hyp

exp2-eval-model: !LoadSerialized
  filename: examples/output/exp1-train-model.mod
  overwrite: # list of [path, value] pairs. Value can be scalar or an arbitrary object
  - path: train # skip the training loop
    val: null
  - path: status
    val: null
  - path: model.inference.search_strategy.beam_size # try some new beam settings
    val: 10
  - path: evaluate
    val: # (re-)define test data and other evaluation settings
    - !AccuracyEvalTask
      eval_metrics: bleu,gleu
      src_file: examples/data/head.ja
      ref_file: examples/data/head.en
      hyp_file: examples/output/{EXP}.test_hyp

Programmatic usage¶

# It is also possible to configure model training using Python code rather than
# YAML config files. This is less convenient and usually not necessary, but there
# may be cases where the added flexibility is needed. This basically works by
# using XNMT as a library of components that are initialized and run in this
# config file.
#
# This demonstrates a standard model training, including set up of logging, model
# saving, etc.; models are saved into YAML files that can again be loaded using
# the standard YAML  way (examples/07_load_finetune.yaml) or the Python way
# (10_programmatic_load.py)
#
# To launch this, use ``python -m examples.09_programmatic``, making sure that XNMT
# setup.py has been run properly.


import os
import random

import numpy as np

from xnmt.modelparts.attenders import MlpAttender
from xnmt.batchers import SrcBatcher, InOrderBatcher
from xnmt.modelparts.bridges import CopyBridge
from xnmt.modelparts.decoders import AutoRegressiveDecoder
from xnmt.modelparts.embedders import SimpleWordEmbedder
from xnmt.eval.tasks import LossEvalTask, AccuracyEvalTask
from xnmt.experiments import Experiment
from xnmt.inferences import AutoRegressiveInference
from xnmt.input_readers import PlainTextReader
from xnmt.transducers.recurrent import BiLSTMSeqTransducer, UniLSTMSeqTransducer
from xnmt.modelparts.transforms import AuxNonLinear
from xnmt.modelparts.scorers import Softmax
from xnmt.optimizers import AdamTrainer
from xnmt.param_collections import ParamManager
from xnmt.persistence import save_to_file
import xnmt.tee
from xnmt.train.regimens import SimpleTrainingRegimen
from xnmt.models.translators.default import DefaultTranslator
from xnmt.vocabs import Vocab

seed=13
random.seed(seed)
np.random.seed(seed)

EXP_DIR = os.path.dirname(__file__)
EXP = "programmatic"

model_file = f"{EXP_DIR}/models/{EXP}.mod"
log_file = f"{EXP_DIR}/logs/{EXP}.log"

xnmt.tee.set_out_file(log_file, EXP)

ParamManager.init_param_col()
ParamManager.param_col.model_file = model_file

src_vocab = Vocab(vocab_file="examples/data/head.ja.vocab")
trg_vocab = Vocab(vocab_file="examples/data/head.en.vocab")

batcher = SrcBatcher(batch_size=64)

inference = AutoRegressiveInference(batcher=InOrderBatcher(batch_size=1))

layer_dim = 512

model = DefaultTranslator(
  src_reader=PlainTextReader(vocab=src_vocab),
  trg_reader=PlainTextReader(vocab=trg_vocab),
  src_embedder=SimpleWordEmbedder(emb_dim=layer_dim, vocab_size=len(src_vocab)),

  encoder=BiLSTMSeqTransducer(input_dim=layer_dim, hidden_dim=layer_dim, layers=1),
  attender=MlpAttender(hidden_dim=layer_dim, state_dim=layer_dim, input_dim=layer_dim),
  decoder=AutoRegressiveDecoder(input_dim=layer_dim,
                                embedder=SimpleWordEmbedder(emb_dim=layer_dim, vocab_size=len(trg_vocab)),
                                rnn=UniLSTMSeqTransducer(input_dim=layer_dim, hidden_dim=layer_dim,
                                                         decoder_input_dim=layer_dim, yaml_path="decoder"),
                                transform=AuxNonLinear(input_dim=layer_dim, output_dim=layer_dim,
                                                       aux_input_dim=layer_dim),
                                scorer=Softmax(vocab_size=len(trg_vocab), input_dim=layer_dim),
                                bridge=CopyBridge(dec_dim=layer_dim, dec_layers=1)),
  inference=inference
)

train = SimpleTrainingRegimen(
  name=f"{EXP}",
  model=model,
  batcher=batcher,
  trainer=AdamTrainer(alpha=0.001),
  run_for_epochs=2,
  src_file="examples/data/head.ja",
  trg_file="examples/data/head.en",
  dev_tasks=[LossEvalTask(src_file="examples/data/head.ja",
                          ref_file="examples/data/head.en",
                          model=model,
                          batcher=batcher)],
)

evaluate = [AccuracyEvalTask(eval_metrics="bleu,wer",
                             src_file="examples/data/head.ja",
                             ref_file="examples/data/head.en",
                             hyp_file=f"examples/output/{EXP}.test_hyp",
                             inference=inference,
                             model=model)]

standard_experiment = Experiment(
  name="programmatic",
  model=model,
  train=train,
  evaluate=evaluate
)

# run experiment
standard_experiment(save_fct=lambda: save_to_file(model_file, standard_experiment))

exit()

Programmatic loading¶

# This demonstrates how to load the model trained using ``09_programmatic.py``
# the programmatic way and for the purpose of evaluating the model.

import os

import xnmt.tee
from xnmt.param_collections import ParamManager
from xnmt.persistence import initialize_if_needed, YamlPreloader, LoadSerialized, save_to_file

EXP_DIR = os.path.dirname(__file__)
EXP = "programmatic-load"

model_file = f"{EXP_DIR}/models/{EXP}.mod"
log_file = f"{EXP_DIR}/logs/{EXP}.log"

xnmt.tee.set_out_file(log_file, EXP)

ParamManager.init_param_col()

load_experiment = LoadSerialized(
  filename=f"{EXP_DIR}/models/programmatic.mod",
  overwrite=[
    {"path" : "train", "val" : None},
    {"path": "status", "val": None},
  ]
)

uninitialized_experiment = YamlPreloader.preload_obj(load_experiment, exp_dir=EXP_DIR, exp_name=EXP)
loaded_experiment = initialize_if_needed(uninitialized_experiment)

# if we were to continue training, we would need to set a save model file like this:
# ParamManager.param_col.model_file = model_file
ParamManager.populate()

# run experiment
loaded_experiment(save_fct=lambda: None)

Parameter sharing¶

# This illustrates component and parameter sharing. This is useful for making
# config files less verbose, and more importantly makes it possible to realize
# weight-sharing between components, which will also be demonstrated in the
# multi-task example later.
#
# There are 2 ways to achieve sharing:
# - YAML's anchor system where '&' denotes a named anchor, '*' denotes a reference to an anchor.
#   This essentially copies values or subcomponents from one place to another.
#   It can be combined with the << operator that allows copying parts of a dictionary, but overwriting other parts.
#   More info is found here: http://yaml.readthedocs.io/en/latest/example.html
# - XNMT's !Ref object creates a reference, meaning both places will point to the exact same Python object,
#   and that DyNet parameters will be shared.
#   References can be made by path or by name, as illustrated below. The name refers to a _xnmt_id that can
#   be set in any component and must be unique.
#   Note that references do not work across experiments (e.g. we cannot refer to exp2.load from within exp1.pretrain)

exp1.pretrain: !Experiment
  exp_global: !ExpGlobal
    default_layer_dim: 32
    model_file: 'examples/output/{EXP}.mod'
  model: !DefaultTranslator
    src_reader: !PlainTextReader
      vocab: !Vocab {vocab_file: examples/data/head.ja.vocab}
    trg_reader: !PlainTextReader
      vocab: !Vocab {vocab_file: examples/data/head.en.vocab}
    src_embedder: !SimpleWordEmbedder
      emb_dim: 32
    encoder: !BiLSTMSeqTransducer
      layers: 1
    attender: !MlpAttender {}
    # reference-sharing between softmax projection and target embedder. This means both layers share DyNet parameters!
    decoder: !AutoRegressiveDecoder
      embedder: !DenseWordEmbedder
        _xnmt_id: trg_emb # this id must be unique and is needed to create a reference-by-name below.
        emb_dim: 32
      rnn: !UniLSTMSeqTransducer
        layers: 1
      scorer: !Softmax
        output_projector: !Ref { name: trg_emb }
        #                alternatively, the same could be achieved like this,
        #                in which case model.decoder.embedder._xnmt_id is not required:
        #                !Ref { path: model.decoder.embedder }
      bridge: !CopyBridge {}
    inference: !AutoRegressiveInference {}
  train: !SimpleTrainingRegimen
    run_for_epochs: 2
    src_file: examples/data/head.ja
    trg_file: examples/data/head.en
    dev_tasks:
      - !LossEvalTask
        src_file: &dev_src examples/data/head.ja  # value-sharing between train.training_corpus.dev_src and inference.src_file
        ref_file: &dev_trg examples/data/head.en  # value-sharing between train.training_corpus.dev_trg and evaluate.ref_file
  evaluate:
    - !AccuracyEvalTask
      eval_metrics: bleu
      src_file: *dev_src # Copy over the file path from the dev tasks using YAML anchors.
      ref_file: *dev_trg # The same could also be done for more complex objects.
      hyp_file: examples/output/{EXP}.test_hyp

exp2.load: !LoadSerialized
  filename: examples/output/exp1.pretrain.mod
  

Multi-task¶

# XNMT offers a very flexible way of multi-task training by specifying multiple
# models and using the !Ref mechanism for weight sharing, as demonstrated
# in this config file.
# The possible multi-task training strategies can be looked up in
# xnmt/regimens.py and include same-batch, alternating-batch, and serial
# strategies.
exp1-multi_task: !Experiment
  exp_global: !ExpGlobal
    model_file: examples/output/{EXP}.mod
    log_file: examples/output/{EXP}.log
    default_layer_dim: 64
  train: !SameBatchMultiTaskTrainingRegimen
    trainer: !AdamTrainer {}
    n_task_steps: [2,1]
    tasks:
    - !SimpleTrainingTask # first task is the main task: it will control early stopping, learning rate schedule, model checkpoints, ..
      name: first_task
      run_for_epochs: 6
      batcher: !SrcBatcher
        batch_size: 6
      src_file: examples/data/head.ja
      trg_file: examples/data/head.en
      model: !DefaultTranslator
        _xnmt_id: first_task_model
        src_reader: !PlainTextReader
          vocab: !Vocab
            _xnmt_id: src_vocab
            vocab_file: examples/data/head.ja.vocab
        trg_reader: !PlainTextReader
          vocab: !Vocab
            _xnmt_id: trg_vocab
            vocab_file: examples/data/head.en.vocab
        src_embedder: !SimpleWordEmbedder
          emb_dim: 64
          vocab: !Ref {name: src_vocab}
        encoder: !BiLSTMSeqTransducer # the encoder shares parameters between tasks
          _xnmt_id: first_task_encoder
          layers: 1
        attender: !MlpAttender
          state_dim: 64
          hidden_dim: 64
          input_dim: 64
        decoder: !AutoRegressiveDecoder
          embedder: !SimpleWordEmbedder
            emb_dim: 64
            vocab: !Ref {name: trg_vocab}
          rnn: !UniLSTMSeqTransducer
            layers: 1
            hidden_dim: 64
          bridge: !CopyBridge {}
          scorer: !Softmax
            vocab: !Ref {name: trg_vocab}
      dev_tasks:
        - !AccuracyEvalTask
          model: !Ref { name: first_task_model }
          src_file: &first_task_dev_src examples/data/head.ja  # value-sharing between first task dev and final eval
          ref_file: &first_task_dev_trg examples/data/head.en  # value-sharing between first task dev and final eval
          hyp_file: examples/output/{EXP}.first_dev_hyp
          eval_metrics: bleu # tasks can specify different dev_metrics
    - !SimpleTrainingTask
      name: second_task
      batcher: !SrcBatcher
        batch_size: 6
      src_file: examples/data/head.ja
      trg_file: examples/data/head.en
      model: !DefaultTranslator
        _xnmt_id: second_task_model
        src_reader: !PlainTextReader
          vocab: !Ref {name: src_vocab}
        trg_reader: !PlainTextReader
          vocab: !Ref {name: trg_vocab}
        src_embedder: !SimpleWordEmbedder
          emb_dim: 64
          vocab: !Ref {name: src_vocab}
        encoder: !Ref { name: first_task_encoder }
        attender: !MlpAttender
          state_dim: 64
          hidden_dim: 64
          input_dim: 64
        decoder: !AutoRegressiveDecoder
          embedder: !SimpleWordEmbedder
            emb_dim: 64
            vocab: !Ref {name: trg_vocab}
          bridge: !CopyBridge {}
          scorer: !Softmax
            vocab: !Ref {name: trg_vocab}
      dev_tasks:
        - !AccuracyEvalTask
          model: !Ref { name: second_task_model }
          src_file: examples/data/head.ja
          ref_file: examples/data/head.en
          hyp_file: examples/output/{EXP}.second_dev_hyp
          eval_metrics: gleu # tasks can specify different dev_metrics
  evaluate:
    - !AccuracyEvalTask
      model: !Ref { name: first_task_model }
      eval_metrics: bleu
      src_file: *first_task_dev_src
      ref_file: *first_task_dev_trg
      hyp_file: examples/output/{EXP}.test_hyp

exp2-finetune-model: !LoadSerialized
  filename: examples/output/exp1-multi_task.mod

Speech¶

# This config file demonstrates how to specify a speech recognition model
# using the Listen-Attend-Spell architecture: https://arxiv.org/pdf/1508.01211.pdf
# Compared to the conventional attentional model, we remove input embeddings,
# instead directly read in a feature vector the pyramidal LSTM reduces length of
# the input sequence by a factor of 2 per layer (except for the first layer).
# Output units should be characters according to the paper.
!Experiment
  name: speech
  exp_global: !ExpGlobal
    save_num_checkpoints: 2
    default_layer_dim: 32
    dropout: 0.4
  preproc: !PreprocRunner
    overwrite: False
    tasks:
    - !PreprocExtract
      in_files:
      - examples/data/LDC94S13A.yaml
      out_files:
      - examples/data/LDC94S13A.h5
      specs: !MelFiltExtractor {}
  model: !DefaultTranslator
    src_embedder: !NoopEmbedder
      emb_dim: 40
    encoder: !PyramidalLSTMSeqTransducer
      layers: 3
      downsampling_method: concat
      reduce_factor: 2
      input_dim: 40
      hidden_dim: 64
    attender: !MlpAttender
      state_dim: 64
      hidden_dim: 64
      input_dim: 64
    decoder: !AutoRegressiveDecoder
      embedder: !SimpleWordEmbedder
        emb_dim: 64
      rnn: !UniLSTMSeqTransducer
        layers: 1
      transform: !AuxNonLinear
        output_dim: 64
      bridge: !CopyBridge {}
    src_reader: !H5Reader
      transpose: True
    trg_reader: !PlainTextReader
      vocab: !Vocab {vocab_file: examples/data/char.vocab}
      output_proc: join-char
  train: !SimpleTrainingRegimen
    run_for_epochs: 1
    batcher: !SrcBatcher
      pad_src_to_multiple: 4
      batch_size: 3
    trainer: !AdamTrainer {}
    src_file: examples/data/LDC94S13A.h5
    trg_file: examples/data/LDC94S13A.char
    dev_tasks:
      - !LossEvalTask
        src_file: examples/data/LDC94S13A.h5
        ref_file: examples/data/LDC94S13A.char
      - !AccuracyEvalTask
        eval_metrics: cer,wer
        src_file: examples/data/LDC94S13A.h5
        ref_file: examples/data/LDC94S13A.char
        hyp_file: examples/output/{EXP}.dev_hyp
        inference: !AutoRegressiveInference
          batcher: !InOrderBatcher
            _xnmt_id: inference_batcher
            pad_src_to_multiple: 4
            batch_size: 1
  evaluate:
    - !AccuracyEvalTask
      eval_metrics: cer,wer
      src_file: examples/data/LDC94S13A.h5
      ref_file: examples/data/LDC94S13A.words
      hyp_file: examples/output/{EXP}.test_hyp
      inference: !AutoRegressiveInference
        batcher: !Ref { name: inference_batcher }

Reporting attention matrices¶

# XNMT supports writing out reports, such as attention matrices generated during inference or difference highlighting
# between outputs and references.
# These are generally created by setting exp_global.compute_report to True, and adding one or several reporters
# to the inference class.
!Experiment
  name: report
  model: !DefaultTranslator
    src_reader: !PlainTextReader
      vocab: !Vocab {vocab_file: examples/data/head.ja.vocab}
    trg_reader: !PlainTextReader
      vocab: !Vocab {vocab_file: examples/data/head.en.vocab}
  train: !SimpleTrainingRegimen
    run_for_epochs: 2
    src_file: examples/data/head.ja
    trg_file: examples/data/head.en
    dev_tasks:
      - !LossEvalTask
        src_file: examples/data/head.ja
        ref_file: examples/data/head.en
  train: !SimpleTrainingRegimen
    run_for_epochs: 0
    src_file: examples/data/head.ja
    trg_file: examples/data/head.en
  evaluate:
    - !AccuracyEvalTask
      eval_metrics: bleu
      src_file: examples/data/head.ja
      ref_file: examples/data/head.en
      hyp_file: examples/output/{EXP}.test_hyp
      inference: !AutoRegressiveInference
        reporter:
        - !AttentionReporter {} # plot attentions
        - !ReferenceDiffReporter {} # difference highlighting
        - !CompareMtReporter {} # analyze MT outputs
        - !OOVStatisticsReporter # report on recovered OOVs, fantasized new words, etc.
            train_trg_file: examples/data/head.en

Scoring N-best lists¶

# Using a trained model to add hypothesis score for an nbest list
# First, exp1-model trains a model which is saved at examples/output/exp1-model.mod
# Then, exp2-score loads the exp1-model, and use it to score an nbest list
# The nbest list example used here is located at examples/data/head.nbest.en
# exp2-score outputs a new nbest list with hypothesis score.
# The output file will be in examples/output/exp2-score.test_hyp

exp1-model: !Experiment
  exp_global: !ExpGlobal
    model_file: examples/output/{EXP}.mod
    log_file: examples/output/{EXP}.log
    default_layer_dim: 64
    dropout: 0.5
    weight_noise: 0.1
  model: !DefaultTranslator
    src_reader: !PlainTextReader
      vocab: !Vocab {vocab_file: examples/data/head.ja.vocab}
    trg_reader: !PlainTextReader
      vocab: !Vocab {vocab_file: examples/data/head.en.vocab}
    src_embedder: !SimpleWordEmbedder
      emb_dim: 64
    encoder: !BiLSTMSeqTransducer
      layers: 2
      input_dim: 64
    attender: !MlpAttender
      state_dim: 64
      hidden_dim: 64
      input_dim: 64
    decoder: !AutoRegressiveDecoder
      embedder: !SimpleWordEmbedder
        emb_dim: 64
      rnn: !UniLSTMSeqTransducer
        layers: 1
      transform: !AuxNonLinear
        output_dim: 64
      input_feeding: True
      bridge: !CopyBridge {}
    inference: !AutoRegressiveInference {}
  train: !SimpleTrainingRegimen
    run_for_epochs: 2
    src_file: examples/data/head.ja
    trg_file: examples/data/head.en
    dev_tasks:
    - !AccuracyEvalTask
      eval_metrics: bleu
      src_file: examples/data/head.ja
      ref_file: examples/data/head.en
      hyp_file: examples/output/{EXP}.dev_hyp
  evaluate:
    - !AccuracyEvalTask
      eval_metrics: bleu
      src_file: examples/data/head.ja
      ref_file: examples/data/head.en
      hyp_file: examples/output/{EXP}.test_hyp

exp2-score: !LoadSerialized
  filename: examples/output/exp1-model.mod
  overwrite:
  - path: train
    val: ~
  - path: model.inference
    val: !AutoRegressiveInference
      mode: score
      ref_file: examples/data/head.nbest.en
      src_file: examples/data/head.ja
  - path: evaluate.0
    val: !AccuracyEvalTask
      src_file: examples/data/head.ja
      ref_file: examples/data/head.nbest.en
      hyp_file: examples/output/{EXP}.test_hyp

Ensembling¶

# This example shows different ways to perform model ensembling

# First, let's a define a simple experiment with a single model
exp1-single: !Experiment
  exp_global: &globals !ExpGlobal
    model_file: examples/output/{EXP}.mod
    log_file: examples/output/{EXP}.log
    default_layer_dim: 32
  # Just use default model settings here
  model: &model1 !DefaultTranslator
    src_reader: &src_reader !PlainTextReader
      vocab: !Vocab {vocab_file: examples/data/head.ja.vocab}
    trg_reader: &trg_reader !PlainTextReader
      vocab: !Vocab {vocab_file: examples/data/head.en.vocab}
  train: &train !SimpleTrainingRegimen
    run_for_epochs: 2
    src_file: examples/data/head.ja
    trg_file: examples/data/head.en
    dev_tasks:
      - !LossEvalTask
        src_file: examples/data/head.ja
        ref_file: examples/data/head.en

# Another single model, but with a different number of layers and some other
# different settings
exp2-single: !Experiment
  exp_global: *globals
  model: &model2 !DefaultTranslator
    src_reader: *src_reader
    trg_reader: *trg_reader
    encoder: !BiLSTMSeqTransducer
      layers: 3
      hidden_dim: 64
    decoder: !AutoRegressiveDecoder
      embedder: !DenseWordEmbedder
        _xnmt_id: dense_embed
        emb_dim: 64
      rnn: !UniLSTMSeqTransducer
        hidden_dim: 64
      transform: !AuxNonLinear
        output_dim: 64
      scorer: !Softmax
        output_projector: !Ref {name: dense_embed}
  train: *train

# Load the previously trained models and combine them to an ensemble
exp3-ensemble-load: !Experiment
  exp_global: *globals
  model: !EnsembleTranslator
    src_reader: !Ref {path: model.models.0.src_reader}
    trg_reader: !Ref {path: model.models.0.trg_reader}
    models:
      - !LoadSerialized
        filename: 'examples/output/exp1-single.mod'
        path: model
      - !LoadSerialized
        filename: 'examples/output/exp2-single.mod'
        path: model
  evaluate:
    - !AccuracyEvalTask
      eval_metrics: bleu,wer
      src_file: examples/data/head.ja
      ref_file: examples/data/head.en
      hyp_file: examples/output/{EXP}.test_hyp

# Alternatively, we can also hook up the models during training time already
exp4-ensemble-train: !Experiment
  exp_global: *globals
  model: !EnsembleTranslator
    src_reader: *src_reader
    trg_reader: *trg_reader
    models:
      - *model1
      - *model2
  train: *train
  evaluate:
    - !AccuracyEvalTask
      eval_metrics: bleu,wer
      src_file: examples/data/head.ja
      ref_file: examples/data/head.en
      hyp_file: examples/output/{EXP}.test_hyp

Minimum risk training¶

# Saving and loading models is a key feature demonstrated in this config file.
# This example shows how to load a trained model for fine tuning.
# pretrained model.
exp1-pretrain-model: !Experiment
  exp_global: !ExpGlobal
    # The model file contain the whole contents of this experiment in YAML
    # format. Note that {EXP} expressions are left intact when saving.
    default_layer_dim: 64
    dropout: 0.3
    weight_noise: 0.1
  model: !DefaultTranslator
    src_reader: !PlainTextReader
      vocab: !Vocab {vocab_file: examples/data/head.ja.vocab}
    trg_reader: !PlainTextReader
      vocab: !Vocab {vocab_file: examples/data/head.en.vocab}
    src_embedder: !SimpleWordEmbedder
      emb_dim: 64
    encoder: !BiLSTMSeqTransducer
      layers: 2
      input_dim: 64
    attender: !MlpAttender
      state_dim: 64
      hidden_dim: 64
      input_dim: 64
    decoder: !AutoRegressiveDecoder
      embedder: !SimpleWordEmbedder
        emb_dim: 64
      rnn: !UniLSTMSeqTransducer
        layers: 1
      transform: !AuxNonLinear
        output_dim: 64
      input_feeding: True
      bridge: !CopyBridge {}
    inference: !AutoRegressiveInference {}
  train: !SimpleTrainingRegimen
    run_for_epochs: 2
    src_file: examples/data/head.ja
    trg_file: examples/data/head.en
    dev_tasks:
    - !AccuracyEvalTask
      eval_metrics: bleu
      src_file: examples/data/head.ja
      ref_file: examples/data/head.en
      hyp_file: examples/output/{EXP}.dev_hyp
  evaluate:
    - !AccuracyEvalTask
      eval_metrics: bleu
      src_file: examples/data/head.ja
      ref_file: examples/data/head.en
      hyp_file: examples/output/{EXP}.test_hyp

exp2-finetune-minrisk: !LoadSerialized
  # This will perform minimum risk training with SamplingSearch.
  # Same as above, the pretrained model will be loaded and an appropriate search_strategy
  # will be used during minimum risk training.
  filename: examples/models/exp1-pretrain-model.mod
  path: ''
  overwrite:
  - path: train.loss_calculator
    val: !MinRiskLoss
      alpha: 0.005
  - path: model.inference.search_strategy
    val: !SamplingSearch
      sample_size: 10
      max_len: 50
  - path: train.run_for_epochs
    val: 1

Biased Lexicon¶

(this is currently broken)

lexbias: !Experiment # 'standard' is the name given to the experiment
  exp_global: !ExpGlobal
    model_file: '{EXP_DIR}/models/{EXP}.mod'
    log_file: '{EXP_DIR}/logs/{EXP}.log'
    default_layer_dim: 512
    dropout: 0.3
  # model architecture
  model: !DefaultTranslator
    src_reader: !PlainTextReader
      vocab: !Vocab {vocab_file: examples/data/head.en.vocab}
    trg_reader: !PlainTextReader
      vocab: !Vocab {vocab_file: examples/data/head.ja.vocab}
    src_embedder: !SimpleWordEmbedder
      emb_dim: 512
    encoder: !BiLSTMSeqTransducer
      layers: 1
    attender: !MlpAttender
      hidden_dim: 512
      state_dim: 512
      input_dim: 512
    decoder: !AutoRegressiveDecoder
      embedder: !SimpleWordEmbedder
        emb_dim: 512
      rnn: !UniLSTMSeqTransducer
        layers: 1
      transform: !AuxNonLinear
        output_dim: 512
      bridge: !CopyBridge {}
      scorer: !LexiconSoftmax
        lexicon_file: examples/data/head-ja_given_en.lex
        # can choose between bias/linear
        lexicon_type: bias
        # The small epsilon value to be added to the bias
        lexicon_alpha: 0.001
  # training parameters
  train: !SimpleTrainingRegimen
    batcher: !SrcBatcher
      batch_size: 32
    trainer: !AdamTrainer
      alpha: 0.001
    run_for_epochs: 2
    src_file: examples/data/head.en
    trg_file: examples/data/head.ja
    dev_tasks:
      - !LossEvalTask
        src_file: examples/data/head.en
        ref_file: examples/data/head.ja
  # final evaluation
  evaluate:
    - !AccuracyEvalTask
      eval_metrics: bleu
      src_file: examples/data/head.en
      ref_file: examples/data/head.ja
      hyp_file: examples/output/{EXP}.test_hyp

Subword Sampling¶

# Sampling subword units for subword regularization
# Note that this requires 'sentencepiece' as an extra dependency
!Experiment
  name: subword_sample
  exp_global: !ExpGlobal
    model_file: '{EXP_DIR}/models/{EXP}.mod'
    log_file: '{EXP_DIR}/logs/{EXP}.log'
    default_layer_dim: 512
    dropout: 0.3
  model: !DefaultTranslator
    # Here we set the sample_train and alpha parameters to turn on sampling
    src_reader: !SentencePieceTextReader
      sample_train: True
      alpha: 0.1
      vocab: !Vocab
        vocab_file: examples/data/big-ja.vocab
        sentencepiece_vocab: True
      model_file: examples/data/big-ja.model
    trg_reader: !SentencePieceTextReader
      sample_train: True
      alpha: 0.1
      vocab: !Vocab
        vocab_file: examples/data/big-en.vocab
        sentencepiece_vocab: True
      model_file: examples/data/big-en.model
    src_embedder: !SimpleWordEmbedder
      emb_dim: 512
    encoder: !BiLSTMSeqTransducer
      layers: 1
    attender: !MlpAttender
      hidden_dim: 512
      state_dim: 512
      input_dim: 512
    decoder: !AutoRegressiveDecoder
      embedder: !SimpleWordEmbedder
        emb_dim: 512
      rnn: !UniLSTMSeqTransducer
        layers: 1
      transform: !AuxNonLinear
        output_dim: 512
        activation: 'tanh'
      bridge: !CopyBridge {}
    inference: !AutoRegressiveInference
      post_process: join-piece
  # training parameters
  train: !SimpleTrainingRegimen
    batcher: !SrcBatcher
      batch_size: 32
    trainer: !AdamTrainer
      alpha: 0.001
    run_for_epochs: 20
    src_file: examples/data/head.ja
    trg_file: examples/data/head.en
    dev_tasks:
      - !LossEvalTask
        src_file: examples/data/head.ja
        ref_file: examples/data/head.en
  # final evaluation
  evaluate:
    - !AccuracyEvalTask
      eval_metrics: bleu
      src_file: examples/data/head.ja
      ref_file: examples/data/head.en
      hyp_file: examples/output/{EXP}.test_hyp

Self Attention¶

# A setup using self-attention
!Experiment
  name: self_attention
  exp_global: !ExpGlobal
    model_file: '{EXP_DIR}/models/{EXP}.mod'
    log_file: '{EXP_DIR}/logs/{EXP}.log'
    default_layer_dim: 512
    dropout: 0.3
    placeholders:
      DATA_IN: examples/data
      DATA_OUT: examples/preproc
  preproc: !PreprocRunner
    overwrite: False
    tasks:
    - !PreprocVocab
      in_files:
      - '{DATA_IN}/train.ja'
      - '{DATA_IN}/train.en'
      out_files:
      - '{DATA_OUT}/train.ja.vocab'
      - '{DATA_OUT}/train.en.vocab'
      specs:
      - filenum: all
        filters:
        - !VocabFiltererFreq
            min_freq: 2
  model: !DefaultTranslator
    src_reader: !PlainTextReader
      vocab: !Vocab {vocab_file: '{DATA_OUT}/train.ja.vocab'}
    trg_reader: !PlainTextReader
      vocab: !Vocab {vocab_file: '{DATA_OUT}/train.en.vocab'}
    src_embedder: !SimpleWordEmbedder
      emb_dim: 512
    encoder: !ModularSeqTransducer
      modules:
      - !PositionalSeqTransducer
        input_dim: 512
        max_pos: 100
        dropout: 0.1
      - !ModularSeqTransducer
        modules: !Repeat
          times: 2
          content: !ModularSeqTransducer
            modules:
            - !ResidualSeqTransducer
              input_dim: 512
              child: !MultiHeadAttentionSeqTransducer
                num_heads: 8
                dropout: 0.1
              layer_norm: True
              dropout: 0.1
            - !ResidualSeqTransducer
              input_dim: 512
              child: !TransformSeqTransducer
                transform: !MLP
                  activation: relu
              layer_norm: True
              dropout: 0.1
    attender: !MlpAttender
      hidden_dim: 512
      state_dim: 512
      input_dim: 512
    decoder: !AutoRegressiveDecoder
      embedder: !SimpleWordEmbedder
        emb_dim: 512
      rnn: !UniLSTMSeqTransducer
        layers: 1
      transform: !AuxNonLinear
        output_dim: 512
        activation: 'tanh'
      bridge: !CopyBridge {}
  train: !SimpleTrainingRegimen
    batcher: !SrcBatcher
      batch_size: 32
    trainer: !NoamTrainer
      alpha: 1.0
      warmup_steps: 4000
    run_for_epochs: 2
    src_file: examples/data/train.ja
    trg_file: examples/data/train.en
    dev_tasks:
      - !LossEvalTask
        src_file: examples/data/head.ja
        ref_file: examples/data/head.en
  evaluate:
    - !AccuracyEvalTask
      eval_metrics: bleu
      src_file: examples/data/head.ja
      ref_file: examples/data/head.en
      hyp_file: examples/output/{EXP}.test_hyp

Char Segment¶

# Examples of using SegmentingSeqTransducer
# Look available composition functions at xnmt/specialized_encoders/segmenting_encoder/segmenting_composer.py

# Looking up characters from word vocabulary
# Basically this is the same as 01_standard.yaml
seg_lookup: !Experiment
  exp_global: !ExpGlobal {}
  model: !DefaultTranslator
    src_reader: !CharFromWordTextReader
      # Can be produced by script/vocab/make_vocab.py --char_vocab < [CORPUS]
      vocab: !Vocab {vocab_file: examples/data/head.ja.charvocab}
    trg_reader: !PlainTextReader
      vocab: !Vocab {vocab_file: examples/data/head.en.vocab}
    # It reads in characters and produce word embeddings
    encoder: !SegmentingSeqTransducer
      segment_composer: !LookupComposer
        word_vocab: !Vocab {vocab_file: examples/data/head.ja.vocab}
      final_transducer: !BiLSTMSeqTransducer {}
  train: !SimpleTrainingRegimen
    run_for_epochs: 1
    src_file: examples/data/head.ja
    trg_file: examples/data/head.en
  evaluate:
    - !AccuracyEvalTask
      eval_metrics: bleu,wer
      src_file: examples/data/head.ja
      ref_file: examples/data/head.en
      hyp_file: test/tmp/{EXP}.test_hyp
      inference: !AutoRegressiveInference {}

# Summing together character composition functions.
seg_sum: !Experiment
  exp_global: !ExpGlobal {}
  model: !DefaultTranslator
    src_reader: !CharFromWordTextReader
      vocab: !Vocab {vocab_file: examples/data/head.ja.charvocab}
    trg_reader: !PlainTextReader
      vocab: !Vocab {vocab_file: examples/data/head.en.vocab}
    encoder: !SegmentingSeqTransducer
      ### Pay attention to this part
      segment_composer: !SumComposer {}
      ###
      final_transducer: !BiLSTMSeqTransducer {}
  train: !SimpleTrainingRegimen
    run_for_epochs: 1
    src_file: examples/data/head.ja
    trg_file: examples/data/head.en
  evaluate:
    - !AccuracyEvalTask
      eval_metrics: bleu,wer
      src_file: examples/data/head.ja
      ref_file: examples/data/head.en
      hyp_file: test/tmp/{EXP}.test_hyp
      inference: !AutoRegressiveInference {}

# Using BiLSTM to predict word embeddings.
seg_bilstm: !Experiment
  exp_global: !ExpGlobal {}
  model: !DefaultTranslator
    src_reader: !CharFromWordTextReader
      vocab: !Vocab {vocab_file: examples/data/head.ja.charvocab}
    trg_reader: !PlainTextReader
      vocab: !Vocab {vocab_file: examples/data/head.en.vocab}
    encoder: !SegmentingSeqTransducer
      ### Pay attention to this part
      segment_composer: !SeqTransducerComposer
        seq_transducer: !BiLSTMSeqTransducer {}
      ###
      final_transducer: !BiLSTMSeqTransducer {}
  train: !SimpleTrainingRegimen
    run_for_epochs: 1
    src_file: examples/data/head.ja
    trg_file: examples/data/head.en
  evaluate:
    - !AccuracyEvalTask
      eval_metrics: bleu,wer
      src_file: examples/data/head.ja
      ref_file: examples/data/head.en
      hyp_file: test/tmp/{EXP}.test_hyp
      inference: !AutoRegressiveInference {}

# Using CHARAGRAM composition function
seg_charagram: !Experiment
  exp_global: !ExpGlobal {}
  model: !DefaultTranslator
    src_reader: !CharFromWordTextReader
      vocab: !Vocab {vocab_file: examples/data/head.ja.charvocab}
    trg_reader: !PlainTextReader
      vocab: !Vocab {vocab_file: examples/data/head.en.vocab}
    encoder: !SegmentingSeqTransducer
      ### Pay attention to this part
      segment_composer: !CharNGramComposer
        ngram_size: 4
        word_vocab: !Vocab {vocab_file: examples/data/head.ngramcount.ja}
      ###
      final_transducer: !BiLSTMSeqTransducer {}
  train: !SimpleTrainingRegimen
    run_for_epochs: 1
    src_file: examples/data/head.ja
    trg_file: examples/data/head.en
  evaluate:
    - !AccuracyEvalTask
      eval_metrics: bleu,wer
      src_file: examples/data/head.ja
      ref_file: examples/data/head.en
      hyp_file: test/tmp/{EXP}.test_hyp
      inference: !AutoRegressiveInference {}

# Using Composition of CHARAGRAM and Lookup
seg_lookup_charagram: !Experiment
  exp_global: !ExpGlobal {}
  model: !DefaultTranslator
    src_reader: !CharFromWordTextReader
      vocab: !Vocab {vocab_file: examples/data/head.ja.charvocab}
    trg_reader: !PlainTextReader
      vocab: !Vocab {vocab_file: examples/data/head.en.vocab}
    encoder: !SegmentingSeqTransducer
      ### Pay attention to this part
      segment_composer: !SumMultipleComposer
        composers: 
        - !LookupComposer
          word_vocab: !Vocab {vocab_file: examples/data/head.ja.vocab}
        - !CharNGramComposer
          ngram_size: 4
          word_vocab: !Vocab {vocab_file: examples/data/head.ngramcount.ja}
      ###
      final_transducer: !BiLSTMSeqTransducer {}
  train: !SimpleTrainingRegimen
    run_for_epochs: 1
    src_file: examples/data/head.ja
    trg_file: examples/data/head.en
  evaluate:
    - !AccuracyEvalTask
      eval_metrics: bleu,wer
      src_file: examples/data/head.ja
      ref_file: examples/data/head.en
      hyp_file: test/tmp/{EXP}.test_hyp
      inference: !AutoRegressiveInference {}

Switchout¶

# Implements SwitchOut, a data augmentation strategy for NMT
# RAML corrupts target side only, while SwitchOut corrupts both source and target
# https://arxiv.org/pdf/1808.07512.pdf 
switchout: !Experiment 
  # global parameters shared throughout the experiment
  exp_global: !ExpGlobal
    # {EXP_DIR} is a placeholder for the directory in which the config file lies.
    # {EXP} is a placeholder for the experiment name (here: 'standard')
    model_file: '{EXP_DIR}/models/{EXP}.mod'
    log_file: '{EXP_DIR}/logs/{EXP}.log'
    default_layer_dim: 512
    dropout: 0.3
  # model architecture
  model: !DefaultTranslator
    src_reader: !RamlTextReader
      vocab: !Vocab {vocab_file: examples/data/head.ja.vocab}
      tau: 0.8
    trg_reader: !RamlTextReader
      vocab: !Vocab {vocab_file: examples/data/head.en.vocab}
      tau: 0.8
    src_embedder: !SimpleWordEmbedder
      emb_dim: 512
    encoder: !BiLSTMSeqTransducer
      layers: 1
    attender: !MlpAttender
      hidden_dim: 512
      state_dim: 512
      input_dim: 512
    decoder: !AutoRegressiveDecoder
      embedder: !SimpleWordEmbedder
        emb_dim: 512
      rnn: !UniLSTMSeqTransducer
        layers: 1
      transform: !AuxNonLinear
        output_dim: 512
        activation: 'tanh'
      bridge: !CopyBridge {}
      scorer: !Softmax {}
  # training parameters
  train: !SimpleTrainingRegimen
    batcher: !SrcBatcher
      batch_size: 32
    trainer: !AdamTrainer
      alpha: 0.001
    run_for_epochs: 2
    src_file: examples/data/head.ja
    trg_file: examples/data/head.en
    dev_tasks:
      - !LossEvalTask
        src_file: examples/data/head.ja
        ref_file: examples/data/head.en
  # final evaluation
  evaluate:
    - !AccuracyEvalTask
      eval_metrics: bleu
      src_file: examples/data/head.ja
      ref_file: examples/data/head.en
      hyp_file: examples/output/{EXP}.test_hyp

Experiment configuration file format¶

Intro¶

Experiment¶

Preprocessing¶

Model¶

Training¶

Evaluation¶

Config files vs. saved model files¶

Examples¶

Standard¶

Minimal¶

Multiple experiments¶

Settings¶

Preprocessing¶

Early stopping¶

Fine-tuning¶

Beam search¶

Programmatic usage¶

Programmatic loading¶

Parameter sharing¶

Multi-task¶

Speech¶

Reporting attention matrices¶

Scoring N-best lists¶

Ensembling¶

Minimum risk training¶

Biased Lexicon¶

Subword Sampling¶

Self Attention¶

Char Segment¶

Switchout¶

Autobatching¶