How computing loss in shards helps reduce memory cost? (1)
SentencePiece vs. BPE (1)
Custom Loss Function Criterion (5)
How to ensemble some models by OpenNMT(pytorch)? (2)
How should I choose parameters? (2)
Corpus level TER averaging (3)
How ensemble decoding (4)
OpenNMT : lua code debug (2)
Need help understanding copy_attn_force (3)
BPE options handling in learn_bpe.lua and tokenizer.lua (4)
OpenNMT tagger (CUDA -> CPU) release model (3)
Changing the behaviour of `end_epoch` options when used in combination with `train_from` and `continue` options (5)
Word features with idx_files (2)
Attention only models (2)
[Code Understanding] Where are different models 'used' in the source? (6)
Can dispatching batches with different src_len degrade performance in synchronous training (5)
OpenCL instead of CUDA (4)
Gradient checking (3)
Adaptive Learning in OpenNMT (11)
How to use multi-GPU parallel training with an old commit on Feb 23 (2)
-train_from and train_from_state_dict (4)
Add support for distributed training on multiple CPU only nodes (2)
Can we use language translation model for Hindi to English translation? (2)
outputs.backward(gradOutput) (7)
What's the difference between model.zero_grad() and optim.zero_grad() (2)
Need a proper understanding of the preprocess step (5)
Having issue while preprocessing the data (1)
Different RNN architecture for each layer of Encoder (3)
Further Memory Optimization (1)
Coding environment for the framework (2)