I want to use a pretrained BPEmb model. Here I use SentencePiece and create training.en.vocab and training.bn.vocab which I used as vocabulary files.
I have downloaded W2V files and included as source and target embedding files.
Is this correct pipeline for data preprocessing and training?
OS: Windows 10
model_dir: run/
data:
train_features_file: TrainEn.txt
train_labels_file: trainBn.txt
eval_features_file: trainDevEn.txt
eval_labels_file: trainDevBn.txt
source_vocabulary: training.en.vocab
target_vocabulary: training.bn.vocab
train:
max_step: 5000
batch_size: 40
source_embedding:
path: en.wiki.bpe.vs25000.d300.w2v.txt
with_header: False
case_insensitive: True
target_embedding:
path: bn.wiki.bpe.vs25000.d300.w2v.txt
with_header: False