Restart train with train_from i get a There appear to be 6 leaked semaphores to clean up at shutdown error

Hi

Not sure what i am doing wrong. I have a transformer model, small one, using values from the FAQ.

save_model: ptes/run/model
save_checkpoint_steps: 5000
train_steps: 100000
valid_steps: 10000

I hit control-C now I what to restart, I and I typed:


oot@43ab6766956b:/u/02_T4T/22_SP_TRANSF_Statclean_100_3.5# onmt_train -config pt_es.yaml -train_from ptes/run/model_step_15000.pt
[2021-07-05 14:19:40,928 INFO] Missing transforms field for corpus_1 data, set to default: [].
[2021-07-05 14:19:40,928 WARNING] Corpus corpus_1's weight should be given. We default it to 1 for you.
[2021-07-05 14:19:40,928 INFO] Missing transforms field for valid data, set to default: [].
[2021-07-05 14:19:40,928 INFO] Parsed 2 corpora from -data.
[2021-07-05 14:19:40,928 INFO] Loading checkpoint from ptes/run/model_step_15000.pt
[2021-07-05 14:19:41,600 INFO] Loading fields from checkpoint...
[2021-07-05 14:19:41,600 INFO]  * src vocab size = 13767
[2021-07-05 14:19:41,600 INFO]  * tgt vocab size = 13512
/opt/conda/lib/python3.7/multiprocessing/semaphore_tracker.py:144: UserWarning: semaphore_tracker: There appear to be 6 leaked semaphores to clean up at shutdown
  len(cache))
Bus error (core dumped)

Besides the -train_from do i need something else?

I am using a docker image with if not the last one of the last images (end of june) in an ubuntu host.

Do I miss something?
Thanks in advance
Miquel Canals

Hi Miquel!

It seems you use Conda. Some users here say that updating Conda helps.

Kind regards,
Yasmin

Hi Yasmin, thanks for your anser.

I have run conda update --all and conda update -n base conda and indeed my conda version has upgraded from 4.10.1 to 4.10.3, but issue the same. Maybe is related to how my docker image has been created.

# https://hub.docker.com/r/pytorch/pytorch/tags?page=1&ordering=last_updated
FROM pytorch/pytorch:1.9.0-cuda10.2-cudnn7-runtime
# 
# Update the image to the latest packages
RUN apt-get update && apt-get upgrade -y
#
RUN apt install git -y
RUN apt install nano
# Locale UTF8 https://stackoverflow.com/questions/27931668/encoding-problems-when-running-an-app-in-docker-python-java-ruby-with-u
RUN apt-get update && apt-get install -y locales && locale-gen en_US.UTF-8
RUN locale-gen en_US.UTF-8
ENV LANG en_US.UTF-8
ENV LANGUAGE en_US:en
ENV LC_ALL en_US.UTF-8
# sentence piece
# from https://github.com/google/sentencepiece
RUN apt-get install cmake build-essential pkg-config libgoogle-perftools-dev -y 
RUN git clone https://github.com/google/sentencepiece.git && \
     cd sentencepiece       && \
     mkdir build            && \
     cd build               && \
     cmake ..               && \
     make -j $(nproc)       && \
     make install           && \
     ldconfig -v          
RUN cd ../..              
RUN  git clone https://github.com/OpenNMT/OpenNMT-py.git && \
     cd OpenNMT-py  && \
     pip install -e .

Thanks anyway.
Have a nice day!
Miquel