Introducing jp-translate.cloud

JptoEn · February 23, 2022, 7:26pm

http://jp-translate.cloud is a state-of-the-art open source Japanese ⇋ English machine translation system leveraging OpenNMT.

If you are interested in the development check out the research paper:

If you would like any of the data / development notebooks just let me know, source code is on my Github.

Many thanks to the OpenNMT community for all the help and especially Yasmin Moslem for providing great tutorials and tooling.

yaren · March 3, 2022, 11:08am

good job. I have some questions:

Why the model is so small? The model files in Github look like smaller than 10M.
How many bilingual sentences you had to trained?
Can this model return the alignment information ?
Thanks for your sharing. Really good job.

JptoEn · March 5, 2022, 8:23pm

Hi thanks for checking the project out.

1.) Actually those are the SentencePiece models, the Ctranslate2 models are too large to put on Github (maybe I can use LFS but I am not familiar with it).

2.) 15, 847, 871 parallel sentences.

3.) I don’t think the ctranslate models are able to do so, OpenNMT itself can but inference would be a lot slower I think.

ymoslem · March 5, 2022, 11:30pm

Congratulations, Matthew! Great work.

Recently, I started to add big files under “Releases” on the relevant GitHub repository.

Kind regards,
Yasmin

JptoEn · March 6, 2022, 9:49am

Thanks for the tip that was easy to do.