<unk> Token in Target File During Translation

I am currently training a dataset using OpenNMT-py that contains a source file containing English natural language statements and a target file that contains the expected Java code translation of the English statement one entry per line (I do not see an option to upload these files for reference, so if they are needed, I will need to know how to share them on the forum). The preprocessing and training complete without issues; however, when I attempt to translate (using onmt_translate), I get an < unk > token for all arguments contained in the method arguments (in this case System.out.println()) as seen below.

SENT 647: ['show', 'me', 'distressed', 'bans', 'macquarie', 'shouting', 'pta']
PRED 647: System.out.println ( "pooh enthusiast udp positively
PRED SCORE: -6.5099
GOLD 647: System.out.println ( <unk> <unk> <unk> <unk> <unk> ) ;
GOLD SCORE: -66.7281

SENT 648: ['output', 'poy', 'mao']
PRED 648: System.out.println ( "parade" )
PRED SCORE: -6.1009
GOLD 648: System.out.println ( <unk> <unk> ) ;
GOLD SCORE: -28.3675

SENT 649: ['output', 'bullock', 'to', 'console']
PRED 649: System.out.println ( "parade" )
PRED SCORE: -5.8722
GOLD 649: System.out.println ( <unk> ) ;
GOLD SCORE: -18.7538

SENT 650: ['display', 'villagers', 'transferable', 'yummy', 'acknowledgments', 'ethiopian', 'on', 'the', 'console']
PRED 650: System.out.println ( "pooh enthusiast udp positively
PRED SCORE: -6.3975
GOLD 650: System.out.println ( <unk> <unk> <unk> <unk> <unk> ) ;
GOLD SCORE: -56.7989

SENT 651: ['show', 'me', 'momma', 'lehigh']
PRED 651: System.out.println ( "fails"
PRED SCORE: -6.9291
GOLD 651: System.out.println ( <unk> <unk> ) ;
GOLD SCORE: -28.7017

I am not sure why this is occurring, and I have tried changing OpenNMT-py command arguments and reinstalling and upgrading OpenNMT-py, but I still get the same behavior.

Below are my preprocessing, training, and translation commands I used:

Does anyone know what the cause of this issue may be and a potential solution?

I’m not sure to understand. The <unk> only appear on the “GOLD” line. These sequences come from the reference translation file that you passed to the -tgt option during translation. They are not produced by the model.

That is what I thought, but below is the contents of the -tgt file (…/personal-dataset/data_test_code.txt) and it does not contain the < unk > tokens that show up in translation:

System.out.println ( "tic mississauga dialysis filmed" ) ;
System.out.println ( "staten carole noticeable" ) ;
System.out.println ( "cette aesthetics schwarzenegger" ) ;
System.out.println ( "smoker benign hypotheses" ) ;
System.out.println ( "afforded" ) ;
System.out.println ( "aisle dunno blur evidently" ) ;
System.out.println ( "summarizes limbs" ) ;
System.out.println ( "unforgettable punt" ) ;
System.out.println ( "sludge crypto christensen tanned altering" ) ;
System.out.println ( "bunker multiplication paved heavyweight lps" ) ;
System.out.println ( "fabricated zach pdp" ) ;
System.out.println ( "pasture" ) ;
System.out.println ( "phantomnode richest cruelty comptroller scalability" ) ;
System.out.println ( "creatine mormon embl minimizing" ) ;
System.out.println ( "scots genuinely gpo" ) ;
System.out.println ( "neighbouring plugged tyson" ) ;
System.out.println ( "souvenir dq mifflin relativity" ) ;
System.out.println ( "mojo" ) ;
System.out.println ( "econo cucumber occurrences" ) ;
System.out.println ( "shapiro marshal rituals anders" ) ;
System.out.println ( "seize decisive spawn pq" ) ;
System.out.println ( "blanks ub dungeons" ) ;
System.out.println ( "epoxy watercolor" ) ;
System.out.println ( "uncensored sailors stony fayette" ) ;
System.out.println ( "trainees tori shelving" ) ;
System.out.println ( "effluent infousa" ) ;
System.out.println ( "annals storytelling sadness periodical" ) ;
System.out.println ( "polarization moe dime" ) ;
System.out.println ( "losers bombings punta" ) ;
System.out.println ( "flavour smes ionamin fuckin" ) ;
System.out.println ( "crypt charlottesville accomplishment xu" ) ;
System.out.println ( "onwards bogus carp aniston" ) ;
System.out.println ( "prompts" ) ;
System.out.println ( "witches barred skinner" ) ;
System.out.println ( "equities dusk nouveau customary vertically" ) ;
System.out.println ( "crashing cautious" ) ;
System.out.println ( "possessions feeders urging jboss passions" ) ;
System.out.println ( "faded mobil" ) ;
System.out.println ( "scrolling counterpart utensils" ) ;
System.out.println ( "secretly tying lent diode" ) ;
System.out.println ( "kaufman magician" ) ;
System.out.println ( "indulgence aloe johan buckinghamshire" ) ;
System.out.println ( "melted" ) ;
System.out.println ( "lund medford fam nel extremes" ) ;
System.out.println ( "puff underlined whores galileo bloomfield" ) ;
System.out.println ( "obsessed flavored" ) ;
System.out.println ( "gemstones" ) ;
System.out.println ( "bmi viewpoints groceries motto" ) ;
System.out.println ( "exim singled alton" ) ;
System.out.println ( "appalachian staple" ) ;
System.out.println ( "dealings phillies pathetic ramblings" ) ;
System.out.println ( "janis craftsman irritation rulers centric" ) ;
System.out.println ( "collisions militia optionally eis conservatory" ) ;
System.out.println ( "nightclub bananas geophysical fictional adherence" ) ;
System.out.println ( "golfing" ) ;
System.out.println ( "defended rubin handlers" ) ;
System.out.println ( "grille elisabeth claw pushes" ) ;
System.out.println ( "alain flagship" ) ;
System.out.println ( "kittens topeka openoffice illegally" ) ;
System.out.println ( "bugzilla deter tyre" ) ;
System.out.println ( "furry cubes" ) ;
System.out.println ( "transcribed" ) ;
System.out.println ( "bouncing" ) ;
System.out.println ( "wand" ) ;
System.out.println ( "linus taco mcsg humboldt" ) ;
System.out.println ( "scarves cavalier ish rinse" ) ;
System.out.println ( "outfits mla" ) ;
System.out.println ( "charlton" ) ;
System.out.println ( "repertoire respectfully emeritus ulster macroeconomic" ) ;
System.out.println ( "tides chu" ) ;
System.out.println ( "weld venom gundam adaptec" ) ;
System.out.println ( "writ patagonia" ) ;
System.out.println ( "dispensing tailed puppets voyer tapping" ) ;
System.out.println ( "hostname excl bx" ) ;
System.out.println ( "arr typo immersion" ) ;
System.out.println ( "explode toulouse escapes" ) ;
System.out.println ( "berries merchantability" ) ;
System.out.println ( "happier autodesk mummy jn punjab" ) ;
System.out.println ( "stacked winged brighter cries" ) ;
System.out.println ( "speciality warranted attacker ruined catcher" ) ;
System.out.println ( "damp sanity ether suction haynes" ) ;
System.out.println ( "crusade siyabona rumble" ) ;
System.out.println ( "inverter correcting shattered" ) ;
System.out.println ( "abi heroic motivate" ) ;
System.out.println ( "retreats mackay formulate bridgeport assessor" ) ;
System.out.println ( "fullerton cpp sheds blockbuster dz" ) ;
System.out.println ( "amarillo pixmania" ) ;
System.out.println ( "pathfinder anomalies homogeneous bonsai" ) ;
System.out.println ( "windshield humphrey spheres belonged" ) ;
System.out.println ( "tomtom spf assigns croydon" ) ;
System.out.println ( "sofas croix cushions fern" ) ;
System.out.println ( "convection jdbc" ) ;
System.out.println ( "defenders debugger boing" ) ;
System.out.println ( "odessa" ) ;
System.out.println ( "lore ancillary pointless" ) ;
System.out.println ( "whipped vox alibris dinners rosie" ) ;
System.out.println ( "factoring genealogical gyms inhalation terre" ) ;
System.out.println ( "selfish" ) ;
System.out.println ( "eventual" ) ;
System.out.println ( "faucet" ) ;
System.out.println ( "nach" ) ;
System.out.println ( "mitigate" ) ;
System.out.println ( "bitpipe" ) ;
System.out.println ( "jamestown arguably techs electives" ) ;
System.out.println ( "walkman" ) ;
System.out.println ( "midget elisa shelton quan boiled" ) ;
System.out.println ( "commissioning neville experimentation" ) ;
System.out.println ( "saltwater" ) ;
System.out.println ( "natasha cpi endeavour roswell haute" ) ;
System.out.println ( "herring nis unfamiliar" ) ;
System.out.println ( "wacky expectancy deterioration sgml" ) ;
System.out.println ( "proclaimed arid anemia biting" ) ;
System.out.println ( "coincidence idiots mona reits muddy" ) ;
System.out.println ( "nuevo savanna" ) ;
System.out.println ( "crn hitchcock" ) ;
System.out.println ( "cid travestis neighbour mmf" ) ;
System.out.println ( "raspberry" ) ;
System.out.println ( "cancellations paging" ) ;
System.out.println ( "coe nudists illusions fac spikes" ) ;
System.out.println ( "asean airsoft bontril enumeration" ) ;
System.out.println ( "proliant keeling zh accesses" ) ;
System.out.println ( "suche permissible" ) ;
System.out.println ( "yielded nuisance jive siam" ) ;
System.out.println ( "latent" ) ;
System.out.println ( "marcia drowning" ) ;
System.out.println ( "bullshit casper spun shalt libstdc" ) ;
System.out.println ( "ric loch commanding sparrow poorest" ) ;
System.out.println ( "hector xpress" ) ;
System.out.println ( "datasets webdesign nicotine comeback" ) ;
System.out.println ( "brotherhood gannett milling sinking sulphur" ) ;
System.out.println ( "curricular downtime" ) ;
System.out.println ( "takeover wicker lolitas balm" ) ;
System.out.println ( "thessalonians figs upto browne" ) ;
System.out.println ( "nephew confess" ) ;
System.out.println ( "joaquin" ) ;
System.out.println ( "chit chaotic alexandre lays principally" ) ;
System.out.println ( "visor mundo transistor jarvis drip" ) ;
System.out.println ( "traced outright" ) ;
System.out.println ( "melodies spotting myriad stains" ) ;
System.out.println ( "sandal rubbing naive wien skeptical" ) ;
System.out.println ( "wagering" ) ;
System.out.println ( "remembrance detects" ) ;
System.out.println ( "everest" ) ;
System.out.println ( "disregard hanger outkast" ) ;
System.out.println ( "dragged pitbull foreman" ) ;
System.out.println ( "rtf allegiance fairview" ) ;
System.out.println ( "hires conduit alienware" ) ;
System.out.println ( "dependable mainframe echoes indo" ) ;
System.out.println ( "compilers ladders prudent" ) ;
System.out.println ( "glowing guinness heartbeat" ) ;
System.out.println ( "blazer alchemy" ) ;
System.out.println ( "linden timezone merck" ) ;
System.out.println ( "sven tanya geographically" ) ;
System.out.println ( "bmc alternating" ) ;
System.out.println ( "tristan audible folio eia presiding" ) ;
System.out.println ( "mans colleen bbbonline" ) ;
System.out.println ( "participates waterways syndicated lexicon" ) ;
System.out.println ( "aff fractures apprenticeship childbirth" ) ;
System.out.println ( "dumped integers zirconia barre" ) ;
System.out.println ( "shortages plumbers rama johannes fiery" ) ;
System.out.println ( "convex jfk raf richer igor" ) ;
System.out.println ( "hama mop" ) ;
System.out.println ( "urn soleil" ) ;
System.out.println ( "patton pei surfer diapers eas" ) ;
System.out.println ( "waco physiol connor adp" ) ;
System.out.println ( "northamptonshire biscuits disclaims sich outbound" ) ;
System.out.println ( "breakout restless unanswered paired fakes" ) ;
System.out.println ( "stderr" ) ;
System.out.println ( "kev fomit" ) ;
System.out.println ( "vaults injections ahmad" ) ;
System.out.println ( "remortgage yogurt complies tossed caucus" ) ;
System.out.println ( "workaround" ) ;
System.out.println ( "cooke polytechnic pillars" ) ;
System.out.println ( "katy zoe uber" ) ;
System.out.println ( "overwhelmed salute" ) ;
System.out.println ( "shoppe" ) ;
System.out.println ( "parody berlios" ) ;
System.out.println ( "csr" ) ;
System.out.println ( "penthouse compensated synthase lacked circulated" ) ;
System.out.println ( "soo" ) ;
System.out.println ( "pistons emule maltese sauvignon" ) ;
System.out.println ( "acorn bosses pint" ) ;
System.out.println ( "ascension bayer carrera ply mornings" ) ;
System.out.println ( "dvb cation mentioning" ) ;
System.out.println ( "scientology cdma flagstaff maxi" ) ;
System.out.println ( "pretoria thrive" ) ;
System.out.println ( "msm rac" ) ;
System.out.println ( "feminism rightly paragon" ) ;
System.out.println ( "basal topps" ) ;
System.out.println ( "webinar dewalt" ) ;
System.out.println ( "turnout bruins persist wilde indispensable" ) ;
System.out.println ( "clamps illicit firefly liar" ) ;
System.out.println ( "tabletop pledged" ) ;
System.out.println ( "monoclonal pictorial" ) ;
System.out.println ( "curling ares wholesaler smoky" ) ;
System.out.println ( "opus typekey aromatic flirt slang" ) ;
System.out.println ( "emporium princes restricting partnering" ) ;
System.out.println ( "promoters" ) ;
System.out.println ( "soothing freshmen mage departed" ) ;
System.out.println ( "sqrt aristotle israelis" ) ;
System.out.println ( "finch inherently cdp krishna" ) ;
System.out.println ( "forefront" ) ;
System.out.println ( "headlights" ) ;
System.out.println ( "monophonic largo proquest" ) ;
System.out.println ( "amazingly plural dominic sergio" ) ;
System.out.println ( "swapping skipped hereinafter nur" ) ;
System.out.println ( "extracting analogous" ) ;
System.out.println ( "mev" ) ;
System.out.println ( "hebrews particulate tally unpleasant" ) ;
System.out.println ( "uno" ) ;
System.out.println ( "tempted bedfordshire blindness creep" ) ;
System.out.println ( "staining rockport" ) ;
System.out.println ( "nist shaded cot plaster" ) ;
System.out.println ( "novo" ) ;
System.out.println ( "negotiable subcategories" ) ;
System.out.println ( "hearted" ) ;
System.out.println ( "quarterback obstruction" ) ;
System.out.println ( "agility complying" ) ;
System.out.println ( "sudbury otis overture newcomers hectares" ) ;
System.out.println ( "upscale scrabble noteworthy" ) ;
System.out.println ( "agile sdn mta sacks" ) ;
System.out.println ( "docbook kiosk ionic stray runaway" ) ;
System.out.println ( "slowing firstgov" ) ;
System.out.println ( "hoodie hoodia" ) ;
System.out.println ( "payout clinically" ) ;
System.out.println ( "watchers supplemented poppy monmouth" ) ;
System.out.println ( "metacritic obligated frenzy decoding" ) ;
System.out.println ( "jargon kangaroo sleeper elemental presenters" ) ;
System.out.println ( "teal unnamed" ) ;
System.out.println ( "epstein doncaster particulars jerking weblogic" ) ;
System.out.println ( "ity bungalow" ) ;
System.out.println ( "covington bazaar esd interconnect" ) ;
System.out.println ( "predicate recurrence" ) ;
System.out.println ( "chinatown mindless purifier recruits" ) ;
System.out.println ( "sharper kz tablespoons greedy" ) ;
System.out.println ( "rodgers gloryhole supervise" ) ;
System.out.println ( "termed frauen suppl" ) ;
System.out.println ( "stamping coolest reilly hotjobs downing" ) ;
System.out.println ( "gnd libc basque societal astros" ) ;
System.out.println ( "ire halogen" ) ;
System.out.println ( "pegasus" ) ;
System.out.println ( "silhouette wyndham osu" ) ;
System.out.println ( "tuesdays" ) ;
System.out.println ( "dorado daring realms maestro turin" ) ;
System.out.println ( "gus utp superpages forte coaxial" ) ;
System.out.println ( "tipping" ) ;
System.out.println ( "jpy holster" ) ;
System.out.println ( "fiddle crunch leipzig liam" ) ;
System.out.println ( "sesso bard kellogg" ) ;
System.out.println ( "arabidopsis reap argv hanoi ccm" ) ;
System.out.println ( "faucets" ) ;
System.out.println ( "ballistic exemplary payouts rockin caliber" ) ;
System.out.println ( "apostle playful supermarkets bmg icelandic" ) ;
System.out.println ( "multiplied enchanted" ) ;
System.out.println ( "belgrade styled nacional commanders csv" ) ;
System.out.println ( "telstra" ) ;
System.out.println ( "thor" ) ;
System.out.println ( "waive contraception" ) ;
System.out.println ( "bethany polaroid" ) ;
System.out.println ( "vance soprano polishing marquis" ) ;
System.out.println ( "underage cardio" ) ;
System.out.println ( "wen translating frontiers" ) ;
System.out.println ( "timeshares atk qi logger adjoining" ) ;
System.out.println ( "greet acclaim kool" ) ;
System.out.println ( "oki birding" ) ;
System.out.println ( "hardship detainees hast indi lymph" ) ;
System.out.println ( "barrie" ) ;
System.out.println ( "pollutant closeouts miriam cavaliers" ) ;
System.out.println ( "rollers carleton pumped" ) ;
System.out.println ( "tolkien differentiated sonia undp verifying" ) ;
System.out.println ( "jbl" ) ;
System.out.println ( "almighty weekday homecoming" ) ;
System.out.println ( "increments kurdish" ) ;
System.out.println ( "vel intuition" ) ;
System.out.println ( "revoked openness chromium circulating" ) ;
System.out.println ( "bryce ilo latch mccormick" ) ;
System.out.println ( "verbs drank" ) ;
System.out.println ( "pcm confrontation shreveport grower" ) ;
System.out.println ( "frederic darlington" ) ;
System.out.println ( "slippery unpredictable galerie dtd" ) ;
System.out.println ( "capacitor outpost burnett" ) ;
System.out.println ( "hilfiger mda" ) ;
System.out.println ( "litres moroccan" ) ;
System.out.println ( "seville" ) ;
System.out.println ( "mira nightwish" ) ;
System.out.println ( "chatter hess wheaton" ) ;
System.out.println ( "santo lettuce" ) ;
System.out.println ( "raging tidy motorized jong subgroup" ) ;
System.out.println ( "oppression chevelle vets bows" ) ;
System.out.println ( "yielding assays torso occult expeditions" ) ;
System.out.println ( "nok hooker ramon longhorn lorenzo" ) ;
System.out.println ( "beau backdrop subordinate lilies" ) ;
System.out.println ( "aerobic articulate" ) ;
System.out.println ( "vgroup ecstasy sweetheart" ) ;
System.out.println ( "fulfil calcutta thursdays" ) ;
System.out.println ( "dansk" ) ;
System.out.println ( "tenerife hobbs" ) ;
System.out.println ( "mayen mediator oldmedline dunlop" ) ;
System.out.println ( "caa" ) ;
System.out.println ( "tad modernization" ) ;
System.out.println ( "xe cultivated rang disconnected consulate" ) ;
System.out.println ( "fourier" ) ;
System.out.println ( "businessman watersports lucent" ) ;
System.out.println ( "wilkes commuter" ) ;
System.out.println ( "orthopedic" ) ;
System.out.println ( "disagreement hhs" ) ;
System.out.println ( "strands tyrosine sicily compost" ) ;
System.out.println ( "shenzhen adjourned familiarity initiating erroneous" ) ;
System.out.println ( "grabs erickson marlin pulses theses" ) ;
System.out.println ( "stuffing casserole canoeing" ) ;
System.out.println ( "cca jeux wilton" ) ;
System.out.println ( "ophthalmology flooded geile" ) ;
System.out.println ( "clubhouse reverted crackers greyhound corsair" ) ;
System.out.println ( "ironic" ) ;
System.out.println ( "licensees wards unsupported evaluates" ) ;
System.out.println ( "hinge svg ultima cockpit protesters" ) ;
System.out.println ( "fernandez venetian mvc sleazydream" ) ;
System.out.println ( "patti mz sew carrots faire" ) ;
System.out.println ( "laps memorials" ) ;
System.out.println ( "sennheiser resumed sheehan conversely emory" ) ;
System.out.println ( "stunt maven" ) ;
System.out.println ( "excuses commute staged vitae transgender" ) ;
System.out.println ( "hustle stimuli customizing" ) ;
System.out.println ( "subroutine upwards witty pong transcend" ) ;
System.out.println ( "loosely" ) ;
System.out.println ( "anchors hun" ) ;
System.out.println ( "hertz" ) ;
System.out.println ( "atheist capped oro myr" ) ;
System.out.println ( "bridgewater firefighter liking preacher propulsion" ) ;
System.out.println ( "complied intangible westfield compassionate catastrophic" ) ;
System.out.println ( "fuckers blower" ) ;
System.out.println ( "substitutes tata" ) ;
System.out.println ( "flown frau dubbed silky giclee" ) ;
System.out.println ( "groovy vows reusable" ) ;
System.out.println ( "macy actuarial distorted nathaniel attracts" ) ;
System.out.println ( "bern qualifies grizzly helpline micah" ) ;
System.out.println ( "erectile timeliness obstetrics chaired" ) ;
System.out.println ( "agri repay hurting" ) ;
System.out.println ( "homicide prognosis colombian pandemic await" ) ;
System.out.println ( "mpc fob sparse corridors sont" ) ;
System.out.println ( "mcdowell fossils victories dimage" ) ;
System.out.println ( "chemically" ) ;
System.out.println ( "fetus" ) ;
System.out.println ( "determinants compliments durango cider noncommercial" ) ;
System.out.println ( "opteron crooked gangs segregation superannuation" ) ;
System.out.println ( "nemo ifs overcast" ) ;
System.out.println ( "inverted lenny" ) ;
System.out.println ( "achieves haas wimbledon documentaries mpa" ) ;
System.out.println ( "rao remake" ) ;
System.out.println ( "arp braille forehead physiopathology skye" ) ;
System.out.println ( "seperate" ) ;
System.out.println ( "econpapers arxiv pax kalamazoo" ) ;
System.out.println ( "taj percy scratches" ) ;
System.out.println ( "conan lilac sinus maverick" ) ;
System.out.println ( "intellect charmed denny harman hears" ) ;
System.out.println ( "wilhelm" ) ;
System.out.println ( "nationalism pervasive auch enfield" ) ;
System.out.println ( "anabolic" ) ;
System.out.println ( "nie allegra lexar clears videotape" ) ;
System.out.println ( "educ knowingly pivot" ) ;
System.out.println ( "amplification larsen huron" ) ;
System.out.println ( "snippets" ) ;
System.out.println ( "undergraduates conserv digestion dustin wsop" ) ;
System.out.println ( "mixtures composites wolverhampton soaring" ) ;
System.out.println ( "dragging virtues banning flushing" ) ;
System.out.println ( "deprivation cpt delights" ) ;
System.out.println ( "gauteng foreword glide transverse" ) ;
System.out.println ( "ftc watertown pathogens engagements mft" ) ;
System.out.println ( "withstand uefa newbury authorizes blooms" ) ;
System.out.println ( "soar jacking radiohead uniformly" ) ;
System.out.println ( "ooh subsections todos definately bod" ) ;
System.out.println ( "piedmont yin" ) ;
System.out.println ( "tiki empowered homepages asi" ) ;
System.out.println ( "lena" ) ;
System.out.println ( "outlying" ) ;
System.out.println ( "slogan subdivisions" ) ;
System.out.println ( "handouts deducted ezekiel totaling" ) ;
System.out.println ( "elijah cpm marvelous bop asnblock" ) ;
System.out.println ( "compton stretches vigorous biloxi" ) ;
System.out.println ( "flee biscuit creme submits" ) ;
System.out.println ( "woes waltz menace emerges" ) ;
System.out.println ( "classify paige downstairs statesman indymedia" ) ;
System.out.println ( "clapton" ) ;
System.out.println ( "cheerful blush beyonce" ) ;
System.out.println ( "smf leaflet monde weymouth" ) ;
System.out.println ( "nabble spherical intracellular infoworld" ) ;
System.out.println ( "favourable informs" ) ;
System.out.println ( "boyz dramas cher" ) ;
System.out.println ( "waltham" ) ;
System.out.println ( "geisha billiard" ) ;
System.out.println ( "aut dblp" ) ;
System.out.println ( "briefcase malay unseen" ) ;
System.out.println ( "mcmahon optimism cq silica kara" ) ;
System.out.println ( "mcgregor modal marlboro grafton" ) ;
System.out.println ( "unusually phishing addendum widest" ) ;
System.out.println ( "foia" ) ;
System.out.println ( "impotence medley cadet redskins" ) ;
System.out.println ( "kirsten temper yorker memberlistmemberlist gam" ) ;
System.out.println ( "intravenous ashcroft" ) ;
System.out.println ( "asserts" ) ;
System.out.println ( "loren stew newsfeed hereafter carbs" ) ;
System.out.println ( "retiring smashing yakima accumulate realtones" ) ;
System.out.println ( "xtc vdata interpro" ) ;
System.out.println ( "tahiti" ) ;
System.out.println ( "engadget" ) ;
System.out.println ( "tracey wac mariner collier" ) ;
System.out.println ( "hush darfur fragmentation" ) ;
System.out.println ( "behavioural kiev paranormal whispered generosity" ) ;
System.out.println ( "vibrating glossaries sonyericsson lama" ) ;
System.out.println ( "artisan akin raphael dex lola" ) ;
System.out.println ( "embarrassing emoticons carbohydrates aqueous pembroke" ) ;
System.out.println ( "hms norwood" ) ;
System.out.println ( "appetizers" ) ;
System.out.println ( "stockholders webmin lillian stylesheet" ) ;
System.out.println ( "goldstein splinter ibn wnba preferable" ) ;
System.out.println ( "englewood juices ironically morale morales" ) ;
System.out.println ( "solder trench asf persuasion" ) ;
System.out.println ( "hottie stripper practise pfc" ) ;
System.out.println ( "adrenaline mammalian" ) ;
System.out.println ( "opted lodged revolt meteorology analyzes" ) ;
System.out.println ( "renders" ) ;
System.out.println ( "pioneering" ) ;
System.out.println ( "pristine francaise" ) ;
System.out.println ( "ctx shines catalan spreadsheets regain" ) ;
System.out.println ( "resize auditory applause" ) ;
System.out.println ( "medically tweak" ) ;
System.out.println ( "mmm trait popped" ) ;
System.out.println ( "busted" ) ;
System.out.println ( "alicante basins farmhouse pounding" ) ;
System.out.println ( "picturesque ottoman graders shrek" ) ;
System.out.println ( "eater universidad" ) ;
System.out.println ( "tuners" ) ;
System.out.println ( "utopia slider insists cymru" ) ;
System.out.println ( "fprintf" ) ;
System.out.println ( "willard irq lettering" ) ;
System.out.println ( "dads" ) ;
System.out.println ( "marlborough sdl ebusiness" ) ;
System.out.println ( "pouring hays cyrus concentrating soak" ) ;
System.out.println ( "buckingham courtroom" ) ;
System.out.println ( "hides" ) ;
System.out.println ( "goodwin manure savior" ) ;
System.out.println ( "dade secrecy" ) ;
System.out.println ( "wesleyan baht" ) ;
System.out.println ( "duplicated" ) ;
System.out.println ( "dreamed" ) ;
System.out.println ( "relocating fertile hinges" ) ;
System.out.println ( "plausible creepy synth filthy subchapter" ) ;
System.out.println ( "ttf narrator optimizations" ) ;
System.out.println ( "infocus bellsouth sweeney augustus" ) ;
System.out.println ( "aca fpo fahrenheit" ) ;
System.out.println ( "hillside" ) ;
System.out.println ( "standpoint" ) ;
System.out.println ( "layup laundering" ) ;
System.out.println ( "nationalist piazza" ) ;
System.out.println ( "fre" ) ;
System.out.println ( "denoted nazis" ) ;
System.out.println ( "cumfiesta" ) ;
System.out.println ( "oneself royalties newbies mds piles" ) ;
System.out.println ( "abbreviation" ) ;
System.out.println ( "vaginas" ) ;
System.out.println ( "blanco" ) ;
System.out.println ( "critiques" ) ;
System.out.println ( "stroll" ) ;
System.out.println ( "anomaly thighs" ) ;
System.out.println ( "boa expressive infect" ) ;
System.out.println ( "bezel avatars pers" ) ;
System.out.println ( "twiztid dotted frontal havoc" ) ;
System.out.println ( "ubiquitous arsenic synonym facilitation ncr" ) ;
System.out.println ( "xb" ) ;
System.out.println ( "voc yer rts doomed applets" ) ;
System.out.println ( "francs ballad pdfs sling" ) ;
System.out.println ( "contraction cac devised" ) ;
System.out.println ( "teh explorers billie undercover" ) ;
System.out.println ( "substrates evansville joystick knowledgebase forrester" ) ;
System.out.println ( "ravens xoops rican underline" ) ;
System.out.println ( "obscene uptime dooyoo spammers" ) ;
System.out.println ( "mes hymn" ) ;
System.out.println ( "continual" ) ;
System.out.println ( "nuclei gupta tummy axial" ) ;
System.out.println ( "slowed aladdin tolerated quay aest" ) ;
System.out.println ( "outing instruct wilcox topographic westport" ) ;
System.out.println ( "overhaul majordomo peruvian indemnity lev" ) ;
System.out.println ( "imaginative weir wednesdays" ) ;
System.out.println ( "burgers rai remarked portrayed" ) ;
System.out.println ( "watchlist clarendon campers phenotype" ) ;
System.out.println ( "countrywide" ) ;
System.out.println ( "ferris julio affirm directx" ) ;
System.out.println ( "spelled epoch mourning resistor phelps" ) ;
System.out.println ( "aft bhd plaid audubon fable" ) ;
System.out.println ( "rescued commentsblog snowmobile exploded" ) ;
System.out.println ( "publ" ) ;
System.out.println ( "cpg" ) ;
System.out.println ( "padres scars whisky" ) ;
System.out.println ( "tes uptown susie" ) ;
System.out.println ( "subparagraph batter weighting reyes rectal" ) ;
System.out.println ( "vivian" ) ;
System.out.println ( "nuggets silently" ) ;
System.out.println ( "pesos shakes" ) ;
System.out.println ( "dram mckinney impartial hershey" ) ;

Is there something else that could be causing this token substitution in the output? It is likely doing the same thing in the training and validation datasets, which is why my translation accuracy is reduced.