I was wondering if there is some specifics about the XML tags in general ?
is there something done at tokenization step ? later ?
is it just tokenized as real text (separate < tags >) ?
Hi, I would suggest to use tokenize.lua with -joiner_annotate option that will separate xml-characters [<>"] and will mark them to correctly generates valid XML tags.