Yes, I'll have to wait for another file to calculate
But let's keep in touch if we get a file like that. We may need many tries before getting to a conclusion.
Yes, I'll have to wait for another file to calculate
But let's keep in touch if we get a file like that. We may need many tries before getting to a conclusion.
For Thai document such type of word count not available it only applicable for the western country language.
Thank you nabylm for your points, but I would like to friendly present some different ideas.
You mentioned " As far as I know, the concept of "Wordcount" for us westeners does not quite exist in languages such as Thai for instance. Indeed, there is no spacing between words, which makes it very delicate to count for a software". Well, this might not be 100% true, as to my own personal experience, MS WORD can count the Chinese characters as easy as it does with the English words. The concept of "wordcount" still exist in oriental languages like Thai, Chinese, Japanese and Korean. Or more precisely, we would call it "character count" to reflect the actual nature of these text.
I myself had been working as translator in the language pair of English<>Chinese, and am naturally keen on checking up the ratio between the Chinese character count and the English word count. I could suggest with a reasonable level of certainty that 1,000 Chinese characters can be approximately translated into about 600-700 English words, or 1,000 English words be translated into about 1,500-1,700 Chinese characters, variying depending on the natures of the source contents and target writing styles. I assume the Thai - English should also have some sort of character-word count ratio like this, and you could just get it confirmed from an experience Thai translator?
Cheers.
Hello lingotext,
Thank you for sharing your ratios and experience.
We are discussing here one of the most controversial topics in linguistics as we can’t define what can be a word. My romance language mind likes to segment according to the concept of potential pause as a limit for a word, but it is not enough and it will not work in Chinese, Thai, Japanese, etc.
I am about to receive a Thai translation and I will share how was the output when it is ready.
Hi lingotext,
If you say that MS Word can count Chinese characters as good as English words, I believe you and will use it in the future.
I think the only way to get an idea of the wordcount is by having a relation/ratio between characters and words, for each language. In your case, a 1.5 ratio (approx) for English>Chinese could be used as a reference.What you can do is pretranslate the document, and count the target. Also, you can ask a native to give you an approximate Wordcount, this way you can compare it to the target Wordcount, and maybe get a relation (equation like) between wordcount and character count.
However, the ratio for Thai, Japanese, Hindi or Korean may differ (thanks for sharing with us danielr what you get from your ongoing project)
In addition, the ratio 1.5 for English>Chinese doesn't take into account the material domain (legal, medical, literature, etc)
One more question, do you think it would me more accurate to quote those projects with character counts instead of word counts?
Here is what I got from the Thai project. My Word version said that there were 464 words in English and the target file in Thai had 126 words and 3,282 characters without spaces. When I double-clicked on the string of text, only one precise part was selected and counted as a single word.
According to what I’ve read, Thai doesn’t work as Chinese or Japanese because it combines the notions of characters and words.
So approximately a 1 to 7 ratio !!!!
Are you sure?
I won't venture to state a ratio. I can't find a logical proportion.
Here are more numbers to see if we can find a way of calculating it. I have some files in English with 577 words and the target in Thai has 3314 words and 3834 characters.
And as bonus, I had an English text with 300 words and the Khmer target had 100 words.
It seems like we are still within that 1 (engish word) to 7 (thai characters) ratio ...
There are currently 1 users browsing this thread. (0 members and 1 guests)