What language is this? Chinese? Japanese?
It’s Korean actually. Detecting this manually would have taken me a lot of time. Fortunately, I found some very accurate tools that can do this automatically. They are all listed below.
The experiment: I tested the websites using sample text (1-2 sentences with 8 words) from the following languages: Portuguese, Russian, Korean, Vietnamese, Italian, Turkish, Polish, Ukrainian, Azerbaijani, Slovenian, Macedonian, Dutch, Filipino (Tagalog), Greek, Galician, Czech, Belorussian, Finnish, Tatar and Norwegian.
Overall, I tested 20 different languages.
3 Tools to Detect Unknown Language Text
1. LangId (passed 18 out of 20 tests, didn’t pass Tatar and Belorussian)Pros: Overall, great online tool. It offers basic text detection functionality and they also have Twitter and email-detection bots for even quicker results.
Cons: Their engine is based on Google API but they seem to have better results than the Google detector described below. It seems they know how to utilize things very well. I didn’t like that they don’t have their own unique algorithm to detect languages.
2. Google Language Detector (passed 17 out of 20 tests, didn’t pass Portuguese, Taglog and Belorussian)
Pros: Google has one of the world’s best API for language detection. They good thing is you’re able to see the probability of the result they display being true. They were able to pass most of the sample tests.
Cons: I was quite surprised they didn’t pass the Portuguese test. It seems they have a (I hope temporary) bug with this language. Also they can surely do a better job in making the page design better.
3. What Language Is This (passed 11 out of 20 tests, didn’t pass Russian, Korean, Ukrainian, Azerbaijani, Macedonian, Tagalog, Greek, Galician and Tatar)
Pros: Some languages like the South Slavic ones (Serbian, Croatian, Slovenian) are quite similar. In case you enter some Croatian text, let’s say, this website will tell you that the text could also be Serbian or Slovenian.
Cons: They need to work on making their detection system more sophisticated. I was thinking of putting Translated.net (another website for language detection) instead of this one, but Translated promised detection of more languages and actually did worse than WhatLanguageIsThis.com.
2 Tools To Detect Websites In Unknown Languages
4. Google Translate with Detect Language as the first optionPassed: 18 out of 20, didn’t pass Belorussian and Tatar.
Pros: This tool does its job very well. The thing I like about Google Translate is that if it doesn’t support a specific language it gives you the following screen:
That’s a great language detector if you ask me!
5. Microsoft Bing Translator with Auto-Detect as the first option.
Passed: 8 out of 20, didn’t pass Dutch, Vietnamese, Turkish, Ukrainian, Azerbaijani, Slovenian, Macedonian, Tagalog, Greek, Galician, Czech and Belorussian
Pros: It supports a limited number of languages. For those languages, it does its job well.
Cons: I am very disappointed with Microsoft. They have a very limited number of languages for detection& translation and their Auto-Detect feature is terrible. In case you enter a language they don’t support, you’ll get a wrong result instead of telling you they don’t support that language.
No comments:
Post a Comment
[Please do not advertise, or post irrelevant links. Thank you for your cooperation.]