Thursday, March 14, 2024

Improved translations handling provides ~800k more translations

Different Wiktionaries store their data in different ways. WikDict knows this and handles the data accordingly. Unfortunately, the data structure is not fully consistent even within a single Wiktionary. Recent changes relax some assumptions made with regards to the structure of translations, which allows more translations to find their way into the public WikDict data set. The results are especially beneficial for Turkish, Russian and Bulgarian but have a noticeable effect on nearly all language pairs.

See the GitHub issue if you are interested in more technical details of this change.

Friday, December 30, 2022

Compound Word Splitting in WikDict Search

 Many languages allow building compound words by combining multiple words into a single word without spaces or dashes in between. With the exception of very common compound words, these are unlikely to be found in a dictionary, even though they are totally reasonable words. To look up these words in a dictionary, you have to split the word into its parts and look up each of them separately. This is cumbersome and very difficult for novice speakers. WikDict now alleviates this problem by attempting to split a compound word when the word is not found directly in the dictionary.

The feature is in its early stages and both the list of supported languages (currently German, English, Finnish, Dutch and Swedish) and the accuracy are expected to improve over time. If you want to help out, have a look at the wikdict-compound repository. As always, feedback is very welcome!

Tuesday, November 16, 2021

New results page, inflections, WikDict everywhere

New results page

Here are before (left) and after (right) screenshots and a summary of the main changes compared to the previous non-beta version: