WikDict builds on data extracted by the dbnary project. This project changed its way of storing data, which required adaptations on the part of WikDict. This is the reason why WikDict data has not been updated during the last months.
Now this work is finally done and new data is available in the web interface. This includes all changes done to the underlying Wiktionaries as well as additional bug fixes which prevented some translations from showing up properly. Overall this yields 22% more translations than the previous data from March 2017. As always, please let me know about any problems you encounter or suggestions for improvement.
Saturday, November 11, 2017
Sunday, February 19, 2017
Get translations while typing
Having a typeahead autocompletion is very helpful when the work you are looking for is long or hard to type. But it can get even better by providing the translation along with the autocompletion. This is now available on WikDict.
Sunday, January 22, 2017
More Translations, less Noise
There has been a large number of changes to the generation of WikDict dictionary changes. While many of them are related, some are just included in this post to give you a good summary of what happened in the last months.
More Translations
Deriving Translations from Intermediate Languages
When a translation is not found in the dictionary, you could give up and tell the user that there is no such translation. Or you could try to use translations between other languages to give a (hopefully accurate) answer. Here's an example. Let's say the word "dog" can't be found in the English-German dictionary, you could try to use French as an intermediate language:
dog (en) -> chien (fr)
chien (fr) -> Hund (de)
=> dog (en) -> Hund (de)
While this is useful, it can generate wrong translations due to ambiguities. WikDict tries to get the best of both worlds by applying a scoring dependant on multiple different factors to rank and filter the results of this approach.
Bug Fixes and Workarounds
Some bugs, especially a bug in the Virtuoso database made it necessary to skip some translations. The known bugs are now fixed or a workaround is applied.
More Recent Data
As always, the people working on Wiktionary and DBnary aren't lazy, either. Their changes trickle down to WikDict with some delay and lead to visible improvements over time.
Less Noise
More filtering and better sorting
The scoring mentioned above has also been used to improve the sorting of words, senses and translations, as well as to filter some less reliable results introduced when reading dictionaries in reverse.
Less Unparsed Markup in Senses
When senses/definitions for words are extracted from Wiktionary, quite a large number of different Markups might be left inside those strings. WikDict got better at parsing those texts, so you will see less [[brackets]], <tags> and [1] left 1. over | numbers : or symbols than before. If you still see those, please let me know.
Subscribe to:
Posts (Atom)