Sunday, August 18, 2024

New languages, testers wanted: Catalan, Chinese, Irish, Kurdish

 Improvements in dbnary allow WikDict to increase its support to four more languages:

  • Catalan
  • Chinese
  • Irish
  • Kurdish

This brings the total number of translations to an impressive number of 14.2 million!

Since I don't speak any of the new languages myself, thoroughly checking the results is not something I can do without your help. So if you use any of the new dictionaries, please report back both with successes and problems!

Thursday, March 14, 2024

Improved translations handling provides ~800k more translations

Different Wiktionaries store their data in different ways. WikDict knows this and handles the data accordingly. Unfortunately, the data structure is not fully consistent even within a single Wiktionary. Recent changes relax some assumptions made with regards to the structure of translations, which allows more translations to find their way into the public WikDict data set. The results are especially beneficial for Turkish, Russian and Bulgarian but have a noticeable effect on nearly all language pairs.

See the GitHub issue if you are interested in more technical details of this change.

Friday, December 30, 2022

Compound Word Splitting in WikDict Search

 Many languages allow building compound words by combining multiple words into a single word without spaces or dashes in between. With the exception of very common compound words, these are unlikely to be found in a dictionary, even though they are totally reasonable words. To look up these words in a dictionary, you have to split the word into its parts and look up each of them separately. This is cumbersome and very difficult for novice speakers. WikDict now alleviates this problem by attempting to split a compound word when the word is not found directly in the dictionary.

The feature is in its early stages and both the list of supported languages (currently German, English, Finnish, Dutch and Swedish) and the accuracy are expected to improve over time. If you want to help out, have a look at the wikdict-compound repository. As always, feedback is very welcome!

Tuesday, November 16, 2021

New results page, inflections, WikDict everywhere

New results page

Here are before (left) and after (right) screenshots and a summary of the main changes compared to the previous non-beta version:


Stronger grouping

The old page worked well for queries with few results, but for larger results the page became quite long and often contained less important or even redundant information (e.g. repeated pronunciations). To mitigate this, the new page groups different translations for the same word more strongly and shows the word type (noun, verb, adjective, etc.) along with the word.

Clear split between direct matches and idioms

When looking up an unknown word (e.g. "catch") you usually don't know if it is being used in the usual way or as part of an idiom with a different meaning (e.g. "catch up"). Therefore, WikDict always displayed translations for idioms containing the search word.

In too many cases these idiom translations took up the largest part of the results page, although they were not the translation the user was looking for in the majority of cases. To deal with this, the idiom translations are shown below the other translations in a more concise form (no pronunciation, no explanatory texts for disambiguation).

Flags

Flags are shown to help identify which language is at which side at the first glance. Languages actually don't have any flags, only countries do. And the mapping between countries and languages is far from perfect in all cases. But despite this, the user feedback has been clear that people are used to seeing flags for languages and even expect them to be there. So here they are!

Inflections

One of the driving forces behind the grouping and the more concise idioms was to reduce the clutter to make room for new information. I took this opportunity to show inflections for (non-idiom) translations when available. For verbs, this means that important conjugations are shown while for nouns irregular plural forms are displayed. Currently, this is supported for the following languages:

  • German
  • English
  • Swedish (definite forms are shown for nouns in addition to the plural)

This will be expanded to more languages in the future. If you care about a specific language, let me know which inflections would be a good choice to increase the likelihood to get your language supported.

Feedback wanted!

For the first time since the very early days of WikDict, I made significant changes to the results page. Please send plenty of feedback on these changes (I don't get nearly as much as I would like)! I personally also liked the more minimalist version a lot, which is still available at beta.wikdict.com at the time of this writing. But early feedback indicated that users were missing the clear split between the source language being on the left and the target language being on the right.

WikDict everywhere

Another major reason for changing the results page was that I have been working on the underlying data structures. Initially, WikDict only had a web interface and everything was focused on that one goal. But over time, more and more downloadable data formats have been added and recently I started working on a Gemini site, a DICT protocol server and a basic command line interface.

This made it quite obvious that I need some abstraction over all these different backends, or I won't be able to keep maintaining them all. The result is the changed grouping of translations you see on the new results page and the introduction of the wikdict-query helper library. All of this is still in a pretty early stage, but don't hesitate to reach out if you have any questions or would like to contribute in some way.

Smaller changes

  • The translations have been updated with fresh data from Wiktionary, bringing the total number of translations up to 7.6 million (from 7.4)
  • The ordering of results which are shown while typing is better since fixing a bug in the underlying data. Thanks to Sebastian Pipping for reporting this.
  • A more crisp logo. Thanks again to Sebastian.