Sunday, July 24, 2016

Links now link to searches in WikDict

Previously, clicking on a term in the dictionary results lead to the corresponding Wiktionary page. Feedback from users has shown that this is not a typical user's expectation. Now all linked terms lead to a search in WikDict using the clicked term as search text.

The Wiktionary links can now be found in the side bar at the right instead. As always, feedback on this change is very welcome!

Sunday, April 24, 2016

Stemming support for English

All English entries are now searched using the Porter stemming algorithm, which means that more translations will be found if you use something different than the base form of word. The most common case is searching for a plural (e.g. "stoats") are getting a translation for the singular ("stoat"), even though the plural form does not appear anywhere in the data set.


Sunday, April 17, 2016

WikDict source code in Bitbucket

If you're interested in how WikDict works or you want to contribute any fixes or improvements, head straight to the WikDict Bitbucket page and have a look at the different parts of this project. If you need any help with one of the repositories, feel free to contact me for additional information.

Sunday, January 17, 2016

Filtering of HTML entities and tags

In some cases, HTML tags (<center>, <ref>, etc.) or entities (usually &nbsp;) remain in the data from dbnary, which is used as input material for WikDict. I'm now using a basic HTML parser to improve the handling of these cases.

Entities

From now on, all entities should be properly converted resulting in
Gerät für Turnübungen, auf einem Gestell befestigter 10 cm breiter und 5 m langer Holzbalken
instead of
Gerät für Turnübungen, auf einem Gestell befestigter 10&nbsp;cm breiter und 5&nbsp;m langer Holzbalken

 Stripped Tags

Most HTML tags will be ignored, leaving the text inside the tag untouched. However, some tags will be stripped including the content, since the tag content is not relevant for the translation. One such tag is <ref>, resulting in
  • „fiktives Land, in dem absurde Verhältnisse herrschen“ als sinnbildhafte Bezeichnung für „unverständliche (absurde) politische Situationen“, „bestimmte Verhältnisse[, die] nicht nachvollziehbar sind“, für „etwas völlig Absurdes“
instead of
  • „fiktives Land, in dem absurde Verhältnisse herrschen“<ref></ref> als sinnbildhafte Bezeichnung für „unverständliche (absurde) politische Situationen“<ref name="WP"></ref>, „bestimmte Verhältnisse[, die] nicht nachvollziehbar sind“<ref name="WP"/>, für „etwas völlig Absurdes“<ref>, Stichwort »absurd«, Seite 45.</ref>

Sub- and Superscripts

Special handling is done to sub- and superscript tags, where the content is converted to the corresponding Unicode characters in the most common cases. This makes beautiful chemical texts like
  • Organische Chemie: eine farblose, viskose Säure mit der Summenformel C₄H₆O₃
instead of
  • Organische Chemie: eine farblose, viskose Säure mit der Summenformel C<sub>4</sub>H<sub>6</sub>O<sub>3</sub>

Feedback

Did you find cases where the results of these changes are bad or there are obvious improvements possible? Let me know and I'll try to fix it.

Saturday, January 2, 2016

Word suggestions while typing

When searching in WikDict, you'll now get type-ahead suggestions for words.

This should make searching even faster and also help avoiding spelling errors for complicated words. The five most important words starting with the given text will be shown, according to a rough importance metric based on the number of translations into different languages.

As always, feedback is very welcome!