DeutschDB: German dictionary with YouTube native pronunciation

Hi all! I would like to ask for a review of my German dictionary client-server project. I was not satisfied with the features offered by existing resources:

  • Lack of translations for both words and their examples with German prepositions.

  • Absence of compound words in databases, which are very common in German.

  • Other sites use AI-generated audio pronunciation. I chose to show YouTube videos with timestamp with native speakers instead.

  • And some other reasons.

The server can be deployed locally, and the interface is accessible via browser. It can be used from a mobile phone, though it is not yet fully adapted for it.

Currently, only Ukrainian translations are available. I faced difficulties with a small number of correctly translated words. This issue was solved by using the Gemini API to generate translations with usage examples. :+1:

Full README with the web interface and usage examples:

Thanks. Have a nice day. :blush:

How… why… wait… eh? :ferris_what:

Why is this screenshot in your README showcasing - in the section labelled “Konjunktiv I” - a form that’s looking like a Konjunktiv I of a Futur II form (if such a thing even were to exist)?



And this one seems highly suboptimal, too:

No German will associate just “lustig machen” on it’s own with “mock / ridicule”. It always has to also be reflexive (with pronoun “sich” or “mich / dich / uns / euch”) and use the preposition “über” on its object. In infinitive (and using translation to/from English because I don’t speak Ukranian), you should thus rather not ever learn something like “mock”=“lustig machen”, but rather “[to] mock someone”=“sich über jemanden lustig machen”. That way you’re not missing crucial information when actually using it. [These infinitives can appear in actual sentences like that, e.g. “With satire, you’re allowed to mock someone.” = “Bei Satire darf man sich über jemanden lustig machen.”]

For English readers: This is a bit as if “shoot oneself in the foot” was learned as “shoot in the foot”, or “make fun of” was learned as “make fun”.


Wiktionary expresses this information codified like this

And German-to-German dictionary would often do it rather like this

(here, “jmd.” is short for “jemand” = “someone” [and “jmdn.” is “jemanden”], and “etw.” for “etwas” = “something”; the comma represents alternative options)

One note about dative reflexives:

The addition of “sich” doesn’t fully determine the usage pattern, since “sich” could be either accusative or dative (and e.g. in 1st person “mich” vs “mir” would then need to be different!)

Accusative is much more common, so it seems like dictionaries differentiate the dative case by an additional marker, e.g. here’s Wiktionary and DWDS again for some example:

You can also see here that the same verb can have different usage patterns tied to different meanings of the verb; I’m not sure how best to turn this into flash cards.

For “sich über jmd. lustig machen”, there was only one possibly grammatical usage pattern anyway, so the DWDS page thus did include this pattern in infinitive itself that labelled the entry; for “vorstellen” it only appeared in the list of “Bedeutungen” :wink:


And your 3rd of 3 screenshots is problematic, too:

These kinds of word couds - for the purpose of language learning especially - should not be made all-lowercase. German capitalizes all nouns, lowercase nouns even look fairly weird to many German native speakers, and this kind of exposure for sure won’t help you to get used to the correct spelling rules as a language learner.


I have no context on the state of your project - these are perhaps just mock pictures created before the implementation, but they don’t exactly motivate me to any more to start looking into what the code has to offer :wink:

As a semi-regular user of youglish.com I can certainly appreciate the general project ideas here, but if you’re asking me: be careful with erroneous content in language learning applications (especially in case you’re planning to heavily rely on AI-generated content; as a user I would appreciate having clear knowledge about which parts of the content I’m seeing is unreviewed AI-generated stuff, and which parts come from human-created sources ~ in case of dictionary content it’s nice to mention the actual source[1])!


  1. I don’t know whether there is an free-to-use dictionary data for German-to-Ukranian in particular; though for grammar information of the word itself, you could probably rely on some prior work/content that’s human-made and well reviewed, e.g. maybe even the data from Wiktionary ↩︎

Thank you very much for such a detailed response! I honestly didn’t expect it. The note about some words in the “trending” category needing to be capitalized is excellent - I hadn’t thought of that at all.

And yes, that was a terrible fuckup :flushed_face: with the Konjunktiv. I confused things during the layout process, and that table was actually supposed to be Konjunktiv Futur II. Thank you so much for pointing it out - I’ll fix it.

Regarding “lustig machen,” you’re absolutely right that it’s reflexive and used with “über.” This is actually indicated in the app itself for vocabulary (see image 1).

An interesting point about labeling AI - I’ll think about how to incorporate that. This was only an initial version, and I’ll see how things develop from here. Since I’m also a user myself, I hope to reach a +- reasonably good version in the near future.

The project was indeed created under very tight time constraints, and I’m planning to keep developing it further. And yes, I was also inspired by youglish.com ))

Thanks again for your helpful feedback, and have a great day!

Ah, very cool. I noticed late while writing my reply the case of “auf (Akk.) starren” and already suspected that this might already be the case in some places :slight_smile:

Do you also manage to disambiguate the case where “sich” is Dative? (I presented the example of “sich [Dativ] etw. vorstellen” - another example I can come up with would be “sich [Dativ] etw., jmdn. anschauen”)


By the way, another IMO really cool piece of information[1] is the way that Duden presents the most important pronunciation information - place of word stress and vowel length (which sometimes is ambiguous) - by simply marking the length of the stressed vowel (unstressed syllables are almost never “long” anyway, at least outside of compound words or loan words, so it’s a sensible default IMHO). You can see that on their website

Examples

or it could also be combined into a single rendering - combined with some syllable-splitting marks (these are technically mainly meant for spelling; but some spelling/pronunciation rules, e.g. about “st”/“sp” at the start of a syllable or about “d”/“g”/“b”/“s” at the end of a syllable, can also benefit from having this information around [besides, it may help you to pronounce a super long German word part by part]). [Here’s an official sample PDF from their German spelling dictionary that does such a combined style of dot/dash mark for stressed syllable position&length and “|” for possible hyphenation points.]


As one example, on the page jpdb.io (which is something I sometimes use for English-to-Japanese information) which has some really extensive information about pitch accent for dictionary entries, I can see an indicator (a symbol with some tooltop on hover) on some entries and not on others:

(In this particular case, the information is correct, the accent pattern of “Kyoto” does change in a compound word.[2])

I love the UI design here because it’s so minimal it doesn’t get in my way, but it still makes implicit the infromation that some entries come out of a dictionary or whatever higher-reliability source, compared to other entries which were somehow generated automatically. In presumably a best-effort manner, but not reviewed individually nonetheless.

I don’t actually know how they’re generated but that’s besides the point :slight_smile: …

…I could imagine though: if a website has lots of generated content where some is algorithmically and some other is made using one LLM invocation per-entry, I do expect different kinds of possible error patterns could come up[3] and hence it might be useful to have different indicators. Same goes for general pre-assessment of quality (e.g. if you know it, feel free to also indicate something like “this generated information is usually very spot on” vs “this generated information is sometimes wrong or unnatural” as a distinction).


hah, truly it might actually exist (⇦ I do like that website, too, btw :wink:). I’m still struggling coming up with any actual use-case of Futur II in Konjunktiv I but it may just be my lack of imagination :sweat_smile:


  1. and I’m not a learner of German, but I suppose it might be quite useful for learners, too? ↩︎

  2. I have yet to find more resources explaining the actual rules of how this works and why the accent pattern changes. ↩︎

  3. The algorithmic approach might have a bug and consistently do some specific thing wrong – and/or an algorithmic approach might have missed certain cases of irregularity/exceptions.

    LLM invocations may just be randomly wrong – depending on the setup they might miss some relevant context – and of course, for example sentences they might be creating some unnatural examples (or – something I’ve also sometimes noticed – produce patterns that are unnaturally similar to English grammar / ways of expressing yourself, in languages that aren’t English) and things like that. ↩︎

Thank you so much for also sharing links to resources. It will help not only in the app but also in learning the language in general. I have the Duden book at home as well) But prefere to use digital version.

I downloaded a sorted JSON dump of all words from (Wiktionary) German dictionary — it’s about a gigabyte. As for that awkward word situation with Futur 2 Konjuktiv , I probably mixed it up myself during formatting, since the tense is indeed very rare :grin:. I’ve never heard it even once.

Your point about sich with accusative/dative is really helpful! :+1: I'll do it
Great observation about word stress as well.

I’m not happy with the navigation on the Duden website and the overload of ads there, but it does have some cool features that could be borrowed:

  • As you mentioned, stress marks :ok_hand:
  • I’ve also seen in reviews of other apps that users like when a word has a usage level (A1/A2/B1, etc.)

I was also very disappointed that I couldn’t find a properly working and stable API for German word forms. I think it would be useful to keep not only the UI but also provide a REST API or GraphQL. i will try.

Thanks a lot for the detailed answers. On the site, I only used AI for translations. You are totally right that AI might mix things up. I want to add also examples from Wiktionary and Tatoeba.
The words themselves and their usage come from the database mentioned above. One of the advantages of building the app was the ability to find more and more complex or rare compound words and add them to the app. Such words are often missing on other sites with correct translations.

I’m planning to finish normalizing the database soon and will definitely upload it to GitHub. It may be quite large, but it will surely be useful to someone else as well!

Have a nice day, and Danke schön!

It was indeed a pain to produce ad-free screenshots.

I’m curious how you’re going about creating all the relevant word forms. If you do want to generate things yourself as much as possible, I guess you could look at Wiktionary, perhaps even at their source code: They use templating for a lot of things, and while the templating language is a pain to read, at least it does offer some view of how (a very completionist picture of) verb forms can actually be workable, in an environment that tries to minimize overly redundant information (to avoid mistakes) but even more highly values that the resulting tables are actually correct, as far as I can tell.

For example, if you consider something like sehen – Wiktionary and then follow the link to the inflection subpage Flexion:sehen – Wiktionary it has a source view where the collection of all tables comes from a single template call

{{Deutsch Verb unregelmäßig|2=seh|3=sah|4=säh|5=gesehen|6=sieh|8=i|vp=ja|zp=ja|gerund=ja
|Imperativ (du)=sieh!<br />siehe}}

that template itself is documented on its own page Vorlage:Deutsch Verb unregelmäßig – Wiktionary and whilst the source is an absolute mindf*ck to actually read at least it’s available, and it might inspire some corner cases one could otherwise miss.

It would admittedly be much simpler if one can find any preexisting solution for generating the relevant information :sweat_smile: - but this is nonetheless a much more reliable basis than e.g. asking an AI to just come up with a “correct” conjugation logic (it will definitely fail to acknowledge some details.)

I'm for sure not a German teacher, but these are my takes on Futur II.

  1. Konjunktiv I Futur II examples:

    • "Bis heute abend werde ich 10.000 Schritte gegangen sein."[1]
    • "Zum Abgabezeitpunkt werden wir den Fehler behoben haben."[2]
  2. Konjunktiv II Futur II examples:

    • "Bis heute abend wĂĽrde ich 10.000 Schritte gegangen sein, wenn ich genug Zeit gehabt hätte."[3]
    • "Zum Abgabezeitpunkt wĂĽrden wir den Fehler behoben haben, wenn Peter nicht krank wäre."[4]

(Absolutely not sure about the Konjunktiv II Futur II examples)


  1. By tonight I will have walked 10,000 steps. ↩︎

  2. We will have fixed the bug at the deadline. ↩︎

  3. By tonight I would have walked 10,000 steps, if I would have had enough time. ↩︎

  4. We would have fixed the bug at the deadline, if Peter weren't sick. ↩︎

Hmm. Is it meant to be a personal/Ukrainian-first project, above all? If not, I'm not sure I understand why you wouldn't start with some basic implementation in English. In addition to all the points brought up by @steffahn, English and German share the exact same language branch, to begin with. Figuring out the "mapping" in between the two is therefore going to be a whole lot simpler.

Design-wise, looks fairly good? Although, without having tried the actual interface mechanics, it might hard to say for sure. There's quite a bit of difference in between a visually polished, and immediately intuitive UI/UX. Then again: do you have any plans to share it with the world at large, or is it mostly a personal learning experience? For the latter, feel free to take wherever the vision takes it. For the former, ironing out the grammatical inconsistencies would definitely be the start.

Another concern, which would quite a major turn-off for me personally, is the predominance of politically charged vocabulary. Which has little/nothing to do with actual, everyday, conversational German. Without some sort of filters, ideally the kind people may choose to opt in/out off at will, you'd be turning quite a few people away. Knowing the foreign terms for "Pakistan" and "Ukraine" and "Russia" and "Iran" isn't exactly the highest priority for most of language learners, I'd imagine.

Minimizing the input from the neural-network powered autocomplete AI engines in favor of the chiefly human-powered databases, akin to Wiktionary, might be another point to consider. Ideally, once again, behind an opt in/out toggle. The online space is slowly cementing itself into quite a firm divide between those who are to being outright allergic to anything AI-related, and those who (wish to) believe their glorious linear-algebra based NLP transformers will make them billionaires, which is bound to happen as soon as they discover the one prompt/agentic workflow to rule all.

The former are going to have little to no patience when it comes to any inaccuracies an AI is (still) likely to make. The latter might not care as much, but then: why wouldn't they rather rely on their all too familiar chat pal to tokenize them a list of words/vocab/conjugation exercises they happen to care about the most, at that particular time of day? Something to consider, perhaps.

These are Indikativ :wink:

A Konjunktiv I (for “werden”) is only really working well (as a distinguished form) in 2nd and 3rd person singular: “Du werdest …” (different from “wirst”); “Er/sie/es werde” (different from “wird”).


That being said, “beheben” is a good choice, since I can think of a natural sounding use of Konjunktiv I Futur II in indirect speech:

Der Maintainer berichtet, er habe den Fehler bereits erkannt und werde das Problem in KĂĽrze behoben haben.[1]


These sound alright to me :slight_smile:

One other natural use case – that’s identical in form[2] to Konjunktiv II Futur II – is the “würde”-Form that can replace a Konjunktiv II (Plusquam)perfekt (at least/especially in colloqual use):

Wenn ich gestern mehr Zeit gehabt hätte, würde ich den Fehler längst behoben haben.[3]

identical in meaning to:

Wenn ich gestern mehr Zeit gehabt hätte, hätte ich den Fehler längst behoben.

or

Hätte ich gestern mehr Zeit gehabt, dann hätte ich den Fehler längst behoben.


A fun case where forms of Futur (though Futur I much more often than Futur II) will need to appear in Konjunktiv II[4] is for expressing relative future tense in narration/stories about the past (e.g. in a novel):

Damals wussten wir noch nicht, wie lange es noch dauern wĂĽrde, bis wir diesen Krieg wieder beendet haben wĂĽrden.[5]


  1. The maintainer reports that he has already identified the error and will have resolved the issue shortly.. ↩︎

  2. I’ll refrain from discussion the question of whether or not these are actually a form of “Futur”. Arguably the forms of “Futur” in German aren’t really grammatical forms to begin with but more like a collection of common use cases of a modal verb and/or helper verb “werden”.

    The main reason it’s called a future tense it probably just strongly influenced by Latin grammar. Latin does have actual future verb tenses, you know, with their own suffixes on verbs and all :wink: ↩︎

  3. If I’d had more time yesterday, I would have sorted it out in no time.

    (translations are just in meaning but not able to preserve or mirror the specifics of the German grammar of course, even though that grammar was the main point of discussion) ↩︎

  4. and for this use-case they truly are very much future tenses, semantically ↩︎

  5. At the time, we had no idea how long it would take before we had brought this war to an end.

    (of course, the same could be expressed in German a bit more concisely) ↩︎

Haha damn. My examples are obviously direct speech. I was wondering why you said it was difficult for you to come up with examples, because those two came quite natural to me. Especially compared to the other two, which feel contrived and not how anyone I know would speak at all.

Well, I was intrigued by this topic and felt inclined to try and see if maybe in my late 20s I finally understood the grammar of the language I claim to be a native speaker of. Safe to say: no. My school teachers wouldn't be surprised. Really lovely and humbling conversation. :sweat_smile:

Regarding the "starren" example, it's 3rd person plural "die Schweizer" -> sie. (The conjugation is the same for all verbs and tenses, though. Still, I find the arrow confusing)