-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Description
Hi,
I use both the stanford-corenlp and the german model in version "4.2.2" and found some odd behavior in the allocation of POS-Tags für words like "für" and "vor". These two words should be normally designed with "ADP" but I often get "NOUN" or even "PROPN" as POS-Tag for these words.
For example for the sentence "Welcher der Befunde ist für eine Gehirnerkrankung typisch?" "für" is tagged as "PROPN" an in "Welcher der Befunde ist am ehesten für hier am wahrscheinlichsten vorliegende Gehirnerkrankung typisch?" it is designated as "NOUN".
But for the sentences "Für wen ist das Essen?" and "Ich war reif für das Bett." "für" is correctly tagged as "ADP".
Here are the properties I used for the initialization:
(.setProperty properties "annotators" "tokenize, ssplit, mwt , pos, lemma, ner, parse, depparse")
(.setProperty properties "coref.algorithm", "neural")
(.setProperty properties "depparse.language", "german")
(.setProperty properties "ner.language", "de")
(.setProperty properties "tokenize.options" "untokenizable=noneDelete")
These are all the information I have. I hope they are helpfull.
Best regards
Goldritter