Skip to content

All params should be optional for ngram tokenizer and edge ngram tokenizer #877

Closed
@39charactersisnotenoughforagoodusername

Description

Java API client version

8.14.3

Java version

17

Elasticsearch Version

8.14.3

Problem description

Hello! I'm running into a MissingRequiredPropertyException when trying to create indices that use ngram/edge ngram tokenizers with the Java client.

Minimal code repro - settings are the same as docs but with token_chars omitted:

String settingsJson =
        "{\"settings\": {\"analysis\": {\"analyzer\": {\"my_analyzer\": {\"tokenizer\": \"my_tokenizer\"}},\"tokenizer\": {\"my_tokenizer\": {\"type\": \"ngram\",\"min_gram\": 3,\"max_gram\": 3}}}}}";
IndexSettings settings = IndexSettings.of(i -> i.withJson(new StringReader(settingsJson)));

throws

Exception in thread "main" co.elastic.clients.json.JsonpMappingException: Error deserializing co.elastic.clients.elasticsearch._types.analysis.TokenizerDefinition: co.elastic.clients.util.MissingRequiredPropertyException: Missing required property 'NGramTokenizer.tokenChars' (JSON path: settings.analysis.tokenizer.my_tokenizer) (line no=1, column no=162, offset=161)
	at co.elastic.clients.json.JsonpMappingException.from0(JsonpMappingException.java:134)
	at co.elastic.clients.json.JsonpMappingException.from(JsonpMappingException.java:121)
...
	at co.elastic.clients.elasticsearch.indices.IndexSettings.of(IndexSettings.java:308)
	at Scratch.main(Scratch.java:11)
Caused by: co.elastic.clients.util.MissingRequiredPropertyException: Missing required property 'NGramTokenizer.tokenChars'
	at co.elastic.clients.util.ApiTypeHelper.requireNonNull(ApiTypeHelper.java:76)
	at co.elastic.clients.util.ApiTypeHelper.unmodifiableRequired(ApiTypeHelper.java:141)
	at co.elastic.clients.elasticsearch._types.analysis.NGramTokenizer.<init>(NGramTokenizer.java:79)
Similar example throwing for no maxGram if min_gram/max_gram/token_chars are all omitted
String settingsJson =
        "{\"settings\": {\"analysis\": {\"analyzer\": {\"my_analyzer\": {\"tokenizer\": \"my_tokenizer\"}},\"tokenizer\": {\"my_tokenizer\": {\"type\": \"ngram\"}}}}}";
IndexSettings settings = IndexSettings.of(i -> i.withJson(new StringReader(settingsJson)));
Exception in thread "main" co.elastic.clients.json.JsonpMappingException: Error deserializing co.elastic.clients.elasticsearch._types.analysis.TokenizerDefinition: co.elastic.clients.util.MissingRequiredPropertyException: Missing required property 'NGramTokenizer.maxGram' (JSON path: settings.analysis.tokenizer.my_tokenizer) (line no=1, column no=134, offset=133)
	at co.elastic.clients.json.JsonpMappingException.from0(JsonpMappingException.java:134)
	at co.elastic.clients.json.JsonpMappingException.from(JsonpMappingException.java:121)
...
	at co.elastic.clients.elasticsearch.indices.IndexSettings.of(IndexSettings.java:308)
	at Scratch.main(Scratch.java:11)
Caused by: co.elastic.clients.util.MissingRequiredPropertyException: Missing required property 'NGramTokenizer.maxGram'
	at co.elastic.clients.util.ApiTypeHelper.requireNonNull(ApiTypeHelper.java:76)
	at co.elastic.clients.elasticsearch._types.analysis.NGramTokenizer.<init>(NGramTokenizer.java:77)

This seems to be because the spec that's used to generate the Java client requires min_gram, max_gram, and token_chars for ngram/edge ngram tokenizers, even though they have defaults in docs (also supported by server code and Lucene defaults).

I can also confirm that creating an index via curl without specifying min_gram / max_gram / token_chars works.

kubectl exec es8-data-0 -- curl -XPUT "https://localhost:9200/test-index" -H "Content-Type: application/json" -d '{"settings": {"analysis": {"analyzer": {"my_analyzer": {"tokenizer": "my_tokenizer"}},"tokenizer": {"my_tokenizer": {"type": "ngram"}}}}}'

returns

{"acknowledged":true,"shards_acknowledged":true,"index":"test-index"}

The same is true for "type": "edge_ngram" as well.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Area: SpecificationRelated to the API spec used to generate client codeCategory: BugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions