Skip to content

Commit b64d86f

Browse files
committed
Add none chunking strategy to disable automatic chunking for inference endpoints (elastic#129150)
This introduces a `none` chunking strategy that disables automatic chunking when using an inference endpoint. It enables users to provide pre-chunked input directly to a `semantic_text` field without any additional splitting. The chunking strategy can be configured either on the inference endpoint or directly in the `semantic_text` field definition. **Example:** ```json PUT test-index { "mappings": { "properties": { "my_semantic_field": { "type": "semantic_text", "chunking_settings": { "strategy": "none" <1> } } } } } ``` <1> Disables automatic chunking on `my_semantic_field`. ```json PUT test-index/_doc/1 { "my_semantic_field": ["my first chunk", "my second chunk", ...] <1> ... } ``` <1> Pre-chunked input provided as an array of strings. Each array element represents a single chunk that will be sent directly to the inference service without further processing.
1 parent cb59197 commit b64d86f

File tree

18 files changed

+396
-15
lines changed

18 files changed

+396
-15
lines changed

docs/changelog/129150.yaml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
pr: 129150
2+
summary: Add `none` chunking strategy to disable automatic chunking for inference
3+
endpoints
4+
area: Machine Learning
5+
type: feature
6+
issues: []

docs/reference/mapping/types/semantic-text.asciidoc

Lines changed: 51 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -100,18 +100,19 @@ If not specified, the {infer} endpoint defined by `inference_id` will be used at
100100
(Optional, object) Settings for chunking text into smaller passages.
101101
If specified, these will override the chunking settings set in the {infer-cap} endpoint associated with `inference_id`.
102102
If chunking settings are updated, they will not be applied to existing documents until they are reindexed.
103+
To completely disable chunking, use the `none` chunking strategy.
103104

104105
.Valid values for `chunking_settings`
105106
[%collapsible%open]
106107
====
107108
`type`:::
108109
Indicates the type of chunking strategy to use.
109-
Valid values are `word` or `sentence`.
110+
Valid values are `none`, word` or `sentence`.
110111
Required.
111112
112113
`max_chunk_size`:::
113-
The maximum number of works in a chunk.
114-
Required.
114+
The maximum number of words in a chunk.
115+
Required for `word` and `sentence` strategies.
115116
116117
`overlap`:::
117118
The number of overlapping words allowed in chunks.
@@ -123,6 +124,10 @@ The number of overlapping words allowed in chunks.
123124
Valid values are `0` or `1`.
124125
Required for `sentence` type chunking settings.
125126
127+
WARNING: If the input exceeds the maximum token limit of the underlying model, some services (such as OpenAI) may return an
128+
error. In contrast, the `elastic` and `elasticsearch` services will automatically truncate the input to fit within the
129+
model's limit.
130+
126131
====
127132

128133
[discrete]
@@ -147,7 +152,49 @@ When querying, the individual passages will be automatically searched for each d
147152

148153
For more details on chunking and how to configure chunking settings, see <<infer-chunking-config, Configuring chunking>> in the Inference API documentation.
149154

150-
Refer to <<semantic-search-semantic-text,this tutorial>> to learn more about semantic search using `semantic_text` and the `semantic` query.
155+
You can also pre-chunk the input by sending it to Elasticsearch as an array of strings.
156+
Example:
157+
158+
[source,console]
159+
------------------------------------------------------------
160+
PUT test-index
161+
{
162+
"mappings": {
163+
"properties": {
164+
"my_semantic_field": {
165+
"type": "semantic_text",
166+
"chunking_settings": {
167+
"strategy": "none" <1>
168+
}
169+
}
170+
}
171+
}
172+
}
173+
------------------------------------------------------------
174+
// TEST[skip:Requires inference endpoint]
175+
<1> Disable chunking on `my_semantic_field`.
176+
177+
[source,console]
178+
------------------------------------------------------------
179+
PUT test-index/_doc/1
180+
{
181+
"my_semantic_field": ["my first chunk", "my second chunk", ...] <1>
182+
...
183+
}
184+
------------------------------------------------------------
185+
// TEST[skip:Requires inference endpoint]
186+
<1> The text is pre-chunked and provided as an array of strings.
187+
Each element in the array represents a single chunk that will be sent directly to the inference service without further chunking.
188+
189+
**Important considerations**:
190+
191+
* When providing pre-chunked input, ensure that you set the chunking strategy to `none` to avoid additional processing.
192+
* Each chunk should be sized carefully, staying within the token limit of the inference service and the underlying model.
193+
* If a chunk exceeds the model's token limit, the behavior depends on the service:
194+
* Some services (such as OpenAI) will return an error.
195+
* Others (such as `elastic` and `elasticsearch`) will automatically truncate the input.
196+
197+
Refer to <<semantic-search-semantic-text,this tutorial>> to learn more about semantic search using `semantic_text`.
151198

152199
[discrete]
153200
[[semantic-text-highlighting]]

server/src/main/java/org/elasticsearch/TransportVersions.java

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -239,6 +239,7 @@ static TransportVersion def(int id) {
239239
public static final TransportVersion SEARCH_SOURCE_EXCLUDE_VECTORS_PARAM_8_19 = def(8_841_0_46);
240240
public static final TransportVersion ML_INFERENCE_MISTRAL_CHAT_COMPLETION_ADDED_8_19 = def(8_841_0_47);
241241
public static final TransportVersion ML_INFERENCE_ELASTIC_RERANK_ADDED_8_19 = def(8_841_0_48);
242+
public static final TransportVersion NONE_CHUNKING_STRATEGY_8_19 = def(8_841_0_49);
242243

243244
/*
244245
* STOP! READ THIS FIRST! No, really,

server/src/main/java/org/elasticsearch/inference/ChunkingStrategy.java

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,8 @@
1515

1616
public enum ChunkingStrategy {
1717
WORD("word"),
18-
SENTENCE("sentence");
18+
SENTENCE("sentence"),
19+
NONE("none");
1920

2021
private final String chunkingStrategy;
2122

x-pack/plugin/inference/qa/test-service-plugin/src/main/java/org/elasticsearch/xpack/inference/mock/AbstractTestInferenceService.java

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@
2525
import org.elasticsearch.inference.TaskSettings;
2626
import org.elasticsearch.inference.TaskType;
2727
import org.elasticsearch.xcontent.XContentBuilder;
28+
import org.elasticsearch.xpack.inference.chunking.NoopChunker;
2829
import org.elasticsearch.xpack.inference.chunking.WordBoundaryChunker;
2930
import org.elasticsearch.xpack.inference.chunking.WordBoundaryChunkingSettings;
3031

@@ -126,7 +127,14 @@ protected List<ChunkedInput> chunkInputs(ChunkInferenceInput input) {
126127
}
127128

128129
List<ChunkedInput> chunkedInputs = new ArrayList<>();
129-
if (chunkingSettings.getChunkingStrategy() == ChunkingStrategy.WORD) {
130+
if (chunkingSettings.getChunkingStrategy() == ChunkingStrategy.NONE) {
131+
var offsets = NoopChunker.INSTANCE.chunk(input.input(), chunkingSettings);
132+
List<ChunkedInput> ret = new ArrayList<>();
133+
for (var offset : offsets) {
134+
ret.add(new ChunkedInput(inputText.substring(offset.start(), offset.end()), offset.start(), offset.end()));
135+
}
136+
return ret;
137+
} else if (chunkingSettings.getChunkingStrategy() == ChunkingStrategy.WORD) {
130138
WordBoundaryChunker chunker = new WordBoundaryChunker();
131139
WordBoundaryChunkingSettings wordBoundaryChunkingSettings = (WordBoundaryChunkingSettings) chunkingSettings;
132140
List<WordBoundaryChunker.ChunkOffset> offsets = chunker.chunk(

x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/InferenceNamedWriteablesProvider.java

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@
2626
import org.elasticsearch.xpack.core.inference.results.TextEmbeddingByteResults;
2727
import org.elasticsearch.xpack.core.inference.results.TextEmbeddingFloatResults;
2828
import org.elasticsearch.xpack.inference.action.task.StreamingTaskManager;
29+
import org.elasticsearch.xpack.inference.chunking.NoneChunkingSettings;
2930
import org.elasticsearch.xpack.inference.chunking.SentenceBoundaryChunkingSettings;
3031
import org.elasticsearch.xpack.inference.chunking.WordBoundaryChunkingSettings;
3132
import org.elasticsearch.xpack.inference.common.amazon.AwsSecretSettings;
@@ -553,6 +554,9 @@ private static void addInternalNamedWriteables(List<NamedWriteableRegistry.Entry
553554
}
554555

555556
private static void addChunkingSettingsNamedWriteables(List<NamedWriteableRegistry.Entry> namedWriteables) {
557+
namedWriteables.add(
558+
new NamedWriteableRegistry.Entry(ChunkingSettings.class, NoneChunkingSettings.NAME, in -> NoneChunkingSettings.INSTANCE)
559+
);
556560
namedWriteables.add(
557561
new NamedWriteableRegistry.Entry(ChunkingSettings.class, WordBoundaryChunkingSettings.NAME, WordBoundaryChunkingSettings::new)
558562
);

x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/chunking/ChunkerBuilder.java

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ public static Chunker fromChunkingStrategy(ChunkingStrategy chunkingStrategy) {
1616
}
1717

1818
return switch (chunkingStrategy) {
19+
case NONE -> NoopChunker.INSTANCE;
1920
case WORD -> new WordBoundaryChunker();
2021
case SENTENCE -> new SentenceBoundaryChunker();
2122
};

x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/chunking/ChunkingSettingsBuilder.java

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,7 @@ public static ChunkingSettings fromMap(Map<String, Object> settings, boolean ret
4545
settings.get(ChunkingSettingsOptions.STRATEGY.toString()).toString()
4646
);
4747
return switch (chunkingStrategy) {
48+
case NONE -> NoneChunkingSettings.INSTANCE;
4849
case WORD -> WordBoundaryChunkingSettings.fromMap(new HashMap<>(settings));
4950
case SENTENCE -> SentenceBoundaryChunkingSettings.fromMap(new HashMap<>(settings));
5051
};
Lines changed: 110 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,110 @@
1+
/*
2+
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
3+
* or more contributor license agreements. Licensed under the Elastic License
4+
* 2.0; you may not use this file except in compliance with the Elastic License
5+
* 2.0.
6+
*/
7+
8+
package org.elasticsearch.xpack.inference.chunking;
9+
10+
import org.elasticsearch.TransportVersion;
11+
import org.elasticsearch.TransportVersions;
12+
import org.elasticsearch.common.Strings;
13+
import org.elasticsearch.common.ValidationException;
14+
import org.elasticsearch.common.io.stream.StreamOutput;
15+
import org.elasticsearch.inference.ChunkingSettings;
16+
import org.elasticsearch.inference.ChunkingStrategy;
17+
import org.elasticsearch.xcontent.XContentBuilder;
18+
19+
import java.io.IOException;
20+
import java.util.Arrays;
21+
import java.util.Locale;
22+
import java.util.Map;
23+
import java.util.Objects;
24+
import java.util.Set;
25+
26+
public class NoneChunkingSettings implements ChunkingSettings {
27+
public static final String NAME = "NoneChunkingSettings";
28+
public static NoneChunkingSettings INSTANCE = new NoneChunkingSettings();
29+
30+
private static final ChunkingStrategy STRATEGY = ChunkingStrategy.NONE;
31+
private static final Set<String> VALID_KEYS = Set.of(ChunkingSettingsOptions.STRATEGY.toString());
32+
33+
private NoneChunkingSettings() {}
34+
35+
@Override
36+
public ChunkingStrategy getChunkingStrategy() {
37+
return STRATEGY;
38+
}
39+
40+
@Override
41+
public String getWriteableName() {
42+
return NAME;
43+
}
44+
45+
@Override
46+
public TransportVersion getMinimalSupportedVersion() {
47+
throw new IllegalStateException("not used");
48+
}
49+
50+
@Override
51+
public boolean supportsVersion(TransportVersion version) {
52+
return version.isPatchFrom(TransportVersions.NONE_CHUNKING_STRATEGY_8_19)
53+
|| version.onOrAfter(TransportVersions.NONE_CHUNKING_STRATEGY);
54+
}
55+
56+
@Override
57+
public void writeTo(StreamOutput out) throws IOException {}
58+
59+
@Override
60+
public Map<String, Object> asMap() {
61+
return Map.of(ChunkingSettingsOptions.STRATEGY.toString(), STRATEGY.toString().toLowerCase(Locale.ROOT));
62+
}
63+
64+
public static NoneChunkingSettings fromMap(Map<String, Object> map) {
65+
ValidationException validationException = new ValidationException();
66+
67+
var invalidSettings = map.keySet().stream().filter(key -> VALID_KEYS.contains(key) == false).toArray();
68+
if (invalidSettings.length > 0) {
69+
validationException.addValidationError(
70+
Strings.format(
71+
"When chunking is disabled (none), settings can not have the following: %s",
72+
Arrays.toString(invalidSettings)
73+
)
74+
);
75+
}
76+
77+
if (validationException.validationErrors().isEmpty() == false) {
78+
throw validationException;
79+
}
80+
81+
return NoneChunkingSettings.INSTANCE;
82+
}
83+
84+
@Override
85+
public XContentBuilder toXContent(XContentBuilder builder, Params params) throws IOException {
86+
builder.startObject();
87+
{
88+
builder.field(ChunkingSettingsOptions.STRATEGY.toString(), STRATEGY);
89+
}
90+
builder.endObject();
91+
return builder;
92+
}
93+
94+
@Override
95+
public boolean equals(Object o) {
96+
if (this == o) return true;
97+
if (o == null || getClass() != o.getClass()) return false;
98+
return true;
99+
}
100+
101+
@Override
102+
public int hashCode() {
103+
return Objects.hash(getClass());
104+
}
105+
106+
@Override
107+
public String toString() {
108+
return Strings.toString(this);
109+
}
110+
}
Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
/*
2+
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
3+
* or more contributor license agreements. Licensed under the Elastic License
4+
* 2.0; you may not use this file except in compliance with the Elastic License
5+
* 2.0.
6+
*/
7+
8+
package org.elasticsearch.xpack.inference.chunking;
9+
10+
import org.elasticsearch.common.Strings;
11+
import org.elasticsearch.inference.ChunkingSettings;
12+
import org.elasticsearch.xpack.inference.services.openai.embeddings.OpenAiEmbeddingsModel;
13+
14+
import java.util.List;
15+
16+
/**
17+
* A {@link Chunker} implementation that returns the input unchanged (no chunking is performed).
18+
*
19+
* <p><b>WARNING</b>If the input exceeds the maximum token limit, some services (such as {@link OpenAiEmbeddingsModel})
20+
* may return an error.
21+
* </p>
22+
*/
23+
public class NoopChunker implements Chunker {
24+
public static final NoopChunker INSTANCE = new NoopChunker();
25+
26+
private NoopChunker() {}
27+
28+
@Override
29+
public List<ChunkOffset> chunk(String input, ChunkingSettings chunkingSettings) {
30+
if (chunkingSettings instanceof NoneChunkingSettings) {
31+
return List.of(new ChunkOffset(0, input.length()));
32+
} else {
33+
throw new IllegalArgumentException(
34+
Strings.format("NoopChunker can't use ChunkingSettings with strategy [%s]", chunkingSettings.getChunkingStrategy())
35+
);
36+
}
37+
}
38+
}

0 commit comments

Comments
 (0)