Skip to content

[8.19] Add none chunking strategy to disable automatic chunking for inference endpoints #129324

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jun 12, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions docs/changelog/129150.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
pr: 129150
summary: Add `none` chunking strategy to disable automatic chunking for inference
endpoints
area: Machine Learning
type: feature
issues: []
54 changes: 50 additions & 4 deletions docs/reference/mapping/types/semantic-text.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -100,18 +100,19 @@ If not specified, the {infer} endpoint defined by `inference_id` will be used at
(Optional, object) Settings for chunking text into smaller passages.
If specified, these will override the chunking settings set in the {infer-cap} endpoint associated with `inference_id`.
If chunking settings are updated, they will not be applied to existing documents until they are reindexed.
To completely disable chunking, use the `none` chunking strategy.

.Valid values for `chunking_settings`
[%collapsible%open]
====
`type`:::
Indicates the type of chunking strategy to use.
Valid values are `word` or `sentence`.
Valid values are `none`, word` or `sentence`.
Required.

`max_chunk_size`:::
The maximum number of works in a chunk.
Required.
The maximum number of words in a chunk.
Required for `word` and `sentence` strategies.

`overlap`:::
The number of overlapping words allowed in chunks.
Expand All @@ -123,6 +124,10 @@ The number of overlapping words allowed in chunks.
Valid values are `0` or `1`.
Required for `sentence` type chunking settings.

WARNING: If the input exceeds the maximum token limit of the underlying model, some services (such as OpenAI) may return an
error. In contrast, the `elastic` and `elasticsearch` services will automatically truncate the input to fit within the
model's limit.

====

[discrete]
Expand All @@ -147,7 +152,48 @@ When querying, the individual passages will be automatically searched for each d

For more details on chunking and how to configure chunking settings, see <<infer-chunking-config, Configuring chunking>> in the Inference API documentation.

Refer to <<semantic-search-semantic-text,this tutorial>> to learn more about semantic search using `semantic_text` and the `semantic` query.
You can also pre-chunk the input by sending it to Elasticsearch as an array of strings.
Example:

[source,console]
------------------------------------------------------------
PUT test-index
{
"mappings": {
"properties": {
"my_semantic_field": {
"type": "semantic_text",
"chunking_settings": {
"strategy": "none" <1>
}
}
}
}
}
------------------------------------------------------------
// TEST[skip:Requires inference endpoint]
<1> Disable chunking on `my_semantic_field`.

[source,console]
------------------------------------------------------------
PUT test-index/_doc/1
{
"my_semantic_field": ["my first chunk", "my second chunk"] <1>
}
------------------------------------------------------------
// TEST[skip:Requires inference endpoint]
<1> The text is pre-chunked and provided as an array of strings.
Each element in the array represents a single chunk that will be sent directly to the inference service without further chunking.

**Important considerations**:

* When providing pre-chunked input, ensure that you set the chunking strategy to `none` to avoid additional processing.
* Each chunk should be sized carefully, staying within the token limit of the inference service and the underlying model.
* If a chunk exceeds the model's token limit, the behavior depends on the service:
* Some services (such as OpenAI) will return an error.
* Others (such as `elastic` and `elasticsearch`) will automatically truncate the input.

Refer to <<semantic-search-semantic-text,this tutorial>> to learn more about semantic search using `semantic_text`.

[discrete]
[[semantic-text-highlighting]]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -239,6 +239,7 @@ static TransportVersion def(int id) {
public static final TransportVersion SEARCH_SOURCE_EXCLUDE_VECTORS_PARAM_8_19 = def(8_841_0_46);
public static final TransportVersion ML_INFERENCE_MISTRAL_CHAT_COMPLETION_ADDED_8_19 = def(8_841_0_47);
public static final TransportVersion ML_INFERENCE_ELASTIC_RERANK_ADDED_8_19 = def(8_841_0_48);
public static final TransportVersion NONE_CHUNKING_STRATEGY_8_19 = def(8_841_0_49);

/*
* STOP! READ THIS FIRST! No, really,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,8 @@

public enum ChunkingStrategy {
WORD("word"),
SENTENCE("sentence");
SENTENCE("sentence"),
NONE("none");

private final String chunkingStrategy;

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@
import org.elasticsearch.inference.TaskSettings;
import org.elasticsearch.inference.TaskType;
import org.elasticsearch.xcontent.XContentBuilder;
import org.elasticsearch.xpack.inference.chunking.NoopChunker;
import org.elasticsearch.xpack.inference.chunking.WordBoundaryChunker;
import org.elasticsearch.xpack.inference.chunking.WordBoundaryChunkingSettings;

Expand Down Expand Up @@ -126,7 +127,14 @@ protected List<ChunkedInput> chunkInputs(ChunkInferenceInput input) {
}

List<ChunkedInput> chunkedInputs = new ArrayList<>();
if (chunkingSettings.getChunkingStrategy() == ChunkingStrategy.WORD) {
if (chunkingSettings.getChunkingStrategy() == ChunkingStrategy.NONE) {
var offsets = NoopChunker.INSTANCE.chunk(input.input(), chunkingSettings);
List<ChunkedInput> ret = new ArrayList<>();
for (var offset : offsets) {
ret.add(new ChunkedInput(inputText.substring(offset.start(), offset.end()), offset.start(), offset.end()));
}
return ret;
} else if (chunkingSettings.getChunkingStrategy() == ChunkingStrategy.WORD) {
WordBoundaryChunker chunker = new WordBoundaryChunker();
WordBoundaryChunkingSettings wordBoundaryChunkingSettings = (WordBoundaryChunkingSettings) chunkingSettings;
List<WordBoundaryChunker.ChunkOffset> offsets = chunker.chunk(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@
import org.elasticsearch.xpack.core.inference.results.TextEmbeddingByteResults;
import org.elasticsearch.xpack.core.inference.results.TextEmbeddingFloatResults;
import org.elasticsearch.xpack.inference.action.task.StreamingTaskManager;
import org.elasticsearch.xpack.inference.chunking.NoneChunkingSettings;
import org.elasticsearch.xpack.inference.chunking.SentenceBoundaryChunkingSettings;
import org.elasticsearch.xpack.inference.chunking.WordBoundaryChunkingSettings;
import org.elasticsearch.xpack.inference.common.amazon.AwsSecretSettings;
Expand Down Expand Up @@ -553,6 +554,9 @@ private static void addInternalNamedWriteables(List<NamedWriteableRegistry.Entry
}

private static void addChunkingSettingsNamedWriteables(List<NamedWriteableRegistry.Entry> namedWriteables) {
namedWriteables.add(
new NamedWriteableRegistry.Entry(ChunkingSettings.class, NoneChunkingSettings.NAME, in -> NoneChunkingSettings.INSTANCE)
);
namedWriteables.add(
new NamedWriteableRegistry.Entry(ChunkingSettings.class, WordBoundaryChunkingSettings.NAME, WordBoundaryChunkingSettings::new)
);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ public static Chunker fromChunkingStrategy(ChunkingStrategy chunkingStrategy) {
}

return switch (chunkingStrategy) {
case NONE -> NoopChunker.INSTANCE;
case WORD -> new WordBoundaryChunker();
case SENTENCE -> new SentenceBoundaryChunker();
};
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@ public static ChunkingSettings fromMap(Map<String, Object> settings, boolean ret
settings.get(ChunkingSettingsOptions.STRATEGY.toString()).toString()
);
return switch (chunkingStrategy) {
case NONE -> NoneChunkingSettings.INSTANCE;
case WORD -> WordBoundaryChunkingSettings.fromMap(new HashMap<>(settings));
case SENTENCE -> SentenceBoundaryChunkingSettings.fromMap(new HashMap<>(settings));
};
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
/*
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
* or more contributor license agreements. Licensed under the Elastic License
* 2.0; you may not use this file except in compliance with the Elastic License
* 2.0.
*/

package org.elasticsearch.xpack.inference.chunking;

import org.elasticsearch.TransportVersion;
import org.elasticsearch.TransportVersions;
import org.elasticsearch.common.Strings;
import org.elasticsearch.common.ValidationException;
import org.elasticsearch.common.io.stream.StreamOutput;
import org.elasticsearch.inference.ChunkingSettings;
import org.elasticsearch.inference.ChunkingStrategy;
import org.elasticsearch.xcontent.XContentBuilder;

import java.io.IOException;
import java.util.Arrays;
import java.util.Locale;
import java.util.Map;
import java.util.Objects;
import java.util.Set;

public class NoneChunkingSettings implements ChunkingSettings {
public static final String NAME = "NoneChunkingSettings";
public static NoneChunkingSettings INSTANCE = new NoneChunkingSettings();

private static final ChunkingStrategy STRATEGY = ChunkingStrategy.NONE;
private static final Set<String> VALID_KEYS = Set.of(ChunkingSettingsOptions.STRATEGY.toString());

private NoneChunkingSettings() {}

@Override
public ChunkingStrategy getChunkingStrategy() {
return STRATEGY;
}

@Override
public String getWriteableName() {
return NAME;
}

@Override
public TransportVersion getMinimalSupportedVersion() {
return TransportVersions.NONE_CHUNKING_STRATEGY_8_19;
}

@Override
public void writeTo(StreamOutput out) throws IOException {}

@Override
public Map<String, Object> asMap() {
return Map.of(ChunkingSettingsOptions.STRATEGY.toString(), STRATEGY.toString().toLowerCase(Locale.ROOT));
}

public static NoneChunkingSettings fromMap(Map<String, Object> map) {
ValidationException validationException = new ValidationException();

var invalidSettings = map.keySet().stream().filter(key -> VALID_KEYS.contains(key) == false).toArray();
if (invalidSettings.length > 0) {
validationException.addValidationError(
Strings.format(
"When chunking is disabled (none), settings can not have the following: %s",
Arrays.toString(invalidSettings)
)
);
}

if (validationException.validationErrors().isEmpty() == false) {
throw validationException;
}

return NoneChunkingSettings.INSTANCE;
}

@Override
public XContentBuilder toXContent(XContentBuilder builder, Params params) throws IOException {
builder.startObject();
{
builder.field(ChunkingSettingsOptions.STRATEGY.toString(), STRATEGY);
}
builder.endObject();
return builder;
}

@Override
public boolean equals(Object o) {
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
return true;
}

@Override
public int hashCode() {
return Objects.hash(getClass());
}

@Override
public String toString() {
return Strings.toString(this);
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
/*
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
* or more contributor license agreements. Licensed under the Elastic License
* 2.0; you may not use this file except in compliance with the Elastic License
* 2.0.
*/

package org.elasticsearch.xpack.inference.chunking;

import org.elasticsearch.common.Strings;
import org.elasticsearch.inference.ChunkingSettings;
import org.elasticsearch.xpack.inference.services.openai.embeddings.OpenAiEmbeddingsModel;

import java.util.List;

/**
* A {@link Chunker} implementation that returns the input unchanged (no chunking is performed).
*
* <p><b>WARNING</b>If the input exceeds the maximum token limit, some services (such as {@link OpenAiEmbeddingsModel})
* may return an error.
* </p>
*/
public class NoopChunker implements Chunker {
public static final NoopChunker INSTANCE = new NoopChunker();

private NoopChunker() {}

@Override
public List<ChunkOffset> chunk(String input, ChunkingSettings chunkingSettings) {
if (chunkingSettings instanceof NoneChunkingSettings) {
return List.of(new ChunkOffset(0, input.length()));
} else {
throw new IllegalArgumentException(
Strings.format("NoopChunker can't use ChunkingSettings with strategy [%s]", chunkingSettings.getChunkingStrategy())
);
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,13 @@ public void testValidChunkingStrategy() {
}

private Map<ChunkingStrategy, Class<? extends Chunker>> chunkingStrategyToExpectedChunkerClassMap() {
return Map.of(ChunkingStrategy.WORD, WordBoundaryChunker.class, ChunkingStrategy.SENTENCE, SentenceBoundaryChunker.class);
return Map.of(
ChunkingStrategy.NONE,
NoopChunker.class,
ChunkingStrategy.WORD,
WordBoundaryChunker.class,
ChunkingStrategy.SENTENCE,
SentenceBoundaryChunker.class
);
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,9 @@ public static ChunkingSettings createRandomChunkingSettings() {
ChunkingStrategy randomStrategy = randomFrom(ChunkingStrategy.values());

switch (randomStrategy) {
case NONE -> {
return NoneChunkingSettings.INSTANCE;
}
case WORD -> {
var maxChunkSize = randomIntBetween(10, 300);
return new WordBoundaryChunkingSettings(maxChunkSize, randomIntBetween(1, maxChunkSize / 2));
Expand All @@ -37,15 +40,15 @@ public static Map<String, Object> createRandomChunkingSettingsMap() {
chunkingSettingsMap.put(ChunkingSettingsOptions.STRATEGY.toString(), randomStrategy.toString());

switch (randomStrategy) {
case NONE -> {
}
case WORD -> {
var maxChunkSize = randomIntBetween(10, 300);
chunkingSettingsMap.put(ChunkingSettingsOptions.MAX_CHUNK_SIZE.toString(), maxChunkSize);
chunkingSettingsMap.put(ChunkingSettingsOptions.OVERLAP.toString(), randomIntBetween(1, maxChunkSize / 2));

}
case SENTENCE -> {
chunkingSettingsMap.put(ChunkingSettingsOptions.MAX_CHUNK_SIZE.toString(), randomIntBetween(20, 300));
}
case SENTENCE -> chunkingSettingsMap.put(ChunkingSettingsOptions.MAX_CHUNK_SIZE.toString(), randomIntBetween(20, 300));
default -> {
}
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,22 @@ public void testEmptyInput_SentenceChunker() {
assertThat(batches, empty());
}

public void testEmptyInput_NoopChunker() {
var batches = new EmbeddingRequestChunker<>(List.of(), 10, NoneChunkingSettings.INSTANCE).batchRequestsWithListeners(
testListener()
);
assertThat(batches, empty());
}

public void testAnyInput_NoopChunker() {
var randomInput = randomAlphaOfLengthBetween(100, 1000);
var batches = new EmbeddingRequestChunker<>(List.of(new ChunkInferenceInput(randomInput)), 10, NoneChunkingSettings.INSTANCE)
.batchRequestsWithListeners(testListener());
assertThat(batches, hasSize(1));
assertThat(batches.get(0).batch().inputs().get(), hasSize(1));
assertThat(batches.get(0).batch().inputs().get().get(0), Matchers.is(randomInput));
}

public void testWhitespaceInput_SentenceChunker() {
var batches = new EmbeddingRequestChunker<>(
List.of(new ChunkInferenceInput(" ")),
Expand Down
Loading