Skip to content

Slow peer recovery of vector data #130785

Open
@weizijun

Description

@weizijun

Elasticsearch Version

main

Installed Plugins

No response

Java Version

bundled

OS Version

null

Problem Description

we find that the vector data input is slow in MemorySegmentIndexInput case, this result slow io in flat type vector search and peer recovery.
In peer recovery case, even if set indices.recovery.max_bytes_per_sec to a large value like “4000mb”, the rate of vec file copy is slow.
The size of every read size is 8kb, it can’t use the readahead mechanism.
Do you know this case, is some feature about MemorySegmentIndexInput, or if we don’t call MemorySegmentIndexInput.prefetch, it can’t use the readahead?
this is the iostat record:

Image

I added a line of code in RecoverySourceHandler:

currentInput.updateReadAdvice(ReadAdvice.SEQUENTIAL);

The rate of vec file copy is fast. The size of every read size become bigger.

Image Image

Steps to Reproduce

peer recovery of vector data

Logs (if relevant)

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    :Distributed Indexing/RecoveryAnything around constructing a new shard, either from a local or a remote source.>bugTeam:Distributed IndexingMeta label for Distributed Indexing team

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions