Skip to content

Fix autoexpand during node replace #96281

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

idegtiarenko
Copy link
Contributor

Prior to this change NodeReplacementAllocationDecider was unconditionally
skipping both replacement source and target nodes when calculation auto-expand
replicas. This is fixed by autoexpanding to the replacement node if source node
already had shards of the index.

Closes: #89527

Prior to this change NodeReplacementAllocationDecider was unconditionally
skipping both replacement source and target nodes when calculation auto-expand
replicas. This is fixed by autoexpanding to the replacement node if source node
already had shards of the index
@idegtiarenko idegtiarenko added >bug :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v8.9.0 labels May 23, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@elasticsearchmachine
Copy link
Collaborator

Hi @idegtiarenko, I've created a changelog YAML for you.

shardRouting.currentNodeId(),
node.nodeId()
node.nodeId(),
shardRouting.currentNodeId()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The order was flipped

EmptyClusterInfoService.INSTANCE,
EmptySnapshotsInfoService.INSTANCE,
TestShardRoutingRoleStrategies.DEFAULT_ROLE_ONLY
);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was unused test setup, removed in a separate commit to make it easier to review

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this. I left some comments to address.

)
.build();
allocation = new RoutingAllocation(allocationDeciders, state, null, null, 0);
assertThatAutoExpandReplicasDidNotContract(indexMetadata, allocation);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add verification of the decider also after the source node is shutdown, but with the shutdown record still in place?

);

// when replacing NODE_A with NODE_B
state = ClusterState.builder(state)
.nodes(DiscoveryNodes.builder().add(NODE_A).add(NODE_B).add(NODE_C).build())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to test the scenario where we add the shutdown indication first and then add the node later (verifying that no contraction happens in between).

I specifically think that will fail due to NodeShutdownAllocationDecider saying NO to the source node - and there is no target node yet.

# Conflicts:
#	server/src/main/java/org/elasticsearch/cluster/routing/allocation/decider/NodeShutdownAllocationDecider.java
#	server/src/test/java/org/elasticsearch/cluster/routing/allocation/decider/NodeReplacementAllocationDeciderTests.java
#	server/src/test/java/org/elasticsearch/cluster/routing/allocation/decider/NodeShutdownAllocationDeciderTests.java
# Conflicts:
#	server/src/main/java/org/elasticsearch/cluster/routing/allocation/decider/NodeReplacementAllocationDecider.java
@idegtiarenko
Copy link
Contributor Author

@elasticsearchmachine please run elasticsearch-ci/bwc

@henningandersen henningandersen added the :Core/Infra/Node Lifecycle Node startup, bootstrapping, and shutdown label Jun 4, 2023
@elasticsearchmachine elasticsearchmachine added the Team:Core/Infra Meta label for core/infra team label Jun 4, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the additional work on this. I left a number of smaller comments and test comments.

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of comments from last review seems unaddressed (if not, please point me to the verifications).

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@idegtiarenko idegtiarenko merged commit 1bb0fa9 into elastic:main Jun 7, 2023
@idegtiarenko idegtiarenko deleted the fix_autoexpand_during_node_replace branch June 7, 2023 08:49
kingherc pushed a commit to kingherc/elasticsearch that referenced this pull request Aug 25, 2023
Prior to this change NodeReplacementAllocationDecider was unconditionally
skipping both replacement source and target nodes when calculation auto-expand
replicas. This is fixed by autoexpanding to the replacement node if source node
already had shards of the index

Backport of PR elastic#96281 amended for 7.17.x

Closes elastic#89527
kingherc added a commit that referenced this pull request Aug 28, 2023
Prior to this change NodeReplacementAllocationDecider was unconditionally skipping both replacement source and target nodes when calculation auto-expand replicas. This is fixed by autoexpanding to the replacement node if source node already had shards of the index

Backport of PR #96281 amended for 7.17.x

Closes #89527

Co-authored-by: Ievgen Degtiarenko <[email protected]>
@DaveCTurner
Copy link
Contributor

Added the v7.17.13 label because this has now been backported /cc @kingherc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Core/Infra/Node Lifecycle Node startup, bootstrapping, and shutdown :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) Team:Core/Infra Meta label for core/infra team Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v7.17.13 v8.9.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Shutdown replace contracts auto-expand replicas
4 participants