Skip to content

key pool can cause memtier to hang when there are multiple shards and request count is bellow pool size #204

Closed
@filipecosta90

Description

@filipecosta90

this can be easily reproduced with master ( or any previous stable version ) and more than 1 shards scenario on OSS cluster.
with a simple > 1 shard scenario:

$ redis-cli
127.0.0.1:6379> cluster slots
1) 1) (integer) 0
   2) (integer) 5461
   3) 1) "127.0.0.1"
      2) (integer) 6379
      3) "ae7607ffdb3519473d83a9420f3a00c162820a8a"
      4) (empty array)
2) 1) (integer) 5462
   2) (integer) 10923
   3) 1) "127.0.0.1"
      2) (integer) 6381
      3) "ffa17c93a302f721c18377c7256053ecb8175e97"
      4) (empty array)
3) 1) (integer) 10924
   2) (integer) 16383
   3) 1) "127.0.0.1"
      2) (integer) 6383
      3) "eafe2ee085ca92723662dd18661088f70962c197"
      4) (empty array)

you can see that there is no assurance that the 4 requests generated are for the connections that is sending the requests ( given we have 3 connections ).

$ memtier_benchmark  --threads 1 --clients 1 -n 4 --ratio 0:1 --cluster-mode --hide-histogram -D
Writing results to stdout
[RUN #1] Preparing benchmark client...
client.cpp:111: new client 0x55a181879400 successfully set up.
[RUN #1] Launching threads now...
shard_connection.cpp:372: sending cluster slots command.
shard_connection.cpp:425: cluster slot command successful
shard_connection.cpp:587: server 127.0.0.1:6379: GET key=[memtier-1168204]
shard_connection.cpp:587: server 127.0.0.1:6383: GET key=[memtier-6263819]
shard_connection.cpp:438: server 127.0.0.1:6379: handled response (first line): $-1, 0 hits, 1 misses
shard_connection.cpp:587: server 127.0.0.1:6379: GET key=[memtier-3771236]
shard_connection.cpp:438: server 127.0.0.1:6383: handled response (first line): $-1, 0 hits, 1 misses
shard_connection.cpp:587: server 127.0.0.1:6383: GET key=[memtier-5586315]
shard_connection.cpp:438: server 127.0.0.1:6379: handled response (first line): $-1, 0 hits, 1 misses
shard_connection.cpp:438: server 127.0.0.1:6383: handled response (first line): $-1, 0 hits, 1 misses
client.cpp:219: nothing else to do, test is finished.
[RUN #1 100%,   0 secs]  1 threads:           4 ops,    7782 (avg:    7782) ops/sec, 303.99KB/sec (avg: 303.99KB/sec),  0.20 (avg:  0.20) msec latency

this is due to the following codition

m_config->requests > 0 && m_reqs_processed >= m_config->requests

and https://github.com/RedisLabs/memtier_benchmark/blob/master/cluster_client.cpp#L352

// store key for other connection, if queue is not full
        key_index_pool* key_idx_pool = m_key_index_pools[other_conn_id];
        if (key_idx_pool->size() < KEY_INDEX_QUEUE_MAX_SIZE) {
            key_idx_pool->push(*key_index);
            m_reqs_generated++;
        }

This can be solved by avoiding pushing to other pools if we don't have enough requests to fill all pools of all shards.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions