Skip to content

Conversation

@Beihao-Zhou
Copy link
Member

@Beihao-Zhou Beihao-Zhou commented Jun 18, 2024

Implement Proposal at #2316

Encoding

HNSW vector field metadata encoding:

ns | FIELD_META | index name | field name -> field flag | vector_type | dimension | distance_metric | initial_cap | m | ef_construction | ef_runtime | epsilon | num_levels

HNSW node index encoding:

ns | FIELD | index name | field name | level | NODE | key -> num_neighbours | vector dimension | [vector...]

HNSW edge index encoding:

ns | FIELD | index name | field name | level | EDGE |  key1 | key2 -> (nil)

Reference for other index encoding: #2329

Future steps

  • Add the plan operator and the corresponding executor
  • Add expression node (i.e. SQL/RediSearch parsers) for vector search
  • Modify some passes (eg. index_selection) to convert the expression node to plan operator
  • Improve HnswIndex construction (Avoid HnswIndex heavy construction because of mt19937 #2398)

@Beihao-Zhou Beihao-Zhou changed the title [Draft] Add HNSW encoding index & search/insertion algorithm feat: [Draft] Add HNSW encoding index & search/insertion algorithm Jun 25, 2024
@Beihao-Zhou Beihao-Zhou changed the title feat: [Draft] Add HNSW encoding index & search/insertion algorithm feat(search) Add HNSW encoding index & search/insertion algorithm Jul 1, 2024
@Beihao-Zhou Beihao-Zhou changed the title feat(search) Add HNSW encoding index & search/insertion algorithm feat(search): Add HNSW encoding index & search/insertion algorithm Jul 1, 2024
@Beihao-Zhou Beihao-Zhou marked this pull request as ready for review July 7, 2024 20:35
@Beihao-Zhou
Copy link
Member Author

Hi @PragmaTwice , this PR is ready for review! :)

Copy link
Member

@PragmaTwice PragmaTwice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rest looks good to me. Thank you!

Also there are still some clang-tidy issues that need to be fixed.

@Beihao-Zhou Beihao-Zhou changed the title feat(search): Add HNSW encoding index & search/insertion algorithm feat(search): Add HNSW encoding index & insertion/deletion algorithm Jul 9, 2024
Copy link
Member

@PragmaTwice PragmaTwice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks fine to me.

But there are some issues in CI:

  • one unit test case failed in macOS arm64,
  • some memory issues (likely use-after-free) reported by ASan/TSan.

Could you try to investigate them?

Copy link
Contributor

@Yangsx-1 Yangsx-1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there is any heuristic logic to avoid isolated cluster?

@Beihao-Zhou
Copy link
Member Author

I wonder if there is any heuristic logic to avoid isolated cluster?

@Yangsx-1 Good question but not yet, the plan for the PR is just to implement the hnsw construction first so didn't think ahead that much. Also I saw KQIR cannot be enabled on cluster mode yet[1], so also didn't take this into the scope of this PR.

[1] KQIR: a query engine for Apache Kvrocks that supports both SQL and RediSearch queries

@Beihao-Zhou
Copy link
Member Author

Beihao-Zhou commented Jul 12, 2024

The code looks fine to me.

But there are some issues in CI:

  • one unit test case failed in macOS arm64,
  • some memory issues (likely use-after-free) reported by ASan/TSan.

Could you try to investigate them?

@git-hulk @PragmaTwice
The issue was caused by the ComputeSimilarity calculates the distance between VectorItem based on the HnswVectorMetadata. In the unit test, I initialized VectorItem where its vector size less than metadata->dim, so looping through the vector causes memory leak.

I changed the code with one VectorItem::Create to do this validation early. Let me know if this still looks good to you <3

CR: 224141f
Successful workflow: https://github.com/Beihao-Zhou/kvrocks/actions/runs/9914854795

@sonarqubecloud
Copy link

Quality Gate Failed Quality Gate failed

Failed conditions
2 Security Hotspots
D Reliability Rating on New Code (required ≥ A)

See analysis details on SonarCloud

Catch issues before they fail your Quality Gate with our IDE extension SonarLint

@PragmaTwice PragmaTwice merged commit 12269d7 into apache:unstable Jul 13, 2024
@PragmaTwice
Copy link
Member

PragmaTwice commented Jul 13, 2024

Awesome. Thank you for your contribution!

@PragmaTwice
Copy link
Member

Hi @Beihao-Zhou , could you also open a tracking issue to track all issues and PRs for vector search in Kvrocks?

@Beihao-Zhou
Copy link
Member Author

Hi @Beihao-Zhou , could you also open a tracking issue to track all issues and PRs for vector search in Kvrocks?

Sure, will do that later today <3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants