Skip to content

Support one byte int vectors - [MOD-8207] #5377

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Jan 6, 2025
Merged

Conversation

GuyAv46
Copy link
Collaborator

@GuyAv46 GuyAv46 commented Dec 19, 2024

Describe the changes in the pull request

Implement support for new vector types INT8, UINT8.

Mark if applicable

  • This PR introduces API changes
  • This PR introduces serialization changes

@GuyAv46 GuyAv46 requested review from alonre24 and meiravgri January 6, 2025 07:44
@GuyAv46 GuyAv46 changed the title Support one byte ints - [MOD-8207] Support one byte int vectors - [MOD-8207] Jan 6, 2025
@GuyAv46 GuyAv46 marked this pull request as ready for review January 6, 2025 07:46
@GuyAv46 GuyAv46 enabled auto-merge January 6, 2025 07:46
Copy link

codecov bot commented Jan 6, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 87.56%. Comparing base (565633a) to head (d00dfa2).
Report is 17 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5377      +/-   ##
==========================================
+ Coverage   86.63%   87.56%   +0.92%     
==========================================
  Files         195      196       +1     
  Lines       34809    34846      +37     
==========================================
+ Hits        30158    30514     +356     
+ Misses       4651     4332     -319     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@alonre24 alonre24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's see if we need more tests...
Also let's bundle the new 7.99.2 vecsim tag

@GuyAv46 GuyAv46 requested review from meiravgri and alonre24 January 6, 2025 13:49
@GuyAv46 GuyAv46 added this pull request to the merge queue Jan 6, 2025
Merged via the queue into master with commit 0ba0c32 Jan 6, 2025
10 checks passed
@GuyAv46 GuyAv46 deleted the guyav-support_one_byte_int branch January 6, 2025 18:33
redisearch-backport-pull-request bot pushed a commit that referenced this pull request Jan 6, 2025
* [TEMP] define new types enum values

* support new types

* support JSON embeddings

* [TEMP] remove temporary definitions

* [TEMP] follow a VecSim branch with new types implemented

* added basic tests for new int types

* fix unit tests

* Follow VecSim's main

* follow an official VecSim tag (v7.99.2)

* added another test

(cherry picked from commit 0ba0c32)
@redisearch-backport-pull-request
Copy link
Contributor

Successfully created backport PR for 8.0:

github-merge-queue bot pushed a commit that referenced this pull request Jan 6, 2025
Support one byte int vectors - [MOD-8207] (#5377)

* [TEMP] define new types enum values

* support new types

* support JSON embeddings

* [TEMP] remove temporary definitions

* [TEMP] follow a VecSim branch with new types implemented

* added basic tests for new int types

* fix unit tests

* Follow VecSim's main

* follow an official VecSim tag (v7.99.2)

* added another test

(cherry picked from commit 0ba0c32)

Co-authored-by: GuyAv46 <[email protected]>
@ofiryanai
Copy link

⚠️ Scope Creep Detection - MODERATE

This PR implements functionality beyond the minimum Definition of Done requirements:

Scope Creep Analysis for PR #5377 against Jira Ticket MOD-8207

1. Gather Information

Jira Ticket MOD-8207 DoD Requirements:

  • JSON Handling:

    • Support for vector array elements as int64.
    • Conversion of int64 arrays into BLOB format.
  • Index Creation Schema API:

    • Development of a new API for index schema creation.

PR #5377 Code Changes:

  • Vector Type Definitions:

    • Added INT8 and UINT8 vector types.
  • JSON Handling:

    • Functions for handling INT8 and UINT8 types.
  • Vector Type Parsing:

    • Parsing logic for INT8 and UINT8.
  • Vector Index Adjustments:

    • Adjustments to handle new vector types.
  • Submodule Update:

    • Updated VectorSimilarity submodule.
  • Testing:

    • Extensive tests for INT8 and UINT8 types.

2. Compare Implementation vs Requirements

  • Mapped to DoD:

    • JSON handling functions for new vector types align with the requirement to support new data types, albeit not specifically int64.
  • Unmapped Changes:

    • Addition of INT8 and UINT8 types is not explicitly required by the DoD, which specifies int64.
    • Submodule update and extensive testing for INT8 and UINT8 types are not directly mapped to the DoD.

3. Categorize Scope Creep

  • Positive Scope Creep:

    • Testing Enhancements: Comprehensive tests for new vector types improve reliability and robustness.
    • Submodule Update: Ensures compatibility and potentially fixes existing issues.
  • Negative Scope Creep:

    • New Vector Types: Introduction of INT8 and UINT8 adds complexity not required by the DoD.
  • Neutral Scope Creep:

    • Parsing Logic: While not required, it aligns with potential future needs.

Scope Creep Assessment: MODERATE

Additional Implementations Beyond DoD:

  • Introduction of INT8 and UINT8 vector types.
  • Extensive testing for these new types.
  • Submodule update.

Recommendations for Handling Scope Creep:

  • Approve with Notes: Acknowledge the beneficial additions but highlight the deviation from the specified DoD.
  • Request Changes: Suggest splitting the PR to separate necessary changes from additional features.
  • Documentation Update: Ensure documentation reflects all new features and changes.

Suggested Actions:

  • Approve: If the additional features are deemed beneficial and align with future goals.
  • Request Changes: If the focus should remain strictly on the specified DoD requirements.

The PR introduces valuable enhancements but deviates from the strict requirements of the DoD. Consider the broader project goals when deciding on the next steps.

Scope Creep Assessment:
This PR should be reviewed as it contains implementations beyond the Definition of Done requirements

Recommendations:

  • Review if additional implementations are necessary for this ticket
  • Consider splitting scope creep into separate tickets/PRs
  • Ensure the core DoD requirements are fully addressed first
  • Document any architectural decisions for additional implementations

Next Steps:

  • If scope creep is intentional and beneficial, please document the rationale
  • If scope creep is unintentional, consider removing or deferring to future tickets
  • Ensure all DoD requirements are met before addressing additional features

This analysis was performed automatically by the PR Verification Agent.

@ofiryanai
Copy link

⚠️ Scope Creep Detection - MODERATE

This PR implements functionality beyond the minimum Definition of Done requirements:

Scope Creep Analysis for PR #5377 against Jira Ticket MOD-8207

1. Gather Information

Jira Ticket MOD-8207 DoD Requirements:

  • JSON Handling Enhancements: Support int64 vector elements.
  • Data Conversion: Convert int64 arrays into a blob format.
  • Index Creation Schema API: Introduce a new API for index creation schema.

PR #5377 Code Changes:

  • Vector Type Definitions: Added INT8 and UINT8 types.
  • JSON Handling: Functions for INT8 and UINT8 types.
  • Vector Type Parsing: Logic for new vector types.
  • Vector Index Adjustments: Support for new types.
  • Testing: Comprehensive tests for new vector types.

2. Compare Implementation vs Requirements

Mapping Code Changes to DoD Requirements:

  • JSON Handling Enhancements:
    • Implemented support for INT8 and UINT8, which partially aligns with the requirement for int64 support.
  • Data Conversion:
    • No explicit conversion of int64 arrays to blobs was found; focus was on INT8 and UINT8.
  • Index Creation Schema API:
    • No implementation found in the PR.

Identifying Additional Implementations:

  • New Vector Types: INT8 and UINT8 are beyond the specified int64 requirement.
  • Extensive Testing: While beneficial, the level of testing for INT8 and UINT8 goes beyond the DoD.

3. Categorize Scope Creep

  • Positive Scope Creep:

    • Extensive Testing: Ensures robustness and reliability, beneficial for future maintenance.
  • Negative Scope Creep:

    • New Vector Types: Introducing INT8 and UINT8 without explicit requirement for these types adds complexity and deviates from the int64 focus.
  • Neutral Scope Creep:

    • Vector Type Parsing and Adjustments: Necessary for supporting new types but not directly aligned with the DoD.

Scope Creep Assessment: MODERATE

Additional Implementations Beyond DoD:

  • Introduction of INT8 and UINT8 vector types.
  • Comprehensive testing for new vector types.

Recommendations for Handling Scope Creep:

  • Clarify Requirements: Confirm if INT8 and UINT8 were intended enhancements or if they should be deferred.
  • Focus on DoD: Prioritize implementing the int64 support and index creation schema API as per the original requirements.

Suggested Actions:

  • Request Changes: Address the missing DoD elements (e.g., int64 support, index creation schema API).
  • Split PR: Consider separating the new vector types into a different PR if they are not immediately required.

By focusing on the core requirements and addressing the identified scope creep, the PR can be aligned more closely with the original objectives of the Jira ticket.

Scope Creep Assessment:
This PR should be reviewed as it contains implementations beyond the Definition of Done requirements

Recommendations:

  • Review if additional implementations are necessary for this ticket
  • Consider splitting scope creep into separate tickets/PRs
  • Ensure the core DoD requirements are fully addressed first
  • Document any architectural decisions for additional implementations

Next Steps:

  • If scope creep is intentional and beneficial, please document the rationale
  • If scope creep is unintentional, consider removing or deferring to future tickets
  • Ensure all DoD requirements are met before addressing additional features

This analysis was performed automatically by the PR Verification Agent.

@ofiryanai
Copy link

⚠️ Scope Creep Detection - MODERATE

This PR implements functionality beyond the minimum Definition of Done requirements:

Scope Creep Analysis for PR #5377 against Jira Ticket MOD-8207

1. Gather Information

Jira Ticket MOD-8207 DoD Requirements:

  • JSON Handling: Support for int64 vector elements.
  • Data Conversion: Convert int64 arrays into blobs.
  • Index Creation Schema API: Develop an API for index schema creation.

PR #5377 Code Changes:

  • Vector Type Definitions: Added INT8 and UINT8 vector types.
  • JSON Handling: Functions for INT8 and UINT8 in src/json.c.
  • Vector Index Modifications: Support for new types in src/vector_index.c.
  • Schema Parsing: Logic for new types in src/spec.c.
  • Testing: Extensive tests in C++ and Python for new vector types.

2. Compare Implementation vs Requirements

Mapping Code Changes to DoD:

  • JSON Handling: Implemented for INT8 and UINT8, not int64 as required.
  • Data Conversion: No explicit conversion to blobs for int64 found.
  • Index Creation Schema API: Parsing logic aligns with DoD.

Identified Additional Implementations:

  • New Vector Types: INT8 and UINT8 are beyond the int64 requirement.
  • Extensive Testing: While beneficial, the level of testing exceeds basic DoD needs.

3. Categorize Scope Creep

  • Positive Scope Creep:

    • Extensive Testing: Ensures robustness and reliability, beneficial for long-term maintenance.
  • Negative Scope Creep:

    • New Vector Types: Introduction of INT8 and UINT8 without explicit requirement in DoD adds complexity.
  • Neutral Scope Creep:

    • Additional Type Definitions: While not required, they could be useful for future enhancements.

Scope Creep Assessment: MODERATE

Additional Implementations Beyond DoD:

  • Introduction of INT8 and UINT8 vector types.
  • Comprehensive testing for new types.

Recommendations for Handling Scope Creep:

  • Approve with Notes: Acknowledge the beneficial testing but request clarification on the necessity of new vector types.
  • Documentation Update: Ensure documentation reflects new capabilities if retained.
  • Future Planning: Consider splitting vector type additions into a separate PR if not immediately needed.

Suggested Actions:

  • Request Changes: Clarify the need for INT8 and UINT8 types or adjust the PR to align strictly with DoD.
  • Approve with Conditions: If new types are justified, ensure documentation and benchmarks are updated accordingly.

This analysis identifies areas where the implementation exceeds the defined requirements, providing a balanced view of potential benefits and complexities introduced.

Scope Creep Assessment:
This PR should be reviewed as it contains implementations beyond the Definition of Done requirements

Recommendations:

  • Review if additional implementations are necessary for this ticket
  • Consider splitting scope creep into separate tickets/PRs
  • Ensure the core DoD requirements are fully addressed first
  • Document any architectural decisions for additional implementations

Next Steps:

  • If scope creep is intentional and beneficial, please document the rationale
  • If scope creep is unintentional, consider removing or deferring to future tickets
  • Ensure all DoD requirements are met before addressing additional features

This analysis was performed automatically by the PR Verification Agent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants