Add retries to `configureIndex` and `update` operations #318

aulorbe · 2024-12-19T01:10:20Z

Problem

We first shipped retries to upsert as a POC. Now that we are happy with that, we are expanding retries to configureIndex and update operations.

This PR includes updates to the retry logic itself, too: I found some errors when I applied it to configureIndex, yay!

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update
Infrastructure change (CI configs, etc)
Non-code change (docs, etc)
None of the above: (explain here)

Test Plan

CI passes.

To see the specific tasks where the Asana app for GitHub is being used, see below:
- https://app.asana.com/0/0/1208718597634858

aulorbe · 2024-12-19T01:11:01Z

src/errors/http.ts

@@ -198,6 +198,8 @@ export const mapHttpStatusError = (failedRequestInfo: FailedRequestInfo) => {
      return new PineconeInternalServerError(failedRequestInfo);
    case 501:
      return new PineconeNotImplementedError(failedRequestInfo);
+    case 503:


Whoops forgot to map this the 1st time around

aulorbe · 2024-12-19T01:11:21Z

src/integration/control/configureIndex.test.ts

-      // Scale up podType to x2
-      let state = true;
-      let retryCount = 0;
-      const maxRetries = 10;
-      while (state && retryCount < maxRetries) {
-        try {
-          await pinecone.configureIndex(podIndexName, {
-            spec: { pod: { podType: 'p1.x2' } },
-          });
-          state = false;
-        } catch (e) {
-          if (e instanceof PineconeInternalServerError) {
-            retryCount++;
-            await sleep(2000);
-          } else {
-            console.log('Unexpected error:', e);
-            throw e;
-          }
-        }
-      }


Don't need this now that we've got retries!

aulorbe · 2024-12-19T01:13:20Z

src/integration/data/vectors/upsertAndUpdate.test.ts

+    expect(callCount).toBe(2);
+  });
+
+  test('Update operation should retry 1x if server responds 1x with error and 1x with success', async () => {


Figured duplicating this type of test across upsert and update would be enough to justify me not duplicating it again for configureIndex, but lmk if you disagree!

(We should rly centralize this type of thing to avoid duplicating this logic, but I think this is okay for now)

It seems useful to validate that the calls themselves trigger retries as we'd expect, is that what you're talking about centralizing?

Mmm I'm not 100% sure we're on the same page -- when you say "the calls themselves," are you talking about the different async funcs that we could pass into the RetryWrapper?

Assuming you answer yes to the above, yes that's what I'd like to centralize.... something like we have a single parameterized test that confirms that < whatever async func > is retried n times

…connections" This reverts commit 6e357c1.

…de20

austin-denoble

Overall LGTM, it's nice that this is a pretty easy replacement due to how you set things up.

austin-denoble · 2024-12-20T01:27:49Z

src/integration/data/vectors/upsertAndUpdate.test.ts


  // Helper function to start the server with a specific response pattern
  const startMockServer = (shouldSucceedOnSecondCall: boolean) => {
    // Create http server
-    server = http.createServer((req, res) => {
+    server = http.createServer({ keepAlive: false }, (req, res) => {


Just curious, what happens when setting this to false?

It simply ensures any outstanding http connections close once whatever you're doing on the spun up server concludes. It's set as a default to true in Node20+, which is a new development, and something I thought might be the cause of our failing tests. Unfortunately, setting it to false (so that it remains false across 18 and 20) didn't fix the problem.

austin-denoble · 2024-12-20T01:34:02Z

src/utils/retries.ts

+    if (error?.status) {
+      return mapHttpStatusError(error);
+    }
+    return error; // Return original error if no mapping is needed


Do we know what errors we're seeing that end up without an associated error.status?

I can't quite remember off the top of my head, and definitely need to look into this in the future at some point, but basically something about the BasePineconeError class sometimes has status in the json obj that's printed out in the console (if you print the error), but that shows up as undefined when you do error.status.

austin-denoble · 2024-12-20T01:35:40Z

src/integration/data/vectors/upsertAndUpdate.test.ts

+    expect(callCount).toBe(2);
+  });
+
+  test('Update operation should retry 1x if server responds 1x with error and 1x with success', async () => {


It seems useful to validate that the calls themselves trigger retries as we'd expect, is that what you're talking about centralizing?

aulorbe · 2024-12-23T19:11:13Z

Re: CI/CD failures in Node20+:

I did a lot of research into why the http server we spin up in our integration tests fails in Node20+ but passes in Node18+. I honestly had to put a time-cap on my research into this, because I was finding a lot, but the TLDR is:

In Node20+, the native fetch API stabilized. With that stabilization came a major upgrade to Node's internal undici dependency... from 5.x to 7.x in this PR.
- undici change log here for v.6+, where they call out that they changed how the lib handles http errors
Within this upgrade were changes to how undici handles http connections. You can see more here.

Basically, I had to add a sleep to our final integration test because I was getting the following error when I was debugging in Node20+, which indicates an aborted TCP connection:

TypeError: fetch failed
        at node:internal/deps/undici/undici:13185:13 {
      [cause]: Error: read ECONNRESET
          at TCP.onStreamRead (node:internal/stream_base_commons:216:20) {
        errno: -54,
        code: 'ECONNRESET',
        syscall: 'read'
      }
    }

I tried a bunch of different ways to add a sleep, including adding timeouts to server.listen and http.createServer, as well as sleeps to the final afterEach method, but none fixed it. The only fix that resolved the testing error in Node20+ was adding the sleep to the Max retries exceeded w/o resolve test itself.

aulorbe added 3 commits December 18, 2024 14:35

Refactor retries, add them to more ops, refactor retry tests

1f1bfb8

Clean up

8df8e7d

Add new ops to README

bb9a6de

aulorbe commented Dec 19, 2024

View reviewed changes

Uncomment tests, remove todos

61b34d7

aulorbe requested a review from austin-denoble December 19, 2024 01:14

aulorbe marked this pull request as ready for review December 19, 2024 01:14

aulorbe added 4 commits December 18, 2024 17:18

Idk why format process indented this; fixing so workflow runs

3536e72

Add server.unref to deal with diff versions of Node and http connections

6e357c1

Revert "Add server.unref to deal with diff versions of Node and http …

6c5ba34

…connections" This reverts commit 6e357c1.

Try keepAlive=False since it changed default from False to True in No…

133b349

…de20

austin-denoble approved these changes Dec 20, 2024

View reviewed changes

Add timeout to bypass node20+ errors

28c1c64

aulorbe added 3 commits December 23, 2024 11:12

Cleanup

a04bdd0

Remove false

591b624

Merge branch 'main' into add-remaining-retries

60a8ad8

aulorbe merged commit f71419d into main Dec 23, 2024
32 checks passed

aulorbe deleted the add-remaining-retries branch December 23, 2024 19:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add retries to `configureIndex` and `update` operations #318

Add retries to `configureIndex` and `update` operations #318

Uh oh!

aulorbe commented Dec 19, 2024 •

edited

Loading

Uh oh!

aulorbe Dec 19, 2024

Uh oh!

aulorbe Dec 19, 2024

Uh oh!

aulorbe Dec 19, 2024

Uh oh!

aulorbe Dec 19, 2024

Uh oh!

austin-denoble Dec 20, 2024

Uh oh!

aulorbe Dec 20, 2024 •

edited

Loading

Uh oh!

austin-denoble left a comment

Uh oh!

austin-denoble Dec 20, 2024

Uh oh!

aulorbe Dec 20, 2024 •

edited

Loading

Uh oh!

austin-denoble Dec 20, 2024

Uh oh!

aulorbe Dec 20, 2024

Uh oh!

austin-denoble Dec 20, 2024

Uh oh!

aulorbe commented Dec 23, 2024 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Add retries to configureIndex and update operations #318

Add retries to configureIndex and update operations #318

Uh oh!

Conversation

aulorbe commented Dec 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Type of Change

Test Plan

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aulorbe Dec 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

austin-denoble left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aulorbe Dec 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aulorbe commented Dec 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Add retries to `configureIndex` and `update` operations #318

Add retries to `configureIndex` and `update` operations #318

aulorbe commented Dec 19, 2024 •

edited

Loading

aulorbe Dec 20, 2024 •

edited

Loading

aulorbe Dec 20, 2024 •

edited

Loading

aulorbe commented Dec 23, 2024 •

edited

Loading