Skip to content

Conversation

joe-baudisch
Copy link

Description

"if-unmodified-since" header field is considered for patch requests for the following endpoints:

  • datasets (v3/v4)
  • attachments (v4)
  • instruments
  • origdatablocks (v3/v4)
  • samples
  • proposals

If the "if-unmodified-since" header field is missing, the patch request will still be excuted.

Fixes

@Junjiequan
Copy link
Member

Junjiequan commented Aug 28, 2025

Isn't if-unmodified-since an optional field in the header? that it has to be set from the client side to make it actually useful.
I see the issues you are trying to resolve here, but it would be nice to do concurrency control on the server side instead of relying on client

@joe-baudisch
Copy link
Author

joe-baudisch commented Aug 29, 2025

Isn't if-unmodified-since an optional field in the header? that it has to be set from the client side to make it actually useful. I see the issues you are trying to resolve here, but it would be nice to do concurrency control on the server side instead of relying on client

In my opinion this approach is entirely server-side: the server tracks timestamps and controls concurrency.
This is a form of optimistic concurrency control, where the client assumes the resource hasn't changed and wants to avoid overwriting newer data. Clients still need to send If-Unmodified-Since, but they don’t manage timestamp-based versioning logic. If clients do not want to use the server-side management of this versioning logic, they omit if-unmodified-since.
Solving this issue is a special request from MLZ.

@cchndl
Copy link
Contributor

cchndl commented Sep 2, 2025

Looks good! I think its a good compromise to use modified-since only for now, as using Etags would require computing/storing them and be quite a lot more effort to implement as it is right now.

@Junjiequan to your concerns:

I see the issues you are trying to resolve here, but it would be nice to do concurrency control on the server side instead of relying on client

Well the problem is that the time between these requests may be quite long.

Doing locking in the backend is not a good Idea:
In my opinion you don't want to give out locks over resources on a REST endpoint, since its stateless and you are not guaranteed to have it released in a timely manner. And enforcing some timeouts would be bad for expensive operations on the client side and for performance. In the first case, the client would never be able to finish its computation and make the request before the lock is released again and in the latter one, waiting for these timeouts prevents other clients to do anything else.

For doing some locking directly on the database, I think that has the same issues. I don't think you want to hold a database lock while waiting for a client to make a follow-up post/patch request.

Optimistic concurrency with the conditional requests is the standard way to do this, at least to my knowledge.

Isn't if-unmodified-since an optional field in the header? that it has to be set from the client side to make it actually useful.

Yes, the client has to set it. For our case at MLZ at least, the clients we write internally for the automatic ingestion will use it. If its merged, I want to have a look at scitacean to implement support there and then most use-cases should be covered. As we probably won't require clients to send it with every request, there will of course still be the option around it, but well behaved clients can use it, which is a better situation than now.

Copy link
Member

@nitrosx nitrosx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please switch around the if statements as suggested.
Also make sure to include some API test to test backward compatibility and the functionality itself.

@nitrosx
Copy link
Member

nitrosx commented Sep 2, 2025

As long as that the feature is backward compatible and it does not effect the updates if it is not used, I'm open to accept it.

@joe-baudisch does this applies only to updates, correct?

@joe-baudisch
Copy link
Author

As long as that the feature is backward compatible and it does not effect the updates if it is not used, I'm open to accept it.

@joe-baudisch does this applies only to updates, correct?

@nitrosx : this only applies to this updates.

@Junjiequan
Copy link
Member

@cchndl
Apologies for my misinterpretation of optimistic concurrency control.
What I wanted to say is actually using ETag with versions to control concurrency requests instead of relying on the date to control patches. For a temporary solution it’s fine, but it is not a very clean solution compared to version ETag. First, there might be timing issues (rare, but still). Second, it’s a fix only for clients that are aware of concurrency issues. I could imagine concurrency requests causing data loss if not handled carefully. That being said, I agree that ETag takes more effort to implement, which is also best practice if I’m not mistaken.

@nitrosx
Copy link
Member

nitrosx commented Sep 2, 2025

Jumping in as a complete ignorant here.
how does an ETag be more or strict than a if-modified-before?
It all depends how the information is used by the client.

@cchndl
Copy link
Contributor

cchndl commented Sep 2, 2025

Apologies for my misinterpretation of optimistic concurrency control.

I mean you didn't misinterpreted it, I just wanted to clarify why this would be preferable to other options. I hope I was not overbearing!

What I wanted to say is actually using ETag with versions to control concurrency requests instead of relying on the date to control patches. For a temporary solution it’s fine, but it is not a very clean solution compared to version ETag. First, there might be timing issues (rare, but still). Second, it’s a fix only for clients that are aware of concurrency issues. I could imagine concurrency requests causing data loss if not handled carefully. That being said, I agree that ETag takes more effort to implement, which is also best practice if I’m not mistaken.

Full agreement with you there. Its good to have both supported in the end.

@cchndl
Copy link
Contributor

cchndl commented Sep 2, 2025

Jumping in as a complete ignorant here. how does an ETag be more or strict than a if-modified-before? It all depends how the information is used by the client.

@nitrosx
Maybe "more strict" is a little bit too strong, the good thing about Etags is that they are really a version "distinguisher", if that makes sense. If its not the same etag, something changed.

For the last-modified, if you read and afterwards in the same second an update comes, you wont see that. Its not likely but possible. The specification only says second-accuracy. For example, a last-modified-header that the etag libraty gives could be:
last-modified Wed, 30 Apr 2025 08:37:54 GMT
This should be fine for most resources but if you want to be sure for writes, the Etag is in that sense "better" to check for.

In the end, i believe having both would be ideal. The last-modified is "free" in the sense that most of the objects already track it, so it can't hurt. Depending on how we do it, it may also be cheaper to first check the timestamp and then the etag if both are given, but that I can't say now, thats something one would have to check when we build it.

As the etag library generates the Etag by hashing the response body (not the headers I think?), and not the object in the database, we would have to do some work there. This is a nice default behaviour for caching page reads and so on, but not for the objects themselves. We would have to hook Etag generation into there somewhere or save the Etag in the db in or next to the object.

Hope this helps.

@nitrosx
Copy link
Member

nitrosx commented Sep 3, 2025

@cchndl thank you for the explanation.
I think there are already ETag functionality in nest.js.
I found the following post that mention ETag in nest.js:

@cfelder
Copy link

cfelder commented Sep 3, 2025

If you do not allow sub-second updates on a resource there is no concurrency issue using If-Unmodified-Since.

Example: Assuming we have two clients A and B requesting the same resource R concurrently

  • Client A and B get the same last modified time t0
  • Client A sends an update with If-Unmodified-Since: t0
  • Client B sends an update with If-Unmodified-Since: t0
  • One Client will win, let's assume Client B wins b/c of a shorter round trip time
  • The Application receives update B first and updates its last modified time to t1 (server based timestamp)
  • The Application receives request from A and compares the last modified time t0 with t1 and responds with HTTP 412 Precondition Failed.

For our use case this simple implementation is good enough and clients can handle http 412 accordingly. I currently do not see a need for sub-second concurrent updates on our end.

@nitrosx
Copy link
Member

nitrosx commented Sep 4, 2025

I agree with @cfelder

@joe-baudisch
Copy link
Author

Please switch around the if statements as suggested. Also make sure to include some API test to test backward compatibility and the functionality itself.

Added a test that can be applied in a similar way to the other affected controllers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants