Open AI has an automatic internal prompt cache which can be influences a bit. Within the responses, it reports how many tokens were a cache hit.
See https://platform.openai.com/docs/guides/prompt-caching
ToDo:
- Track cache hits when using OpenAI
- The tracking stats and tracking GUI needs to be aligned with the cache tracking already done for Anthropic. Caching also has different effects on price and rate limits than it has for Anthropic.
- no extra costs for caching, still affects rate limits,
- Check whether we can improve cache hits when using OpenAI