Description
Problem
When agents run for extended periods, they accumulate a large history of messages that eventually fills up the LLM's context window, causing errors when the token limit is exceeded.
Proposed Solution
Implement automatic compaction of historical messages to prevent context window overflow:
-
Enhance the LLM abstraction to track and return:
- Total tokens used in the last completion request
- Maximum allowed tokens for the model/provider
-
Monitor token usage and trigger compaction when it approaches a configurable threshold (e.g., 50% of maximum)
-
When triggered, compact older messages (excluding recent ones, perhaps 10 messages back) into a single summarized message
-
Use a prompt like: "Provide a detailed but concise summary of our conversation above. Focus on information that would be helpful for continuing the conversation, including what we did, what we're doing, which files we're working on, and what we're going to do next."
Benefits
- Prevents context window overflow errors
- Maintains important context for agent operation
- Enables longer-running agent sessions
- Makes the system more robust for complex tasks
Questions
- Should the compaction threshold be configurable?
- How many recent messages should be preserved from compaction?
- Should we implement different compaction strategies for different agent types?