Skip to content

Conversation

@caizer0x
Copy link
Contributor

@caizer0x caizer0x commented Mar 25, 2025

Create multi turn complex evals

We want to check if the agent can perform a chain of actions.
Example:

    turns: [
      { input: "What's the price of KING?" },
      {
        input:
          "The mint address is 5eqNDjbsWL9hfAqUfhegTxgEa3XardzGdVAboMA4pump",
        expectedToolCall: {
          tool: "solana_token_data",
          params: "5eqNDjbsWL9hfAqUfhegTxgEa3XardzGdVAboMA4pump",
        },
      },
      {
        input: "Buy 20 tokens using USDC",
        expectedToolCall: {
          tool: "solana_trade",
          params: {
            inputAmount: 20,
            inputMint: "EPjFWdd5AufqSSqeM2qN1xzybapC8G4wEGGkZwyTDt1v",
            outputMint: "5eqNDjbsWL9hfAqUfhegTxgEa3XardzGdVAboMA4pump",
            slippageBps: 100,
          },
        },
      },
      {
        input: "And check my KING balance",
        expectedToolCall: {
          tool: "solana_balance_other",
          params: {
            walletAddress: "GZbQmKYYzwjP3nbdqRWPLn98ipAni9w5eXMGp7bmZbGB",
            tokenAddress: "5eqNDjbsWL9hfAqUfhegTxgEa3XardzGdVAboMA4pump",
          },
        },
      },

This test checks if the agent responds appropriately to the 4 user queries.

  1. The agent should ask for the mint address to find the right token.
  2. Once provided it should call solana_token_data and get the token information
  3. The user asked it to buy the token using 20 USDC. The agent has all the information needed to execute the trade in its context and should use solana_trade
  4. The agent should get the user's balance for the token using solana_balance_other

Changes Made

This PR adds the following changes:

Implementation Details

  • Added runComplexEvals function that handles multi turn evals

Additional Notes

Checklist

  • I have tested these changes locally
  • I have added the prompt used to test it

@caizer0x caizer0x force-pushed the complex-evals branch 3 times, most recently from 64869a1 to 4e68162 Compare March 26, 2025 16:37
@caizer0x caizer0x closed this Mar 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants