Skip to content

Error code: 404 - {'error': {'code': '404', 'message': 'Resource not found'}} #3

@LinZichuan

Description

@LinZichuan

Hi, when I run bash scripts/run_generalqa_qwen3_judge.sh, I get the following errors. Is it due to missing GPT-4o API keys?

(main_task pid=388584) ====================================================================================================
(main_task pid=388584) ====================================================================================================
(main_task pid=388584) Error code: 404 - {'error': {'code': '404', 'message': 'Resource not found'}}
(main_task pid=388584) messages: [{'role': 'system', 'content': [{'type': 'text', 'text': "You are an intelligent chatbot designed for evaluating the correct
ness of generative outputs for question-answer pairs.\nYour task is to compare the predicted answer with the correct answer and determine if they match meanin
gfully. Here's how you can accomplish the task:\n------\n##INSTRUCTIONS:\n- Focus on the meaningful match between the predicted answer and the correct answer.
\n- Consider synonyms or paraphrases as valid matches.\n- Evaluate the correctness of the prediction compared to the answer."}]}, {'role': 'user', 'content':
[{'type': 'text', 'text': 'I will give you a question related to an image and the following text as inputs:\n\n1. Question Related to the Image: Wh
o is the author of this book?\nAnswer the question with a short phrase.\n2. Ground Truth Answer: Antonio Graceffo\n3. Model Predicted Answer: Antonio
Graceffo\n\nYour task is to evaluate the model's predicted answer against the ground truth answer, based on the context provided by the question related to t
he image. Consider the following criteria for evaluation:\n- Relevance: Does the predicted answer directly address the question posed, considering the inf
ormation provided by the given question?\n- Accuracy: Compare the predicted answer to the ground truth answer. You need to evaluate from the following two
perspectives:\n(1) If the ground truth answer is open-ended, consider whether the prediction accurately reflects the information given in the ground truth wi
thout introducing factual inaccuracies. If it does, the prediction should be considered correct.\n(2) If the ground truth answer is a definitive answer, stric
tly compare the model's prediction to the actual answer. Pay attention to unit conversions such as length and angle, etc. As long as the results are consiste
nt, the model's prediction should be deemed correct.\nOutput Format:\nYour response should include an integer score indicating the correctness of the pre
diction: 1 for correct and 0 for incorrect. Note that 1 means the model's prediction strictly aligns with the ground truth, while 0 means it does not.\nThe f
ormat should be "Score: 0 or 1"'}]}]
(main_task pid=388584) ====================================================================================================
(main_task pid=388584) Warning: Failed after 3 attempts
(main_task pid=388584) Warning: Failed after 3 attempts
(main_task pid=388584) Warning: Failed after 3 attempts
(main_task pid=388584) Warning: Failed after 3 attempts
(main_task pid=388584) Warning: Failed after 3 attempts
(main_task pid=388584) Warning: Failed after 3 attempts
(main_task pid=388584) Warning: Failed after 3 attempts
(main_task pid=388584) Warning: Failed after 3 attempts
(main_task pid=388584) Warning: Failed after 3 attempts
(main_task pid=388584) top 10 time consuming in reward fn: []
(main_task pid=388584) there are 512 invalid samples in this batch: ['f3a54872-c197-478b-a3b9-2739c5cca43c', '7ca01723-2205-4575-9a78-852d888451f6', '58bf17d8
-3df0-4d16-9e89-bd00ec4d4e1d', 'dc26a6fa-ec1c-4cb7-b563-217c3c3fd459', '6fed4532-d31f-43dd-820a-6a53ff4ac96f']
(main_task pid=388584) total time: 23.259926319122314
(main_task pid=388584) ("Initial validation metrics: {'val/test_score/open_source/total': "
(main_task pid=388584) "np.float64(0.0), 'val/test_score/open_source/format_reward': "
(main_task pid=388584) "np.float64(0.0), 'val/test_score/open_source/acc_reward': np.float64(0.0)}")
(main_task pid=388584) step:0 - val/test_score/open_source/total:0.000 - val/test_score/open_source/format_reward:0.000 - val/test_score/open_source/acc_rewar
d:0.000

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions