fix: prevent topic drift in writeup and harden LaTeX/plotting pipeline by YihanJIANG-lab · Pull Request #88 · SakanaAI/AI-Scientist-v2

YihanJIANG-lab · 2026-03-18T12:39:07Z

During long paper generation runs, the LLM tends to drift away from the original research topic. This PR adds:

Topic anchor terms extracted from the idea, checked after each reflection round
latex_matches_idea() validation to detect and retry drifted content
LaTeX compilation timeout increased from 30s to 300s (large papers need more time)
UTF-8 encoding fixes for LaTeX subprocess calls
Missing binary detection for pdflatex/bibtex
Code snippet validation in plotting to prevent malformed scripts
Compact plotting guidance to reduce unnecessary figure generation

A collection of fixes for the paper generation pipeline: ## Topic drift prevention (perform_icbinb_writeup.py) - Add get_topic_anchor_terms(): extract key terms from idea.json Name/Title/ Short Hypothesis to serve as topic anchors - Add latex_matches_idea(): verify generated LaTeX contains at least 2 anchor terms, preventing the LLM from drifting to unrelated topics - Writeup now retries once with explicit correction prompt if first draft drifts away from the original research idea - Add topic anchor reminder to each reflection round prompt - Add explicit anti-drift instruction in the writeup system prompt ## LaTeX compilation hardening (perform_icbinb_writeup.py) - Increase compile_latex timeout from 30s to 300s for complex papers - Add encoding='utf-8', errors='replace' to subprocess calls to prevent encoding crashes on Windows - Skip PDF compilation gracefully when pdflatex/bibtex not available - Handle FileNotFoundError for missing LaTeX binaries - Check reflection PDF exists before attempting VLM review - Fix summary file path resolution: auto-discover run log dir instead of hardcoding 'logs/0-run/' ## Review encoding fix (perform_llm_review.py) - Add encoding='utf-8' to load_review() json.load - Add errors='replace' to paper text file reading ## Plotting robustness (perform_plotting.py) - extract_code_snippet() now returns empty string instead of raw LLM text when no code block is found, preventing writing natural language as Python - Add validate_python_snippet() using compile() to catch syntax errors before running aggregator scripts - Reduce target figure count guidance (4-6 instead of ~12) for more reliable generation on resource-constrained setups - Add guidance to keep scripts compact and prefer robust numeric plots

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: prevent topic drift in writeup and harden LaTeX/plotting pipeline#88

fix: prevent topic drift in writeup and harden LaTeX/plotting pipeline#88
YihanJIANG-lab wants to merge 1 commit intoSakanaAI:mainfrom
YihanJIANG-lab:fix/writeup-topic-drift-and-latex

YihanJIANG-lab commented Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

YihanJIANG-lab commented Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant