Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 29 additions & 0 deletions skills/bfts-config-prep/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
---
name: bfts-config-prep
description: Prepare a run directory and BFTS config for experiments from an idea JSON + idea.md. Use before running experiment-bfts-runner.
---

# BFTS Config Prep

## Overview
Create a run folder with a timestamped name, copy a BFTS config template, and fill in required paths (desc_file, data_dir, log_dir, workspace_dir).

## Workflow
1. **Ensure idea files exist**
- `idea.json` follows `references/idea.schema.json`.
- `idea.md` generated by idea-to-markdown.
2. **Prepare run folder**
- `UV_CACHE_DIR=/tmp/uv-cache XDG_CACHE_HOME=/tmp uv run --with pyyaml -s scripts/prep_bfts_config.py --idea-json idea.json --idea-md idea.md --out-root runs`

## Outputs
- `runs/<timestamp>_<idea_name>/`
- `idea.json`, `idea.md`, `bfts_config.yaml`
- `data/`, `logs/`, `workspaces/`

## Safeguards
- Does not modify source idea files.
- Writes only under `--out-root`.

## References
- Idea schema: `references/idea.schema.json`
- BFTS template: `references/bfts_config_template.yaml`
4 changes: 4 additions & 0 deletions skills/bfts-config-prep/agents/openai.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
interface:
display_name: "BFTS Config Prep"
short_description: "Prepare run dirs + BFTS config"
default_prompt: "Create a timestamped run directory with idea.json/idea.md and a configured bfts_config.yaml."
87 changes: 87 additions & 0 deletions skills/bfts-config-prep/references/bfts_config_template.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
# path to the task data directory
data_dir: "data"
preprocess_data: False

goal: null
eval: null

log_dir: logs
workspace_dir: workspaces

# whether to copy the data to the workspace directory (otherwise it will be symlinked)
# copying is recommended to prevent the agent from accidentally modifying the original data
copy_data: True

exp_name: run # a random experiment name will be generated if not provided

# settings for code execution
exec:
timeout: 3600
agent_file_name: runfile.py
format_tb_ipython: False

generate_report: True
# LLM settings for final report from journal
report:
model: gpt-4o-2024-11-20
temp: 1.0

experiment:
num_syn_datasets: 1

debug:
stage4: False

# agent hyperparams
agent:
type: parallel
num_workers: 4
stages:
stage1_max_iters: 20
stage2_max_iters: 12
stage3_max_iters: 12
stage4_max_iters: 18
# how many improvement iterations to run
steps: 5 # if stage-specific max_iters are not provided, the agent will use this value for all stages
# whether to instruct the agent to use CV (set to 1 to disable)
k_fold_validation: 1
multi_seed_eval:
num_seeds: 3 # should be the same as num_workers if num_workers < 3. Otherwise, set it to be 3.
# whether to instruct the agent to generate a prediction function
expose_prediction: False
# whether to provide the agent with a preview of the data
data_preview: False

# LLM settings for coding
code:
model: anthropic.claude-3-5-sonnet-20241022-v2:0
temp: 1.0
max_tokens: 12000

# LLM settings for evaluating program output / tracebacks
feedback:
model: gpt-4o-2024-11-20
# gpt-4o
temp: 0.5
max_tokens: 8192

vlm_feedback:
model: gpt-4o-2024-11-20
temp: 0.5
max_tokens: null

search:
max_debug_depth: 3
debug_prob: 0.5
num_drafts: 3

# Options for summarizing findings and selecting the best node
# If not specified, the default behavior will be used.

# summary:
# model: gpt-4o
# temp: 0.3

# select_node:
# model: gpt-4o
# temp: 0.3
34 changes: 34 additions & 0 deletions skills/bfts-config-prep/references/idea.schema.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "AI Scientist Idea",
"type": "object",
"required": [
"Name",
"Title",
"Short Hypothesis",
"Related Work",
"Abstract",
"Experiments",
"Risk Factors and Limitations"
],
"properties": {
"Name": {"type": "string", "pattern": "^[a-z0-9_-]+$"},
"Title": {"type": "string"},
"Short Hypothesis": {"type": "string"},
"Related Work": {"type": "string"},
"Abstract": {"type": "string"},
"Experiments": {
"oneOf": [
{"type": "string"},
{"type": "array", "items": {"type": ["string", "object"]}}
]
},
"Risk Factors and Limitations": {
"oneOf": [
{"type": "string"},
{"type": "array", "items": {"type": "string"}}
]
}
},
"additionalProperties": true
}
108 changes: 108 additions & 0 deletions skills/bfts-config-prep/scripts/prep_bfts_config.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
#!/usr/bin/env python3
"""
Prepare a BFTS run directory and config from idea JSON + idea.md.
"""
from __future__ import annotations

import argparse
import json
import os
from datetime import datetime
from pathlib import Path

try:
import yaml # type: ignore
except Exception:
yaml = None


def _load_json(path: Path) -> dict:
try:
return json.loads(path.read_text(encoding="utf-8"))
except FileNotFoundError:
raise SystemExit(f"[ERROR] File not found: {path}")
except json.JSONDecodeError as e:
raise SystemExit(f"[ERROR] Invalid JSON: {path}: {e}")


def _extract_idea_name(obj: dict) -> str:
if "Name" in obj and isinstance(obj["Name"], str):
return obj["Name"].strip()
if "idea" in obj and isinstance(obj["idea"], dict) and isinstance(obj["idea"].get("Name"), str):
return obj["idea"]["Name"].strip()
return "idea"


def main() -> int:
ap = argparse.ArgumentParser(description="Prepare BFTS run directory and config.")
ap.add_argument("--idea-json", required=True, help="Path to idea JSON.")
ap.add_argument("--idea-md", required=True, help="Path to idea markdown.")
ap.add_argument("--out-root", required=True, help="Root directory for runs.")
ap.add_argument(
"--config-template",
default=None,
help="BFTS config template YAML (default: references/bfts_config_template.yaml).",
)
args = ap.parse_args()

if yaml is None:
raise SystemExit("[ERROR] pyyaml is required. Try: uv run --with pyyaml -s scripts/prep_bfts_config.py --help")

idea_json = Path(args.idea_json).expanduser().resolve()
idea_md = Path(args.idea_md).expanduser().resolve()
out_root = Path(args.out_root).expanduser().resolve()

if not idea_json.exists():
raise SystemExit(f"[ERROR] idea JSON not found: {idea_json}")
if not idea_md.exists():
raise SystemExit(f"[ERROR] idea markdown not found: {idea_md}")

obj = _load_json(idea_json)
if isinstance(obj, list) and obj:
name = _extract_idea_name(obj[0] if isinstance(obj[0], dict) else {})
elif isinstance(obj, dict):
name = _extract_idea_name(obj)
else:
name = "idea"

ts = datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
run_dir = out_root / f"{ts}_{name}"
run_dir.mkdir(parents=True, exist_ok=True)

data_dir = run_dir / "data"
logs_dir = run_dir / "logs"
workspaces_dir = run_dir / "workspaces"
for d in (data_dir, logs_dir, workspaces_dir):
d.mkdir(parents=True, exist_ok=True)

# Copy idea files
(run_dir / "idea.json").write_text(idea_json.read_text(encoding="utf-8"), encoding="utf-8")
(run_dir / "idea.md").write_text(idea_md.read_text(encoding="utf-8"), encoding="utf-8")

# Load template
if args.config_template:
tpl = Path(args.config_template).expanduser().resolve()
else:
tpl = Path(__file__).parent.parent / "references" / "bfts_config_template.yaml"
if not tpl.exists():
raise SystemExit(f"[ERROR] Config template not found: {tpl}")

config = yaml.safe_load(tpl.read_text(encoding="utf-8"))
if not isinstance(config, dict):
raise SystemExit("[ERROR] Invalid config template format.")

config["desc_file"] = str((run_dir / "idea.md").resolve())
config["data_dir"] = str(data_dir)
config["log_dir"] = str(logs_dir)
config["workspace_dir"] = str(workspaces_dir)

out_cfg = run_dir / "bfts_config.yaml"
out_cfg.write_text(yaml.safe_dump(config, sort_keys=False), encoding="utf-8")

print(f"[OK] Prepared run directory: {run_dir}")
print(f"[OK] Wrote config: {out_cfg}")
return 0


if __name__ == "__main__":
raise SystemExit(main())
35 changes: 35 additions & 0 deletions skills/citation-harvest/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
---
name: citation-harvest
description: Query Semantic Scholar to collect citations and generate a deduplicated BibTeX file. Offline by default.
---

# Citation Harvest

## Overview
Collect citations from Semantic Scholar using query strings and output a JSON bundle plus a BibTeX file.

## Workflow
1. Prepare queries (one per line)
2. Run the harvester
~~~bash
UV_CACHE_DIR=/tmp/uv-cache XDG_CACHE_HOME=/tmp uv run -s scripts/citation_harvest.py \
--online --in queries.txt --out-json citations.json --out-bib citations.bib
~~~

## Inputs
- --in: text file with one query per line (optional)
- --query: repeatable query strings
- --limit: results per query (default 5)
- --online: enable network calls (required)

## Outputs
- citations.json
- citations.bib

## Safeguards
- Offline by default; --online required.
- No uploads; only queries sent to Semantic Scholar.
- API key must be provided via S2_API_KEY env var if needed.

## References
- Safeguards: references/safeguards.md
4 changes: 4 additions & 0 deletions skills/citation-harvest/agents/openai.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
interface:
display_name: "Citation Harvest"
short_description: "Query Semantic Scholar and output deduplicated BibTeX"
default_prompt: "Gather citations using provided queries, deduplicate, and produce citations.json and citations.bib."
3 changes: 3 additions & 0 deletions skills/citation-harvest/references/safeguards.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
- Do not claim novelty based solely on sparse results.
- Record query strings and the query date in your notes.
- Do not upload private data; only send keyword queries.
Loading