Skip to content

Commit ed087a4

Browse files
sjarmakclaude
andcommitted
feat: US-015 - Starter tasks: Category E onboarding comprehension (2 tasks)
- CCX-onboard-041: scipy.stats API audit in pandas-dev/pandas (python-ml-stack fixture) - Oracle: 4 files with 'from scipy.stats import' — nanops.py, plotting/misc.py, plotting/hist.py, tests/groupby/test_reductions.py - eval.sh: file_set_match + provenance checks - Validity gate: VALID (gold=1.0, empty=0.0) - CCX-onboard-050-ds: End-to-end Deployment creation flow (kubernetes-ecosystem fixture) - Deep Search variant — open-ended cross-repo synthesis question - Oracle chain: kubernetes-client-go/deployment.go (Create) → kubernetes/pkg/registry/apps/deployment/strategy.go (PrepareForCreate) → etcd/server/storage/mvcc/kvstore_txn.go (Put) - eval.sh: dependency_chain + provenance checks - criteria.json: 4 AAA quality rubric criteria for narrative quality - deepsearch_relevant=true in selection file - Validity gate: VALID (gold=1.0, empty=0.0) - Both tasks registered in configs/selected_mcp_unique_tasks.json (8 total tasks) - Both tasks in benchmarks/ccb_mcp_onboarding/ Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1 parent 10a5a42 commit ed087a4

File tree

20 files changed

+1662
-1
lines changed

20 files changed

+1662
-1
lines changed
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
FROM ubuntu:22.04
2+
3+
ENV DEBIAN_FRONTEND=noninteractive
4+
5+
# Base tools
6+
RUN apt-get update && apt-get install -y --no-install-recommends \
7+
git \
8+
ca-certificates \
9+
curl \
10+
python3 \
11+
python3-pip \
12+
&& rm -rf /var/lib/apt/lists/*
13+
14+
WORKDIR /workspace
15+
16+
# Clone local checkout repos (baseline config: agent has local access to these)
17+
RUN git clone --depth 1 --branch 1.6.1 https://github.com/scikit-learn/scikit-learn /workspace/scikit-learn
18+
19+
# Initialize git identity for agent commits
20+
RUN git config --global user.email "agent@example.com" && \
21+
git config --global user.name "Agent" && \
22+
git config --global safe.directory '*'
23+
24+
# Create log directories
25+
RUN mkdir -p /logs/agent /logs/verifier
26+
27+
ENTRYPOINT []
Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
FROM ubuntu:22.04
2+
3+
ENV DEBIAN_FRONTEND=noninteractive
4+
5+
# Base tools
6+
RUN apt-get update && apt-get install -y --no-install-recommends \
7+
git \
8+
ca-certificates \
9+
curl \
10+
python3 \
11+
python3-pip \
12+
&& rm -rf /var/lib/apt/lists/*
13+
14+
WORKDIR /workspace
15+
16+
# sg_only mode: no repo clones — agent must use Sourcegraph MCP to access all repos
17+
# Mark sg_only mode so eval.sh can detect it
18+
RUN touch /tmp/.sg_only_mode
19+
20+
# Initialize git identity for agent commits
21+
RUN git config --global user.email "agent@example.com" && \
22+
git config --global user.name "Agent" && \
23+
git config --global safe.directory '*'
24+
25+
# Create log directories
26+
RUN mkdir -p /logs/agent /logs/verifier
27+
28+
ENTRYPOINT []
Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
# Onboarding Audit: scipy.stats API Call Sites in pandas
2+
3+
## Your Task
4+
5+
You are a new engineer joining the pandas-dev team. As part of onboarding, you've been asked
6+
to audit which pandas source files have runtime dependencies on `scipy.stats`. These call sites
7+
are important to document because they determine where pandas degrades gracefully when scipy
8+
is not installed.
9+
10+
**Specific question**: Which Python source files in `pandas-dev/pandas` contain a
11+
`from scipy.stats import` statement (i.e., directly import functions or classes from
12+
`scipy.stats` at runtime)?
13+
14+
Include files in any part of the `pandas-dev/pandas` codebase — production code **and** test
15+
files. Do not include files that only mention `scipy.stats` in docstrings or comments.
16+
17+
## Context
18+
19+
You are onboarding to a polyrepo Python scientific stack. The local `/workspace/` contains
20+
`scikit-learn/scikit-learn` as a reference implementation of a well-maintained scipy consumer.
21+
22+
**Note:** The `pandas-dev/pandas` repository is accessible via Sourcegraph MCP tools:
23+
- `pandas-dev/pandas` (dataframe-library)
24+
- `numpy/numpy` (array-computing)
25+
- `scipy/scipy` (scientific-computing)
26+
27+
## Output Format
28+
29+
Create a file at `/workspace/answer.json` with your findings:
30+
31+
```json
32+
{
33+
"files": [
34+
{"repo": "pandas-dev/pandas", "path": "relative/path/to/file.py"}
35+
],
36+
"text": "Narrative summary of your findings, citing the repos and file paths."
37+
}
38+
```
39+
40+
List all files that contain `from scipy.stats import`. Your answer is evaluated against
41+
a closed-world oracle — completeness matters.
42+
43+
## Evaluation
44+
45+
Your answer will be scored on:
46+
- **File recall and precision**: Did you find all pandas files that `from scipy.stats import`?
47+
- **Provenance**: Does your narrative cite the repo and file paths found?
Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
version = "1.0"
2+
3+
[metadata]
4+
name = "CCX-onboard-041"
5+
description = "Audit scipy.stats API call sites in pandas"
6+
license = "Apache-2.0"
7+
8+
[task]
9+
id = "CCX-onboard-041"
10+
repo = "scikit-learn/scikit-learn"
11+
category = "onboarding-comprehension"
12+
language = "python"
13+
difficulty = "medium"
14+
time_limit_sec = 900
15+
mcp_suite = "ccb_mcp_onboarding"
16+
use_case_id = 41
17+
repo_set_id = "python-ml-stack"
18+
mcp_unique = true
19+
20+
[verification]
21+
type = "eval"
22+
command = "bash /tests/eval.sh"
23+
24+
reward_type = "score"
25+
description = "Audit scipy.stats API call sites in pandas"
26+
27+
[environment]
28+
build_timeout_sec = 600.0
Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
#!/bin/bash
2+
# eval.sh — MCP-unique benchmark evaluator for CCX-onboard-041
3+
# Exit-code-first (SWE-Factory pattern):
4+
# exit 0 — agent produced useful output (composite score > 0)
5+
# exit 1 — total failure (composite score == 0 or missing answer)
6+
#
7+
# Writes /logs/verifier/reward.txt with the composite score [0.0, 1.0]
8+
9+
set -euo pipefail
10+
11+
TASK_ID="CCX-onboard-041"
12+
ANSWER_PATH="/workspace/answer.json"
13+
TASK_SPEC_PATH="/tests/task_spec.json"
14+
ORACLE_CHECKS="/tests/oracle_checks.py"
15+
REWARD_PATH="/logs/verifier/reward.txt"
16+
17+
mkdir -p /logs/verifier
18+
19+
echo "=== CCX-onboard-041 evaluator ==="
20+
echo "Task spec: $TASK_SPEC_PATH"
21+
echo "Answer: $ANSWER_PATH"
22+
echo ""
23+
24+
# sg_only mode guard: restore full repo if verifier wrapper exists
25+
if [ -f /tmp/.sg_only_mode ] && [ -f /tests/sgonly_verifier_wrapper.sh ]; then
26+
echo "sg_only mode: sourcing verifier wrapper..."
27+
source /tests/sgonly_verifier_wrapper.sh
28+
fi
29+
30+
# Verify answer file exists
31+
if [ ! -f "$ANSWER_PATH" ]; then
32+
echo "ERROR: answer.json not found at $ANSWER_PATH"
33+
echo "0.0" > "$REWARD_PATH"
34+
exit 1
35+
fi
36+
37+
# Validate answer is valid JSON
38+
if ! python3 -c "import json; json.load(open('$ANSWER_PATH'))" 2>/dev/null; then
39+
echo "ERROR: answer.json is not valid JSON"
40+
echo "0.0" > "$REWARD_PATH"
41+
exit 1
42+
fi
43+
44+
echo "answer.json found and valid JSON"
45+
46+
# Run oracle checks
47+
if [ ! -f "$ORACLE_CHECKS" ]; then
48+
echo "ERROR: oracle_checks.py not found at $ORACLE_CHECKS"
49+
echo "0.0" > "$REWARD_PATH"
50+
exit 1
51+
fi
52+
53+
echo "Running oracle checks..."
54+
SCORE=$(python3 "$ORACLE_CHECKS" --answer "$ANSWER_PATH" --spec "$TASK_SPEC_PATH" --verbose 2>&1 | tee /dev/stderr | tail -1)
55+
56+
# Validate score is a number
57+
if ! echo "$SCORE" | python3 -c "import sys; float(sys.stdin.read().strip())" 2>/dev/null; then
58+
echo "ERROR: oracle_checks.py did not return a valid score: $SCORE"
59+
echo "0.0" > "$REWARD_PATH"
60+
exit 1
61+
fi
62+
63+
echo ""
64+
echo "Composite score: $SCORE"
65+
echo "$SCORE" > "$REWARD_PATH"
66+
67+
# Exit based on score (SWE-Factory exit-code-first pattern)
68+
python3 -c "import sys; sys.exit(0 if float('$SCORE') > 0 else 1)"
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
{
2+
"files": [
3+
{"repo": "pandas-dev/pandas", "path": "pandas/core/nanops.py"},
4+
{"repo": "pandas-dev/pandas", "path": "pandas/plotting/_matplotlib/misc.py"},
5+
{"repo": "pandas-dev/pandas", "path": "pandas/plotting/_matplotlib/hist.py"},
6+
{"repo": "pandas-dev/pandas", "path": "pandas/tests/groupby/test_reductions.py"}
7+
],
8+
"text": "Found 4 files in pandas-dev/pandas that contain 'from scipy.stats import': pandas/core/nanops.py (imports kendalltau and spearmanr for correlation methods), pandas/plotting/_matplotlib/misc.py (imports gaussian_kde for KDE plots), pandas/plotting/_matplotlib/hist.py (imports gaussian_kde for histogram density plots), and pandas/tests/groupby/test_reductions.py (imports sem for test assertions). These files represent all runtime call sites where pandas depends on scipy.stats functions.",
9+
"_metadata": {
10+
"oracle_type": "file_set_match",
11+
"discovery_method": "sourcegraph_keyword_search",
12+
"query": "repo:^github.com/pandas-dev/pandas$ \"from scipy.stats import\" file:pandas/"
13+
}
14+
}

0 commit comments

Comments
 (0)