Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
188 changes: 187 additions & 1 deletion AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ You're working on the source code for `miniwdl`, the Workflow Description Langua

Read `CONTRIBUTING.md` for an overview of the codebase and development workflow. In particular:
- If you're not started in a suitable virtualenv, bootstrap one under `venv/`.
- Python code should be linted with `mypy`, `ruff check --fix`, and `ruff format`.
- Formatting and linting (apply routinely for Python diffs): `make pretty` and `make check`.
- Testing guidelines:
- The test suite assumes access to the Internet and dockerd (via unix socket), so make sure you have the necessary user permissions before proceeding.
- While iterating on a task, it's usually best to run a targeted set of test cases that turns around quickly.
Expand All @@ -16,3 +16,189 @@ These development tutorials under `docs/` introduce a few common ways the codeba
- `wdlviz.md` -- generating graphviz diagrams from WDL source code
- `add_functions.md `-- adding new functions to the standard library
- `assert.md` -- adding a new WDL language feature, with parsing, type-checking, and runtime execution

---

# Adding WDL 1.2 Standard Library Functions

This guide provides comprehensive instructions for adding new standard library functions from WDL 1.2. The focus is on the several new functions still needing implementation:

**String functions:** `find`, `matches`
**File functions:** `join_paths`
**Array functions:** `contains`, `chunk`
**Map functions:** `contains_key`, `values`
**Operators:** `**` (exponentiation)

## Quick Reference

- **Source code:** `WDL/StdLib.py`
- **Unit tests:** `tests/test_5stdlib.py`
- **Spec tests:** `tests/spec_tests/` (extracted from WDL specification)
- **Tutorial:** `docs/add_functions.md`
- **WDL 1.2 spec:** `spec/wdl-1.2/SPEC.md` (stdlib starts at line 7375)
- **Changelog:** `spec/wdl-1.2/CHANGELOG.md`

## Implementation Process

### 1. Review the Specification

Start by reading the function's definition in `spec/wdl-1.2/SPEC.md`:
- Signature (parameter types, return type)
- Behavior and semantics
- Example WDL code with expected inputs/outputs
- Note: **The spec has many bugs!** Check `tests/spec_tests/config.yaml` for known issues marked as `xfail`

### 2. Choose Implementation Approach

Functions in `WDL/StdLib.py` fall into three categories:

#### a) Static Functions (simple, fixed types)
Use the `@static()` decorator for functions with fixed argument and return types:

```python
@static([Type.Float()], Type.Int())
def floor(v: Value.Float) -> Value.Int:
return Value.Int(math.floor(v.value))
```

Key points:
- First argument to `@static()` is list of parameter types
- Second argument is return type
- Implementation receives `Value.*` objects, must return `Value.*` objects
- Include PEP 484 type hints for Python types
- Handle type coercion automatically via the decorator

#### b) Polymorphic Functions (varying types, simple logic)
Use `StaticFunction` class when types vary but type-checking logic is straightforward:

```python
self.min = _ArithmeticOperator("min", lambda l, r: min(l, r))
```

#### c) Complex Polymorphic Functions (custom type inference)
Subclass `EagerFunction` when you need custom type inference:

```python
class _MyFunction(EagerFunction):
def infer_type(self, expr: "Expr.Apply") -> Type.Base:
# Validate arguments and determine return type
# Raise Error.WrongArity, Error.StaticTypeMismatch, etc.
return return_type

def _call_eager(self, expr: "Expr.Apply", arguments: List[Value.Base]) -> Value.Base:
# Implement runtime evaluation
# arguments are already evaluated
# Raise Error.EvalError for runtime failures
return result_value
```

Then instantiate in `Base.__init__`:
```python
self.my_function = _MyFunction()
```

### 3. Version Gating

WDL 1.2 functions should only be available in version 1.2+:

```python
if self.wdl_version not in ["draft-2", "1.0"]:
# WDL 1.1+ functions
self.min = ...

# For WDL 1.2+ (assuming version string convention continues):
if self.wdl_version not in ["draft-2", "1.0", "1.1"]:
# WDL 1.2+ functions
self.find = ...
```

Check existing version gating patterns around line 141 in `StdLib.py`.

### 4. Error Handling

Use appropriate error types from `WDL.Error`:

**Static (type-checking) errors:**
- `Error.WrongArity(expr, expected_count)` - wrong number of arguments
- `Error.StaticTypeMismatch(expr, expected_type, actual_type, context)` - type doesn't match
- `Error.IndeterminateType(expr, message)` - can't infer type (e.g., empty array)

**Runtime errors:**
- `Error.EvalError(expr, message)` - runtime evaluation failures
- `Error.InputError(message)` - invalid input data
- `Error.OutOfBounds(expr, message)` - index out of range, key not found

### 5. File System Access

Functions that read/write files must use:
- `self._devirtualize_filename(filename)` - convert WDL File path to local path for reading
- `self._virtualize_filename(filename)` - convert local path to WDL File value for output
- `self._write_dir` - directory for creating temporary files

Example:
```python
@static([Type.File()], Type.String())
def my_read_func(file: Value.File) -> Value.String:
with open(self._devirtualize_filename(file.value), "r") as f:
content = f.read()
return Value.String(content)
```

### 6. Testing Strategy

#### a) Unit Tests (`tests/test_5stdlib.py`)

Add tests using the `_test_task()` helper which:
- Parses WDL task source
- Type-checks the document
- Executes in a Docker container
- Returns outputs as JSON

```python
def test_my_function(self):
# Basic functionality
outputs = self._test_task(R"""
version 1.2
task test_my_function {
command {}
output {
String result = my_function("input")
}
}
""")
self.assertEqual(outputs["result"], "expected_value")

# Error cases
self._test_task(R"""
version 1.2
task test_error {
command {}
output {
String result = my_function(42) # wrong type
}
}
""", expected_exception=WDL.Error.StaticTypeMismatch)
```

Test coverage should include:
- Basic functionality with typical inputs
- Edge cases (empty arrays, null values, boundary conditions)
- Type coercion scenarios
- Error conditions (wrong arity, type mismatches, runtime errors)
- Optional parameters if applicable
- Version gating (ensure unavailable in WDL 1.0/1.1)

#### b) Spec Tests

The spec tests in `tests/spec_tests/` are auto-extracted from the WDL specification. Check:
1. `tests/spec_tests/config.yaml` for the function's test status
2. Many tests are marked `xfail` due to **bugs in the spec itself**
3. When your implementation is ready, remove the test from the `xfail` list
4. Document any spec bugs you discover in comments in `config.yaml`

Common spec bugs to watch for:
- Wrong expected output values
- Typos in test code
- Missing struct/type definitions
- Incorrect syntax
- Python-style ternary instead of WDL ternary
146 changes: 143 additions & 3 deletions WDL/StdLib.py
Original file line number Diff line number Diff line change
Expand Up @@ -143,11 +143,16 @@ def sep(sep: Value.String, iterable: Value.Array) -> Value.String:
self.max = _ArithmeticOperator("max", lambda l, r: max(l, r))
self.quote = _Quote()
self.squote = _Quote(squote=True)
self.keys = _Keys()
self.keys = _Keys(wdl_version=self.wdl_version)
self.as_map = _AsMap()
self.as_pairs = _AsPairs()
self.collect_by_key = _CollectByKey()

if self.wdl_version not in ["draft-2", "1.0", "1.1"]:
# WDL 1.2+ functions
self.contains = _Contains()
self.values = _Values()

def _read(self, parse: Callable[[str], Value.Base]) -> Callable[[Value.File], Value.Base]:
"generate read_* function implementation based on parse"

Expand Down Expand Up @@ -1051,6 +1056,84 @@ def _call_eager(self, expr: "Expr.Apply", arguments: List[Value.Base]) -> Value.


class _Keys(EagerFunction):
# Array[P] keys(Map[P, Y])
# Array[String] keys(Struct|Object) [WDL 1.2+]
# Returns an array of keys from a Map, Struct, or Object

def __init__(self, wdl_version: str):
super().__init__()
self.wdl_version = wdl_version

def infer_type(self, expr: "Expr.Apply") -> Type.Base:
if len(expr.arguments) != 1:
raise Error.WrongArity(expr, 1)
arg0ty = expr.arguments[0].type

# Accept Map, StructInstance, or Object
if isinstance(arg0ty, Type.Map):
if expr._check_quant and arg0ty.optional:
raise Error.StaticTypeMismatch(
expr.arguments[0], Type.Map((Type.Any(), Type.Any())), arg0ty
)
# For Map[P, Y], return Array[P]
return Type.Array(arg0ty.item_type[0].copy())
elif isinstance(arg0ty, (Type.StructInstance, Type.Object)):
# Struct/Object support added in WDL 1.2
if self.wdl_version in ["draft-2", "1.0", "1.1"]:
raise Error.StaticTypeMismatch(
expr.arguments[0],
Type.Map((Type.Any(), Type.Any())),
arg0ty,
"keys() does not accept Struct or Object in WDL version {}".format(
self.wdl_version
),
)
if expr._check_quant and arg0ty.optional:
raise Error.StaticTypeMismatch(expr.arguments[0], Type.StructInstance(""), arg0ty)
# For Struct or Object, return Array[String]
return Type.Array(Type.String())
else:
raise Error.StaticTypeMismatch(
expr.arguments[0],
Type.Map((Type.Any(), Type.Any())),
arg0ty,
"keys() requires Map, Struct, or Object",
)

def _call_eager(self, expr: "Expr.Apply", arguments: List[Value.Base]) -> Value.Base:
arg = arguments[0]

if isinstance(arg, Value.Map):
mapty = arg.type
assert isinstance(mapty, Type.Map)
return Value.Array(
mapty.item_type[0], [p[0].coerce(mapty.item_type[0]) for p in arg.value], expr
)
elif isinstance(arg, Value.Struct):
# For structs, return keys in the order they appear in the struct definition.
# The struct type's members dict maintains insertion order (Python 3.7+).
#
# Note: We return ALL keys including optional members, even if they are set to None.
# This is consistent with the spec's distinction (for contains_key) between "presence"
# and "defined": optional members are present in the struct but may not be defined.
# The Value.Struct constructor ensures all optional members exist in the value dict
# (populated with Null() if omitted in the literal), so all members are always present.
struct_ty = arg.type
if isinstance(struct_ty, Type.StructInstance) and struct_ty.members:
# Use the order from the type definition
keys = list(struct_ty.members.keys())
else:
# Fallback to value dict order (for Object type)
keys = list(arg.value.keys())
return Value.Array(Type.String(), [Value.String(k) for k in keys], expr)
else:
raise Error.EvalError(expr, f"keys() received unexpected argument type: {type(arg)}")


class _Values(EagerFunction):
# Array[Y] values(Map[P, Y])
# Returns an array of values from a Map

def infer_type(self, expr: "Expr.Apply") -> Type.Base:
if len(expr.arguments) != 1:
raise Error.WrongArity(expr, 1)
Expand All @@ -1059,14 +1142,16 @@ def infer_type(self, expr: "Expr.Apply") -> Type.Base:
raise Error.StaticTypeMismatch(
expr.arguments[0], Type.Map((Type.Any(), Type.Any())), arg0ty
)
return Type.Array(arg0ty.item_type[0].copy())
# For Map[P, Y], return Array[Y]
return Type.Array(arg0ty.item_type[1].copy())

def _call_eager(self, expr: "Expr.Apply", arguments: List[Value.Base]) -> Value.Base:
assert isinstance(arguments[0], Value.Map)
mapty = arguments[0].type
assert isinstance(mapty, Type.Map)
# Return the values (p[1]) from the map
return Value.Array(
mapty.item_type[0], [p[0].coerce(mapty.item_type[0]) for p in arguments[0].value], expr
mapty.item_type[1], [p[1].coerce(mapty.item_type[1]) for p in arguments[0].value], expr
)


Expand Down Expand Up @@ -1158,3 +1243,58 @@ def _call_eager(self, expr: "Expr.Apply", arguments: List[Value.Base]) -> Value.
raise Error.EvalError(expr, "duplicate keys supplied to as_map(): " + str(k))
singletons.append((k, vs.value[0]))
return Value.Map((collectedty.item_type[0], arrayty.item_type), singletons, expr)


class _Contains(EagerFunction):
# Boolean contains(Array[X], X)
# Determine whether an array contains a specified value

def infer_type(self, expr: "Expr.Apply") -> Type.Base:
if len(expr.arguments) != 2:
raise Error.WrongArity(expr, 2)

arr_ty = expr.arguments[0].type
if not isinstance(arr_ty, Type.Array) or (expr._check_quant and arr_ty.optional):
raise Error.StaticTypeMismatch(expr.arguments[0], Type.Array(Type.Any()), arr_ty)

# For empty arrays (Array[Any]), we can accept any element type
# The runtime will handle empty arrays correctly (always returns false)
if not isinstance(arr_ty.item_type, Type.Any):
# Spec defines two signatures:
# - Boolean contains(Array[P], P)
# - Boolean contains(Array[P?], P?)
# The element type must match the array item type including optionality,
# with standard coercion allowed (T can coerce to T?)
elem_ty = expr.arguments[1].type

# Check that element type is compatible with array item type
if not elem_ty.equatable(arr_ty.item_type):
raise Error.StaticTypeMismatch(
expr.arguments[1],
arr_ty.item_type,
elem_ty,
"for contains() element argument",
)

# Additional check: if array item type is non-optional, element must not be optional
# (but T can coerce to T? is allowed, so T into Array[T?] is ok)
if not arr_ty.item_type.optional and elem_ty.optional:
raise Error.StaticTypeMismatch(
expr.arguments[1],
arr_ty.item_type,
elem_ty,
"for contains() element argument - cannot check optional value in non-optional array",
)

return Type.Boolean()

def _call_eager(self, expr: "Expr.Apply", arguments: List[Value.Base]) -> Value.Base:
arr = arguments[0]
assert isinstance(arr, Value.Array)
elem = arguments[1]

# Use Value.__eq__ for proper equality comparison
for item in arr.value:
if elem == item:
return Value.Boolean(True)
return Value.Boolean(False)
2 changes: 0 additions & 2 deletions tests/spec_tests/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -40,9 +40,7 @@ wdl-1.2:
- join_paths_task.wdl
- one_mount_point_task.wdl
- string_to_file.wdl
- test_contains.wdl
- test_find_task.wdl
- test_keys.wdl
- test_matches_task.wdl
- test_runtime_info_task.wdl
- test_select_first.wdl
Expand Down
Loading
Loading