Skip to content

Implementing secrets env egress schema #430

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jul 11, 2025

Conversation

aryanjassal
Copy link
Contributor

@aryanjassal aryanjassal commented Jun 24, 2025

Description

A schema will dictate which secrets need to be exported and which ones aren't, better enforcing POLP, and make secrets management easier.

Issues Fixed

Tasks

  • 1. Use ajv to validate a schema
  • 2. Use the schema to export a set of secrets
  • 3. Follow POLP to only get required secrets
  • 4. Add schema composition

Final checklist

  • Domain specific tests
  • Full tests
  • Updated inline-comment documentation
  • Lint fixed
  • Squash and rebased
  • Sanity check the final build

@aryanjassal aryanjassal self-assigned this Jun 24, 2025
Copy link

linear bot commented Jun 24, 2025

ENG-638

@aryanjassal aryanjassal changed the title Implementing on env egress schema Implementing secrets env egress schema Jun 30, 2025
@aryanjassal aryanjassal force-pushed the feature-secrets-env-egress-schema branch from 52ceb74 to 09cdc52 Compare June 30, 2025 04:17
@aryanjassal
Copy link
Contributor Author

Ajv does not support automatically removing additional properties from a composed schema. It has an open issue addressing a way to do so (see ajv-validator/ajv#1346), but as of commenting, this is not yet implemented.

Instead, Ajv will be used without removeAdditional and be just used for validation. Once the schema has been validated, I will need to manually extract the relevant properties of the schema as follows, then use the results to filter env vars.

async function loadEnvSchema(schemaPath) {
  const schema = await $RefParser.bundle(schemaPath);
  const props = new Set();
  const required = new Set();
  const defaults = {};
  (function extract(s) {
    if (!s || typeof s !== 'object') return;
    // Collect properties and their defaults
    if (s.properties) {
      for (const [k, p] of Object.entries(s.properties)) {
        props.add(k);
        if (p.default != null) {
          defaults[k] = p.default;
        }
      }
    }
    // Collect required properties
    if (s.required) s.required.forEach((r) => required.add(r));
    // Process composition keywords
    ['allOf', 'anyOf', 'oneOf'].forEach(
      (k) => Array.isArray(s[k]) && s[k].forEach(extract),
    );
  })(schema);
  return {
    allKeys: [...props],
    requiredKeys: [...required],
    defaults: defaults,
  };
}

Finally, this can be processed as such to yield a populated env object.

const { allKeys, requiredKeys, defaults } = await loadEnvSchema('path/to/schema.json');
const env = {};
for (const key of allKeys) {
  let value = process.env[key];
  if (value == null && defaults[key] !== undefined) {
    value = defaults[key];
  }
  // Note: An explicit empty string in process.env (value === '') will take precedence over a default.
  if (requiredKeys.includes(key) && (value == null || value === '')) {
    throw new Error(
      `Required environment variable ${key} is missing, empty, or its default is invalid.`,
    );
  }
  if (value !== undefined) {
    // Boundary env values must be strings, otherwise it can cause type errors
    env[key] = value.toString();
  }
}

// Example usage
const someSecretValue = env['SECRET_VALUE'];

@CMCDragonkai
Copy link
Member

Ajv does not support automatically removing additional properties from a composed schema. It has an open issue addressing a way to do so (see ajv-validator/ajv#1346), but as of commenting, this is not yet implemented.

Instead, Ajv will be used without removeAdditional and be just used for validation. Once the schema has been validated, I will need to manually extract the relevant properties of the schema as follows, then use the results to filter env vars.

async function loadEnvSchema(schemaPath) {
  const schema = await $RefParser.bundle(schemaPath);
  const props = new Set();
  const required = new Set();
  const defaults = {};
  (function extract(s) {
    if (!s || typeof s !== 'object') return;
    // Collect properties and their defaults
    if (s.properties) {
      for (const [k, p] of Object.entries(s.properties)) {
        props.add(k);
        if (p.default != null) {
          defaults[k] = p.default;
        }
      }
    }
    // Collect required properties
    if (s.required) s.required.forEach((r) => required.add(r));
    // Process composition keywords
    ['allOf', 'anyOf', 'oneOf'].forEach(
      (k) => Array.isArray(s[k]) && s[k].forEach(extract),
    );
  })(schema);
  return {
    allKeys: [...props],
    requiredKeys: [...required],
    defaults: defaults,
  };
}

Finally, this can be processed as such to yield a populated env object.

const { allKeys, requiredKeys, defaults } = await loadEnvSchema('path/to/schema.json');
const env = {};
for (const key of allKeys) {
  let value = process.env[key];
  if (value == null && defaults[key] !== undefined) {
    value = defaults[key];
  }
  // Note: An explicit empty string in process.env (value === '') will take precedence over a default.
  if (requiredKeys.includes(key) && (value == null || value === '')) {
    throw new Error(
      `Required environment variable ${key} is missing, empty, or its default is invalid.`,
    );
  }
  if (value !== undefined) {
    // Boundary env values must be strings, otherwise it can cause type errors
    env[key] = value.toString();
  }
}

// Example usage
const someSecretValue = env['SECRET_VALUE'];

One might consider whether we want to do strictly what the schema says. But remember there's a schema option called https://json-schema.org/understanding-json-schema/reference/object#additionalproperties.

Without that, we should understand schemas as "gradual schemas". Any non-mentioned properties should be allowed to just pass through.

@CMCDragonkai
Copy link
Member

@aryanjassal
Copy link
Contributor Author

One might consider whether we want to do strictly what the schema says. But remember there's a schema option called https://json-schema.org/understanding-json-schema/reference/object#additionalproperties.

Without that, we should understand schemas as "gradual schemas". Any non-mentioned properties should be allowed to just pass through.

Oh and you should check this https://json-schema.org/understanding-json-schema/reference/object#unevaluatedproperties

Ah, so setting additionalProperties to false will enforce any additional fields to fail schema validation, and unevaluatedProperties also seems promising. I will need to look into this.

So far, I manually unwrapped the schema to extract the relevant keys to avoid requesting unnecessary keys, then validated the final result while applying defaults before returning that as the final environment variables.

So far, if the schema was provided, the all secrets will be filtered against that, and all other secrets will be trimmed. It should be a fairly trivial change to check additional properties and allow all secrets or strictly only the requested ones. I feel the behaviour of unevaluatedProperties is expected to be the default, and there is no easy way to do that in Ajv apart from adding that as an actual option in the schema, so I manually unwrapped it, which gave me more freedom with the keys and properties without relying on this property.

I will look into them both, however, and get my progress reviewed to make sure I'm on the right track.

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Jul 2, 2025

The default expected behaviour is to allow any additional properties and only apply the rules to that which is specified.

However this is why you want to use a pre-existing json schema parser that fully understands ALL the rules, and is up to date with respect to the json schema spec (they update the spec sometimes). Do not reinvent the wheel here. You're just wasting time if that's the case. It's far more efficient to use a well known json schema tool.

The only time to reinvent the wheel, is when you're doing frontier work and you need precise control. Be humble. Learn when speed and stability is the priority. Learn when R&D is the priority. Switch strategies, learn how to minmax.

Just remember ajv doesn't understand how to "resolve" references. It requires additional tooling to do so, that's why I used the simplified library in my scripts. However for production tooling, I believe you're supposed to use ajv resolver.

Look I asked this question with chatgpt, and it gives a perfect answer:

Context & Assumptions

Assumptions:

  • You want to use AJV (Another JSON Schema Validator) for validating JSON objects against JSON Schemas, potentially including remote/local $ref references.

  • You want fully dynamic, production-grade resolution: nested, cyclic, local and remote refs, all up-to-date with the latest JSON Schema (2020-12 and 2023-12) spec.

  • You are not interested in writing your own resolver logic or patching together partial solutions.

  • You require a solution that's "SOTA" (State of the Art) as of July 2025—with maximal ecosystem support, stability, and compliance.


1. AJV Core Capabilities and Limitations (as of 2025)

  • AJV by itself supports local references ($ref) within the same schema or within pre-loaded schemas.

  • For dynamic, on-demand remote/local reference resolution (HTTP, file, etc.), AJV does not include a built-in universal resolver.

  • AJV users must supplement with a schema loader/resolver library—especially for recursive, cyclic, or distributed schema graphs.


2. Current SOTA AJV Reference Resolution Ecosystem (July 2025)

Key Packages:

A. @hyperjump/json-schema

  • Widely adopted, full JSON Schema reference resolver and evaluator, supports all official drafts up to the latest (2023-12).

  • Handles $ref resolution for:

    • Local files (filesystem URIs)

    • HTTP(S) resources

    • In-memory schemas

    • Cyclic and recursive references (both local and remote)

  • Acts as both a resolver and validator, but can also act as a resolver only to plug into AJV or other engines.

B. json-schema-ref-resolver (a.k.a. @apidevtools/json-schema-ref-parser)

  • Longstanding, stable tool for dereferencing JSON Schemas, supports local and remote refs.

  • Used by many codegen and validation tools.

  • Sometimes falls short for the very latest JSON Schema drafts and complex edge cases, but often "just works".

C. ajv-formats

  • Not a resolver, but important to support advanced format validation (e.g., uri, date-time, etc.) up to the latest spec.

D. ajv-draft-2020 or AJV built-in draft support

  • Ensures your AJV version is using the latest JSON Schema draft logic.

E. AJV v8+ built-in $refs loader hooks

  • AJV 8+ supports custom loaders via the loadSchema option (async schema loading), allowing you to plug in resolvers.


3. Recommended SOTA Combination (2025)

For a production-grade, fully dynamic, spec-compliant AJV-based validation stack:

Option 1: AJV + @hyperjump/json-schema

  • Use AJV for validation, but delegate all reference resolution to @hyperjump/json-schema.

  • @hyperjump/json-schema is now widely recognized as the most spec-compliant resolver, and can be run in "resolver only" mode.

  • In AJV, set loadSchema to a wrapper that calls @hyperjump/json-schema for schema loading.

import Ajv from "ajv/dist/2020"; // or latest
import { get } from "@hyperjump/json-schema";

const ajv = new Ajv({ loadSchema: async (uri) => {
return get(uri); // this uses @hyperjump/json-schema to fetch and resolve
}});
// Use ajv.compileAsync for schema compilation

Option 2: AJV + json-schema-ref-resolver

  • Use json-schema-ref-resolver (or @apidevtools/json-schema-ref-parser) to fully dereference your schemas before passing them to AJV.

  • This flattens the schema graph, so AJV doesn't have to resolve any $ref at validation time.

import $RefParser from '@apidevtools/json-schema-ref-parser';
import Ajv from 'ajv/dist/2020';

const parser = new $RefParser();
const schema = await parser.dereference(schemaPathOrObj);
const ajv = new Ajv();
const validate = ajv.compile(schema);

Note: This approach is not as dynamic at runtime for streaming validation of arbitrarily-referenced schemas, but is great for most use-cases where you have a bounded schema set.


4. Which Should You Use?

  • For runtime, dynamic reference resolution (arbitrary, possibly untrusted sources, e.g., API gateways, extensible pipelines):

    Use AJV + @hyperjump/json-schema as your SOTA combo.

  • For static or semi-static schema sets (microservices, codegen, build-time validation):

    Use AJV + json-schema-ref-resolver/@apidevtools/json-schema-ref-parser to flatten the schema tree ahead of time.

In both cases:

  • Always use ajv-formats to cover full JSON Schema format validation.

  • Pin your AJV version to 8.x or later, and explicitly set the draft version.


5. Summary Table

Use-Case SOTA Combo Notes
Arbitrary/dynamic refs (runtime) AJV + @hyperjump/json-schema Best for dynamic/unknown schema graphs, latest spec
Flatten refs (static/codegen/CI) AJV + @apidevtools/json-schema-ref-parser Best for known schema sets, simpler but less dynamic
Complex "format" validation + ajv-formats Always include

6. Caveats & Final Thoughts

  • Do not write your own resolver. All major SOTA tools are robust, fast, and actively maintained (as of July 2025).

  • AJV’s core maintainers and the JSON Schema community have converged on using @hyperjump/json-schema for serious resolver needs.

  • For edge-case enterprise requirements (private schema registries, custom URI schemes, caching), you can extend @hyperjump/json-schema with your own loaders, but this is rare.

  • For full spec-compliance, ensure all packages are at their latest major versions and test against your target draft.


References / Sources


If you want example code for a real-world AJV + resolver stack, or want to compare perf/compat between the two above, specify your scenario.

## Context & Assumptions

Assumptions:

  • You want to use AJV (Another JSON Schema Validator) for validating JSON objects against JSON Schemas, potentially including remote/local $ref references.
  • You want fully dynamic, production-grade resolution: nested, cyclic, local and remote refs, all up-to-date with the latest JSON Schema (2020-12 and 2023-12) spec.
  • You are not interested in writing your own resolver logic or patching together partial solutions.
  • You require a solution that's "SOTA" (State of the Art) as of July 2025—with maximal ecosystem support, stability, and compliance.

1. AJV Core Capabilities and Limitations (as of 2025)

  • AJV by itself supports local references ($ref) within the same schema or within pre-loaded schemas.
  • For dynamic, on-demand remote/local reference resolution (HTTP, file, etc.), AJV does not include a built-in universal resolver.
  • AJV users must supplement with a schema loader/resolver library—especially for recursive, cyclic, or distributed schema graphs.

2. Current SOTA AJV Reference Resolution Ecosystem (July 2025)

Key Packages:

A. [@hyperjump/json-schema](https://www.npmjs.com/package/@hyperjump/json-schema)

  • Widely adopted, full JSON Schema reference resolver and evaluator, supports all official drafts up to the latest (2023-12).

  • Handles $ref resolution for:

    • Local files (filesystem URIs)
    • HTTP(S) resources
    • In-memory schemas
    • Cyclic and recursive references (both local and remote)
  • Acts as both a resolver and validator, but can also act as a resolver only to plug into AJV or other engines.

B. [json-schema-ref-resolver](https://github.com/APIDevTools/json-schema-ref-resolver) (a.k.a. @apidevtools/json-schema-ref-parser)

  • Longstanding, stable tool for dereferencing JSON Schemas, supports local and remote refs.
  • Used by many codegen and validation tools.
  • Sometimes falls short for the very latest JSON Schema drafts and complex edge cases, but often "just works".

C. [ajv-formats](https://github.com/ajv-validator/ajv-formats%60)

  • Not a resolver, but important to support advanced format validation (e.g., uri, date-time, etc.) up to the latest spec.

D. [ajv-draft-2020](https://github.com/ajv-validator/ajv-draft-2020) or AJV built-in draft support

  • Ensures your AJV version is using the latest JSON Schema draft logic.

E. AJV v8+ built-in $refs loader hooks

  • AJV 8+ supports custom loaders via the loadSchema option (async schema loading), allowing you to plug in resolvers.

3. Recommended SOTA Combination (2025)

For a production-grade, fully dynamic, spec-compliant AJV-based validation stack:

Option 1: AJV + @hyperjump/json-schema

  • Use AJV for validation, but delegate all reference resolution to @hyperjump/json-schema.
  • @hyperjump/json-schema is now widely recognized as the most spec-compliant resolver, and can be run in "resolver only" mode.
  • In AJV, set loadSchema to a wrapper that calls @hyperjump/json-schema for schema loading.
import Ajv from "ajv/dist/2020"; // or latest
import { get } from "@hyperjump/json-schema";

const ajv = new Ajv({ loadSchema: async (uri) => {
    return get(uri); // this uses @hyperjump/json-schema to fetch and resolve
}});
// Use ajv.compileAsync for schema compilation

Option 2: AJV + json-schema-ref-resolver

  • Use json-schema-ref-resolver (or @apidevtools/json-schema-ref-parser) to fully dereference your schemas before passing them to AJV.
  • This flattens the schema graph, so AJV doesn't have to resolve any $ref at validation time.
import $RefParser from '@apidevtools/json-schema-ref-parser';
import Ajv from 'ajv/dist/2020';

const parser = new $RefParser();
const schema = await parser.dereference(schemaPathOrObj);
const ajv = new Ajv();
const validate = ajv.compile(schema);

Note: This approach is not as dynamic at runtime for streaming validation of arbitrarily-referenced schemas, but is great for most use-cases where you have a bounded schema set.


4. Which Should You Use?

  • For runtime, dynamic reference resolution (arbitrary, possibly untrusted sources, e.g., API gateways, extensible pipelines):

    Use AJV + @hyperjump/json-schema as your SOTA combo.

  • For static or semi-static schema sets (microservices, codegen, build-time validation):

    Use AJV + json-schema-ref-resolver/@apidevtools/json-schema-ref-parser to flatten the schema tree ahead of time.

In both cases:

  • Always use ajv-formats to cover full JSON Schema format validation.
  • Pin your AJV version to 8.x or later, and explicitly set the draft version.

5. Summary Table

Use-Case SOTA Combo Notes
Arbitrary/dynamic refs (runtime) AJV + @hyperjump/json-schema Best for dynamic/unknown schema graphs, latest spec
Flatten refs (static/codegen/CI) AJV + @apidevtools/json-schema-ref-parser Best for known schema sets, simpler but less dynamic
Complex "format" validation + ajv-formats Always include

6. Caveats & Final Thoughts

  • Do not write your own resolver. All major SOTA tools are robust, fast, and actively maintained (as of July 2025).
  • AJV’s core maintainers and the JSON Schema community have converged on using @hyperjump/json-schema for serious resolver needs.
  • For edge-case enterprise requirements (private schema registries, custom URI schemes, caching), you can extend @hyperjump/json-schema with your own loaders, but this is rare.
  • For full spec-compliance, ensure all packages are at their latest major versions and test against your target draft.

References / Sources


If you want example code for a real-world AJV + resolver stack, or want to compare perf/compat between the two above, specify your scenario.

@CMCDragonkai
Copy link
Member

image

Don't be shy with the AI. Abuse it. Force it to your will.

@aryanjassal
Copy link
Contributor Author

Previously, I was going out-of-scope a little bit, but now I've realigned this PR to better match the expectations.

@aryanjassal
Copy link
Contributor Author

I have tested this on the schema from zeta house, and it seems to work as intended. The schemas do not supply a type other than a string, so for the time being, that has not been tested within a repo, but the tests actually handle this case.

[aryanj@zenith:~/zeta.house]$ polykey secrets env vault --egress-schema schemas/scripts/deploy.schema.json
CLOUDFLARE_ACCOUNT_ID='mno'
CLOUDFLARE_API_TOKEN='stu'
CLOUDFLARE_ZONE_ID='pqr'
ZETA_HOUSE_GOOGLE_MAPS_API_KEY='jkl'
ZETA_HOUSE_SENDGRID_API_KEY='ghi'
ZETA_HOUSE_SUPABASE_KEY='def'
ZETA_HOUSE_SUPABASE_URL='abc'
ZETA_HOUSE_ENV='development'

[aryanj@zenith:~/zeta.house]$ polykey secrets rm vault:CLOUDFLARE_API_TOKEN

[aryanj@zenith:~/zeta.house]$ polykey secrets env vault --egress-schema schemas/scripts/deploy.schema.json
ErrorPolykeyCLISchemaInvalid: The provided JSON schema is invalid - JSON schema validation failed
  data: {"errors":[{"instancePath":"","schemaPath":"#/required","keyword":"required","params":{"missingProperty":"CLOUDFLARE_API_TOKEN"},"message":"must have required property 'CLOUDFLARE_API_TOKEN'"}]}

If this behaviour is fine, then after a quick cleanup, this is ready for merging @CMCDragonkai.

@aryanjassal
Copy link
Contributor Author

The command can now take a flag for --egress-schema, which needs to be a valid JSON schema. I used apitools' refparser to bundle the schema, and used ajv to parse it.

The schema validation is applied at the end of collecting the secrets, so behaviour controls like duplicate name behaviour, etc. are unmodified.

The default behaviour for a failing schema validation is an error being thrown, printing no secrets. Other potential behaviour could be printing the secrets but warning the user about each validation error. However, these options have not been incorporated in this iteration of the command.

We need to figure out the default behaviour of a failing validation. Noop makes no sense here as you would just not specify a schema for that. The two real options are to fail the command without printing the secrets, just the error, or we can print the secrets and the validation errors at the end of the command, so the schema was more a guideline than a strict rule. I feel the default would make sense to be strict validation rather than printing both the secrets and errors, but this needs more discussion before going ahead.

@CMCDragonkai

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Jul 11, 2025 via email

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Jul 11, 2025 via email

@aryanjassal
Copy link
Contributor Author

I feel like the error report shouldn't be stuffed into the data... There should be more rich reporting. But it's something we have to deal with at a meta level.

My first instinct was to use an AggregateError to render the errors instead, but it would have taken a lot of extra effort to implement, not to say it needs some meta decisions, so I used a regular error and added a list of errors at the end.

A similar issue plagues Polykey when it fails to connect to any seednode. In Polykey, the details of all the failing connections are included as a part of the error message itself, which I felt was also kinda lacking.

We need a way to extend AggregateErrors for Polykey and Polykey CLI errors, and then also add separate rendering code for both. Currently, all non-polykey errors are treated as unexpected errors, including JS native errors like Error, TypeError, or AggregateError.

@aryanjassal aryanjassal merged commit 501da50 into staging Jul 11, 2025
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Add egress schema to polykey secrets env
2 participants