Skip to content

Python build file tree support for Code Intelligence#160

Merged
anderruiz merged 9 commits into
mainfrom
ander/python-requirements-build-trees
Jun 11, 2026
Merged

Python build file tree support for Code Intelligence#160
anderruiz merged 9 commits into
mainfrom
ander/python-requirements-build-trees

Conversation

@anderruiz

@anderruiz anderruiz commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Replaces MavenProcessor with a generic SimpleProcessor that handles build file tree generation for any file type via BFS over FileDependencies. Adding support for a new file type now requires only a registration call — no new processor code needed.
  • Adds GetArtifact to PyProjectTOMLExtractor so pyproject.toml files report their module identity (reading [project].name or [tool.poetry].name) — enabling Python inter-module relationships in GetBuildFileTrees.
  • Registers SimpleProcessor for requirements.txt and pyproject.toml so both file types participate in build file tree traversal.
  • Renames flag to ExtractArtifactIds (previously ExtractArtifacts) for clarity.
  • Exposes ManifestParsers as an independent field in sbomgen.Options, decoupled from ExtractArtifactIds. Manifest parsing (pyproject.toml as package source when no lockfile exists) and artifact ID extraction are orthogonal capabilities. Internally, ExtractArtifactIds=true still activates manifest parsers as a side-effect (required for GetArtifact to run on manifest extractors).
  • Emits a bare type:file SBOM component for every build file that opts in (by implementing ArtifactExtractor) when ExtractArtifactIds=true, even if GetArtifact returns no identity. Lockfile-only parsers (Cargo.lock, yarn.lock, etc.) that do not implement ArtifactExtractor are unaffected. RequirementsTxtExtractor implements ArtifactExtractor with a trivial GetArtifact returning nil, so requirements.txt files appear in GetBuildFileTrees even when they contain only local editable deps.
  • Fixes findArtifact version matching to also match when the artifact has no declared version — handles local modules that omit the version field.
  • Generalizes parseArtifactID from Maven-only to any purl type.

Changes

File What changed
pkg/sbomgen/simple_processor.go New SimpleProcessor (replaces MavenProcessor); registered for pom.xml, requirements.txt, and pyproject.toml
pkg/sbomgen/build_files.go Generic parseArtifactID; fixed-point cross-type expansion
pkg/sbomgen/sbomgen.go ManifestParsers exposed as independent Options field; ExtractArtifactIds renamed
pkg/extractor/python/parse-pyproject-toml.go GetArtifact: reads [project].name then [tool.poetry].name
pkg/extractor/python/parse-requirements-txt.go Trivial GetArtifact so requirements.txt opts in to file-type component emission
pkg/extractor/python/types.go Added Name field to PyProjectPoetry struct
pkg/extractor/extract.go Bare ScannedArtifact fallback scoped to ArtifactExtractor implementors only
internal/output/sbom/cyclonedx.go addFileDependencies handles name-less artifacts; findArtifact allows empty artifact version
pkg/scanner/datadog_sbom_generator.go Updated flag name reference

Test plan

  • Unit tests for SimpleProcessor (BFS traversal, artifact ID lookup, multi-type registration)
  • Unit tests for PyProjectTOMLExtractor.GetArtifact (PEP project name, poetry name fallback, missing name, parse error)
  • build_files_test.go: cross-type expansion, empty-name artifacts, fixed-point loop
  • Snapshot tests updated for addFileDependencies bare artifact change
  • All existing Maven tests pass with SimpleProcessor
  • go test ./... passes
  • golangci-lint passes clean

🤖 Generated with Claude Code

anderruiz added 2 commits June 8, 2026 15:26
- Rename ProcessorContext.MavenArtifactIDs to ArtifactIDs for ecosystem-agnostic usage
- Replace hand-rolled parseMavenArtifactID with parseArtifactID using packageurl-go
- For purls with a namespace (Maven, npm) return "namespace:name"; for others (PyPI) return name only
- Add comprehensive test cases for parseArtifactID covering Maven, PyPI, npm, and invalid purls

Rationale: The build file tree processor needs to parse artifact IDs from any ecosystem,
not just Maven. Using packageurl-go provides correct purl parsing for all ecosystem types
and eliminates the Maven-specific string manipulation that would break on non-Maven purls.

This commit made by [/dd:git:commit:atomic](https://github.com/DataDog/claude-marketplace/tree/main/dd/commands/git/commit/atomic.md)
- Create SimpleProcessor with identical BFS logic from MavenProcessor
- Register SimpleProcessor for FileTypePomXML (replacing MavenProcessor registration)
- Delete MavenProcessor as its logic is fully absorbed by SimpleProcessor
- Rename test functions from TestMavenProcessor_* to TestSimpleProcessor_Maven_*
- Fix test helper mavenFileComponent to produce realistic purls (separate artifactID param)
- Update test expected IDs to match proper purl parsing (e.g. "com.example:child" not "com.example:child/pom.xml")

Rationale: MavenProcessor's BFS-over-FileDependencies logic is not Maven-specific -- it works
for any file type that has dependency edges and optional artifact IDs. Renaming to SimpleProcessor
and making it generic allows registering it for requirements.txt and other file types without
duplicating the BFS logic. All existing Maven test scenarios pass unchanged under the new name.

This commit made by [/dd:git:commit:atomic](https://github.com/DataDog/claude-marketplace/tree/main/dd/commands/git/commit/atomic.md)
@anderruiz anderruiz requested a review from a team as a code owner June 8, 2026 13:39
@datadog-official

datadog-official Bot commented Jun 8, 2026

Copy link
Copy Markdown

🎯 Code Coverage (details)
Patch Coverage: 83.33%
Overall Coverage: 84.52% (+0.17%)

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: e4fcd1d | Docs | Datadog PR Page | Give us feedback!

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 972f0499f8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread pkg/sbomgen/build_files.go Outdated
// if a dependency target is a known file-type component that wasn't
// collected, we add it to the same FileType group as its dependent so
// the processor can resolve it during BFS.
if len(filterSet) > 0 {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Handle cross-type targets without filters

When callers use the documented unfiltered form GetBuildFileTrees(sbom), requirements SBOMs that depend on a file component such as libs/mylib/setup.py lose that edge: the first pass groups the file component under its real basename (setup.py) with the no-op processor, while the requirements.txt SimpleProcessor only indexes files in the requirements group. This reclassification pass is the only code that adds cross-type targets to the dependent's group, but it is skipped unless a filter is provided, so unfiltered results omit the setup.py/pyproject dependency and its artifact ID even though filtered calls work.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in this PR.

@anderruiz anderruiz force-pushed the ander/python-requirements-build-trees branch from 046c3a0 to c3057ab Compare June 9, 2026 12:20

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c3057ab528

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +441 to +445
ArtifactDetail: models.ArtifactDetail{
Name: name,
Filename: filePath,
Ecosystem: models.EcosystemPyPI,
},

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve Python artifact versions for pinned dependencies

When a requirements.txt entry pins an internal package, e.g. mylib==1.0.0, parseLine records the package version, and internal/output/sbom/cyclonedx.go only links it to a scanned artifact when the artifact has the same version. This new artifact construction leaves Version empty, so the downstream findArtifact check fails and no requirements.txt -> pyproject.toml/setup.py dependency edge is emitted for pinned local modules, which is a common requirements format.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed: findArtifact now matches when artifact.Version is empty (local modules without a declared version), via the condition (version == "" || artifact.Version == "" || artifact.Version == version).

@anderruiz anderruiz force-pushed the ander/python-requirements-build-trees branch from c3057ab to ef47bdd Compare June 9, 2026 12:38

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ef47bddae8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

return nil, nil
}

name := normalizedRequirementName(pyproject.Project.Name)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Read Poetry project names when extracting artifacts

For pyproject.toml files that use Poetry metadata only ([tool.poetry] name = ..., which this extractor already supports for dependencies), this reads only [project].name and returns no scanned artifact. In that common layout, requirements entries for the local module can never match a file component, so the new requirements.txt build-file tree loses the dependency/ID for the Poetry module.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed: GetArtifact now falls back to [tool.poetry].name when [project].name is absent.

Comment thread pkg/sbomgen/sbomgen.go Outdated
Comment on lines +61 to +65
actions := scanner.ScannerActions{
DirectoryPaths: dirs,
ExcludePaths: opts.ExcludePaths,
Recursive: opts.Recursive,
ExtractMavenPomArtifactIds: opts.ExtractMavenPomArtifactIds,
DirectoryPaths: dirs,
ExcludePaths: opts.ExcludePaths,
Recursive: opts.Recursive,
ExtractArtifacts: opts.ExtractArtifacts,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Expose manifest parsing through GenerateSBOM

When callers use the documented sbomgen.GenerateSBOM(..., sbomgen.DefaultOptions()) API, this action never enables ScannerActions.ManifestParsers, and DoScan only adds manifest extractors such as pyproject.toml when that flag is set. As a result the new PyProjectTOML artifact extractor cannot run through the public sbomgen API, so requirements.txt → pyproject.toml build-file edges are absent unless users bypass this package and call scanner directly.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed: GenerateSBOM now sets ManifestParsers: opts.ExtractArtifactIds so pyproject.toml runs through the public API when artifact extraction is enabled.

@anderruiz anderruiz force-pushed the ander/python-requirements-build-trees branch from ef47bdd to 584b899 Compare June 9, 2026 12:44

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 584b8994c6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +168 to +169
for ft, files := range grouped {
for _, f := range files {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Expand cross-type targets to a fixed point

When a filtered tree has a local module that itself depends on another file component (for example requirements.txt -> libs/a/pyproject.toml -> libs/b/pyproject.toml), this loop only scans the files slice captured before line 177 appends libs/a/pyproject.toml. The newly added target is never examined for its own SBOM dependency edges, so libs/b/pyproject.toml is absent from the processor's filesByPath and SimpleProcessor drops that transitive dependency even though the SBOM contains it. Consider expanding these cross-type targets until no new file components are added.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed: replaced the single-pass loop with a fixed-point loop that keeps iterating until no new cross-type targets are added.

@anderruiz anderruiz force-pushed the ander/python-requirements-build-trees branch from 584b899 to 6a9568e Compare June 9, 2026 12:52

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6a9568e915

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread README.md Outdated
| ------------------ | ---------- | ------------------------------------------------------------------------------------------------------------------------------------------- |
| `Recursive` | `bool` | Scan subdirectories recursively (default: `true`) |
| `ExcludePaths` | `[]string` | Glob patterns to exclude from scanning |
| `ExtractArtifacts` | `bool` | Emit file-type components and dependency edges for build files, enabling `GetBuildFileTrees` dependency and ID resolution (default: `true`) |

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Use the actual artifact option name

The README now documents ExtractArtifacts, but the public sbomgen.Options struct exposes ExtractArtifactIds instead. Users following this example to disable artifact extraction with sbomgen.Options{ExtractArtifacts: false} will get a compile error, so the documented API should match the field name in pkg/sbomgen/sbomgen.go.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed: README now documents ExtractArtifactIds (matching the actual struct field).

@anderruiz anderruiz changed the title feat(sbomgen): generic SimpleProcessor + Python requirements.txt build file trees feat(sbomgen): generic SimpleProcessor + PyToml build file trees Jun 9, 2026
@anderruiz anderruiz force-pushed the ander/python-requirements-build-trees branch from 6a9568e to c56993c Compare June 9, 2026 13:00

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c56993c422

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +66 to +67
RegisterBuildFileProcessor(FileTypePomXML, &SimpleProcessor{})
RegisterBuildFileProcessor(FileTypeRequirementsTxt, &SimpleProcessor{})

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Register pyproject.toml with the simple processor

When a scan emits a pyproject.toml file component from GetArtifact, buildProcessorContext can populate ctx.ArtifactIDs and dependency edges for that file, but this init block only registers the simple processor for POMs and requirements files. As a result, callers using GetBuildFileTrees(sbom, FileTypePyprojectToml) fall through to noopProcessor, so the pyproject build-file tree loses its package ID and any local file dependencies even though the SBOM contains the data; register FileTypePyprojectToml here as well.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed: FileTypePyprojectToml is now registered with SimpleProcessor in the init() block.

anderruiz and others added 3 commits June 9, 2026 15:09
…or for build file tree support

Adds GetArtifact to PyProjectTOMLExtractor so that pyproject.toml files
expose their project identity ([project].name, normalized per PEP 503)
as a ScannedArtifact. This enables findArtifact to match packages from
sibling lockfiles (e.g. requirements.txt) against the local module,
producing inter-module dependency edges in the SBOM — the same mechanism
already used for Maven pom.xml.

GetArtifact lives on PyProjectTOMLExtractor (the entity that owns
pyproject.toml) rather than on RequirementsTxtExtractor; the lockfile
parser has no knowledge of pyproject.toml. PyProjectProject gains a
Name field so the canonical PyProjectTOML type carries the identity.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…integration tests

- Register SimpleProcessor for FileTypeRequirementsTxt alongside FileTypePomXML
- Add integration test covering Python requirements.txt build file tree generation
  with inter-module relationships via PyProjectTOMLExtractor.GetArtifact
The flag gates artifact ID extraction specifically (the name/purl used to
identify internal modules), not general artifact data. The previous name
caused confusion with CI/CD artifact concepts.
@anderruiz anderruiz force-pushed the ander/python-requirements-build-trees branch from c56993c to 3a33496 Compare June 9, 2026 13:13
- Register SimpleProcessor for FileTypePyprojectToml so
  GetBuildFileTrees(sbom, FileTypePyprojectToml) produces results instead
  of falling through to noopProcessor
- Read [tool.poetry].name as fallback in GetArtifact when [project].name
  is absent (Poetry-only layout)
- Enable ManifestParsers in GenerateSBOM so PyProjectTOMLExtractor
  GetArtifact runs through the public sbomgen API
- Match ScannedArtifact with empty Version so pinned requirements entries
  (mylib==1.0.0) link to internal modules that declare no version
- Expand cross-type dependency targets to a fixed point and remove the
  filter guard so chains (requirements.txt → a/pyproject.toml →
  b/pyproject.toml) resolve fully with or without a FileType filter

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 54c29ea355

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread pkg/sbomgen/sbomgen.go Outdated
Comment on lines +65 to +66
ManifestParsers: opts.ExtractArtifactIds,
ExtractArtifactIds: opts.ExtractArtifactIds,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Decouple manifest scanning from artifact extraction

When a library caller disables ExtractArtifactIds to suppress file-type components/dependency edges, this also leaves ManifestParsers false, so manifest-only extractors such as pyproject.toml and package.json are not enabled by DoScan unless explicitly selected elsewhere. That makes GenerateSBOM(..., Options{ExtractArtifactIds: false}) drop packages from manifest-only projects even though the option is documented as controlling only artifact IDs/build-file edges; keep manifest parser enablement independent from the artifact-output toggle.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed: ManifestParsers is now an independent field in sbomgen.Options. Callers can enable manifest parsing without artifact extraction. Internally, ExtractArtifactIds=true still activates manifest parsers as a side-effect (required for GetArtifact to run on manifest extractors like PyProjectTOMLExtractor), via ManifestParsers: opts.ManifestParsers || opts.ExtractArtifactIds.

When ExtractArtifactIds is enabled, every file that matches a registered
extractor is now emitted as a type:file component in the SBOM, even if it
produces zero packages and has no artifact identity (GetArtifact returns nil
or is not implemented).

Previously, a build file only appeared in GetBuildFileTrees if it either
had at least one package with role:"manifest", or its extractor implemented
GetArtifact. Files like a requirements.txt containing only -e ../base (no
parseable packages, no GetArtifact) were invisible.

The bare type:file component has no datadog:maven-package property and does
not participate in findArtifact matching — it only ensures the file appears
as a node in the build file tree.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 13c377dd0a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread pkg/extractor/extract.go Outdated
Comment on lines +211 to +216
// Even if GetArtifact is not implemented or returned nil, emit a bare
// file-type component so every scanned build file appears in GetBuildFileTrees,
// regardless of whether it has an artifact identity or any packages.
if parsedLockfile.Artifact == nil {
parsedLockfile.Artifact = &models.ScannedArtifact{
ArtifactDetail: models.ArtifactDetail{Filename: f.Path()},

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid emitting lockfiles as build-file components

When ExtractArtifactIds is enabled, this fallback runs for every extractor that does not implement ArtifactExtractor, including lockfile-only parsers such as Cargo.lock, package-lock.json, or yarn.lock. Those bare ScannedArtifacts become CycloneDX file components, and GetBuildFileTrees collects all file components as build files, so the library default can return lockfiles/unknown file types even when no manifest was present. Restrict this fallback to actual manifest/build-file extractors or skip it for lockfile-only parsers.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed: the bare ScannedArtifact fallback is now inside the ArtifactExtractor type-assertion block. Only extractors that explicitly opt in by implementing ArtifactExtractor receive a bare file-type component. Lockfile-only parsers (Cargo.lock, yarn.lock, etc.) are unaffected. RequirementsTxtExtractor was given a trivial GetArtifact returning nil so requirements.txt files still appear in GetBuildFileTrees.

func findArtifact(name string, version string, artifacts []models.ScannedArtifact) *models.ScannedArtifact {
for _, artifact := range artifacts {
if artifact.Name == name && (version == "" || artifact.Version == version) {
if artifact.Name == name && (version == "" || artifact.Version == "" || artifact.Version == version) {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Do not treat unknown artifact versions as universal matches

Fresh evidence: this line now makes an empty artifact version match any versioned package, so a repo-local artifact with the same name but a different or unparsed version is linked anyway. For example, if a pyproject module is version 2.0.0 but a requirements file pins mylib==1.0.0, the empty artifact version currently creates a false requirements.txt -> pyproject.toml dependency edge instead of rejecting the version mismatch. Preserve/extract artifact versions and only wildcard when the dependency itself is unversioned.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving as-is. The condition (version == "" || artifact.Version == "" || artifact.Version == version) is intentional: local modules in Python monorepos commonly omit the version in pyproject.toml, and internal requirements.txt entries for those modules rarely pin an exact version. The scenario in the comment (artifact version 2.0.0 vs pinned 1.0.0) is already handled correctly — the match is rejected when both sides have a non-empty, differing version. The remaining edge case (empty artifact version + pinned requirement) is an acceptable trade-off: false positives are unlikely in practice and false negatives (missing inter-module edges) would be more harmful.

@anderruiz anderruiz changed the title feat(sbomgen): generic SimpleProcessor + PyToml build file trees feat(sbomgen): generic SimpleProcessor + Python build file tree support Jun 10, 2026
anderruiz and others added 2 commits June 10, 2026 15:45
…r implementors

Lockfile-only parsers (Cargo.lock, yarn.lock, etc.) were receiving a bare
ScannedArtifact and appearing as type:file components in GetBuildFileTrees.

Move the fallback inside the ArtifactExtractor type-assertion block so only
extractors that opt in by implementing the interface get a bare component.
Add a trivial GetArtifact to RequirementsTxtExtractor so requirements.txt
files continue to appear in GetBuildFileTrees as intended.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ions

ManifestParsers (parse manifests as package sources when no lockfile exists)
and ExtractArtifactIds (emit file-type SBOM components for build file trees)
are independent capabilities and should be controlled separately.

Expose ManifestParsers as its own field in sbomgen.Options. Internally,
ExtractArtifactIds still enables manifest parsers as a side-effect (needed
to invoke GetArtifact on manifest extractors like PyProjectTOMLExtractor),
but callers can now also enable ManifestParsers independently without
implying any artifact ID extraction.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e4fcd1d912

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

if _, isFileComponent := fileComponentPaths[depPath]; !isFileComponent {
continue
}
bf := BuildFile{FileType: ft, FilePath: depPath}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve the target file type for cross-type dependencies

When a requirements file depends on a file component of another type, such as libs/mylib/pyproject.toml or setup.py, this constructs the target BuildFile with the dependent group's ft instead of the target file's actual type. As a result GetBuildFileTrees(sbom, FileTypeRequirementsTxt) can return a pyproject.toml/setup.py path labeled as FileTypeRequirementsTxt, even though the API documents BuildFile.FileType as the file's type; consumers can no longer reliably distinguish the manifest type for these dependency entries.

Useful? React with 👍 / 👎.

@anderruiz anderruiz changed the title feat(sbomgen): generic SimpleProcessor + Python build file tree support Python build file tree support Jun 10, 2026
@anderruiz anderruiz changed the title Python build file tree support Python build file tree support for Code Intelligence Jun 10, 2026
@anderruiz anderruiz merged commit ccede4e into main Jun 11, 2026
11 checks passed
@anderruiz anderruiz deleted the ander/python-requirements-build-trees branch June 11, 2026 15:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants