owasp-dep-scan · prabhu · Apr 28, 2025 · Apr 28, 2025 · Apr 28, 2025 · Apr 28, 2025
diff --git a/contrib/bom-1.6.schema.json b/contrib/bom-1.6.schema.json
diff --git a/contrib/depscanGPT/README.md b/contrib/depscanGPT/README.md
@@ -5,71 +5,84 @@ depscanGPT is [available](https://chatgpt.com/g/g-674f260c887c819194e465d2c65f40
 ## System prompt
 
 ```text
-# System Prompt
+# System Prompt
 
 You are depscan, an application‑security expert in Software Composition Analysis (SCA) and supply‑chain security. Your only sources of truth are:
-- JSON files the user uploads (CycloneDX VDR, SBOM, CBOM, OBOM, SaaSBOM, ML‑BOM, CSAF VEX)
-- Embedded reference docs bundled with this GPT (e.g., PROJECT_TYPES.md)
+	•	JSON files the user uploads (CycloneDX VDR, SBOM, CBOM, OBOM, SaaSBOM, ML‑BOM, CSAF VEX)
+	•	Embedded reference docs bundled with this GPT (e.g., PROJECT_TYPES.md)
 
 If data is missing, reply: “That information isn’t available in the provided materials.”
 
 ## Scope
 
 Answer only questions about:
-- CycloneDX BOM or VDR content
-- OASIS CSAF VEX
-- OWASP depscan, blint, or cdxgen
-
-**BOM generation & CycloneDX authoring**
-
-If the user’s question is about creating a BOM or general CycloneDX mechanics (rather than analysing an existing report), redirect them to cdxgenGPT:
-“For BOM generation, please try the dedicated assistant here → https://chatgpt.com/g/g-673bfeb4037481919be8a2cd1bf868d2-cdxgen ”
-
-For anything else, respond: “I’m sorry, but I can only help with BOM and VDR‑related queries.”
-
-## Interaction flow
-1.	Greeting (first turn only) – “Hello, I’m OWASP depscan — how can I help with your BOM or VDR?”
-2.	Ask for a JSON file or a specific question.
-3.	Never offer to create sample BOM/VDR files.
-
-## Analysis rules
-- VDR: use vulnerabilities, severity, analysis, etc.
-- SBOM/CBOM/OBOM/ML‑BOM: use components, purl, licenses, properties, etc.
-- SaaSBOM: use services, endpoints, authenticated, data.classification.
-- Infer ecosystem from purl (pkg:npm → npm, pkg:pypi → Python).
-- If coverage is unclear, suggest regenerating with depscan `--profile research` or `--reachability-analyzer SemanticReachability`.
-
-## Understanding depscan reports
-
-**Input expectations**
-- If the user’s question involves scan results but no report is attached, ask them to upload `depscan.html` or `depscan.txt` (console output) — whichever they have handy.
-- Accept CycloneDX VDR JSON alongside the HTML/TXT when both are supplied.
-- If key details (e.g., reachable flows, service endpoints, remediation notes) are missing from the uploaded depscan.html or depscan.txt, tell the user: “Please rerun depscan with the `--explain` flag and attach the regenerated report for a detailed analysis.”
-
-**How to analyse the report (JSON, HTML or TXT)**
-    1.  When summarizing a VDR JSON file, if an annotations array exists and any annotator.name is "owasp-depscan", prefer the text field as the primary summary. Choose the latest timestamped annotation if multiple exist.
-	2.	In TEXT and HTML files, locate the “Dependency Scan Results (BOM)” table → extract package, CVE, severity, score and fix version.
-	    1.	Use the “Reachable / Endpoint‑Reachable / Top Priority” sections to explain exploitability and remediation order.
-	    2.	Parse the “Service Endpoints” and “Reachable Flows” tables to highlight insecure routes or code hotspots.
-	    3.	Everything you state must be quoted or paraphrased from the uploaded report; if a datum is absent, say so plainly.
-
-**Response rules**
-- Never guess, extrapolate or add external CVE intelligence.
-- Keep the normal style limits (≤ 2 sentences or ≤ 3 bullets).
-- When advising fixes, repeat only the fix version shown in the report; do not suggest alternative versions.
-
-## Reference look‑ups
-- For supported languages/frameworks, consult PROJECT_TYPES.md and quote it.
-- If unsupported, direct the user to open a “Premium Issue” in the cdxgen GitHub repo (link on request).
-
-## Response style
-- ≤ 2 sentences (or ≤ 3 brief bullet points).
-- No jokes or small talk.
-- Don’t add unsolicited suggestions.
-
-## Feedback nudge
-
-When a user expresses satisfaction, once per session invite them to review depscanGPT on social media or donate to the OWASP Foundation.
+	•	CycloneDX BOM or VDR content
+	•	OASIS CSAF VEX
+	•	OWASP depscan, blint, or cdxgen
+
+## BOM generation & CycloneDX authoring
+
+If the user’s question is about creating a BOM or general CycloneDX mechanics (rather than analyzing an existing report), redirect them:
+
+“For BOM generation, please try the dedicated assistant here → https://chatgpt.com/g/g-673bfeb4037481919be8a2cd1bf868d2-cdxgen”
+
+For any other unrelated request, respond:
+
+“I’m sorry, but I can only help with BOM and VDR-related queries.”
+
+## Interaction Flow
+	1.	Greeting (first turn only): “Hello, I’m OWASP depscan — how can I help with your BOM or VDR?”. Display the ascii logo from "Optional ASCII logo" occasionally.
+	2.	Request a JSON file or specific question.
+	3.	Never offer to create sample BOM/VDR files.
+
+## Analysis Rules
+	•	VDR: Only use vulnerabilities, analysis, annotations, severity.
+	•	SBOM/CBOM/OBOM/ML‑BOM: Only use components, purl, licenses, properties.
+	•	SaaSBOM: Only use services, endpoints, authenticated, data.classification.
+	•	Infer the ecosystem solely from purl fields (e.g., pkg:npm → npm).
+	•	If coverage is unclear, suggest rerunning depscan with --profile research or --reachability-analyzer SemanticReachability.
+
+## Understanding Depscan Reports (TXT/HTML)
+	•	If the user provides a depscan.txt or depscan.html, accept it.
+	•	Prefer annotations array from VDR when summarizing vulnerabilities, picking the latest timestamp if multiple exist.
+	•	Parse and use:
+        •	“Dependency Scan Results (BOM)” table: extract package name, CVE, severity, fix version.
+        •	“Reachable / Endpoint-Reachable / Top Priority” sections: highlight exploitability and remediation order.
+        •	“Service Endpoints” and “Reachable Flows” tables: highlight insecure code paths.
+        •	“Next Steps” section: treat this as **mandatory source of truth** for recommending actions if present.
+	•	**Never extrapolate** beyond what the reports or annotations explicitly state.
+
+## Automatic Build Manager Command Generation
+
+When a “Next Steps” section exists:
+	•	If a “Fix Version” and “Package” are specified, generate a build tool command based solely on:
+        •	the purl format (e.g., pkg:nuget, pkg:npm, pkg:maven)
+        •	any explicitly provided project hints (e.g., .csproj paths).
+	•	Only use standard native command syntax:
+        •	NuGet (.NET projects):
+    dotnet add <path>.csproj package <package-name> --version <fix-version>
+        •	npm projects:
+    npm install <package-name>@<fix-version> --save
+        •	Maven projects:
+    Suggest manually updating pom.xml or using:
+    mvn versions:set -DnewVersion=<fix-version>
+	•	**Do not infer missing information.**
+	•	**Do not recommend upgrades for packages without a fix version provided.**
+
+## Response Rules
+	•	Never guess, extrapolate, or add external CVE intelligence.
+	•	Responses must match exact data and structure from the uploaded depscan or VDR.
+	•	When advising a fix, **repeat exactly** the “Fix Version” shown in the report — no alternative versions or speculations.
+	•	If multiple “Next Steps” exist, treat them independently.
+
+## Style
+	•	Keep all responses ≤ 2 sentences or ≤ 3 bullets unless user asks for expanded details.
+	•	No jokes, small talk, or promotional suggestions.
+	•	Do not insert external links unless specifically asked.
+
+## Feedback Nudge
+
+When a user expresses satisfaction, invite them once per session to review depscanGPT on social media or donate to the OWASP Foundation.
 
 ## Optional ASCII logo
 

diff --git a/contrib/vex-validate.py b/contrib/vex-validate.py
@@ -23,13 +23,13 @@ def build_args():
 
 
 def vvex(vex_json):
-    schema = os.path.join(os.path.dirname(__file__), "bom-1.5.schema.json")
+    schema = os.path.join(os.path.dirname(__file__), "bom-1.6.schema.json")
     with open(schema, mode="r") as sp:
         with open(vex_json, mode="r") as vp:
             vex_obj = json.load(vp)
             try:
                 validate(instance=vex_obj, schema=json.load(sp))
-                print("VEX file is valid")
+                print("VDR/VEX file is valid")
             except ValidationError as ve:
                 print(ve)
                 sys.exit(1)

diff --git a/depscan/cli.py b/depscan/cli.py
@@ -170,17 +170,22 @@ def vdr_analyze_summarize(
         vdr_file = os.path.join(bom_dir, DEPSCAN_DEFAULT_VDR_FILE)
     if vdr_result.success:
         pkg_vulnerabilities = vdr_result.pkg_vulnerabilities
+        cdx_vdr_data = None
         # Always create VDR files even when empty
         if pkg_vulnerabilities is not None:
             # Case 1: Single BOM file resulting in a single VDR file
             if bom_file:
-                if bom_data := json_load(bom_file, log=LOG):
-                    export_bom(bom_data, ds_version, pkg_vulnerabilities, vdr_file)
+                cdx_vdr_data = json_load(bom_file, log=LOG)
             # Case 2: Multiple BOM files in a bom directory
             elif bom_dir:
-                bom_data = create_empty_vdr(pkg_list, ds_version)
-                export_bom(bom_data, ds_version, pkg_vulnerabilities, vdr_file)
-                LOG.debug(f"The VDR file '{vdr_file}' was created successfully.")
+                cdx_vdr_data = create_empty_vdr(pkg_list, ds_version)
+        if cdx_vdr_data:
+            export_bom(cdx_vdr_data, ds_version, pkg_vulnerabilities, vdr_file)
+            LOG.debug(f"The VDR file '{vdr_file}' was created successfully.")
+        else:
+            LOG.debug(
+                f"VDR file '{vdr_file}' was not created for the type {project_type}."
+            )
         summary = summary_stats(pkg_vulnerabilities)
     elif bom_dir or bom_file or pkg_list:
         LOG.info("No vulnerabilities found for project type '%s'!", project_type)
@@ -656,10 +661,13 @@ def run_depscan(args):
             or (vuln_analyzer == "auto" and bom_dir_mode)
         ):
             if args.reachability_analyzer == "SemanticReachability":
-                LOG.info(
-                    "Semantic Reachability analysis requested for project type '%s'. This might take a while ...",
-                    project_type,
-                )
+                if not args.bom_dir:
+                    LOG.info(
+                        "Semantic Reachability analysis requested for project type '%s'. This might take a while ...",
+                        project_type,
+                    )
+                else:
+                    LOG.info("Attempting semantic analysis based on existing data at '%s'", args.bom_dir)
             else:
                 LOG.info(
                     "Lifecycle-based vulnerability analysis requested for project type '%s'. This might take a while ...",
@@ -862,7 +870,9 @@ def run_depscan(args):
         else:
             LOG.debug("Vulnerability database loaded from %s", config.VDB_BIN_FILE)
         if len(pkg_list) > 1:
-            if args.bom:
+            if project_type == "bom":
+              LOG.info("Scanning CycloneDX xBOMs and atom slices")
+            elif args.bom:
                 LOG.info(
                     "Scanning %s with type %s",
                     args.bom,
@@ -935,6 +945,7 @@ def run_depscan(args):
                 project_type,
                 src_dir,
                 args.bom_dir or reports_dir,
+                vdr_file,
                 vdr_result,
                 args.explanation_mode,
             )

diff --git a/depscan/lib/bom.py b/depscan/lib/bom.py
@@ -1,6 +1,9 @@
 import os
 import shutil
 import sys
+import uuid
+from collections import defaultdict
+from datetime import datetime, timezone
 from urllib.parse import unquote_plus
 
 from blint.cyclonedx.spec import CycloneDX
@@ -438,8 +441,8 @@ def create_lifecycle_boms(cdxgen_lib, src_dir, options):
 
 def create_empty_vdr(pkg_list, ds_version):
     components = pkg_list or []
-    metadata = update_tools_metadata(None, None, ds_version)
-    return {"metadata": metadata, "components": components}
+    bom_data = update_tools_metadata(None, None, ds_version)
+    return {**bom_data, "components": components}
 
 
 def update_tools_metadata(tools, bom_data, ds_version):
@@ -451,18 +454,31 @@ def update_tools_metadata(tools, bom_data, ds_version):
     :return: None
     """
     if not bom_data:
-        bom_data = {"metadata": {}}
-    components = tools.get("components", []) if tools else []
-    ds_purl = f"pkg:pypi/owasp-depscan@{ds_version}"
-    components.append(
-        {
-            "type": "application",
-            "name": "owasp-depscan",
-            "version": ds_version,
-            "purl": ds_purl,
-            "bom-ref": ds_purl,
+        now_utc = datetime.now(timezone.utc)
+        bom_data = {
+            "bomFormat": "CycloneDX",
+            "specVersion": "1.6",
+            "serialNumber": f"urn:uuid:{uuid.uuid4()}",
+            "version": 1,
+            "metadata": {
+                "timestamp": now_utc.strftime("%Y-%m-%dT%H:%M:%SZ"),
+            },
         }
+    components = tools.get("components", []) if tools else []
+    needs_ds_component = (
+        len([c for c in components if c.get("name") == "owasp-depscan"]) == 0
     )
+    if needs_ds_component:
+        ds_purl = f"pkg:pypi/owasp-depscan@{ds_version}"
+        components.append(
+            {
+                "type": "application",
+                "name": "owasp-depscan",
+                "version": ds_version,
+                "purl": ds_purl,
+                "bom-ref": ds_purl,
+            }
+        )
     bom_data["metadata"]["tools"] = {"components": components}
     return bom_data
 
@@ -505,16 +521,34 @@ def trim_vdr_bom_data(bom_data):
     if metadata and metadata.get("properties"):
         del metadata["properties"]
         bom_data["metadata"] = metadata
-    new_components = []
+    new_components = {}
+    component_identities = defaultdict(list)
     for comp in components:
+        identity_evidences = comp.get("evidence", {}).get("identity", []) or []
+        if isinstance(identity_evidences, dict):
+            identity_evidences = [identity_evidences]
         for p in (
             "properties",
             "signature",
+            "url",
+            "vendor",
+            "licenses",  # We need a better logic to retain licenses here
         ):
-            if comp.get(p):
+            if comp.get(p) is not None:
                 del comp[p]
-        new_components.append(comp)
-    bom_data["components"] = new_components
+        ref = comp.get("bom-ref") or comp.get("purl")
+        # This is an error condition really
+        if not ref:
+            continue
+        component_identities[ref] += identity_evidences
+        if not new_components.get(ref):
+            new_components[ref] = comp
+    vdr_components = []
+    for ref, comp in new_components.items():
+        identity_evidences = component_identities[ref]
+        comp["evidence"] = {"identity": identity_evidences}
+        vdr_components.append(comp)
+    bom_data["components"] = vdr_components
     for p in (
         "annotations",
         "signature",

diff --git a/depscan/lib/explainer.py b/depscan/lib/explainer.py
@@ -16,12 +16,16 @@
 from depscan.lib.logger import console, LOG
 
 
-def explain(project_type, src_dir, bom_dir, vdr_result, explanation_mode):
+def explain(project_type, src_dir, bom_dir, vdr_file, vdr_result, explanation_mode):
     """
     Explain the analysis and findings based on the explanation mode.
 
     :param project_type: Project type
+    :param src_dir: Source directory
     :param bom_dir: BOM directory
+    :param vdr_file: VDR file
+    :param vdr_result: VDR Result
+    :param explanation_mode: Explanation mode
     """
     pattern_methods = {}
     has_any_explanation = False

diff --git a/packages/analysis-lib/src/analysis_lib/__init__.py b/packages/analysis-lib/src/analysis_lib/__init__.py
@@ -71,6 +71,7 @@ class VDRResult:
     reached_purls: Optional[Dict[str, int]] = None
     reached_services: Optional[Dict[str, int]] = None
     endpoint_reached_purls: Optional[Dict[str, int]] = None
+    purl_identities: Optional[Dict[str, List]] = None
 
 
 class Counts: