[docs] expand deduplication / reimport documentation (DefectDojo#14392)

paulOsinski · Maffooch · valentijnscholten · commit 91bd8fc24612 · 2026-03-13T19:34:09.000+01:00
* edit dedupe reimport docs

* Update docs/content/triage_findings/finding_deduplication/OS__deduplication_tuning.md

Co-authored-by: Cody Maffucci &lt;46459665+Maffooch@users.noreply.github.com&gt;

* change article name and update links

* Update docs/content/triage_findings/finding_deduplication/PRO__deduplication_tuning.md

Co-authored-by: Cody Maffucci &lt;46459665+Maffooch@users.noreply.github.com&gt;

* remove weird line

---------

Co-authored-by: Cody Maffucci &lt;46459665+Maffooch@users.noreply.github.com&gt;
diff --git a/docs/content/get_started/about/faq.md b/docs/content/get_started/about/faq.md
@@ -69,7 +69,7 @@ If you're looking to add a new tool to your suite, we have a list of recommended
 There are two different methods to import a single report from a security tool:
 
 - **Import** handles the report as a single point-in-time record. Importing a report creates a Test containing the resulting Findings.
-- **[Reimport](/import_data/import_intro/import_vs_reimport/)** is used to update an existing Test with a new set of results. If you have a more open-ended approach to your testing process, you can continuously Reimport the latest version of your report to an existing Test. DefectDojo will compare the results of the incoming report to your existing data, record any changes, and then adjust the Findings in the Test to match the latest report.
+- **[Reimport](/import_data/import_intro/reimport/)** is used to update an existing Test with a new set of results. If you have a more open-ended approach to your testing process, you can continuously Reimport the latest version of your report to an existing Test. DefectDojo will compare the results of the incoming report to your existing data, record any changes, and then adjust the Findings in the Test to match the latest report.
 
 To understand the difference, it’s helpful to think of Import as recording a single instance of a scan event, and Reimport as updating a continual record of scanning.
 
diff --git a/docs/content/get_started/common_use_cases/common_use_cases.md b/docs/content/get_started/common_use_cases/common_use_cases.md
@@ -38,7 +38,7 @@ Each of these report categories can be handled by a separate Engagement, with a
 ![image](images/example_product_hierarchy_bigcorp.png)
 
 - If a Product has a CI/CD pipeline, all of the results from that pipeline can be continually imported into a single open-ended Engagement. Each tool used will create a separate Test within the CI/CD Engagement, which can be continuously updated with new data.  
-(See our guide to [Reimport](/import_data/import_intro/import_vs_reimport/))
+(See our guide to [Reimport](/import_data/import_intro/reimport/))
 - Each Pen Test effort can have a separate Engagement created to contain all of the results: e.g. "Q1 Pen Test 2024," "Q2 Pen Test 2024," etc.
 - BigCorp will likely want to run their own mock PCI audit so that they're prepared for the real thing. The results of those audits can also be stored as a separate Engagement.
 
diff --git a/docs/content/import_data/import_intro/reimport.md b/docs/content/import_data/import_intro/reimport.md
@@ -1,5 +1,5 @@
 ---
-title: "Import vs Reimport"
+title: "Reimport"
 description: "Learn how to import data manually, through the API, or via a connector"
 weight: 2
 aliases:
@@ -80,7 +80,13 @@ This header indicates the actions taken by an Import/Reimport.
 * **\# left untouched shows the count of Open Findings which were unchanged by a Reimport (because they also existed in the incoming report).**
 * **\#** **reactivated** shows any Closed Findings which were reopened by an incoming Reimport.
 
-## Reimport via API \- special note
+## Reimport Deduplication
+
+Reimport decides whether an incoming item matches an existing Finding using **[Reimport Deduplication](/triage_findings/finding_deduplication/about_deduplication/)** settings. This is separate from “Same Tool Deduplication” and “Cross Tool Deduplication,” which operate after Findings exist.
+
+If you are seeing Reimport close old Findings and create new Findings when only a minor attribute changes (for example, a line number shift), tune **Reimport Deduplication** for that tool to use stable identifiers that ignore those attributes (such as Unique ID From Tool).
+
+## Reimport via API - special note
 
 Note that the /reimport API endpoint can both **extend an existing Test** (apply the method in this article) **or create a new Test** with new data \- an initial call to `/import`, or setting up a Test in advance is not required.
 
diff --git a/docs/content/triage_findings/finding_deduplication/OS__deduplication_tuning.md b/docs/content/triage_findings/finding_deduplication/OS__deduplication_tuning.md
@@ -1,5 +1,5 @@
 ---
-title: "Deduplication Tuning"
+title: "Deduplication Tuning (Open Source)"
 description: "Configure deduplication in DefectDojo Open Source: algorithms, hash fields, endpoints, and service"
 weight: 5
 audience: opensource
@@ -106,6 +106,10 @@ Notes:
 
 ## After changing deduplication settings
 
+After changing algorithms or Hash computation, you will need to **recompute hashes** for the affected parser/test type before the new matching behavior will apply consistently across existing data.
+
+Note: Recomputing hashes can be lead to long wait times on large instances. Plan maintenance windows accordingly.
+
 - Changes to dedupe configuration (e.g., `HASHCODE_FIELDS_PER_SCANNER`, `HASH_CODE_FIELDS_ALWAYS`, `DEDUPLICATION_ALGORITHM_PER_PARSER`) are not applied retroactively automatically. To re-evaluate existing findings you must run the management command below.
 
 Run inside the uwsgi container. Example (hash codes only, no dedupe):
diff --git a/docs/content/triage_findings/finding_deduplication/PRO__deduplication_tuning.md b/docs/content/triage_findings/finding_deduplication/PRO__deduplication_tuning.md
@@ -1,11 +1,12 @@
 ---
-title: "Deduplication Tuning"
+title: "Deduplication Tuning (Pro)"
 description: "Configure how DefectDojo identifies and manages duplicate findings"
 weight: 4
 audience: pro
 aliases:
   - /en/working_with_findings/finding_deduplication/tune_deduplication
 ---
+
 Deduplication Tuning is a DefectDojo Pro feature that gives you fine-grained control over how findings are deduplicated, allowing you to optimize duplicate detection for your specific security testing workflow.
 
 ## Deduplication Settings
@@ -41,6 +42,8 @@ Uses a combination of selected fields to generate a unique hash. When selected,
 #### Unique ID From Tool
 Leverages the security tool's own internal identifier for findings, ensuring perfect deduplication when the scanner provides reliable unique IDs.
 
+This algorithm can be useful when working with SAST scanners, or situations where a Finding can "move around" in source code as development progresses.
+
 #### Unique ID From Tool or Hash Code
 Attempts to use the tool's unique ID first, then falls back to the hash code if no unique ID is available. This provides the most flexible deduplication option.
 
@@ -60,7 +63,11 @@ Unlike Same Tool Deduplication, Cross Tool Deduplication only supports the Hash
 
 ## Reimport Deduplication
 
-Reimport Deduplication Settings are specifically designed for reimporting data using Universal Parsers or the Generic Parser.
+**⚠️ Reimport processes can completely discard Findings before they are recorded.  This can lead to data loss if set incorrectly, so Reimport Deduplication settings should be adjusted with caution.**
+
+Reimport Deduplication Settings can be used to set an algorithm for Universal Parsers, or for a Generic Findings Import Parser.
+
+Reimport Deduplication cannot be adjusted for other tools by default.  Users who want to adjust the Reimport Deduplication algorithm for other tools in their instance should reach out to [DefectDojo Support](mailto:support@defectdojo.com) for assistance.
 
 ![image](images/reimport_deduplication.png)
 
@@ -74,6 +81,8 @@ The same three algorithm options are available for Reimport Deduplication as for
 - Unique ID From Tool
 - Unique ID From Tool or Hash Code
 
+Reimport can completely discard Findings before they are recorded, so Reimport Deduplication settings should be adjusted with caution.
+
 ## Deduplication Best Practices
 
 For optimal results with Deduplication Tuning:
@@ -85,3 +94,7 @@ For optimal results with Deduplication Tuning:
 - **Avoid overly broad deduplication**: Cross-tool deduplication with too few hash fields may result in false duplicates
 
 By tuning deduplication settings to your specific tools, you can significantly reduce duplicate noise.
+
+## Locked Findings 
+
+Whenever Deduplication Settings are changed for a given tool, Deduplication hashes are re-calculated for that tool across the entire DefectDojo instance.
diff --git a/docs/content/triage_findings/finding_deduplication/about_deduplication.md b/docs/content/triage_findings/finding_deduplication/about_deduplication.md
@@ -26,13 +26,29 @@ By default, these Tests would need to be nested under the same Product for Dedup
 
 Duplicate Findings are set as Inactive by default. This does not mean the Duplicate Finding itself is Inactive. Rather, this is so that your team only has a single active Finding to work on and remediate, with the implication being that once the original Finding is Mitigated, the Duplicates will also be Mitigated.
 
-## Deduplication vs Reimport
+## Reimport Deduplication
 
-Deduplication and Reimport are similar processes but they have a key difference:
+Deduplication and Reimport are similar processes, but they use different algorithms to identify Finding matches.
 
-* When you Reimport to a Test, the Reimport process looks at incoming Findings, **filters and** **discards any matches**. Those matches will never be created as Findings or Finding Duplicates.
-* Deduplication is applied 'passively' on Findings that have already been created. It will identify duplicates in scope and **label them**, but it will not delete or discard the Finding unless 'Delete Deduplicate Findings' is enabled.
-* The 'reimport' action of discarding a Finding always happens before deduplication; DefectDojo **cannot deduplicate Findings that are never created** as a result of Reimport's filtering.
+* When you Reimport to a Test, the Reimport process looks at incoming Findings, **compares hash codes, and then discards any matches**. Those matches will never be created as Findings or Finding Duplicates.
+
+However, any Findings that remain after Reimport Deduplication are still subject to Same-Tool Deduplication.  So if you use narrower a scope for Same-Tool Deduplication, you can end up with Duplicates within a Reimport pipeline.
+
+### Example
+
+Here's a tool with a Reimport Deduplication algorithm which is different from the Same-Tool Deduplication algorithm.
+
+| Deduplication Algorithm | Hash Code Fields |
+| ----- | ---- |
+| Reimport | Title, CWE, Severity, Description, Line Number |
+| Same-Tool | Title, CWE, Severity, Description |
+
+Let's say you had a Finding in DefectDojo with a given line number.  You re-scanned your environment and the line number of that vulnerability changed.  You reimport to the same Test.  Here's what will happen during reimport, and deduplication:
+
+* During Reimport, the Finding will not be matched to any Findings that already exist, because the line number is different.  So a new Finding will be created in the Test.
+* After Reimport is complete, the Same-Tool Deduplication algorithm will run.  Same-Tool Deduplication does not consider line number in this configuration, so the new Finding will be labelled as a duplicate.
+
+Reimport can completely discard Findings before they are recorded, so Reimport Deduplication settings should be adjusted with caution.
 
 ## When are duplicates appropriate?
 
@@ -119,3 +135,14 @@ For example, let’s say that you had your Maximum Duplicates field set to ‘1
 ### Applying this setting
 
 Applying **Delete Deduplicate Findings** will begin a deletion process immediately. This setting can be applied on the **System Settings** page. See Enabling Deduplication for more information.
+
+## Troubleshooting Deduplication
+
+Sometimes, Deduplication does not work as expected.  Here are some examples of ways that Deduplication might not be working correctly, along with possible solutions.
+
+| What you see | Most likely cause | What to tune |
+| --- | --- | --- |
+| Reimport closes an old Finding and creates a new one when only the line number changed | Reimport matching uses unstable fields (for example, line number) | <strong>Reimport Deduplication</strong> (prefer stable IDs or stable hash fields) |
+| Multiple Findings are created in the same Test that you believe should be duplicates | Deduplication matching is not configured for that tool or scope | <strong>Same Tool Deduplication</strong> (and consider “Delete Deduplicate Findings” behavior) |
+| Duplicates are created across different tools | Cross-tool matching is disabled or too strict | <strong>Cross Tool Deduplication (Pro only)</strong> (hash-based matching) |
+| Excess duplicates of the same Finding are being created, across Tests | Asset Hierarchy is not set up correctly | [Consider Reimport for continual testing](/triage_findings/finding_deduplication/avoid_excess_duplicates/) |
diff --git a/docs/content/triage_findings/finding_deduplication/avoid_excess_duplicates.md b/docs/content/triage_findings/finding_deduplication/avoid_excess_duplicates.md
@@ -5,7 +5,7 @@ weight: 4
 aliases:
   - /en/working_with_findings/finding_deduplication/avoiding_duplicates_via_reimport
 ---
-One of DefectDojo’s strengths is that the data model can accommodate many different use\-cases and applications. You’ll likely change your approach as you master the software and discover ways to optimize your workflow.
+One of DefectDojo’s strengths is that the data model can accommodate many different use-cases and applications. You’ll likely change your approach as you master the software and discover ways to optimize your workflow.
 
 By default, DefectDojo does not delete any duplicate Findings that are created. Each Finding is considered to be a separate instance of a vulnerability. So in this case, **Duplicate Findings** can be an indicator that a process change is required to your workflow.
 
@@ -46,7 +46,7 @@ DefectDojo has two methods for importing test data to create Findings: **Import*
 
 Each time you import new vulnerability reports into DefectDojo, those reports will be stored in a Test object. A Test object can be created by a user ahead of time to hold a future **Import**. If a user wants to import data without specifying a Test destination, a new Test will be created to store the incoming report.
 
-Tests are flexible objects, and although they can only hold one *kind* of report, they can handle multiple instances of that same report through the **Reimport** method. To learn more about Reimport, see our **[article](/import_data/import_intro/import_vs_reimport/)** on this topic.
+Tests are flexible objects, and although they can only hold one *kind* of report, they can handle multiple instances of that same report through the **Reimport** method. To learn more about Reimport, see our **[article](/import_data/import_intro/reimport/)** on this topic.
 
 
 ## Using Reimport for continual Tests