Skip to content

Add order Hints for Bulk Copy operations #2701

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 31 commits into
base: main
Choose a base branch
from

Conversation

divang
Copy link
Contributor

@divang divang commented Jul 9, 2025

Description

This PR adds support for specifying order hints during Bulk Copy operations in the Microsoft JDBC Driver for SQL Server. Order hints can be used to optimize data loading performance by informing SQL Server about the order of the incoming data, potentially improving index maintenance and query execution plans.

Changes

  • Introduced a new API to allow clients to specify one or more order hints for Bulk Copy operations.
  • Updated the Bulk Copy implementation to pass order hints to the underlying SQL Server command.
  • Added validation and documentation for supported order hint values.
  • Extended relevant tests to cover order hint scenarios.

Motivation

Enabling order hints helps users optimize large data transfers when loading sorted data into SQL Server tables, improving performance in ETL and data warehousing scenarios.

Testing

  • New unit and integration tests have been added to verify order hint handling.
  • Manual testing performed with large datasets to confirm improved performance and correctness.

Notes

  • Only supported on SQL Server versions that accept order hints in Bulk Copy.
  • Invalid or unsupported hints will result in an exception.

Test Code

    public static void demonstrateOrderHintUsage() {
        System.out.println("\n--- Demonstration: ASC/DESC Order Hints Usage ---");
        
        String tableName = "OrderHintDemo_" + System.currentTimeMillis();
        
        try (Connection conn = DriverManager.getConnection(CONNECTION_URL)) {
            
            // Create a table with a clustered index on (id ASC, timestamp DESC)
            try (Statement stmt = conn.createStatement()) {
                stmt.execute("IF OBJECT_ID('" + tableName + "', 'U') IS NOT NULL DROP TABLE " + tableName);
                
                String createSQL = "CREATE TABLE " + tableName + " (" +
                                 "id INT, " +
                                 "timestamp DATETIME, " +
                                 "data NVARCHAR(100), " +
                                 "INDEX CI_" + tableName + " CLUSTERED (id ASC, timestamp DESC)" +
                                 ")";
                stmt.execute(createSQL);
                System.out.println("Created table with clustered index: (id ASC, timestamp DESC)");
            }
            
            // Scenario 1: Data matches clustered index order - OPTIMAL
            System.out.println("\n--- Scenario 1: Data matches clustered index order ---");
            try (SQLServerBulkCopy bulkCopy = new SQLServerBulkCopy(conn)) {
                bulkCopy.setDestinationTableName(tableName);
                
                // Add column mappings
                bulkCopy.addColumnMapping("id", "id");
                bulkCopy.addColumnMapping("timestamp", "timestamp");
                bulkCopy.addColumnMapping("data", "data");
                
                // Add order hints that match the clustered index
                bulkCopy.addColumnOrderHint("id", SQLServerSortOrder.ASCENDING);      // Matches clustered index
                bulkCopy.addColumnOrderHint("timestamp", SQLServerSortOrder.DESCENDING); // Matches clustered index
                
                System.out.println(" Added order hints: id ASC, timestamp DESC (matches clustered index)");
                System.out.println(" This should provide optimal performance as data order matches index order");
                
                // Use sorted test data
                SortedTestData sortedData = new SortedTestData();
                bulkCopy.writeToServer(sortedData);
                
                System.out.println(" Bulk copy completed with matching order hints");
            }
            
            // Clear the table for next test
            try (Statement stmt = conn.createStatement()) {
                stmt.execute("TRUNCATE TABLE " + tableName);
            }
            
            // Scenario 2: Data doesn't match clustered index order - SUBOPTIMAL
            System.out.println("\n--- Scenario 2: Data doesn't match clustered index order ---");
            try (SQLServerBulkCopy bulkCopy = new SQLServerBulkCopy(conn)) {
                bulkCopy.setDestinationTableName(tableName);
                
                // Add column mappings
                bulkCopy.addColumnMapping("id", "id");
                bulkCopy.addColumnMapping("timestamp", "timestamp");
                bulkCopy.addColumnMapping("data", "data");
                
                // Add order hints that DON'T match the clustered index
                bulkCopy.addColumnOrderHint("id", SQLServerSortOrder.DESCENDING);   // Opposite of clustered index
                bulkCopy.addColumnOrderHint("timestamp", SQLServerSortOrder.ASCENDING);  // Opposite of clustered index
                
                System.out.println(" Added order hints: id DESC, timestamp ASC (opposite of clustered index)");
                System.out.println(" This may cause SQL Server to perform additional sorting operations");
                
                // Use reverse sorted test data
                ReverseSortedTestData reverseSortedData = new ReverseSortedTestData();
                bulkCopy.writeToServer(reverseSortedData);
                
                System.out.println(" Bulk copy completed with non-matching order hints");
            }
            
            // Verify data in both scenarios
            verifyData(conn, tableName);
            
        } catch (Exception e) {
            System.out.println("✗ Order hint demonstration failed: " + e.getMessage());
            e.printStackTrace();
        } finally {
            dropTestTable(tableName);
        }
        
    }

Performance result

FINAL 10-MINUTE PERFORMANCE RESULTS
Total iterations: 20
Rows per iteration: 25000
Total rows processed: 1500000

COMPREHENSIVE PERFORMANCE STATISTICS
Baseline (No Hints) : avg=8241.9 ms, min=8042 ms, max=9339 ms, stddev=300.9 ms, samples=20
Optimal (Correct Hints): avg=8173.9 ms, min=7998 ms, max=8746 ms, stddev=196.4 ms, samples=20

PERFORMANCE COMPARISON
Optimal is 0.8% FASTER than baseline

Copy link

codecov bot commented Jul 9, 2025

Codecov Report

Attention: Patch coverage is 79.06977% with 9 lines in your changes missing coverage. Please review.

Project coverage is 51.55%. Comparing base (87f0553) to head (c6508ee).

Files with missing lines Patch % Lines
...om/microsoft/sqlserver/jdbc/SQLServerBulkCopy.java 79.06% 6 Missing and 3 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #2701      +/-   ##
============================================
+ Coverage     51.50%   51.55%   +0.05%     
- Complexity     4050     4064      +14     
============================================
  Files           149      149              
  Lines         34136    34177      +41     
  Branches       5700     5707       +7     
============================================
+ Hits          17581    17620      +39     
- Misses        14076    14086      +10     
+ Partials       2479     2471       -8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@divang divang added this to the 13.1.1 milestone Jul 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants