Skip to content

[QDP] Add SVHN IQP encoding benchmark with PennyLane baseline and QDP pipeline #1186

Open
ryankert01 wants to merge 2 commits intoapache:mainfrom
ryankert01:svhn-iqp
Open

[QDP] Add SVHN IQP encoding benchmark with PennyLane baseline and QDP pipeline #1186
ryankert01 wants to merge 2 commits intoapache:mainfrom
ryankert01:svhn-iqp

Conversation

@ryankert01
Copy link
Copy Markdown
Member

@ryankert01 ryankert01 commented Mar 15, 2026

1. Introduction

QDP (Quantum Data Plane) is a CUDA-accelerated quantum state encoding library that
converts classical feature vectors into quantum state vectors using GPU kernels. This
report compares QDP's one-shot encoding approach against PennyLane's gate-by-gate
encoding for IQP (Instantaneous Quantum Polynomial) circuits in a variational
classification task on the SVHN dataset.

Key insight: PennyLane embeds the IQP encoding circuit (Hadamard-Phase-Hadamard
gates) inside the quantum node, so it re-executes on every forward and backward pass.
QDP encodes all samples once upfront on GPU, then feeds pre-computed state vectors via
StatePrep during training. This eliminates redundant encoding and enables zero-copy
GPU tensor sharing with lightning.gpu.

2. Method

2.1 Task

Binary classification on SVHN (digit 1 vs 7):

  • SVHN (32x32x3) -> Flatten (3072) -> binary filter -> subsample
  • StandardScaler -> PCA to n_qubits dimensions
  • IQP encoding -> variational layers (Rot + ring CNOT) -> expval(PauliZ(0))
  • Square loss, Adam optimizer, batched training

2.2 IQP Encoding

The IQP circuit implements: H^n * Diag(phases) * H^n |0>^n

where phases include single-qubit terms z_i = features[i] and two-qubit terms
z_{ij} = features[i] * features[j] for all pairs i < j. This creates n + n(n-1)/2
parametric gates per sample.

2.3 Four Configurations

Config Encoding Transfer Training Backend
PL-CPU PennyLane IQP gates (CPU, every step) -- default.qubit (CPU)
PL-GPU PennyLane IQP gates (GPU, every step) -- lightning.gpu (GPU)
QDP-CPU QDP CUDA kernel (one-shot) GPU->CPU copy default.qubit (CPU)
QDP-GPU QDP CUDA kernel (one-shot) Zero copy lightning.gpu (GPU)

2.4 Hardware

  • GPU: NVIDIA GeForce RTX 3090 Ti
  • PennyLane 0.44.1, PennyLane-Lightning-GPU 0.44.0
  • QDP 0.2.0 (qumat-qdp, Rust+CUDA)

3. Results

3.1 Accuracy Parity (Experiment 2)

Goal: Prove identical quantum states by showing matching accuracy across all configs.

Fixed parameters: 6 qubits, 200 samples, 200 iterations, batch size 10, 4 layers, seed 42, 1 trial.

Config Test Accuracy Train Time (s) Throughput (samples/s)
PL-CPU 0.6750 126.03 15.9
QDP-CPU 0.6750 127.02 15.7
PL-GPU 0.6750 61.66 32.4
QDP-GPU 0.6750 52.74 37.9

All four configurations produce identical test accuracy (0.6750), confirming that
QDP's CUDA-encoded state vectors match PennyLane's gate-by-gate construction. The CPU
configs (PL-CPU, QDP-CPU) produce exact numerical agreement because both use
default.qubit with deterministic autograd.

QDP encoding time: 9.6 ms to encode all 200 samples (one-shot).

3.2 Qubit Scaling (Experiment 1)

Goal: Show QDP's advantage grows with qubit count.

Fixed parameters: 200 samples, 200 iterations, batch size 10, 4 layers, seed 42, 3 trials.

3.2.1 Training Time (mean of 3 trials, seconds)

Qubits State Dim PL-CPU QDP-CPU PL-GPU QDP-GPU
4 16 89.4 87.1 44.8 36.4
6 64 134.7 114.4 65.2 43.2
8 256 186.5 166.9 80.7 59.4
10 1024 253.4 217.7 100.1 71.1

3.2.2 Speedup Ratios (PL / QDP)

Qubits CPU Speedup (PL-CPU / QDP-CPU) GPU Speedup (PL-GPU / QDP-GPU) Cross Speedup (PL-CPU / QDP-GPU)
4 1.03x 1.23x 2.46x
6 1.18x 1.51x 3.12x
8 1.12x 1.36x 3.14x
10 1.16x 1.41x 3.56x

The GPU speedup (PL-GPU vs QDP-GPU) peaks at 1.51x at 6 qubits and remains
significant across all tested qubit counts. The cross speedup (PL-CPU vs QDP-GPU)
reaches 3.56x at 10 qubits, combining the benefits of one-shot encoding with GPU
training.

3.2.3 QDP One-Shot Encoding Time

Qubits State Dim QDP Encode (ms) Fraction of Total
4 16 10.0 < 0.01%
6 64 9.4 < 0.01%
8 256 10.7 < 0.01%
10 1024 12.2 < 0.01%

QDP's encoding time is essentially constant (~10 ms) regardless of qubit count
for this dataset size, thanks to GPU parallelism. This is negligible compared to
training time (36-253 seconds).

3.2.4 Throughput (samples/sec, mean of 3 trials)

Qubits PL-CPU QDP-CPU PL-GPU QDP-GPU
4 22.4 23.0 44.6 54.9
6 14.8 17.5 30.7 46.3
8 10.7 12.0 24.8 33.7
10 7.9 9.2 20.0 28.1

QDP-GPU consistently delivers the highest throughput at every qubit count.

3.3 Zero-Copy Advantage

The QDP-CPU vs QDP-GPU comparison isolates the effect of keeping encoded tensors on
GPU (zero copy) versus copying them to CPU after encoding:

Qubits QDP-CPU (s) QDP-GPU (s) Zero-Copy Speedup
4 87.1 36.4 2.39x
6 114.4 43.2 2.65x
8 166.9 59.4 2.81x
10 217.7 71.1 3.06x

The zero-copy advantage grows with qubit count (2.39x to 3.06x), reflecting the
increasing benefit of GPU-native state vector operations as the Hilbert space
dimension grows.

3.4 Test Accuracy Consistency

All configurations produce identical test accuracy at the same qubit count (seed = 42):

Qubits PL-CPU QDP-CPU PL-GPU QDP-GPU
4 0.6417 0.6583 0.6583 0.6333
6 0.6667 0.6750 0.6750 0.6750
8 0.6750 0.6750 0.6750 0.6750
10 0.6750 0.6750 0.6750 0.6750

At 6+ qubits, all configs converge to the same accuracy. Minor variations at 4
qubits are due to different circuit structure (QDP uses StatePrep vs PennyLane's
gate-by-gate IQP) and floating-point non-determinism on GPU.

4. Discussion

Why QDP is Faster

  1. One-shot encoding: QDP encodes all samples once (~10 ms) regardless of training
    iterations. PennyLane re-runs n + n(n-1)/2 parametric gates per sample per step.

  2. Simpler training circuit: QDP's circuit uses StatePrep (one operation) instead
    of the full H-D-H IQP gate sequence. This reduces per-step circuit complexity.

  3. Zero-copy GPU path: QDP-GPU keeps encoded tensors on GPU, avoiding GPU->CPU
    data transfer. Combined with lightning.gpu for training, the entire pipeline
    stays on GPU.

Where QDP Matters Most

  • High qubit counts: The IQP circuit has O(n^2) parametric gates. At 10 qubits,
    that's 55 gates removed from every forward pass.
  • Many training iterations: The savings compound — at 200 iterations with batch
    size 10, PennyLane executes the IQP circuit ~4000 times (200 iters x 2 for
    forward/backward x 10 samples). QDP does it zero times during training.
  • GPU-native workflows: The zero-copy path (QDP-GPU) provides the largest speedup
    (up to 3.56x vs PL-CPU).

Limitations

  • At very low qubit counts (4 qubits), the CPU speedup is minimal (1.03x) because
    the IQP circuit is small relative to variational layers.
  • The benchmark uses default.qubit (CPU) and lightning.gpu (GPU) state vector
    simulators. Results on real quantum hardware would differ.
  • Accuracy is limited by the small dataset (200 samples) and binary classification
    task, not by the encoding method.

5. Reproduction

Prerequisites

cd qdp/qdp-python
uv sync --group benchmark

Experiment 2: Accuracy Parity

# PL-CPU
uv run --group benchmark python benchmark/encoding_benchmarks/pennylane_baseline/svhn_iqp.py \
  --n-qubits 6 --n-samples 200 --iters 200 --batch-size 10 --layers 4 \
  --lr 0.01 --optimizer adam --trials 1 --early-stop 0 --seed 42 --backend cpu

# QDP-CPU
uv run --group benchmark python benchmark/encoding_benchmarks/qdp_pipeline/svhn_iqp.py \
  --n-qubits 6 --n-samples 200 --iters 200 --batch-size 10 --layers 4 \
  --lr 0.01 --optimizer adam --trials 1 --early-stop 0 --seed 42 --backend cpu

# PL-GPU
uv run --group benchmark python benchmark/encoding_benchmarks/pennylane_baseline/svhn_iqp.py \
  --n-qubits 6 --n-samples 200 --iters 200 --batch-size 10 --layers 4 \
  --lr 0.01 --optimizer adam --trials 1 --early-stop 0 --seed 42 --backend gpu

# QDP-GPU
uv run --group benchmark python benchmark/encoding_benchmarks/qdp_pipeline/svhn_iqp.py \
  --n-qubits 6 --n-samples 200 --iters 200 --batch-size 10 --layers 4 \
  --lr 0.01 --optimizer adam --trials 1 --early-stop 0 --seed 42 --backend gpu

Experiment 1: Qubit Scaling

# Run all configs for a given qubit count (e.g., 10 qubits):
for script in pennylane_baseline/svhn_iqp.py qdp_pipeline/svhn_iqp.py; do
  for backend in cpu gpu; do
    uv run --group benchmark python benchmark/encoding_benchmarks/$script \
      --n-qubits 10 --n-samples 200 --iters 200 --batch-size 10 --layers 4 \
      --lr 0.01 --optimizer adam --trials 3 --early-stop 0 --seed 42 --backend $backend
  done
done

@ryankert01 ryankert01 changed the title feat: add QDP pipeline for SVHN IQP variational classifier with one-time encoding [QDP] Add SVHN IQP encoding benchmark with PennyLane baseline and QDP pipeline Mar 15, 2026
@ryankert01 ryankert01 marked this pull request as draft March 20, 2026 12:29
@ryankert01
Copy link
Copy Markdown
Member Author

ryankert01 commented Mar 20, 2026

It's not production code rn. will fix them when i have time.

edit: it's ready now.

@ryankert01 ryankert01 marked this pull request as ready for review March 29, 2026 07:58
@ryankert01 ryankert01 force-pushed the svhn-iqp branch 2 times, most recently from 3d00ce7 to edb9e21 Compare March 29, 2026 07:59
@ryankert01 ryankert01 requested a review from rich7420 March 29, 2026 09:08
@ryankert01
Copy link
Copy Markdown
Member Author

Ready for review!

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new SVHN (digit 1 vs 7) IQP-encoding benchmark to compare PennyLane’s gate-by-gate embedding against a QDP one-shot encoding + StatePrep training pipeline, including CPU and GPU training backends.

Changes:

  • Add SVHN IQP benchmark scripts for (1) pure PennyLane baseline and (2) QDP encoding + StatePrep pipeline.
  • Update benchmark dependency group to include newer PennyLane/Lightning GPU (py>=3.11) and SciPy (for SVHN .mat loading).
  • Update benchmark README with SVHN IQP usage instructions and refresh lockfiles.

Reviewed changes

Copilot reviewed 4 out of 6 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
uv.lock Updates locked dependencies and benchmark requirement markers (incl. PennyLane/Lightning, SciPy).
qdp/qdp-python/uv.lock Updates QDP Python workspace lock to include PennyLane 0.44.x + Lightning GPU deps.
qdp/qdp-python/pyproject.toml Adds benchmark deps for PennyLane>=0.44 / Lightning GPU (py>=3.11) and SciPy.
qdp/qdp-python/benchmark/encoding_benchmarks/qdp_pipeline/svhn_iqp.py New QDP pipeline benchmark: one-shot IQP encoding via QDP + StatePrep training (CPU/GPU).
qdp/qdp-python/benchmark/encoding_benchmarks/pennylane_baseline/svhn_iqp.py New baseline benchmark: IQP circuit inside QNode (CPU/GPU).
qdp/qdp-python/benchmark/encoding_benchmarks/README.md Documents how to run the new SVHN IQP benchmarks.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@ryankert01
Copy link
Copy Markdown
Member Author

fixed at f5b91a5

@ryankert01
Copy link
Copy Markdown
Member Author

PTAL @guan404ming

- Changed revision from 3 to 2.
- Added conditional specifications for 'pennylane' and 'pennylane-lightning' for Python 3.11 and above.
- Included 'scipy' with a minimum version of 1.11.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants