Lossy codec for local-field-potential (LFP) recordings from Neuropixels probes. Eight-stage pipeline: bad-channel detection → dephasing → highpass → bad-channel interpolation → CAR → decimation → Cadzow denoising → adaptive SVD + wavelet-packet thresholding. Achieves >100× compression with median RMSE < 25 µV.
pip install lfpackor with uv:
uv add lfpackDevelopment install:
git clone https://github.com/int-brain-lab/lfpack
cd lfpack
uv sync --extra devfrom lfpack import compress_bin_to_h5, LFPackReader
# Decimate, denoise, and compress a raw LFP binary in one call
compress_bin_to_h5('path/to/lf.cbin', 'lf.h5')
# Decode on demand — same interface as spikeglx.Reader
sr = LFPackReader('lf.h5')
traces = sr[0:1000] # (1000, n_channels) float32, volts
geometry = sr.geometry # {'x': ..., 'y': ...} channel positions| Parameter set | ε (SVD) | α (WP) | File size | CR (median) | RMSE median | RMSE p95 |
|---|---|---|---|---|---|---|
| default | 150 | 28 | 19 MB | 97× | 24 µV | 27 µV |
| aggressive | 450 | 96 | 9 MB | 305× | 49 µV | 54 µV |
Top row: original resampled → Cadzow denoised → default compressed → aggressive compressed. Bottom row: residuals (compressed − Cadzow) for default and aggressive. Colormap ±300 µV.
Same layout as above but displayed as CSD (second spatial derivative). Colormap in A/m³.
Seven sequential stages applied to raw NP1/NP2 LFP data (384 ch, 2500 Hz):
raw LFP (2500 Hz)
└─ 1. Bad-channel detection → per-channel labels (dead / noisy / outside brain)
└─ 2. Dephasing → sample-shift correction (NP1 only)
└─ 3. Highpass filter → 2 Hz zero-phase 3rd-order Butterworth
└─ 4. Bad-channel interpolation → neighbours fill dead / noisy channels
└─ 5. CAR → median common-average reference subtracted
└─ 6. Decimation → 250 Hz (FIR anti-aliasing, Q=10)
└─ 7. Cadzow denoising → spatially denoised LFP
└─ 8. SVD + WP → rank-r approximation → sparse coefficient storage
Automatic per-channel quality labels (0 = good, 1 = dead, 2 = noisy, 3 = outside brain)
are assigned via ibldsp.voltage.detect_bad_channels_cbin before any preprocessing.
Sample-shift correction for NP1 probes: each channel is phase-rotated in the frequency
domain by exp(1j × angle × sample_shift) (ibldsp.fourier.fshift) to align all
channels to a common time reference before filtering.
3rd-order zero-phase Butterworth highpass at 2 Hz (scipy.signal.sosfiltfilt) removes
slow DC drifts.
Dead and noisy channels are replaced by a distance-weighted interpolation from their
neighbours (ibldsp.voltage.interpolate_bad_channels), preventing incoherent channels
from inflating the noise-floor estimate and forcing a higher SVD rank.
Median across all good channels is subtracted sample-by-sample on the full-bandwidth
(2500 Hz) signal before decimation. The removed trace is decimated and saved alongside
the compressed file as <stem>_car.npy.
FIR anti-aliasing (scipy.signal.decimate, Q=10) applied in overlapping chunks
(512-sample filter warmup halo each side) to avoid edge transients.
Spatial denoising via the Cadzow algorithm (ibldsp.cadzow.cadzow_denoiser) run in
640-sample chunks with 64-sample halos (processed window = 768 = 3 × 256, FFT-optimal).
Parameters used: rank=5, niter=1, fmax=None, nswx=64, gap_threshold=2.0, ppca_k=2.0
Each 2048-sample chunk is extended by 128-sample guard bands on each side before SVD. Rank r is selected adaptively:
r = #{k : sv[k] > epsilon × sigma_noise}
where sigma_noise is the median of the upper half of the singular-value distribution
(restricted to sv > 1e-4 × sv[0] to exclude dead channels).
The stored quantity is U_scaled = U[:, :r] * sv[:r] (shape nc × r).
Each of the r temporal rows Vh[k, :] is decomposed with a db4 wavelet-packet tree
(level 5). Per-component hard thresholding:
tau_k = alpha × sigma_noise / sv[k]
Larger singular values (dominant spatial modes) receive a lower threshold, preserving more temporal detail.
cr_total = (nc × ns) / (r × nc + n_kept)
cr_wp = (r × ns) / n_kept # WP contribution to Vh rows only
cr_svd = (nc × ns) / (r × (nc + ns)) # SVD-only CR (no WP)
Rather than storing the full (r, n_wp_slots) float32 array and relying on gzip to
compress zero runs, Vh_hat is stored as two contiguous 1-D datasets:
| Dataset | dtype | length | description |
|---|---|---|---|
vh_indices |
int32 | n_kept | flat indices of non-zero coefficients |
vh_values |
float32 | n_kept | corresponding coefficient values |
vh_shape is stored as a group attribute so the dense array can be reconstructed on
read.
The HDF5 shuffle filter is applied to U_scaled before gzip. It reorders bytes by
significance (all MSBs together, then mantissa bytes), improving gzip compression on
dense float32 data.



