Skip to content

int-brain-lab/lfpack

Repository files navigation

lfpack — LFP codec for Neuropixels recordings

lfpack logo

Lossy codec for local-field-potential (LFP) recordings from Neuropixels probes. Eight-stage pipeline: bad-channel detection → dephasing → highpass → bad-channel interpolation → CAR → decimation → Cadzow denoising → adaptive SVD + wavelet-packet thresholding. Achieves >100× compression with median RMSE < 25 µV.


Installation

pip install lfpack

or with uv:

uv add lfpack

Development install:

git clone https://github.com/int-brain-lab/lfpack
cd lfpack
uv sync --extra dev

Getting started

from lfpack import compress_bin_to_h5, LFPackReader

# Decimate, denoise, and compress a raw LFP binary in one call
compress_bin_to_h5('path/to/lf.cbin', 'lf.h5')

# Decode on demand — same interface as spikeglx.Reader
sr = LFPackReader('lf.h5')
traces = sr[0:1000]      # (1000, n_channels) float32, volts
geometry = sr.geometry   # {'x': ..., 'y': ...} channel positions

Example — PID dab512bd (NP1, 61 min, 384 ch)

Parameter sets

Parameter set ε (SVD) α (WP) File size CR (median) RMSE median RMSE p95
default 150 28 19 MB 97× 24 µV 27 µV
aggressive 450 96 9 MB 305× 49 µV 54 µV

LFP voltage density — original · Cadzow · default · aggressive

LFP density

Top row: original resampled → Cadzow denoised → default compressed → aggressive compressed. Bottom row: residuals (compressed − Cadzow) for default and aggressive. Colormap ±300 µV.

Current-source density — original · Cadzow · default · aggressive

CSD density

Same layout as above but displayed as CSD (second spatial derivative). Colormap in A/m³.

RMSE distribution and CR vs RMSE scatter

RMSE / CR scatter


LFP compression — methods and notes

Pipeline overview

Seven sequential stages applied to raw NP1/NP2 LFP data (384 ch, 2500 Hz):

raw LFP (2500 Hz)
  └─ 1. Bad-channel detection  → per-channel labels (dead / noisy / outside brain)
  └─ 2. Dephasing              → sample-shift correction (NP1 only)
  └─ 3. Highpass filter        → 2 Hz zero-phase 3rd-order Butterworth
  └─ 4. Bad-channel interpolation → neighbours fill dead / noisy channels
  └─ 5. CAR                    → median common-average reference subtracted
  └─ 6. Decimation             → 250 Hz (FIR anti-aliasing, Q=10)
  └─ 7. Cadzow denoising       → spatially denoised LFP
  └─ 8. SVD + WP               → rank-r approximation → sparse coefficient storage

Stage 1 — Bad-channel detection

Automatic per-channel quality labels (0 = good, 1 = dead, 2 = noisy, 3 = outside brain) are assigned via ibldsp.voltage.detect_bad_channels_cbin before any preprocessing.


Stage 2 — Dephasing

Sample-shift correction for NP1 probes: each channel is phase-rotated in the frequency domain by exp(1j × angle × sample_shift) (ibldsp.fourier.fshift) to align all channels to a common time reference before filtering.


Stage 3 — Highpass filter

3rd-order zero-phase Butterworth highpass at 2 Hz (scipy.signal.sosfiltfilt) removes slow DC drifts.


Stage 4 — Bad-channel interpolation

Dead and noisy channels are replaced by a distance-weighted interpolation from their neighbours (ibldsp.voltage.interpolate_bad_channels), preventing incoherent channels from inflating the noise-floor estimate and forcing a higher SVD rank.


Stage 5 — CAR (common-average reference)

Median across all good channels is subtracted sample-by-sample on the full-bandwidth (2500 Hz) signal before decimation. The removed trace is decimated and saved alongside the compressed file as <stem>_car.npy.


Stage 6 — Decimation (2500 → 250 Hz)

FIR anti-aliasing (scipy.signal.decimate, Q=10) applied in overlapping chunks (512-sample filter warmup halo each side) to avoid edge transients.


Stage 7 — Cadzow denoising

Spatial denoising via the Cadzow algorithm (ibldsp.cadzow.cadzow_denoiser) run in 640-sample chunks with 64-sample halos (processed window = 768 = 3 × 256, FFT-optimal).

Parameters used: rank=5, niter=1, fmax=None, nswx=64, gap_threshold=2.0, ppca_k=2.0


Stage 8 — Adaptive SVD + wavelet-packet thresholding

Adaptive SVD

Each 2048-sample chunk is extended by 128-sample guard bands on each side before SVD. Rank r is selected adaptively:

r = #{k : sv[k] > epsilon × sigma_noise}

where sigma_noise is the median of the upper half of the singular-value distribution (restricted to sv > 1e-4 × sv[0] to exclude dead channels).

The stored quantity is U_scaled = U[:, :r] * sv[:r] (shape nc × r).

Wavelet-packet thresholding

Each of the r temporal rows Vh[k, :] is decomposed with a db4 wavelet-packet tree (level 5). Per-component hard thresholding:

tau_k = alpha × sigma_noise / sv[k]

Larger singular values (dominant spatial modes) receive a lower threshold, preserving more temporal detail.

Theoretical compression-ratio formula

cr_total = (nc × ns) / (r × nc + n_kept)
cr_wp    = (r × ns)  / n_kept              # WP contribution to Vh rows only
cr_svd   = (nc × ns) / (r × (nc + ns))    # SVD-only CR (no WP)

HDF5 storage optimisations

Sparse Vh_hat

Rather than storing the full (r, n_wp_slots) float32 array and relying on gzip to compress zero runs, Vh_hat is stored as two contiguous 1-D datasets:

Dataset dtype length description
vh_indices int32 n_kept flat indices of non-zero coefficients
vh_values float32 n_kept corresponding coefficient values

vh_shape is stored as a group attribute so the dense array can be reconstructed on read.

Shuffle filter on U_scaled

The HDF5 shuffle filter is applied to U_scaled before gzip. It reorders bytes by significance (all MSBs together, then mantissa bytes), improving gzip compression on dense float32 data.

About

LFP compression for Neuropixels recordings

Resources

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors