This repository implements and compares 26+ machine learning architectures for ECG classification, including deep learning, state space models, advanced transformers, neuroevolution, and probabilistic/statistical approaches:
Advanced Transformers:
- Longformer (Beltagy et al., 2020) - Efficient transformer with O(n) sliding window attention
- Mixture of Experts (MoE) (Shazeer et al., 2017) - Sparse expert routing with 8 experts
- Big Bird (Zaheer et al., 2020) - Sparse attention (global+window+random)
- Infinite Transformer - 3 variants: Memorizing, Infini, Transformer-XL
- Stacked Transformer - Deep architecture (12-24 layers) with layer scaling
State Space Models: 6. MAMBA (Gu & Dao, 2023) - Selective state space model with O(n) complexity 7. BAMBA - Bidirectional MAMBA for enhanced temporal context
Neuroevolution: 8. HyperNEAT (Stanley et al., 2009) - CPPN-based evolutionary architecture 9. Super-NEAT - Advanced NEAT with speciation and novelty search
Differential Equations: 10. Neural ODE (Chen et al., 2018) - Continuous-depth networks (Euler, RK4, Dopri5) 11. Neural PDE - 3 formulations: Heat, Wave, Reaction-Diffusion
Deep Learning Models:
- Feedforward Neural Network (Lloyd et al., 2001) - A feedforward neural network from scratch using NumPy
- Transformer-based Model (Ikram et al., 2025) - Transformer architecture for ECG classification
- Three-Stage Hierarchical Transformer (3stageFormer) (Tang et al., 2025) - Multi-scale hierarchical transformer
- 1D Convolutional Neural Network (CNN) - Local pattern extraction using convolution
- Long Short-Term Memory (LSTM) - Sequential modeling with recurrent connections
- Hopfield Network (ETASR, 2013) - Energy-based associative memory for pattern recognition
- Variational Autoencoder (VAE) (van de Leur et al., 2022) - Explainable ECG classification using latent factors
- Liquid Time-Constant Network (LTC) (Hasani et al., 2020) - Continuous-time neural ODE with adaptive time constants
Probabilistic and Statistical Models: 9. Hidden Markov Model (HMM) - Probabilistic sequence modeling with hidden states 10. Hierarchical Hidden Markov Model (Hierarchical HMM) - Multi-level temporal structure modeling 11. Dynamic Bayesian Network (DBN) - Temporal dependency modeling with Bayesian networks 12. Markov Decision Process (MDP) - Sequential decision-making framework for classification 13. Partially Observable MDP (PO-MDP) - MDP with hidden state information 14. Markov Random Field (MRF) - Spatial-temporal dependency modeling 15. Granger Causality - Causal relationship analysis for time series classification
The feedforward neural network implementation is designed for ECG analysis and heart disease prediction tasks, based on research published in Circulation (2001) on detecting ischemia in electrocardiograms using artificial neural networks.
- Multi-layer perceptron: Configurable architecture
- Backpropagation: Gradient descent with mini-batch support
- Multiple Activation Functions: Sigmoid, tanh, and ReLU
- Training Features:
- Early stopping
- Validation monitoring
- Training history tracking
- Loss and accuracy visualization
- Multi-head self-attention: Captures temporal dependencies
- Positional encoding: Preserves temporal information
- End-to-end learning: Directly from raw ECG signals
- Multi-scale processing: Three stages at different temporal resolutions
- Hierarchical feature extraction: Captures both local and global patterns
- Feature fusion: Combines multi-scale representations for classification
- Local pattern extraction: Convolutional kernels detect morphological features
- Translation invariance: Recognizes patterns regardless of position
- Efficiency: Fast training and inference with good accuracy
- Sequential modeling: Processes signals step-by-step with memory
- Bidirectional context: Considers both past and future information
- Gating mechanisms: Selectively remembers important information
- Associative memory: Stores and recalls patterns through energy minimization
- Noise robustness: Effective at retrieving patterns from noisy or incomplete inputs
- Pattern completion: Can reconstruct missing or corrupted signal segments
- Energy-based learning: Uses energy function to converge to stable states
- Explainable latent factors: Compresses ECG signals into 21 interpretable factors (FactorECG approach)
- Dual purpose: Can be used for both reconstruction and classification
- Generative capability: Can generate new ECG signals by sampling from latent space
- Clinical interpretability: Latent factors can be associated with physiologically meaningful processes
- Continuous-time dynamics: Models ECG signals as continuous-time processes using neural ODEs
- Adaptive time constants: Learns time constants that adapt to input patterns
- Temporal flexibility: Captures both fast and slow temporal patterns
- Neural ODE integration: Uses differential equations for state evolution
Hidden Markov Model (HMM)
- Probabilistic sequence modeling: Models ECG signals as sequences of hidden states
- State transitions: Captures temporal dependencies through state transition probabilities
- Observation modeling: Maps hidden states to observable ECG features
- Efficient inference: Uses Viterbi algorithm for optimal state sequences
Hierarchical Hidden Markov Model (Hierarchical HMM)
- Multi-level structure: Models ECG at multiple temporal scales
- Hierarchical states: Super-states and sub-states for complex pattern recognition
- Multi-scale analysis: Captures both short-term and long-term patterns
- Enhanced modeling: More expressive than standard HMMs
- Temporal Bayesian networks: Extends Bayesian networks to model temporal dependencies
- Graphical model: Represents conditional dependencies between variables over time
- Uncertainty quantification: Provides probabilistic predictions with confidence estimates
- Structural learning: Can learn network structure from data
- Sequential decision-making: Models classification as a decision process
- State-action framework: Learns optimal actions (classifications) for each state
- Reward-based learning: Uses Q-learning to optimize classification decisions
- Policy optimization: Learns optimal classification policies
- Hidden state modeling: Handles cases where true cardiac state is not directly observable
- Belief states: Maintains probability distributions over hidden states
- Observation model: Maps observations to hidden states
- Robust classification: Effective when state information is incomplete
- Spatial-temporal dependencies: Models dependencies between time points and features
- Undirected graphical model: Captures pairwise and higher-order dependencies
- Energy-based: Uses energy functions for pattern recognition
- Inference: Belief propagation for marginal probabilities
- Causal analysis: Identifies causal relationships between features and time points
- Temporal causality: Determines if one time series helps predict another
- Feature selection: Uses causal relationships as features for classification
- Interpretability: Provides insights into causal mechanisms in ECG signals
Complete deployment stack added:
- β REST API - FastAPI server with auto-docs, batch predictions, health checks
- β Docker - Multi-stage Dockerfile, Docker Compose, Nginx reverse proxy
- β Model Export - ONNX, TorchScript, quantization for cross-platform deployment
- β Comprehensive Metrics - ROC-AUC, confusion matrices, PR curves, 15+ metrics
- β Documentation - 4 new comprehensive guides
Quick Links:
- π New Models Guide - Complete guide to 11 new models
- β‘ Quick Start - Get started in 5 minutes
- π³ Deployment Guide - Production deployment
- π Implementation Summary - Technical details
pip install -r requirements.txtNew dependencies added:
- FastAPI, Uvicorn, Pydantic (for API)
- ONNX, ONNX Runtime (for model export)
- SciPy (for statistical tests)
from neural_network import NeuralNetwork
import numpy as np
# Prepare your data
# X should be shape (n_samples, n_features)
# y should be shape (n_samples, 1) with binary labels (0 or 1)
# Normalize features
X_train = (X_train - X_train.mean(axis=0)) / X_train.std(axis=0)
# Initialize network
nn = NeuralNetwork(
input_size=10, # Number of input features
hidden_layers=[16, 8], # Two hidden layers with 16 and 8 neurons
output_size=1, # Binary classification
activation='sigmoid', # or 'tanh', 'relu'
learning_rate=0.01
)
# Train the network
history = nn.train(
X_train, y_train,
X_val, y_val,
epochs=1000,
batch_size=32,
early_stopping=True,
patience=20
)
# Make predictions
predictions = nn.predict(X_test)
probabilities = nn.predict_proba(X_test)
# Evaluate accuracy
accuracy = nn.compute_accuracy(y_test, predictions)python neural_network.py # Feedforward NN
python transformer_ecg.py # Transformer
python three_stage_former.py # 3stageFormer
python cnn_lstm_ecg.py # CNN and LSTM
python hopfield_ecg.py # Hopfield Network
python vae_ecg.py # VAE
python ltc_ecg.py # LTC
python hmm_ecg.py # HMM
python dbn_ecg.py # DBN
python mdp_ecg.py # MDP / PO-MDP
python mrf_ecg.py # MRF
python granger_ecg.py # Granger Causality# Efficient Transformers
python longformer_ecg.py # Longformer (O(n) complexity)
python moe_transformer_ecg.py # Mixture of Experts
python bigbird_ecg.py # Big Bird (sparse attention)
# State Space Models
python mamba_ecg.py # MAMBA (fastest!)
python bamba_ecg.py # Bidirectional MAMBA
# Memory-Augmented & Deep
python infinite_transformer_ecg.py # Infinite memory (3 variants)
python stacked_transformer_ecg.py # Deep transformer (12-24 layers)
# Neuroevolution
python hyperneat_ecg.py # HyperNEAT
python superneat_ecg.py # Super-NEAT
# Differential Equations
python neural_ode_ecg.py # Neural ODE (3 solvers)
python neural_pde_ecg.py # Neural PDE (3 formulations)To compare all 26+ models:
python benchmark.pyThis will:
- Generate a synthetic ECG dataset
- Train all 26+ models (15 original + 11 new)
- Evaluate performance with comprehensive metrics
- Generate comparison plots
- Save results to
benchmark_results.json
See BENCHMARK_README.md for detailed benchmarking instructions.
docker-compose up -d
# API available at http://localhost:8000
# Docs at http://localhost:8000/docspython api_server.pyimport requests
response = requests.post(
'http://localhost:8000/predict',
json={'signal': ecg_data.tolist(), 'sampling_rate': 250}
)
result = response.json()
print(f"Prediction: {result['class_name']}")
print(f"Confidence: {result['confidence']:.2%}")from model_export import export_model_wrapper
# Export to ONNX, TorchScript, Quantized
export_model_wrapper(
model=trained_model,
model_name='my_ecg_model',
input_shape=(1, 1, 1000),
output_dir='./exports'
)from evaluation_metrics import ComprehensiveEvaluator
evaluator = ComprehensiveEvaluator(num_classes=5)
metrics = evaluator.evaluate_model(model, test_loader)
# Generate visualizations
evaluator.plot_confusion_matrix(save_path='confusion.png')
evaluator.plot_roc_curves(save_path='roc.png')
evaluator.plot_metrics_summary(metrics, save_path='summary.png')
# Generate report
report = evaluator.generate_report(metrics, 'MyModel')
print(report)See DEPLOYMENT_GUIDE.md for complete deployment instructions.
The default architecture follows a typical pattern for medical classification:
- Input Layer: Number of features (e.g., ECG features, patient demographics)
- Hidden Layers: Configurable (default: 16 β 8 neurons)
- Output Layer: Single neuron with sigmoid activation for binary classification
nn = NeuralNetwork(
input_size=20, # Your feature count
hidden_layers=[32, 16, 8], # Add more layers
output_size=1,
activation='relu', # Try different activations
learning_rate=0.001 # Adjust learning rate
)epochs: Number of training iterationsbatch_size: Mini-batch size (None for full batch)early_stopping: Stop training if validation loss doesn't improvepatience: Number of epochs to wait before early stopping
-
Input (X): NumPy array of shape
(n_samples, n_features)- Features should be normalized (zero mean, unit variance)
- Example: ECG features, heart rate variability, patient demographics
-
Labels (y): NumPy array of shape
(n_samples, 1)- Binary labels: 0 (negative) or 1 (positive)
- Example: 0 = no ischemia, 1 = ischemia detected
- Weight Initialization: Xavier/Glorot initialization for better convergence
- Loss Function: Binary cross-entropy
- Optimization: Gradient descent with backpropagation
- Activation: Sigmoid for output layer, configurable for hidden layers
This implementation is educational and demonstrates neural network fundamentals. For production use with real ECG data, consider:
- Proper feature engineering from raw ECG signals
- Data augmentation techniques
- Cross-validation for robust evaluation
- Hyperparameter tuning
- Integration with medical imaging/ECG processing libraries
| Model | Architecture | Input | Parameters | Training Speed | Best For |
|---|---|---|---|---|---|
| Feedforward NN | Feature-based MLP | Statistical features | Few (100s-1000s) | Fastest | Real-time, edge devices |
| Transformer | Single-scale Transformer | Raw signals | Many (100Ks) | Moderate | High-accuracy, research |
| Three-Stage Former | Multi-scale Hierarchical | Raw signals (3 resolutions) | Many (100Ks+) | Slowest | High-accuracy, multi-scale patterns |
| 1D CNN | Convolutional | Raw signals | Moderate (10Ks-100Ks) | Fast | Local patterns, efficiency |
| LSTM | Recurrent | Raw signals | Moderate (10Ks-100Ks) | Moderate | Sequential patterns, rhythm analysis |
| Hopfield Network | Energy-based Associative Memory | Raw signals | Moderate (10Ks-100Ks) | Moderate | Pattern completion, noise robustness |
| VAE | Variational Autoencoder | Raw signals | Moderate (10Ks-100Ks) | Moderate | Explainable AI, interpretable factors |
| LTC | Continuous-time Neural ODE | Raw signals | Moderate (10Ks-100Ks) | Moderate | Adaptive temporal dynamics, continuous-time modeling |
| HMM | Hidden Markov Model | Raw signals (discretized) | Few (1Ks) | Fast | Probabilistic sequence modeling |
| Hierarchical HMM | Multi-level HMM | Raw signals (discretized) | Few (1.5Ks) | Fast | Multi-scale temporal patterns |
| DBN | Dynamic Bayesian Network | Raw signals | Moderate (50Ks) | Moderate | Temporal dependencies, uncertainty |
| MDP | Markov Decision Process | Raw signals | Few (5Ks) | Moderate | Sequential decision-making |
| PO-MDP | Partially Observable MDP | Raw signals | Moderate (8Ks) | Moderate | Hidden state modeling |
| MRF | Markov Random Field | Raw signals | Moderate (40Ks) | Moderate | Spatial-temporal dependencies |
| Granger Causality | Causal Analysis | Raw signals | Moderate (30Ks) | Moderate | Causal relationship discovery |
| Model | Architecture | Input | Parameters | Complexity | Best For |
|---|---|---|---|---|---|
| Longformer | Sliding Window Attention | Raw signals | Moderate (500Ks) | O(n) Linear | Long sequences, efficiency |
| MoE Transformer | Mixture of 8 Experts | Raw signals | Large (1M+) | O(nΒ²) sparse | Scalability, multi-task |
| Big Bird | Sparse Attention | Raw signals | Moderate (400Ks) | O(n) Linear | Memory efficiency |
| MAMBA | Selective SSM | Raw signals | Moderate (300Ks) | O(n) Linear | Speed, efficiency |
| BAMBA | Bidirectional SSM | Raw signals | Moderate (600Ks) | O(n) Linear | Context modeling |
| Infinite Transformer | Memory-Augmented (3 variants) | Raw signals | Moderate (500Ks) | O(n) | Infinite context |
| Stacked Transformer | Deep (12-24 layers) | Raw signals | Large (2M+) | O(nΒ²) | Maximum accuracy |
| HyperNEAT | CPPN Evolution | Statistical features | Variable | Variable | Architecture search |
| Super-NEAT | Advanced Evolution | Statistical features | Variable | Variable | Topology optimization |
| Neural ODE | Continuous-Depth | Raw signals | Moderate (400Ks) | O(n) | Continuous-time |
| Neural PDE | PDE-based (3 types) | Raw signals | Moderate (300Ks) | O(n) | Physical modeling |
All eight models share common deep learning foundations:
- End-to-end learning: All except FFNN process raw ECG signals directly
- Multi-layer architectures: All use multiple layers of non-linear transformations
- Gradient-based optimization: All trained with backpropagation
- Regularization: All employ dropout or similar techniques
- Classification capability: All can perform multi-class ECG classification
- FFNN: No temporal modeling (feature-based)
- Transformer: Global attention across entire sequence
- 3stageFormer: Multi-scale attention at three resolutions
- CNN: Local convolutional filters with translation invariance
- LSTM: Sequential processing with explicit memory gates
- Hopfield: Energy-based associative memory
- VAE: Latent factor representation with reconstruction
- LTC: Continuous-time dynamics with adaptive time constants (neural ODE)
- FFNN: Requires hand-crafted statistical features (mean, std, FFT, etc.)
- All Others: Process raw ECG signals directly (1000 timesteps)
- FFNN: Manual feature extraction required
- All Others: Automatic feature learning from raw signals
- Single-scale: FFNN, Transformer, CNN, LSTM, Hopfield, VAE, LTC
- Multi-scale: Only 3stageFormer processes at multiple temporal resolutions
- Discriminative: FFNN, Transformer, 3stageFormer, CNN, LSTM, Hopfield
- Generative: VAE (can reconstruct and generate signals)
- 3stageFormer: Highest accuracy (multi-scale hierarchical processing)
- Transformer: Excellent accuracy (global attention)
- LTC, CNN, LSTM, VAE, Hopfield: Competitive accuracy with different strengths
- FFNN: Good accuracy (limited by feature engineering)
- FFNN: Fastest training and inference
- CNN: Fast with good accuracy-efficiency balance
- LSTM, Hopfield, VAE, LTC: Moderate speed
- Transformer: Moderate speed, higher accuracy
- 3stageFormer: Slowest but highest accuracy
| Model | Key Strengths | Key Weaknesses |
|---|---|---|
| FFNN | Fastest, simplest, low memory | Requires features, no temporal modeling |
| Transformer | High accuracy, global attention | Many parameters, slower training |
| 3stageFormer | Best accuracy, multi-scale | Most parameters, slowest |
| CNN | Good balance, local patterns | Limited long-range dependencies |
| LSTM | Sequential modeling, interpretable | Sequential processing, moderate speed |
| Hopfield | Noise robust, pattern completion | Limited capacity, iterative updates |
| VAE | Explainable, generative, dual purpose | Blurry reconstructions, training complexity |
| LTC | Adaptive time constants, continuous-time dynamics | ODE solver overhead, moderate complexity |
- Choose FFNN if: Real-time constraints, edge devices, well-understood features
- Choose Transformer if: High accuracy needed, single-scale patterns sufficient, research setting
- Choose 3stageFormer if: Highest accuracy needed, multi-scale patterns, abundant resources
- Choose CNN if: Balance of accuracy and efficiency, local morphological features important
- Choose LSTM if: Sequential patterns critical, rhythm analysis, interpretable processing
- Choose Hopfield if: Noisy data, pattern completion needed, associative memory beneficial
- Choose VAE if: Explainability required, clinical interpretability, generative capabilities needed
- Choose LTC if: Continuous-time modeling needed, adaptive temporal dynamics, neural ODE benefits
- Accuracy vs. Speed: Higher accuracy models (3stageFormer, Transformer) are slower
- Complexity vs. Simplicity: More powerful models are more complex to implement and train
- Feature Engineering vs. End-to-end: FFNN requires features, others learn automatically
- Single-scale vs. Multi-scale: 3stageFormer unique in multi-scale processing
- Discriminative vs. Generative: VAE only model with generative capabilities
- Explainability vs. Performance: VAE offers highest explainability, 3stageFormer offers best performance
- Noise Robustness: Hopfield excels, others rely on learned representations
-
Lloyd, M. D., et al. (2001). "Detection of Ischemia in the Electrocardiogram Using Artificial Neural Networks." Circulation, 103(22), 2711-2716.
-
Ikram, Sunnia, et al. (2025). "Transformer-based ECG classification for early detection of cardiac arrhythmias." Frontiers in Medicine, 12, 1600855.
-
Tang, Xiaoya, et al. (2024). "Hierarchical Transformer for Electrocardiogram Diagnosis." arXiv preprint arXiv:2411.00755.
-
"Electrocardiogram (ECG) Signal Modeling and Noise Reduction Using Hopfield Neural Networks." Engineering, Technology & Applied Science Research (ETASR), Vol. 3, No. 1, 2013.
-
van de Leur, Rutger R., et al. (2022). "Improving explainability of deep neural network-based electrocardiogram interpretation using variational auto-encoders." European Heart Journal - Digital Health, 3(3), 2022. DOI: 10.1093/ehjdh/ztac038.
-
Hasani, Ramin, et al. (2020). "Liquid Time-Constant Networks." arXiv preprint arXiv:2006.04439. GitHub
-
11 State-of-the-Art Models
- Efficient Transformers: Longformer, MoE, Big Bird
- State Space Models: MAMBA, BAMBA
- Deep Architecture: Stacked Transformer (12-24 layers)
- Memory-Augmented: Infinite Transformer (3 variants)
- Neuroevolution: HyperNEAT, Super-NEAT
- Differential Equations: Neural ODE, Neural PDE
-
Production Infrastructure
- FastAPI REST server with auto-documentation
- Docker deployment (Dockerfile + docker-compose)
- Nginx reverse proxy with load balancing
- Model export (ONNX, TorchScript, Quantization)
-
Comprehensive Evaluation
- 15+ metrics (ROC-AUC, confusion matrix, PR curves, etc.)
- Statistical significance tests
- Computational profiling
- Beautiful visualizations
-
Complete Documentation
- NEW_MODELS_README.md - All 11 new models detailed
- QUICK_START.md - Get started in 5 minutes
- DEPLOYMENT_GUIDE.md - Production deployment
- IMPLEMENTATION_SUMMARY.md - Technical details
- π New Models Guide - Comprehensive model documentation
- β‘ Quick Start - 5-minute setup and usage
- π³ Deployment Guide - Docker, K8s, Cloud deployment
- π Implementation Summary - Technical deep dive
- π§ͺ Test Report - Validation results
- Total Models: 26+
- Model Categories: 6 (Transformers, SSMs, Evolution, ODEs/PDEs, Probabilistic, Deep Learning)
- Lines of Code: ~20,000+
- Documentation Pages: 18
- API Endpoints: 7
- Export Formats: 3
- Deployment Options: 4+
This implementation is provided for educational and research purposes.