Skip to content

uwmisl/Amino-Scribe

Repository files navigation

AminoScribe

AminoScribe is a Python module for generating simulated nanopore squiggle signals from amino acid sequences. It provides tools for sequence-based signal generation, time warping, noise addition, and signal processing such as filtering, normalization, and downsampling.

Features

  • Generate idealized templates for amino acid sequences.
  • Add time-domain warping and amplitude noise to simulate realistic signals.
  • Apply low-pass Bessel filtering to reduce noise.
  • Normalize signals using min-max scaling.
  • Downsample signals for efficient processing.
  • Fetch protein sequences using UniProt accession numbers.

Installation

Install AminoScribe using pip:

pip install aminoscribe

Usage

Generate a Squiggle Signal

You can generate a simulated squiggle signal from an amino acid sequence or a protein ID:

from aminoscribe.aminoscribe import generate_squiggle

# Generate a squiggle signal from a sequence
signal = generate_squiggle(sequence="MKTLLDLGYTMKTLLLTLVVTMKTLLDLGYTMKTLLLTLVVLLTLVVVTIVCLDLGYTLGYT", 
                           normalize=True, 
                           downsample=True, 
                           downsample_factor=5)

# Generate a squiggle signal from a protein ID
signal = generate_squiggle(protein_id="P12345", 
                           filter_noise=True, 
                           bessel_N=8, 
                           bessel_Wn=0.1)

Generate an Idealized Template

If you only need the idealized template without noise or processing:

sequence = "YYYYYSTSSDGDEEDGDDSTSYYYYYSTSSDGEDDEGDDSTSYYYYYSTSSDGEDEDGDDSTSYYYYYSTSSDGD"
template = generate_squiggle(sequence=sequence, template_only=True)

Fetch Protein Sequence

Retrieve a protein sequence using its UniProt accession number:

from aminoscribe.aminoscribe import get_protein_seq

sequence = get_protein_seq("E2RYF6")

Function Reference

generate_squiggle

Generates a simulated squiggle signal from an amino acid sequence or protein ID.

Parameters:

  • sequence (str, optional): Amino acid sequence.
  • protein_id (str, optional): Protein ID to fetch the sequence.
  • base_template (optional): Base template signal.
  • seed (optional): Random seed for reproducibility.
  • template_only (bool, optional): Return idealized template only.
  • cterm (str, optional): Sequence to append to the C-terminal end.
  • nterm (str, optional): Sequence to prepend to the N-terminal end.
  • filter_noise (bool, optional): Apply low-pass Bessel filter.
  • bessel_N (int, optional): Order of the Bessel filter.
  • bessel_Wn (float, optional): Normalized cutoff frequency.
  • normalize (bool, optional): Apply min-max normalization.
  • norm_cutoff (int, optional): Number of elements for normalization.
  • downsample (bool, optional): Apply linear downsampling.
  • downsample_factor (float, optional): Downsampling factor.

Returns:

  • List of float values representing the processed squiggle signal.

get_protein_seq

Fetches a protein sequence using its UniProt accession number.

Parameters:

  • protein_id (str): UniProt accession number.

Returns:

  • Amino acid sequence as a string.

License

This project is licensed under Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0). See the LICENSE file for details.

Contributing

Source code lives at https://github.com/uwmisl/Amino-Scribe. Please submit a pull request or open an issue for any bugs or feature requests.

Contributors

Contributors

  • Melissa Queen — Lead author and maintainer
  • Daphne Kontogiorgos-Heintz — Computed and incorporated the amino acid value and variance numbers

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors