Atlas Voice

Always-listening voice dictation for Linux. Say "Hey Atlas" to start dictating — your speech is transcribed and typed into the active window. Say "break" to end the session. Both the end session trigger word or phrase and the "wake" word are configurable or trainable.

atlas-voice-demo.mp4

Features

Wake word activation — custom-trained "Hey Atlas" model, hands-free triggering. Train your own custom word or phrase.
GPU-accelerated transcription — Whisper large-v3 via faster-whisper on CUDA
Continuous dictation — keeps recording after each transcription without re-triggering the wake word; detects overlapping speech and seamlessly chains utterances
Spoken punctuation — say "period", "comma", "new paragraph", "open paren", etc. (45+ rules, all configurable)
Word replacements — auto-correct common Whisper mishearings (e.g. "cloud" → "Claude")
Console & GUI typing modes — newlines as \n characters (for terminals and CLI tools) or as Enter keypresses (for text editors/LibreOffice); switchable by voice command
Terminal safety — detects when a terminal emulator is focused and copies to clipboard instead of typing, preventing accidental command execution
System tray control — pause/resume listening, enable/disable (GPU vram management), or quit
Systemd user service — starts on login, restarts on failure, manageable via systemctl --user
External configuration — all settings in a single settings.conf file (INI format), no source editing required

Requirements

Linux (tested on Ubuntu 24.04 / Linux Mint 22.x)
NVIDIA GPU with CUDA support (4GB+ VRAM, 8GB+ recommended)
NVIDIA driver installed (CUDA runtime libraries are bundled)
Python 3.12
PulseAudio
X11 desktop environment with system tray support
~6GB disk space (2.5GB venv + 3GB Whisper model)

Installation

From .deb package (recommended)

Download the latest .deb from Releases and install:

sudo apt install ./atlas-voice_2.1.1.deb

The installer will:

Set up a Python virtual environment with all dependencies (including CUDA runtime)
Download the Whisper large-v3 model (~3GB) from huggingface.co
Enable and start a systemd user service

The service starts automatically on login. To manage it:

systemctl --user status atlas-voice
systemctl --user restart atlas-voice
systemctl --user stop atlas-voice
journalctl --user -u atlas-voice -f   # live logs

From source

git clone https://github.com/briankelley/atlas-voice.git
cd atlas-voice

# Create venv with system site-packages (required for GTK/gi bindings)
python3 -m venv venv --system-site-packages
source venv/bin/activate

# Install dependencies
pip install numpy sounddevice faster-whisper openwakeword
pip install nvidia-cublas-cu12 nvidia-cudnn-cu12

# Run (Whisper model auto-downloads on first run)
python main.py

Usage

Action	How
Start dictating	Say "Hey Atlas"
End session	Say "break" (types remaining text, presses Enter)
Pause / Resume	Tray menu → Pause / Resume
Unload GPU	Tray menu → Enable / Disable (GPU)
Quit	Tray menu → Quit Atlas
Switch typing mode	Say "switch to console" or "switch to gui"

Continuous Dictation

After your first utterance is transcribed, Atlas stays in recording mode — just keep talking. Each pause triggers a transcription and the result is typed out, then Atlas immediately listens for more. Say "break" when you're done to end the session and press Enter. (Initially developed to use inside console sessions where "enter" was an expected key press.)

Typing Modes

Mode	Newlines sent as	Best for
`console` (default)	`\n` character	Terminals, CLI tools, chat apps
`gui`	Enter keypress	LibreOffice, text editors, form fields

Switch modes on the fly by saying "switch to console" or "switch to gui", or set the default in settings.conf.

Spoken Punctuation

All rules are configurable in settings.conf under [spoken_punctuation]. Defaults include:

Say	Types	Say	Types
"period"	`.`	"open paren"	`(`
"comma"	`,`	"close paren"	`)`
"question mark"	`?`	"open bracket"	`[`
"exclamation point"	`!`	"close bracket"	`]`
"colon"	`:`	"open brace"	`{`
"semicolon"	`;`	"close brace"	`}`
"dash"	`-`	"open quote"	`"`
"new line"	newline	"close quote"	`"`
"new paragraph"	double newline	"apostrophe"	`'`
"ellipsis"	`...`	"ampersand"	`&`
"underscore"	`_`	"asterisk"	`*`
"at sign"	`@`	"hashtag"	`#`
"forward slash"	`/`	"backslash"	`\`
"equals"	`=`	"plus"	`+`
"dollar sign"	`$`

Configuration

All settings live in settings.conf in the installation directory. Edit and restart the service to apply:

# Edit (installed location)
nano /usr/local/lib/atlas-voice/settings.conf

# Restart to pick up changes
systemctl --user restart atlas-voice

Wake Word Detection

[wake_word]
# Detection confidence (0.0–1.0). Lower = more sensitive, more false positives.
threshold = 0.35

Audio Capture

[audio]
silence_threshold = 500     # Amplitude below which audio is "silence"
silence_duration = 2.0      # Seconds of silence before ending capture
max_record_duration = 60    # Hard cap per recording chunk (seconds)
buffer_seconds = 120        # Ring buffer history (seconds of audio kept in memory)

Transcription

[whisper]
device = cuda               # "cuda" or "cpu"
compute_type = float16      # "float16", "int8", or "float32"

Behavior

[behavior]
auto_type = true            # Type transcribed text into the active window
beep_on_wake = true         # Play a sound when wake word detected
debug_mode = false          # Verbose logging (audio health, wake scores, state transitions)
log_transcripts = false     # Log all dictated text to stdout/journald (privacy-sensitive!)
tray_enabled = true         # Show system tray icon
typing_mode = console       # "console" or "gui" (see Typing Modes above)
switch_to_console_phrase = switch to console
switch_to_gui_phrase = switch to gui

Session Control

[session]
end_phrase = break          # Say this word to end session and press Enter

Word Replacements

Correct common Whisper mishearings. Case-sensitive — add variants as needed:

[word_replacements]
cloud = Claude
clawed = Claude
pseudo = sudo
no help = nohup
Brake = break

Spoken Punctuation

Add custom phrase-to-symbol mappings. Multi-word phrases are matched first:

[spoken_punctuation]
new paragraph = \n\n
exclamation point = !
period = .

Architecture

                         ┌──────────────┐
                         │   main.py    │
                         │  (GTK loop)  │
                         └──────┬───────┘
                                │
              ┌─────────────────┼─────────────────┐
              │                 │                 │
       ┌──────▼──────┐  ┌──────▼──────┐  ┌───────▼───────┐
       │   TrayIcon  │  │   Mailbox   │  │ State Worker  │
       │ (GTK thread)│  │(thread-safe)│  │   (thread)    │
       └─────────────┘  └─────────────┘  └───────┬───────┘
                                                  │
                    ┌─────────────────────────────┤
                    │         State Machine       │
                    │                             │
              ┌─────▼─────┐                ┌─────▼─────┐
              │ disabled  │◄───────────────│  paused   │
              └─────┬─────┘                └─────▲─────┘
                    │ (load models)               │
              ┌─────▼─────┐                       │
              │ listening │───────────────────────┘
              │(wake word)│
              └─────┬─────┘
                    │ (wake detected)
              ┌─────▼─────┐
              │ recording │◄─────────────┐
              │ (capture) │              │
              └─────┬─────┘              │
                    │ (silence)          │ (continuous
              ┌─────▼──────┐             │  dictation)
              │transcribing│─────────────┘
              │ (Whisper)  │
              └────────────┘
                    │
         ┌──────────┼──────────┐
         ▼          ▼          ▼
    ┌─────────┐ ┌───────┐ ┌────────┐
    │xdotool  │ │xclip  │ │paplay  │
    │(typing) │ │(clip) │ │(beep)  │
    └─────────┘ └───────┘ └────────┘

Module Map

Module	Responsibility
`main.py`	Entry point, signal handling, GTK main loop, state dispatch
`config.py`	Load and parse `settings.conf` with typed defaults
`context.py`	Shared state container, model load/unload, GPU memory cleanup
`mailbox.py`	Thread-safe request passing between GTK and worker threads
`audio_buffer.py`	Continuous audio capture, ring buffer, chunk queue
`tray.py`	System tray icon — renders icons, posts user actions to mailbox
`logging_utils.py`	Timestamped debug/info/error logging
`text_processing.py`	Spoken punctuation and word replacement pipeline
`text_output.py`	xdotool typing, xclip clipboard, terminal detection
`state_disabled.py`	Models unloaded, GPU free — waits for enable
`state_paused.py`	Models loaded, not listening — waits for resume
`state_listening.py`	Wake word detection loop with audio health checks
`state_recording.py`	Audio capture with silence detection and VAD mode
`state_transcribing.py`	Whisper inference, text output, continuous dictation

Training Your Own Wake Word

Want to use a different wake word? See atlas-voice-training for the dockerized training pipeline.

License

CC BY-NC 4.0

Acknowledgments

OpenWakeWord — wake word detection engine
faster-whisper — CTranslate2-based Whisper inference
Whisper — OpenAI's speech recognition model

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Atlas Voice

Features

Requirements

Installation

From .deb package (recommended)

From source

Usage

Continuous Dictation

Typing Modes

Spoken Punctuation

Configuration

Wake Word Detection

Audio Capture

Transcription

Behavior

Session Control

Word Replacements

Spoken Punctuation

Architecture

Module Map

Training Your Own Wake Word

License

Acknowledgments

About

Uh oh!

Releases 6

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
icons		icons
models/openwakeword		models/openwakeword
packaging		packaging
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
SPEC.md		SPEC.md
audio_buffer.py		audio_buffer.py
config.py		config.py
context.py		context.py
logging_utils.py		logging_utils.py
mailbox.py		mailbox.py
main.py		main.py
pytest.ini		pytest.ini
settings.conf		settings.conf
state_disabled.py		state_disabled.py
state_listening.py		state_listening.py
state_paused.py		state_paused.py
state_recording.py		state_recording.py
state_transcribing.py		state_transcribing.py
test_audio_buffer.py		test_audio_buffer.py
test_integration.py		test_integration.py
text_output.py		text_output.py
text_processing.py		text_processing.py
tray.py		tray.py

Folders and files

Latest commit

History

Repository files navigation

Atlas Voice

Features

Requirements

Installation

From .deb package (recommended)

From source

Usage

Continuous Dictation

Typing Modes

Spoken Punctuation

Configuration

Wake Word Detection

Audio Capture

Transcription

Behavior

Session Control

Word Replacements

Spoken Punctuation

Architecture

Module Map

Training Your Own Wake Word

License

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 6

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages