Hush

Speak into any input field on your Mac, instantly, invisibly, everywhere. Even on you terminal.

Overview

Hush is a system-wide invisible dictation layer for macOS. Press a hotkey, speak, press it again, and your words appear in whatever field you were typing in, whether that's a browser prompt box, a native app, a terminal, or a design tool. No mode switching, no app-specific plugins, no UI to manage.

Built to solve the irony of the AI era: people type more natural language than ever into prompt fields and comment boxes, yet typing is still the bottleneck. Hush removes it.

Tech

Python · pynput · sounddevice · faster-whisper · pyautogui. A lightweight daemon that runs at login. No server, no network calls, transcription runs locally via faster-whisper and text is injected via simulated Cmd+V.

Solution

Toggle Activation

Ctrl+H once to start recording, once to stop. No holding required, works equally well for a quick phrase or a long prompt.

Universal Injection

Text is pasted via simulated Cmd+V, meaning it works in any macOS app without per-app integration, browser, native, Electron, terminal.

Zero Footprint

No stored transcripts, no accumulated recordings. Audio lives in memory only and is deleted immediately after paste.

The Problem

Modern workflows are increasingly prompt-driven. People type full natural-language sentences into Claude, ChatGPT, Cursor, Notion AI, and dozens of other tools every day. The irony is that natural language is exactly what speech-to-text is best at, but most dictation tools are app-specific, require switching context, or are buried in system menus.

There was no invisible, universal, local-first layer that just worked everywhere without the user thinking about it.

Key Decisions

No focus detectionThe naive approach would be to detect the active field and route the transcript there. Instead I simulate Cmd+V after transcription, because the OS already knows where focus is. This cut complexity in half and made the tool universally compatible without any per-app logic.

Toggle over hold-to-talkHold-to-talk is awkward when composing long prompts, the primary use case. Toggle gives full control without physical strain and removes the risk of accidental cutoffs mid-sentence.

suppress=True on the hotkey listenerWithout this, macOS forwards the keypress to the active app on every repeat cycle, causing a system click sound. suppress=True intercepts and consumes the event before it propagates, a one-line fix that eliminates the problem entirely.

What Broke

Repeated system click soundRoot cause: pynput listens passively by default and doesn't block events, macOS kept forwarding the keypress to the active app. Fix: suppress=True on the keyboard listener.

Accidental cutoffs mid-sentenceHold-to-talk caused accidental cutoffs and felt unnatural for longer inputs. Switching to toggle mode resolved both and aligned better with the real use case of composing multi-sentence prompts.

What I Learned

"OS-level simplicity beats app-level cleverness"I assumed I'd need to detect the active field and route text intelligently. The realization that simulating Cmd+V is sufficient, because the OS already knows where focus is, cut the complexity in half and made the tool more robust.

"Invisible tools live or die on latency"A 3-second transcription delay breaks the illusion entirely. Choosing faster-whisper and targeting sub-2-second turnaround was not a performance optimization, it was a UX requirement.

"Good UX is a higher bar than good UI"With no interface to hide behind, every micro-moment matters, the audio cue, the paste timing, the cleanup of temp files. Designing an invisible tool forced a more rigorous standard of quality than any visual UI would have.