Every detail, tuned for real meetings.
Sync Speak is not a wrapper around a generic translation API. Every stage of the pipeline is engineered around the specific failure modes of voice translation in corporate meetings — mic contention, first-syllable clipping, pronoun drift, self-talk loops, and perceived latency.
Neural Voice Activity Detection
WebRTC Neural VAD separates speech from silence in 10 ms frames. A 500 ms pre-buffer (5-frame ring) means the first syllable of short Hindi words like "kya", "toh", "haan" is never clipped.
Hindi + Hinglish STT
Sarvam Saarika v2.5 handles code-mixed speech natively — "deployment ho gaya" is transcribed correctly, not normalised into pure Hindi. A correction table catches common phonetic mis-mappings before they reach the LLM.
5-utterance rolling context
Groq Llama 3.3 70B sees your last five sentences when translating the current one. Pronouns resolve correctly ("use bhej do" → "send it to him"), and running topics maintain continuity across a meeting.
Sentence-pipelined TTS
Sarvam Bulbul v3 synthesises English audio one sentence at a time. Playback of sentence N starts while sentence N+1 is still being generated, so end-to-end latency stays under 1.2 seconds even for long replies.
Virtual-cable routing
VB-Cable's virtual microphone takes the translated audio and feeds it into any application that accepts a mic: Google Meet, Zoom, Teams, Discord, Slack huddles, Whereby, Gather. Zero plugin required in the meeting app.
Self-talk prevention
A _tts_active flag blocks the microphone while Sync Speak is speaking English, so the translated output is never re-captured and re-translated back to Hindi. VAD state resets cleanly between utterances.
Device resilience
If your selected mic is held by Teams or Zoom, Sync Speak falls back to the system default mic. WASAPI is re-initialised before every stream open, eliminating the classic -9985 stale device error on Windows.
Liquid Glass transparency
A native Tauri v2 shell with genuine desktop transparency — the app is a frosted lens over your screen, not a windowed chrome. Minimal CPU, no Electron overhead.
Keys stay on your machine
Sarvam and Groq API keys are entered in the Settings modal and saved to a local config file. Nothing is uploaded to Sync Speak servers. There are no Sync Speak servers.
Conversation history
Every translated utterance is saved locally so you can scroll back through a meeting. History is yours, on disk, clearable at will.
Voice + prosody control
Choose between multiple Bulbul voices and tune speech rate. The translation tone matches the formality of your meeting.
Open, swappable, extensible
The pipeline is three layers: Rust shell, React UI, Python audio sidecar. Swap any model, add a new language, fork the repo — MIT licensed.