🖥️

Desktop recommended

ASL Flow uses your webcam and a computer vision model that works best on a desktop or laptop with a larger screen.

← Back to demos

ASL Flow

Built for Wispr Flow

Making voice dictation accessible through sign language, using only a webcam.

Demo by Armaan Kazi

← Back to Demos

What Wispr does today

Wispr Flow's pipeline begins at the microphone. It has no input layer for users who communicate through sign, which means one of the most powerful productivity tools built in years is inaccessible to an entire population by default.

Today

Mic input→Wispr AI→Text

Requires spoken audio

With ASL Flow

Webcam→CV Pipeline→Audio→Wispr AI→Text

Works with sign language

Stage

Today

ASL Flow

Input

Microphone audio

Webcam video feed

Model

ASR (Whisper / etc.)

CV hand-pose + sign classifier

Output

Text transcription

Hardware

Microphone required

Standard laptop webcam

Access

Hearing users only

Deaf & hard-of-hearing users

What this demo adds

Wispr Flow turns speech into polished text across every app, but it requires a voice. For the 30+ million deaf and hard-of-hearing Americans, that's a wall. ASL Flow is a concept integration that removes it: a computer vision pipeline reads American Sign Language from a standard webcam, converts it to synthesized audio, and passes it into Wispr Flow, making every text field on your computer accessible through sign language, with no specialized hardware.

This is a concept demo, not a finished product. The hand recognition model works with ASL fingerspelling (individual letters A–Z) only, not full ASL vocabulary or phrases. Real ASL is a complete language; this prototype demonstrates the technical pipeline at a letter level. Accuracy varies with lighting and hand position. Hold each sign steady for ~1 second for best results.

Why it's better

Personal Calibration

Unlike fixed gesture libraries, ASL Flow learns your hands. Sign each letter once during setup and the model stores your personal landmark vectors on-device. Your calibration improves the more you use it, and it never leaves your machine.

Audio Bridge

The key architectural insight: Wispr Flow doesn't need to know sign language. ASL Flow converts sign to synthesized speech upstream, so Wispr receives a clean audio stream, identical to a human voice. Zero changes to Wispr's existing architecture required.

Intelligent Input Layer

Word prediction, custom shortcuts, and sentence-level reading make ASL Flow more than a translator: it's a full input system. Power users can define shorthand signs for frequently used phrases, reducing signing burden for repetitive communication by an estimated 40-60%.

Why This Is Different

Most ASL tools stop at recognition. ASL Flow is a full input stack: every layer is designed to make sign language a first-class input method for any app on your computer.

Try it

Hold ASL letter signs in front of your camera. The model reads each letter, speaks it aloud, and builds text in the output field.

Tip: Hold each sign steady for ~1 second. Works best with good lighting and a plain background. Supports letters A-Z (excluding J and Z which require motion).

Initializing…

In production, the CV model runs entirely on-device via TensorFlow.js, so no video data leaves your machine. Wispr Flow would receive only the synthesized audio stream, identical to standard microphone input.