vx
In 2017, I wanted to be a fast typist, so I practiced typing a lot. Then one day I started feeling pain in my finger joints when I type. Unfortunately, it never went away from that point onward.
Among many ways to manage this, I try to type less. This means using my voice more. And hence I started the vx
project.
This project began as a CLI application vxcli. Then an Electron application vxtron. Finally, the vx Chrome extension is its current form, is what I use to type on a day-to-day basis.
# Problems with existing offerings
I use a MacBook, and macOS comes with Dictation (opens new window). Google Docs also has Voice Typing (opens new window). You press a key to make it start listening. You speak. It types what you say. Pretty simple, right?
Here’s the problem: I am not a native speaker. A lot of times the computer would misrecognize what I said. And each time I had to hit the undo button.
This is me struggling to voice-type “Here’s the problem” in the paragraph above… ugh!
macOS’s Dictation also seems to have a few problems with rich text fields and web applications.
# Maybe I can do something to solve this…
I had an idea. Why not make computer just listen to me, and then copy what I said to the clipboard? Don't try to be smart by typing into the fields directly.
If it misrecognized what I said, I can just say it again. No need to press Undo.
If it got the results right, I just hit Paste. This way I can use it with most applications.
…with the plan in mind, I then got to work.
# vxcli
vxcli is a simple command line application written in Node.js.
To run it I just type vx
into the terminal.
Speech recognition is done by the Google Cloud’s Speech-To-Text API (opens new window). The accuracy is very high, but it’s also quite expensive.
vxcli’s GitHub repository https://github.com/dtinth/vxcli
The result is so great, it led me to pursue this project further. Having to run a command in the terminal every time I want to write something… that’s just isn’t quite convenient!
# vxtron
vxtron is, you guessed it, an Electron application.
Now, instead of running a command, I can just run vxtron and leave it in the background. It registers a global hotkey which I can press to make it start/stop listening.
I developed it live on stream (Thai language).
- Part 1 (opens new window) — Prototyping as a web application
- Part 2 (opens new window) — Porting to Electron application
vxtron’s GitHub repository https://github.com/dtinth/vxtron
I also made it about to listen to 2 different languages (a hotkey for each language). I also had my sister try out vxtron. She told me that because of it she can input text almost twice as fast.
It was designed for macOS. But I also had a Chromebook, and Chromebooks don’t run Electron apps…
# vxchrome
vxchrome is a Chrome extension. I was able to turn it into a Chrome extension because:
Chrome extensions can register keyboard shortcuts, and a user can configure these shortcuts to be system-wide. That means you can use in any application, not just Google Chrome.
Google Chrome implements the Web Speech API (opens new window). It allows web applications (and Chrome extensions) to perform speech recognition for free. (This isn't available in Electron.)
vxchrome’s GitHub repository https://github.com/dtinth/vxchrome
Being a Chrome extension, it works on macOS, Windows, Linux, as well as Chrome OS.
In April 2020, I polished the extension and published it to the Chrome Web Store.