Continuing on the journey to get my hands dirty with voice UIs - I put down some user perceived latency metrics I was seeing when building VUIs.
Key points:
- I used the 'pipeline' approach of STT + LLM + TTS (as opposed to the S2S approach eg: gpt-realtime)
- This approach (with my specific setup) - yielded latency far greater than the 500ms target, where conversations feel "natural" and there aren't any awkward silences
- With the LLM as gpt-5-mini I saw latency at ~1.4s and with the LLM as Llama 3.1-8b on Cerebras I saws 1.1s
I've been trying to get some hands-on experience building voice based experiences and put down some of the lessons I learnt + questions that came up.
Some key points:
- STT (speech to text) models are a new source of errors in input - which has impact on UX
- Some UI elements don't have equivalents in voice, eg: drop downs
- STT + TTS (text to speech) still struggle with names, addresses, food items which etc which are not anglo-saxon in origin
+ a few other points - check out the post
Would be great to hear and learn from your experiences building voice forms and experiences.
Disclaimer: I'm no UX expert and have not read those O Riley books that talk about building VUIs
All I really want is to be able to listen to what my friends are listening to - know what's the latest track their jamming to - and this seems to go in that direction
I had put on a couple of pounds in late 2019 because of too much sugar in the diet (can't resist them &🧁) but just couldn't seem to cut down (you're out with friends and you think - just one bite and poof - cookie is over!).
Then friend and I came up with a "game" - we chose rules (zero dessert on weekdays + 2 pieces allowed on weekends) - and then a price (Rs. 200) - which we would pay the other when one slipped up on the rules - and surprisingly - it worked well! Consumption went over time and I started shedding the extra pounds I had put on.
Things got fun + interesting - a competitive vibe emerged b/w 2 of us which acted as forcing function to not eat sugar, lots of trash talk happened, we came up with a concept of a free pass (had a bad day and just wanted some dessert!) to give flexibility.
Looking to change something about yourself but can't seem to follow through? Like:
1. Too much sugar/desserts? 🧁
2. Too much junk food? (McDonalds fries are my weakness!)
3. Not drinking 8 glasses of water a day?
Anything at all - YOU can pick the rules you + friend want to follow & then you pick the price that works for you!
NoMo is a way to do that with the help of friends/loved ones - Give it a shot - all feedback, suggestions are welcome :)
yeah, agree. But SMS was the quickest to get started - depending on traction - can add other channels as well - Email/Whatsapp/IVR/website notifications.
Yes, privacy policy seems to be universal feedback. Will add that for sure.
The Problem:
A bunch of friends and you land up at a bar/wine shop and it turns out to be a dry day and then you go - doh! (one friend even went to Goa for a weekend only to realise it was a dry day)
Solution:
I decided to build this simple SMS based reminder service which would remind you of all the dry days in the year!
I assumed here that potential end users in India would know what a dry day is. Maybe I'll revisit this assumption if more feedback like this comes in. Thanks!
Key points: - I used the 'pipeline' approach of STT + LLM + TTS (as opposed to the S2S approach eg: gpt-realtime) - This approach (with my specific setup) - yielded latency far greater than the 500ms target, where conversations feel "natural" and there aren't any awkward silences - With the LLM as gpt-5-mini I saw latency at ~1.4s and with the LLM as Llama 3.1-8b on Cerebras I saws 1.1s