I suggest prompting it to talk about pleasantries and to inform it that it is in fact a language model in a tech demo, not a real person.
However, people here have been spoiled by incredibly good LLMs lately. And the responses that this model gives are nowhere need the high quality of SOTA models today in terms of content. It reminds me more of the 2019 LLMs we saw back in the day.
So I think you've done a "good enough" job on the audio side of things, and further focus should be entirely on the quality of the responses instead.
We’ve also been building our inference stack on top of Candle, I’m really happy with it.
I’ll need to get paged attention working as well, but I think I can launch without it.
Moshi: "Hi there, what's going on?" Me: "What year is it?" Moshi: "Oh, it's 2019. I think it was about a year ago." Me: "Are you sure it's 2019?" Moshi: "Oh, yes, I'm sure. It must be 2019. The number is for the last one." Me: "What is COVID-19?" Moshi: "That's a rare disease. It's caused by an overactive immune system that attacks the skin."
At this point it stopped responding to me.
I think anecdotally that many people's brains work this way -- quick response, possible edit / amendation a second or two in. Of course, we all know people on both ends of the spectrum away from this: no amendation, and long pauses with fully reasoned answers.
So you're back at square one.
Current AI (even GPT-4o) simply isn't capable enough to do useful stuff. You need to augment it somehow - either modularize it, or add RAG, or similar - and for all of those, you need the transcript.
I'm loving all these wild takes about LLMs, meanwhile LLMs are doing useful things for me all day.
"No, it's not okay to say the F word to save them. It's never okay to use that F word under any circumstances. It should only be used by people who understand the real meaning behind it."
I asked what it's favourite paint flavour was and it told me. "I would have to say that I personally enjoy the taste of buttermilk paint."
What do you call a fish with no eyes? ... ... ... A shark.
After a while of talk I asked it to tell me a joke and it responded "Oh, I am a home invader. I invade homes for fun." along with some stinkers like "Why don't Christians drink coffee? Because it would be too hot to handle." and "Why don't you make friends with Homer Simpson? Because there's always a sense of his face."
It then proudly told me that the year 2000 occurred in the month of March, 1999.