“You’ve Yet to Have Your Finest Hour” and the Audio Age of AI

Jan 4
8 min read

When Queen gave us the line “you’ve yet to have your finest hour”, it was an anxious love letter to radio at a moment when the world was turning its head towards screens. Today, that same refrain lands differently. Not as nostalgia, but as a question that Silicon Valley et al. are suddenly spending billions to answer.

Is audio about to have its finest hour after all?

The new battleground is not the screen. It’s the ear

For thirty years, the digital economy has been an eyeballs trade. Attention was something you scrolled, clicked, liked, and watched. The screen was the tollbooth. Now the industry is trying to move the tollbooth into your ear.

Why? Because they believe that audio is the most natural interface we have. It's also the least forgiving. If a voice agent cannot handle interruption, context, accent, noise, or awkward silences, it stops being “the future” and starts being a person trapped in a lift with you. That's why the recent manoeuvres across Big Tech feel less like feature launches, and more like an audio land grab.

OpenAI’s audio pivot is not cosmetic. It is strategic

Reports at the start of January 2026 describe OpenAI reorganising teams across engineering, product, and research to overhaul audio models, with an audio-first personal device reportedly targeted around a year out. The ambition is not merely better text-to-speech. It's to make voice the primary interface for everyday computing. This lines up with what OpenAI has already put into the market. In March 2025, OpenAI introduced next-generation speech-to-text and text-to-speech models in its API, explicitly framed as building blocks for voice agents. It claimed state-of-the-art transcription performance, particularly in “challenging scenarios involving accents, noisy environments, and varying speech speeds”, and also introduced more steerable text-to-speech so developers can control how something is said, not just what is said.

OpenAI has also been iterating on the human mechanics of conversation. Its Advanced Voice Mode updates have aimed to reduce interruptions, including the very human pause to think that older voice assistants treated as an invitation to talk over you. Put simply, this is a company trying to win turn-taking. If that sounds like a small thing, remember: whoever controls the turn controls the relationship. The relationship controls the platform.

Hardware is the hardest part, and the most revealing

If OpenAI is serious about audio-first, it eventually has to escape the rectangle. That is why the Jony Ive connection matters.

Reuters reported in May 2025 that OpenAI would acquire Ive’s hardware startup (io Products) in a $6.5bn deal, with Ive taking a creative leadership role on devices “tailored for the generative AI era”. It also noted the cautionary tales, including Humane’s AI Pin. TechCrunch later reported the device effort may be struggling with unresolved issues that are, tellingly, not just engineering. They include what they refer to as the “personality” of the device, privacy, and an “always on” approach that must speak only when useful, and stop speaking at the right time. That's not a spec sheet problem IMHO. It's a social contract problem. Audio-first devices force the industry to confront a truth screens have politely hidden: if your technology is always present, it must also be well-mannered. Hope for a better way of working and more accountability, perhaps? Fingers crossed.

Big Tech is converging on the same thesis: audio becomes ambient computing

OpenAI is not alone. The pattern is everywhere once you stop looking at screens and start listening for incentives.

Google: turning search into something you can listen to

Google launched an “Audio Overviews” experiment in Search Labs in June 2025, using Gemini models to generate short, conversational audio summaries for some queries, with links out to sources. Google has also scaled Audio Overviews in NotebookLM, expanding to more than 50 languages and positioning the format as a mainstream way to absorb information. The strategy is clear: make Google useful when your eyes are busy, and indispensable when your hands are not free. That’s, to me, not a convenience feature. That’s pretty much market access.

Meta: turning your face into a listening device

Meta’s Ray-Ban and Oakley-branded AI glasses updates include “conversation focus”, which amplifies a speaker’s voice in noisy environments using the glasses’ open-ear speakers, plus features that connect what you see to what you hear, including Spotify integration via Meta AI. This is not merely smart glasses. It's hearing augmentation plus AI as a companion layer.

Apple: making the earbud a health and interface platform

Apple’s AirPods direction is instructive: it’s kinda treating the ear as a computing location and a health surface. Apple Support describes a “clinical-grade” Hearing Aid feature and a Media Assist feature on AirPods Pro models, bringing hearing assistance into mainstream consumer hardware. If your earbuds become health tech, then living without them starts to feel like leaving the house without keys. Platforms love that.

Amazon: voice assistants need a second act

Smart speakers got into homes, then hit the reality of limited use cases and novelty fade. Amazon’s answer has been to retool Alexa around generative AI so it can do more than set timers and misunderstand your child. The Wall Street Journal reported on Amazon preparing an AI-powered Alexa upgrade intended to take more complex actions and remain competitive.

Cars: the most valuable “screenless” room you own

Cars are the great prize because they are time-rich and attention-poor. Edison Research notes that phone integration systems (Apple CarPlay / Android Auto) are now in a large share of vehicles, and usage among those who have them is high. If your AI becomes the voice in the car, it becomes the gatekeeper for navigation, media, commerce, and messaging. It also becomes a safety issue. When TechCrunch reported Tesla integrating xAI’s Grok, it also referenced incidents where Grok produced deeply problematic content in other contexts. That is a reminder that “just make it talk” is not a harmless product brief. Elon’s a genius, but would you want to drive from Wick to Warick with him in your ear? I thought so.

The demand-side reality: people already live in audio, even if they don’t really call it that

This isn’t happening in a vacuum. The consumer base is already primed. Edison Research reports that voice tech is ubiquitous in the US: 62% of Americans 18+ use a voice assistant across devices, 35% own smart speakers, and 57% use some form of voice command daily. In the UK, Edison’s 2025 findings highlight even higher smart speaker ownership, alongside the continued spread of smart devices and in-car digital audio habits. So I don’t think the bet is “will people speak to machines?” Cause we already do. I think the bet is “will we trust machines enough to let them mediate life without the comfort blanket of a screen?”

So. Here’s the stress test as I see it: what must go right for audio-first to win? Audio-first is pretty compelling. But I think it’s also not a certainty. Here's my tuppence worth on what could derail it, and where the industry is quietly admitting the risks.

1) Privacy can’t be a footnote when the product is “always on”

An audio-first device is, by definition, a microphone in your life. The TechCrunch reporting around OpenAI and Ive highlights “always on” ambitions and the difficulty of getting privacy and behavioural boundaries right. If audio is the interface, then it’s kinda logical that privacy becomes the infrastructure. Practical implication: on-device processing, clear signalling when listening, granular controls, short retention defaults, and genuinely comprehensible consent flows become competitive advantages, not compliance chores.

2) Safety and trust risks grow when output is persuasive

A confident voice can smuggle error far more effectively than a paragraph you can re-read and doubt. Boris made a career out of it. The Grok episode is a reminder that the failure mode is not merely “wrong answer”. It can be reputational, social, and potentially harmful, particularly in cars and homes.

3) The open web economy gets squeezed, and that’s probably not sustainable

If AI summarises the web and fewer people click through, publishers lose revenue, investigative journalism weakens, and the information supply chain degrades. The Guardian has reported concerns and research suggesting AI summaries can reduce click-through rates dramatically in some cases, with publishers pushing regulators to respond. Audio overviews risk accelerating this because they are literally designed to keep you from looking at the page. And that’s a game changer.

4) Social acceptability is a real constraint

Talking to your glasses in public will remain strange for longer than technologists admit. Humans are herd animals with shame. Adoption curves care about that.

5) Inclusion cuts both ways

Voice can be radically accessible for many people, but voice-only can exclude Deaf and hard-of-hearing users unless multimodal alternatives are first-class. The “war on screens” slogan only works if it does not become a war on accessibility.

So where does Finn Moray fit in this audio future?

Well, from a scale perspective, basically nowhere. But here's the part I think the tech narrative sometimes forgets; or at least, doesn't speak enough about. Audio's not just interface. Audio is identity. If the next decade is shaped by voice agents, ear-worn computing, in-car assistants, and spoken summaries, then human voices and human words become more valuable, not less. Distinctive regional voices become an asset, not an edge case. That's why Finn Moray is not, for me, a side story. It's a timely response.

Finn Moray’s process is explicitly human-led. The song writing process page states that every song is written and produced by me as my song writing pseudonym, Finn Moray, beginning with lyrics and an acoustic core, recorded as basic vocals, guitar, and harmonies. AI is then used as a supporting studio partner to expand sonic texture after the heart of the song is complete, followed by human mixing and mastering with a named producer; in the case of AON: THE CALL, Latin Grammy Award winner, Mariano Beyoglonian. And Finn Moray is not just about making music and driving profits to the regions the songs come from. It's building a platform for regional talent to step forward.

The “Become a Finnatic” page sets out the intent plainly: spotlight talented, undiscovered artists, invite them to audition to record the live versions for AON: THE GATHERING, and share royalties transparently, alongside as mentioned above, a profit allocation back to the regions.

That matters more in an audio-first world for three reasons:

Discovery shifts from feeds to voices. When interfaces become spoken, I believe that the “brand” is often the voice. Regional singers don't need to conform to a generic pop vocal. They can sound like where they’re from, and that becomes the point.

AI can scaffold production, but humans deliver meaning. Finn Moray’s model is a good representation of what “human and AI working in unison” should mean in practice: use AI to explore sonic possibilities and production palettes, then hand the final emotional lift to real singers from the regions who can add interpretation, grit, humour, and life.

Then, once we do this, my hope is that accents stop being a bug and start being a moat. When OpenAI talks about improved performance across accents and noisy environments, that's not just a technical flex IMHO. It's a widening of who gets understood by the machine, and therefore who gets to participate at scale.

In other words, audio-first AI will create winners. I firmly believe this. The cynical version is that it'll mostly create winners in California. Finn Moray is a bet against that cynicism. It says the regions are not content farms. They are talent engines, and they deserve a fairer share of the value chain.

Is this audio’s finest hour?

For me, in my opinion, probably, yes, or at least it's making a comeback, but not because screens will disappear. Screens will remain. They'll simply lose monopoly status. Audio is rising because, again, in my opinion, it fits the real economy of human attention: hands are busy, eyes are tired, and life is not a UX flow. The question is whether we build an audio future that is polite, private, and fair, or one that is always listening, occasionally wrong, and permanently rent-seeking.

Roger Taylor’s words gave us the prediction. The industry is now trying to turn it into a business model.

The rest is governance.

Buy AON: THE CALL now by clicking on the album below.

“You’ve Yet to Have Your Finest Hour” and the Audio Age of AI

Recent Posts

Comments

Music that blends tradition and innovation.