There’s a scene in the movie Get Shorty in which John Travolta’s character, Chili Palmer discusses writing a screenplay with Bo Catlett, Delroy Lindo’s character. Catlett proposes the two of them re-write a screenplay (the circumstances are too complicated to explain)
Chili: You know how to write one of these?
Bo Catlett: There’s nothing to know. You have an idea, you write down what you wanna say. Then you get somebody to add the commas and shit where they belong, if you aren’t positive yourself. Maybe fix up the spelling where you have some tricky words… although I’ve seen scripts where I know words weren’t spelled right and there was hardly any commas in it at all. So I don’t think it’s too important. Anyway, you come to the last page you write in "Fade Out” and that’s the end. You’re done.
Chili: That’s all there is too it, huh?
The scene is funny on several layers:
1. Two gangsters that think they can write a screenplay is a fish out of water joke. It’s the same genre of humor as the bank robbers in Reservoir Dogs talking about Madonna’s oeuvre at the diner.
2. They’re trolling actual Hollywood screenwriters, suggesting the work they do isn’t that complicated or difficult, and the hardest part is spelling and comma placement. This is especially funny because while Get Shorty is a smart film with a great script, Hollywood has been known to make formulaic crap.
3. We, the audience, are aware that this dialog was in fact written by a screenwriter, so there’s a meta wink to the audience.
But the scene can also ring painfully true sometimes.
Last week, I wrote about Notebook LM’s “Deep Dive Conversation” tool, which will take a series of documents, media assets, or quotes, and create a summary for you as a chatty two-way podcast. I found its output to be simultaneously hilarious and horrifying. After listening to a few “Deep Dives” I felt that I could sense the pattern, and was no longer so impressed. I do find it to be a potentially useful summary tool, especially if you’re REALLY pressed for time, and need a summary in a hands-free situation (like commuting). Others are more impressed. What I find more horrifying is what Notebook LM tells us about how people understand podcasting at the moment.
Looking back on what I wrote, I realize I’ve only partially articulated my ideas about this, and that’s because I’m still turning some of the ideas around in my head. For the past several years, I keep being painfully reminded that most people have no idea what it takes to make a good podcast or radio show.
Maybe this isn’t actually a problem. In fact, I’m able to make a living because I do have an idea of what it takes to make a good podcast, and can sell my services to people who don’t. And the listeners arguably don’t need to know how the thing they’re listening to is made1 BUT… I’ve observed often the people in charge of putting together podcasting departments, hiring producers, hiring hosts, or setting the budgets, timelines, and expectations around a podcast don’t know what it takes to make a good podcast. This has proved a problem, as many newsrooms, funders, studios, investors, and other institutions have thrown themselves into the business of podcasting, only to find its costlier and harder than they initially thought. For a while, this massive investment was good clean fun, as lots of my friends and colleagues got decent jobs making podcasts, but then the layoffs came.
One assumption I’ve seen many people make is that the “hard” part of podcasting requiring “skill” is actually the easy part. Just like the gangsters in Get Shorty who think the hard part of writing as screenplay is putting the commas in the right place, some people assume that the skill of assembling narration, interview audio, music, natural sound, archival tape, and other elements into a coherent audio whole is a technical skill. Which it is, but the technical part is, by and large, the easy part.
When I tell people I’m an audio producer, sometimes, they seem to assume I’m something more like an audio engineer: Somebody who is expert at eq-ing voices to make them sound more resonant and present, removing tape hiss or background sound, making beautiful recordings of ambient environments, creating interesting sound design effects with reverb, echo, and pitch shifting. I have a working knowledge of all of these tasks, but when I encounter a challenging version of them, I usually turn to an engineer.
My actual expertise is an understanding of the affordances and weaknesses of the audio medium, a set of tools for creating workflows around producing audio shows, features, stories, and solving problems that come up in those workflows. Audio producers are generalists. Generally, we’re skilled writers, audio recordists, reporters, project managers, professional communicators, talent coaches, and yes, manipulators of recorded audio.
One way to understand what audio producers do and why it’s hard would be to imagine what would happen if AI could make an actual podcast. Notebook LM’s Deep Dive Conversation tool isn’t really a podcast (nor does it pretend to me). It’s an artificially generated “conversation” between two artificially generated voices in a style that hilariously and creepily imitates a certain style of modern North American podcasting.
Could AI Make a Decent Podcast?
BUT, I bet somebody out there is wondering if AI could make an actual podcast. Especially given Notebook LM eerily accurate depiction of some of the surface aesthetics of modern podcasting. Could an AI module do some combination of conduct interviews, write script, “voice” the script, and and assemble the elements together, perhaps with music, ambient sound, archival footage, into a compelling and coherent whole?
As far as I know, it can’t right now. But is it possible that somebody could build an AI module with current capabilities that could achieve that? . What discrete “skills” would an AI module need in order to make a decent podcast? I think this is an interesting thought experiment.
To answer, we have to do something quite common in podcasting circles, we have to define what we mean by a “podcast”? So let’s pick a fairly common format, and decide what we want the AI to do and what we want the humans involved to do:
Let’s say we want the AI to produce a forty five minute chat show, featuring two hosts, each of whom do an “explainer” segment in conversation with another, and with one interview segment. This is, more or less, the format for Hard Fork, the podcast I rely on for much of my news about the AI industry. Here’s a breakdown of the tasks in very broad terms, and whether the AI or humans will do it:
Plan the Episode Rundown Humans
Book the Guest Humans
Produce the Episode AI
Be the Actual Voices Human
Write the intro and outro script AI
Title and marketing language AI
Edit the audio elements AI
Publish the Episode Human
So really, this isn’t an AI making a podcast, it’s doing maybe half the work of making a podcast. But… if AI could manage these tasks, that could be incredibly impressive (and alarming) I’m quite sure that many of these tasks are well beyond the skills of current Generative AI models, so for this thought experiment, I want to give the AI an actual chance of being able to plausibly make a podcast.
Next time, we’ll go through each task, and consider how the humans or AI would approach it, and what successes or pitfalls we might encounter.
There is, however, an argument to be made that some basic understanding of the process and craft that goes into any published media is essential to media literacy. I certainly think that working in radio and audio makes me better at spotting hoaxes, satire, false claims, and baseless conspiracy theories than people who don’t work in media or study it.