If you’re too busy to read my last three posts about AI’s capacity to create podcasts, you’re in luck. I had AI make a podcast about it.
If that move sounds familiar, well, my colleague Justin and I did the same thing on a recent episode of Teachlab. I’m a one trick pony this week.
Notebook LM’s Deep Dive tool is garnering a little attention. Mike Pesca played around with it on a recent episode of the gist, making the point that if AI can easily imitate your show, maybe your show isn’t that good. (I agree) Pesca also shares my suspicion that Notebook LM was trained on David Greene’s hosting, among others. (And to be clear, I think David is a good host, AI might imitate the sound of his voice, but not his ability to guide a conversation.)
My friend Shawn Allee and I agree that the SOUND of Notebook LM seems designed to emulate the proximity effect of a Shure SM7 microphone, a mic that I think makes voices sound artificially basier and a bit unhuman. (I’m a EV-RE20 man, myself)
How Good is Notebook LM’s Summary
Well, you can listen to yourself. (Click above) I actually ran the tool THREE times, and I’m sharing what I consider to be the best Deep Dive of the three. Notebook LM actually now lets you prompt the Deep Dive generator, which is a recent addition, and that makes a difference. The first two versions had a couple of hallucinations and factual errors. So I used the new prompting feature and told it to focus on what tools like Deep Dive means for the future of podcast producers. Careful listeners may also detect a joke I added to the prompt.
But overall, I think using the prompt tool for the Deep Dive Summary appears to focus the “podcast” and make it less rambly, repetitive, and wide ranging.
Things it does well:
I think it more or less summarizes the point I was trying to make.
It comes up with some good analogies ON ITS OWN. It suggests that AI booking guests for a podcast is like following a recipe to bake a cake, without knowing what flour is. Not bad. At one point, it compares AI to the Gutenberg Press. I think this is a pretty good analogy. I suspect that in its training data, it has encountered that analogy for new technologies before.
It’s hilariously sly at one point, when it alludes to my suspicion about using Greene’s hosting as training data.
This version (the third) is a good length. The first two were a bit longer and more repetitive.
It makes a really interetsing point, that I don’t think was in my original posts:
Host: It’s not just about reacting in the moment. It's about anticipating those moments you know, knowing how to steer the conversation and seeing those subtle connections that AI might miss completely.
Host: Like when you have a guest on and they say something totally unexpected, and instead of freaking out, you go with it and it ends up being the most interesting part of the whole interview.
This is a really good point. Where did it come from? I didn’t make it. I suppose a reasonable human might infer from my overall viewpoint that I WOULD say something like this.
But, my guess is part of the way Notebook LM works is not simply to interact with the texts that’s given, but also to access its training data around podcasts more broadly. Whether that training data is baked into the LLM that the Notebook LM tools sits on, or whether its trained to do a search for what writers writing similar things have said about podcasts, I’m not sure.
Things it doesn’t do well:
It can be repetitive.
It still makes up stuff. Now, unlikes earlier iterations, this “podcast” doesn’t have any obvious errors. But it sometimes misinterprets my conclusions. This point might be true, but I didn’t make it:
So let's think about it. What if these AI tools, like this notebook LM thing, actually got good at some of the more technical stuff, like making those first drafts of the episode plan, you know, or transcribing interviews, or even cleaning up the audio?
Yeah, that would free up a ton of time for us to work on the stuff AI can't do, like really digging into the research and making those stories come alive, and getting better at interviewing, you know, so we could pull out those amazing moments from our guests.
And actually, I’m not really sure I agree with this. In my experience, powerful technologies that employ greater efficiencies often just lead to higher expectations from the bosses. But it’s possible the AI is right about this.
It has a predictable rhythm. Usually, you get 1-2 sentences from the male host, and 1-2 sentences from the female host. (Although I’ve noticed this seems to be getting better).
The two hosts switch back and forth between being the “knower” and the “learner”. Deep Dive uses a format that has become rather popular and widespread in the last decade. The format includes two regular hosts exploring something through conversation. One of the hosts knows the story, the reporting, the topic.1 We hear the other host “learn” , and one KNOWS the story or reporting, and we hear the other LEARN the story has been around a while. To my knowledge, it was pioneered in recent times by Jad Abumrad and Robert Krulwich in the original iteration of Radiolab. 2 It was refined later by NPR’s Invisibilia and Code Switch and The New York Times The Daily, and has now become widespread. Often both hosts actually know the gist of the topic they’re discussing, and one host will act out surprise, confusion, wonder etc. It’s an illusion, but if done well, it’s based on the true feelings of the “learner” when they originally learned about the topic. You can think of the “learner” as re-enacting their surprise, wonder, etc. But the effect feels weird and fake if the “knower” and “learner” keep randomly switching, which is how Notebook LM currently works.
It seems programmed to be “optimistic” in a way that I (the author it’ summarizing) am not. My tone is sardonic, satirical, and grumpy. But these two “hosts” are much more hopeful:
Who knows, maybe this is what will lead to a whole new wave of amazing podcasts, podcasts that are even more engaging and thought proviking and human, podcasts that we couldn't have made without AI.
I like that. It's like we're standing at the edge of something brand new, and it's up to us to explore it.
Then maybe,just maybe, this AI thing will help us remember what's really important about podcasting. It's not just about giving people information. It's about creating an experience, a connection, a journey we share with our listeners.
It's about remembering that AI can copy the style, but it's the human touch that gives a podcast its soul.
Is it generally programmed to be optimistic? Or is it optimistic because the subject is AI, and Google wants its tools to align with its techno optimism?
So Should We Be Afraid?
At one point, our “hosts” reflect on fear:
So maybe this whole AI thing isn't something to be afraid of. Maybe it's an opportunity.
I think that's a great way to look at it. It's a challenge, for sure, but it's also a chance to push ourselves creatively and get even better at what we do.
Huh, that’s not my point. I actually am afraid. But I’m not afraid that AI is going to take our jobs. Not exactly. I’m afraid that some executives at podcast companies and public media are going to waste a lot of money, trying to get robots to replace good producers. And in the meantime, they’ll miss out on opportunities to create great programming and build relationships to audiences. And the job market will suffer.
I was texting with Shawn Allee about this earlier, and I said I thought any attempt to create robot podcasts was likely to be a waste of money, and a failure. He made a good counter argument. That actually, tools like this could be very popular, and might be a strong revenue source for something I’ll call “Ultra Personalized Podcasts” (or UPP). Shawn’s example:
Ok, I think Shawn’s onto something here. Using AI to generate ultra personalized podcasts could be really cool. I have to admit, I get a little ego boost hearing the robots talk about my writing. I find it enjoyable to listen, despite the fact that Deep Dive’s podcast about my writing isn’t actually very good. And to be fair to Google and Notebook LM, this is all they’ve EVER claimed their “Deep Dive” tool is good for.
BUT, now ElevenLabs is on the case, with their new AI podcast tool, something that works very similarly to Notebook LM. And it makes me worried that somebody would try to replace journalists and good production with AI. As I wrote above, I think that will be disastrous. The content will be mediocre (and yes, a lot of content already IS mediocre, but we should fix that) audiences will diminish, the job market will suffer, and the resources that could have gone to improving, experimenting, and investing in great audio storytelling to thrill audiences will be squandered. Similar things have happened before in our ecosystem.
But Isn’t AI in Podcasting Inevitable?
Nothing technological is “inevitable” and I think sometimes, that assertion of “inevitability” serves the interest of Big Tech entrepreneurs who want to accelerate and release technology in irresponsible ways. It’s always possible to regulate new technology like we regulate food, drugs, and cars, and its possible for society or industries to collectively reject or restrict a technology. In fact, the latter is happening with smart phones right now. But, I do expect leadership in podcasting and public radio will be looking for ways to leverage this podcast generation capacity. And I think it’s likely the technology will continue to improve.
But how much will it improve? We’re not very good at predicting the speed at which new technologies will develop, especially AI powered technologies that are expected to replace human tasks. Just look at the case of laundry folding robots. Part of the reason its hard is we don’t realize how amazing our brains are, and how hard it might be to replace parts of what they do. When it comes to generative AI, and large language models, there’s an ongoing argument about whether or not the recent jump in power experienced with GPT3.5 in 2022 (aka ChatGPT) was a once-in-a-generation shift, or something we’ll see every few years. (I think once a generation) So while it likely will get better, it may only get a little bit better, and that might come slowly.
By all means, there are good reasons to experiment with powerful new technologies. I think on the whole Digital Audio Workstations have proved an improvement over tape editing with razors, not only allowing faster and more efficient editing, but also unlocking wonderful new kinds of storytelling that would have been difficult prior to Pro-Tools et al. Imagine trying to make Radiolab with tape and razors. Digital recorders have on the whole, allowed us to spend much less time manually importing tape to a computer file, freeing up that time for more interviews. 3Good producers devote part of their brain and time to learning what’s new that might help us make better audio stories.
It’s possible AI will let us build on what we know about audio storytelling in exciting ways, that attract and grow our audiences. I hope that CEOs, VPs, and other leaders at podcasting companies and radio stations engage experienced producers in conversations about how AI might improve our work, rather than unilaterally lead experiments with AI content generation. The events of the last few years do not make me optimistic that they will.
In many cases, it might be more accurate to say a producer or reporter has briefed them such that they seem to know.
I think it’s likely you could find earlier examples of this format. Perhaps in public radio contexts I don’t know, or perhaps in the British or European radio context, perhaps in North American broadcast television, or even commercial radio. Segments where a host “learns” from a guest are extremely common. Notebook LM does something more similar to Hard Fork, in which the hosts take turns briefing each other about their reporting.
There is an argument that time saving technologies aren’t always good, since they allow us to ingest more information than we can manage. I imagine that the contrainsts of early audio production techniques might have required producers to make decisions earlier about what tape to use in a story, and what to ignore. I imagine that people who learned audio storytelling in the tape and razor era are better at predicting what tape will be in their final story, shortly after their interview. It would ge interesting to test that.
Share this post