
I have a confession: I hate reading long documents.
Not because I’m lazy (okay, maybe a little). But because my brain works better with audio. Give me a dense research paper and my eyes glaze over by page three. Play me a podcast discussing that same paper? I’ll absorb every detail while doing the dishes.
Google’s NotebookLM seemed like the answer. Upload any document, get a surprisingly natural two-host podcast. Magic. I was hooked.
Until I wasn’t.
The Problem With NotebookLM
After a few weeks of heavy use, the cracks started showing:
It’s a black box. Gemini is the only model. You can’t switch to Claude or GPT-4 when Gemini’s interpretation isn’t quite right. You’re stuck with whatever Google decides to give you.
Your documents go to Google. Maybe you’re fine with that. I had some internal docs I wasn’t comfortable uploading to a third-party service with unclear data retention policies.
Zero customization. Want different voices? Too bad. Want longer episodes? Nope. Want to integrate it into your own workflow? Good luck.
No self-hosting option. This one hurt the most. I wanted to run it on my own infrastructure, on my own terms.
So I did what any reasonable person would do: I spent way too many weekends building my own version.

Meet MyNotebookLM
MyNotebookLM is an open-source alternative that does everything NotebookLM does — and a bunch of things it doesn’t.
Upload a document. Get a podcast. But now you decide:
- Which AI model generates the script — OpenAI, Azure, Ollama, DeepSeek, or whatever’s next
- Which voices narrate it — ElevenLabs, Azure, Edge TTS, OpenAI, SparkTTS
- How long the episode is — 5 minutes for a quick summary, 30 minutes for a deep dive
- Where it runs — Your laptop, your server, your cloud
No vendor lock-in. No data leaving your network if you don’t want it to.
Watch It In Action
I recorded a quick walkthrough showing how the whole thing works — from upload to finished podcast in under two minutes.
▶️ Watch: MyNotebookLM Demo (2 min)
The Hardest Part Wasn’t The Code
Building the pipeline was straightforward. Extract text. Send to LLM. Convert to speech. Done.
The hard part? Making two AI hosts sound like actual humans.
Early versions were painful to listen to:
Host 1: “The document discusses the implementation of neural networks.”
Host 2: “That is an accurate summary. Neural networks are indeed important.”
Nobody wants to listen to two robots agreeing with each other for fifteen minutes.

The fix was all in the prompts. I spent hours crafting instructions that encourage:
- Reactions and interruptions — “Wait, hold on. Are you saying…?”
- Building on ideas instead of just agreeing
- Asking clarifying questions like a real conversation
- Occasional tangents and humor to keep it human
The difference is night and day. The podcasts now sound like two people who actually find the material interesting.
What Can It Handle?
Pretty much anything text-based:
- PDFs — Research papers, ebooks, reports
- Word docs and PowerPoints — Meeting notes, presentations
- URLs — Blog posts, news articles (auto-extracts the content)
- YouTube videos — Grabs the transcript and works with that
- Plain text — For when you just want to paste something in
The Stack (For The Nerds)
If you care about the technical bits:
- Python 3.11 running the show
- Streamlit for the web UI (shipped fast, looks decent)
- Podcastfy as the foundation for content extraction
- Abstraction layers for LLMs and TTS so you can swap providers
- Docker for one-command deployment
The architecture is simple by design:

Upload a file. Extract the text. Generate a two-host script. Convert each segment to speech. Merge into a podcast. Done.
Features I’m Proud Of
Episode length control. You pick how deep to go. A 5-minute episode hits the highlights. A 30-minute episode covers everything. The AI restructures the content accordingly — no awkward truncation.
Session history. Every generation gets saved. Load previous settings with one click. Re-generate with different voices if the first attempt wasn’t quite right.
Custom intro/outro. Upload your own branding audio. The tool stitches it in automatically.
Graceful failure handling. TTS APIs fail sometimes. Rate limits hit. The tool retries failed segments, continues with successful ones, and tells you exactly what went wrong.
Try It Yourself
Live demo: mynotebooklm.jackhui.com.au
GitHub: github.com/jack-jackhui/myNotebookLm
Self-hosting takes about five minutes:
git clone https://github.com/jack-jackhui/myNotebookLm.git
cd myNotebookLm
cp .env.example .env # Add your API keys
docker compose up -d
Open http://localhost:8501. Start making podcasts.

What’s Coming Next
I’m actively working on:
- RSS feed generation — Publish straight to podcast platforms
- YouTube auto-upload — One-click publishing with auto-generated thumbnails
- Voice cloning — Use SparkTTS to create custom host voices
- Batch processing — Drop a folder of documents, get a folder of podcasts
Why I Made This Open Source
Tools like this shouldn’t be locked behind corporate walls. Whether you’re:
- A researcher making papers accessible to broader audiences
- A content creator repurposing your blog archive
- A student who learns better through audio
- A developer who wants to customize every detail
You should be able to use it, modify it, deploy it. For free. Forever.
The code is MIT licensed. Do whatever you want with it.
If it helps you out, a GitHub star would be nice. ⭐
Built by Jack Hui — I automate things and occasionally write about it.