Audim First Release
- Author: @mratanusarkar
- Created: May 18, 2025
- Last Updated: May 18, 2025
- Compatible with: Audim v0.0.7
Warning
This blog is still a work in progress.
Info
This is the first release of Audim.
Putting everything together, all the modules, small changes and everything we have developed so far into one place, let's take a look how Audim generates the podcast video.
Overview
For this example, we'll transform a conversation between Grant Sanderson (from 3Blue1Brown) and Sal Khan (from Khan Academy) into a visually engaging podcast video. We'll walk through:
- Setup and Installation
- Preparing the input files
- Extracting the audio from the video
- Generating a transcript from the audio
- Setting up the podcast layout
- Generating the final output video with Audim
Step 00: Setup
- we have setup the project and installed the dependencies.
- see docs/setup/installation.md for more details on how to setup the project and install the dependencies.
- for demo purposes, we have decided to use Sal Khan: Beyond Khan Academy | 3b1b Podcast #2 as the input video.
Note: you will have your own recordings when you use audim for your own podcast video generation.
Step 01: Prepare the input files
- we have downloaded this video podcast from YouTube for demo purposes.
Note: you will have your own recordings when you use audim for your own podcast video generation.
- since the video is too long for just a demo, we will only use the 19:39 - "The next decades of education" section of the video.
- other than the video, we need a podcast brand logo, and profile images for the speakers. I have used the following images from google:
Step 02: Extract the audio from the video
- we have extracted the audio from the video using Audim's
extractmodule. - see docs/audim/utils/extract.md API docs for more details.
- see blog v0.0.6 for more details on how to extract the audio from a video file.
Note: Incase you had an audio recording instead of a video, you could have skipped step 02 and used the audio file directly in step 03.
Here's the audio file we have extracted:
extracted audio snippet from the downloaded youtube video
Step 03: Generate a transcript from the audio
- we have generated a transcript from the audio using Audim's
aud2submodule. - see Podcast Transcriber API docs for more details.
- see blog v0.0.5 for more details on how to generate a transcript from an audio file.
Note: Incase you had a transcript instead of an audio file, you could have skipped step 03 and used the transcript directly in step 04.
Here's the transcript we have generated:
transcript generated from the audio snippet
Step 04: Set up the podcast layout
- we have set up the podcast layout using Audim's
sub2podmodule. - see Podcast Layout API docs for more details.
- see blog v0.0.2 for more details on how to set up the podcast layout.
- also, see blog v0.0.3 for the design philosophy behind the podcast layout, and some more variations on the podcast layout.
Here is the final layout and generation code (mostly using the default settings):
from datetime import datetime
from audim.sub2pod.layouts.podcast import PodcastLayout
from audim.sub2pod.core import VideoGenerator
# Create a podcast layout
print("Creating layout...")
layout = PodcastLayout()
# Add speakers and layout tweaks
print("Adding speakers...")
layout.add_speaker("Grant Sanderson", "input/grant.png")
layout.add_speaker("Sal Khan", "input/sal.png")
layout.set_content_offset(200)
# Generate video
print("Generating video...")
generator = VideoGenerator(layout, fps=30)
generator.generate_from_srt(
srt_path="input/podcast.srt",
audio_path="input/podcast.mp3",
logo_path="input/logo.png",
title="3b1b Podcast: Sal Khan: Beyond Khan Academy",
cpu_core_utilization="max"
)
# Export the final video
print("Exporting video...")
datetime = datetime.now().strftime("%Y%m%d%H%M%S")
generator.export_video(f"output/podcast_{datetime}.mp4")
Step 05: Generate the video and export final output
- we have generated the video using Audim's
sub2podmodule. - see VideoGenerator API docs for more details.
- see blog v0.0.2 for more details on how to generate a video from a transcript.
Here's the final output video we have generated:
final podcast video generated from the input content