Audim First Release

Author: @mratanusarkar

Created: May 18, 2025

Last Updated: May 18, 2025

Compatible with: Audim v0.0.7

Warning

This blog is still a work in progress.

Info

This is the first release of Audim.

Putting everything together, all the modules, small changes and everything we have developed so far into one place, let's take a look how Audim generates the podcast video.

Overview

For this example, we'll transform a conversation between Grant Sanderson (from 3Blue1Brown) and Sal Khan (from Khan Academy) into a visually engaging podcast video. We'll walk through:

Setup and Installation
Preparing the input files
Extracting the audio from the video
Generating a transcript from the audio
Setting up the podcast layout
Generating the final output video with Audim

Step 00: Setup

we have setup the project and installed the dependencies.
see docs/setup/installation.md for more details on how to setup the project and install the dependencies.
for demo purposes, we have decided to use Sal Khan: Beyond Khan Academy | 3b1b Podcast #2 as the input video.

Note: you will have your own recordings when you use audim for your own podcast video generation.

Step 01: Prepare the input files

we have downloaded this video podcast from YouTube for demo purposes.

Note: you will have your own recordings when you use audim for your own podcast video generation.
since the video is too long for just a demo, we will only use the 19:39 - "The next decades of education" section of the video.
other than the video, we need a podcast brand logo, and profile images for the speakers. I have used the following images from google:

Step 02: Extract the audio from the video

we have extracted the audio from the video using Audim's extract module.
see docs/audim/utils/extract.md API docs for more details.
see blog v0.0.6 for more details on how to extract the audio from a video file.

Note: Incase you had an audio recording instead of a video, you could have skipped step 02 and used the audio file directly in step 03.

Here's the audio file we have extracted:

extracted audio snippet from the downloaded youtube video

Step 03: Generate a transcript from the audio

we have generated a transcript from the audio using Audim's aud2sub module.
see Podcast Transcriber API docs for more details.
see blog v0.0.5 for more details on how to generate a transcript from an audio file.

Note: Incase you had a transcript instead of an audio file, you could have skipped step 03 and used the transcript directly in step 04.

Here's the transcript we have generated:

transcript generated from the audio snippet

Step 04: Set up the podcast layout

we have set up the podcast layout using Audim's sub2pod module.
see Podcast Layout API docs for more details.
see blog v0.0.2 for more details on how to set up the podcast layout.
also, see blog v0.0.3 for the design philosophy behind the podcast layout, and some more variations on the podcast layout.

Here is the final layout and generation code (mostly using the default settings):

from datetime import datetime
from audim.sub2pod.layouts.podcast import PodcastLayout
from audim.sub2pod.core import VideoGenerator

# Create a podcast layout
print("Creating layout...")
layout = PodcastLayout()

# Add speakers and layout tweaks
print("Adding speakers...")
layout.add_speaker("Grant Sanderson", "input/grant.png")
layout.add_speaker("Sal Khan", "input/sal.png")
layout.set_content_offset(200)

# Generate video
print("Generating video...")
generator = VideoGenerator(layout, fps=30)
generator.generate_from_srt(
    srt_path="input/podcast.srt",
    audio_path="input/podcast.mp3",
    logo_path="input/logo.png",
    title="3b1b Podcast: Sal Khan: Beyond Khan Academy",
    cpu_core_utilization="max"
)

# Export the final video
print("Exporting video...")
datetime = datetime.now().strftime("%Y%m%d%H%M%S")
generator.export_video(f"output/podcast_{datetime}.mp4")

Step 05: Generate the video and export final output

we have generated the video using Audim's sub2pod module.
see VideoGenerator API docs for more details.
see blog v0.0.2 for more details on how to generate a video from a transcript.

Here's the final output video we have generated:

final podcast video generated from the input content