Skip to content

Core (Video Generation Module)

The core module is the main Video Generation Engine. It is responsible for the overall structure and flow of the podcast video.

It uses a layout object to define the visual arrangement of the video, which internally uses a collection of elements and their effects to define the components of each frame in the video and their animations and transitions.

Below is the API documentation for the core module:

audim.sub2pod.core

VideoGenerator

Core engine for generating videos from SRT files

This class is responsible for generating video frames from an SRT or subtitle file. The subtitle file must follow our extended SRT format, which adds speaker identification:

  • Standard SRT format with sequential numbering, timestamps, and text content
  • Speaker identification in square brackets at the beginning of each subtitle text Example: "[Host] Welcome to our podcast!"

Example of expected SRT format:

1
00:00:00,000 --> 00:00:04,500
[Host] Welcome to our podcast!

2
00:00:04,600 --> 00:00:08,200
[Guest] Thank you! Glad to be here.

The speaker tag is used to visually distinguish different speakers in the generated video, and is mandatory for the core engine to work.

It uses a layout object to define the visual arrangement of the video.

__init__(layout, fps=30, batch_size=300)

Initialize the video generator

Parameters:

Name Type Description Default
layout

Layout object that defines the visual arrangement

required
fps int

Frames per second for the output video

30
batch_size int

Number of frames to process in a batch before writing to disk

300

generate_from_srt(srt_path, audio_path=None, logo_path=None, title=None, cpu_core_utilization='most')

Generate video frames from an SRT file

Parameters:

Name Type Description Default
srt_path str

Path to the SRT file

required
audio_path str

Path to the audio file

None
logo_path str

Path to the logo image

None
title str

Title for the video

None
cpu_core_utilization str

'single', 'half', 'most', 'max'

  • single: Uses 1 CPU core
  • half: Uses half of available CPU cores
  • most: (default) Uses all available CPU cores except one
  • max: Uses all available CPU cores for maximum performance
'most'

export_video(output_path, encoder='auto', video_codec=None, audio_codec=None, video_bitrate='8M', audio_bitrate='192k', preset='medium', crf=23, threads=None, gpu_acceleration=True, extra_ffmpeg_args=None)

Export the generated frames as a video

Parameters:

Name Type Description Default
output_path str

Path for the output video file

required
encoder str

Encoding method to use: 'ffmpeg', 'moviepy', or 'auto' (default)

'auto'
video_codec str

Video codec to use (default: 'h264_nvenc' for GPU, 'libx264' for CPU)

None
audio_codec str

Audio codec to use (default: 'aac')

None
video_bitrate str

Video bitrate (default: '8M')

'8M'
audio_bitrate str

Audio bitrate (default: '192k')

'192k'
preset str

Encoding preset (default: 'medium')

For CPU encoding (libx264): Options: 'ultrafast', 'superfast', 'veryfast', 'faster', 'fast', 'medium', 'slow', 'slower', 'veryslow' Slower presets give better compression/quality at the cost of encoding time.

For GPU encoding (NVENC): Will be automatically converted to NVENC presets: 'slow'/'slower'/'veryslow''p1' (highest quality) 'medium''p3' (balanced) 'fast'/'faster''p5' (faster encoding) 'veryfast'/'superfast'/'ultrafast''p7' (fastest encoding)

'medium'
crf int

Constant Rate Factor for quality (default: 23, lower is better quality)

  • Range: 0-51, where lower values mean better quality and larger file size
  • Recommended range: 18-28.
  • See CRF Guide
23
threads int

Number of encoding threads (default: CPU count - 1)

None
gpu_acceleration bool

Whether to use GPU acceleration if available (default: True)

True
extra_ffmpeg_args list

Additional FFmpeg arguments as a list

None