You survived the install. Now let's make your first AI video — from opening the UI to watching the final rendered file.
After running both terminals (backend + Streamlit), you'll see this page at localhost:8501. Here's what every setting does and what you can safely ignore.
The only field you must fill in. This is the video's theme — one sentence is enough. For better results, use a list format (see Prompt Crafting below).
Determines what language the AI script and voiceover use. Set to English for English videos, Chinese for Chinese. Mismatched language + voice = broken subtitles.
Pick a voice from the dropdown. Do not leave this blank — it will silently fail after 3 retries with no clear error. Recommended defaults:
en-US-AnaNeural-Female or en-US-ChristopherNeural-Malezh-CN-XiaoxiaoNeural-Female or zh-CN-YunxiNeural-MaleTarget duration in minutes. Default is 1. Longer videos = more script paragraphs = longer generation time. 1–3 minutes is the sweet spot — beyond 5 minutes, quality drops noticeably.
Default is 1080p (1920×1080). 720p for faster renders during testing. Stick with the default for final videos — resolution barely affects generation speed since the bottleneck is script writing + footage download, not rendering.
Checked by default. Adds burned-in subtitles synced to the voiceover. If you're in China, uncheck this — the subtitle feature can trigger a Whisper model download (~3GB) from Hugging Face, which is painfully slow without a VPN.
Adds royalty-free background music. Leave as default or pick a track. Has negligible impact on generation time. Volume and fade settings below this are fine at defaults.
The topic field is the single biggest factor in video quality. A bad prompt produces a rambling, generic video no matter how well everything else is configured.
| Bad Topic | Problem |
|---|---|
| "AI" | Too vague. The LLM has no direction — it'll produce generic fluff. |
| "How to be happy" | Open-ended. No structure. Output is philosophical rambling with no visual hooks. |
| "The history of the Roman Empire" | Way too broad for a 1-minute video. The script will be a shallow summary. |
| "Why my product is the best" | Promotional tone. AI generates marketing-speak. Pexels has no relevant footage. |
| Good Topic | Why it works |
|---|---|
| "Top 5 facts about black holes that will blow your mind" | List format = natural structure. "Blow your mind" sets an engaging tone. Visual hooks are obvious (space footage). |
| "3 things I wish I knew before visiting Tokyo" | List + personal angle. Easy to find matching Pexels footage. ~60 seconds of content fits naturally. |
| "Why cats sleep so much: the science explained" | Specific question → answer structure. Clear visual direction (cat footage). Curiosity-driven title. |
| "Beginner's guide to investing: 4 things to do first" | Actionable, numbered, clear audience. Works as an educational explainer. |
[Number] + [interesting angle] + [specific topic]. Examples: "7 surprising ways...", "3 mistakes people make when...", "How X actually works (in 60 seconds)". List-based topics produce the most watchable videos because each bullet point becomes a natural scene transition.
Video Language: English (US)
Voice: en-US-AnaNeural-Female (clear, natural)
Subtitle: ON (default)
Stick with en-US- voices. UK voices (en-GB-) are fine but sound slightly more formal. Avoid en-IN- (Indian accent) unless your audience expects it — the accent is thick and may reduce perceived quality.
Video Language: Chinese (Mandarin) / 中文
Voice: zh-CN-XiaoxiaoNeural-Female (xiaoxiao, natural)
Subtitle: OFF (skip Whisper download)
Chinese videos work well with MPT, but two things to know: (1) Pexels search is English-only, so the footage may not match Chinese topics as well — use specific English keywords in your topic to help. (2) Edge TTS Chinese voices are good quality but fewer options than English.
MPT supports Japanese, Korean, German, French, and more via Edge TTS. Check the voice dropdown for available options. The LLM prompt is sent in English regardless of video language setting, so script quality for non-English/Chinese languages varies.
speech.platform.bing.com — blocked in China. Keep your VPN ON during video generation or you'll get a 403 error. See the edge-tts fix guide.
Understanding the pipeline helps you debug when things go wrong. Here's the sequence, in order:
moviepy.editor not found at this stage means you need v1.0.3 (see moviepy fix).MoneyPrinterTurbo/output/ as an MP4 file. Streamlit shows a download link and a preview player.The result is a slideshow-video, not AI-generated footage. MPT doesn't create video from scratch — it assembles existing stock clips. Think of it as an automated version of: find stock footage → write narration → record voiceover → edit together in a timeline.
The first video is rarely the best. Try these adjustments:
config.toml. Dramatically faster footage search with a key vs without.Once you have a winning formula, batch-produce videos by reusing settings. MPT doesn't have built-in batch mode, but you can: