Overview
Create professional-quality talking videos by animating a still photo of a person with realistic lip-sync and natural expressions. Choose between Realistic mode for high-fidelity, accurate lip-sync results, or Expressive mode for a more animated, expression-guided style—perfect for marketing videos, presentations, social media, and personalized messages.
Realistic is the default and recommended option for best results. It provides sharper details, more realistic skin textures, and superior lip-sync accuracy compared to Expressive mode.
Prerequisites
A Magic Hour account (free or paid)
A clear, front-facing photo of a person (JPG, PNG, or similar image format)
Audio source: text to convert to speech, a cloned voice, or an audio file (MP3, WAV, etc.)
Available credits (free plan provides 400 credits; generation cost varies by duration and mode)
Before You Begin
Plan your photo: For best results, use a high-quality, well-lit image where the person's face is clearly visible and facing forward. Blurry, side-angled, or heavily obscured faces will produce lower-quality results.
Credit usage: Credit cost is based on video duration, frame rate, and the generation mode selected. Realistic mode costs more per second than Expressive mode — check the credit estimate in the UI before generating. Free-plan users generate watermarked outputs at 576px resolution; upgrading unlocks higher resolutions and removes watermarks.
Step-by-Step Guide
Step 1: Access the Talking Photo Tool
Sign in to Magic Hour at magichour.ai.
Click Create in the top navigation.
Select Talking Photo from the dropdown menu. You'll be taken to the creation page.
Step 2: Select Your Generation Mode
On the creation page, look for the Generation Mode selector.
Choose your mode:
Realistic (default) — Realistic detail, accurate lip sync, and faster generation
Expressive — Animated style with prompt-guided expression and movement
Optional: Click the info icon next to "Generation Mode" to see a side-by-side comparison video.
Realistic mode is pre-selected for the best quality output, faster generation, and optimal lip-sync accuracy.
Step 3: Upload Your Photo
In the center of the screen, click the upload area or drag and drop your photo file.
Select a clear, front-facing image of the person whose photo you want to animate.
The photo will preview immediately below the upload area.
Alternative: If you don't have a photo ready, scroll down to the Preset Faces Carousel and select one of the sample faces to test the feature.
Step 4: Review Advanced Settings (Optional)
Below your photo preview, expand the Advanced Settings accordion.
In Expressive mode, you can enter a custom prompt to guide facial expression and movement.
In Realistic mode, expression is handled automatically — no additional adjustments are needed.
Step 5: Select Your Audio Source
Choose one of three audio options:
Option A: Upload an Audio File
Click Upload from device and select an MP3, WAV, or similar audio file.
The file uploads and displays in the audio player.
Option B: Use a Preset Audio
Browse and select from the available preset audio clips.
Option C: Use a Cloned Voice / Text to Speech
Select a voice and enter your script to generate a voiceover using an AI voice.
Step 6: Trim Audio Duration (Optional)
Below the audio player, adjust the Start Seconds and End Seconds sliders to select the portion of audio you want to use.
The maximum supported duration is 60 seconds.
The UI displays the trimmed duration and estimated credit cost before you generate.
Trim error: If you set "End Seconds" beyond the audio duration, you'll see an error. Adjust the slider to fix it.
Step 7: Review and Generate
Review your selections:
Generation mode is set (Realistic or Expressive)
Photo is uploaded and visible
Audio is selected and trimmed
Check the estimated credit cost displayed on the Generate button.
Click Generate Talking Photo.
You'll see loading status messages as your video is processed.
Step 8: Download Your Video
Once generation completes, you'll be redirected to your project page with a success notification.
The video plays automatically.
Click Download Video to save the MP4 file to your device.
Verify the Setup
To confirm your talking photo was created successfully:
Video plays: The generated video shows your photo animating with clear lip movements.
Lip-sync accuracy: Mouth movements sync to the audio — words and syllables align visually.
Natural expressions: The face shows natural, realistic movements (Realistic mode) or prompt-guided expressions (Expressive mode).
Audio clarity: The audio plays clearly with no gaps or distortion.
Download file: The MP4 saves to your device and plays in any standard video player.
Troubleshooting
Issue | Likely Cause | Solution |
Lip movements look unnatural or don't match audio | Photo is side-angled, blurry, or face is obscured | Upload a clear, front-facing photo with good lighting. Regenerate. |
"Insufficient credits" error during generation | Not enough credits for the video length | Purchase a credit pack, or trim the audio to a shorter duration. |
"The selected end seconds is past the end of the audio" error | End Seconds slider exceeds audio length | Drag the End Seconds slider back to within the audio duration. |
Upload fails for image or audio file | File is too large or unsupported format | Ensure image is JPG/PNG and audio is MP3/WAV. Try uploading again. |
Video quality is low or blurry | Free plan limits resolution to 576px; watermark present | Upgrade to a paid plan (Creator, Pro, or Business) to unlock higher resolution and remove watermarks. |
Generation is taking a long time or stuck | Video is queued during high platform load | Wait a few minutes. Try a shorter clip. If stuck after 10 minutes, contact support. |
Audio is distorted or cuts off | Audio file is corrupted or trimmed incorrectly | Verify the audio plays correctly on your device. Re-upload and check Start/End Seconds sliders. |
Realistic vs. Expressive Mode
Feature | Realistic | Expressive |
Lip-sync accuracy | Highly accurate | Good (less precise) |
Visual fidelity | Sharp details, realistic skin texture | Animated, stylized |
Facial expressions | Natural, realistic movements | Prompt-guided expression and movement |
Generation speed | Faster | Standard |
Custom prompt | N/A (auto-optimized) | Available |
Best for | Professional videos, marketing, presentations | Creative experimentation, stylized effects |
Realistic is ideal if you want the most realistic, polished results. Switch to Expressive if you want to guide expressions with a text prompt or prefer a more stylized, animated look.
Limitations
Video duration: Maximum supported duration is 60 seconds.
Face visibility: Realistic mode works best with clear, front-facing faces.
Resolution limits: Free-plan videos generate at 576px; paid plans unlock 1024px and higher.
Watermarks: Free-plan outputs include a watermark. Upgrade to remove watermarks.
No batch processing: Generate one video at a time.
Audio-only edits: You cannot edit the audio after generation; you must regenerate with a new audio file.
Best Practices
Use clear photos: High-quality, well-lit images with the face centered produce the best results.
Test audio quality: Ensure your audio is clear, without background noise, before generating.
Keep videos short: 10–30 seconds is optimal. Longer videos consume more credits.
Avoid extreme angles: Straight-on or slightly angled photos work best.
Use natural audio: Clear speech with normal pacing syncs better than mumbled or very fast audio.
Regenerate if needed: If the first result doesn't meet your needs, regenerate with updated inputs (costs additional credits).
What's Next
Explore combining Talking Photo with other tools: generate a Talking Photo, then use Lip Sync or Face Swap for advanced effects.
Experiment with different audio sources: voice clone, text-to-speech, or uploaded audio.
Getting Help
If you encounter issues or have questions:
Email support: [email protected]
Community: Join the Magic Hour Discord to share feedback and ask questions.
When contacting support, include:
The photo and audio files you used (or descriptions of them)
Steps you took when the issue occurred
Any error messages you received
Your account email address
