Skip to main content

How to Create a Talking Photo

Written by Runbo (CEO of Magic Hour)

Overview

Create professional-quality talking videos by animating a still photo of a person with realistic lip-sync and natural expressions. Choose between Realistic mode for high-fidelity, accurate lip-sync results, or Expressive mode for a more animated, expression-guided style—perfect for marketing videos, presentations, social media, and personalized messages.

Realistic is the default and recommended option for best results. It provides sharper details, more realistic skin textures, and superior lip-sync accuracy compared to Expressive mode.


Prerequisites

  • A Magic Hour account (free or paid)

  • A clear, front-facing photo of a person (JPG, PNG, or similar image format)

  • Audio source: text to convert to speech, a cloned voice, or an audio file (MP3, WAV, etc.)

  • Available credits (free plan provides 400 credits; generation cost varies by duration and mode)


Before You Begin

Plan your photo: For best results, use a high-quality, well-lit image where the person's face is clearly visible and facing forward. Blurry, side-angled, or heavily obscured faces will produce lower-quality results.

Credit usage: Credit cost is based on video duration, frame rate, and the generation mode selected. Realistic mode costs more per second than Expressive mode — check the credit estimate in the UI before generating. Free-plan users generate watermarked outputs at 576px resolution; upgrading unlocks higher resolutions and removes watermarks.


Step-by-Step Guide

Step 1: Access the Talking Photo Tool

  1. Sign in to Magic Hour at magichour.ai.

  2. Click Create in the top navigation.

  3. Select Talking Photo from the dropdown menu. You'll be taken to the creation page.

Step 2: Select Your Generation Mode

  1. On the creation page, look for the Generation Mode selector.

  2. Choose your mode:

    • Realistic (default) — Realistic detail, accurate lip sync, and faster generation

    • Expressive — Animated style with prompt-guided expression and movement

  3. Optional: Click the info icon next to "Generation Mode" to see a side-by-side comparison video.

Realistic mode is pre-selected for the best quality output, faster generation, and optimal lip-sync accuracy.

Step 3: Upload Your Photo

  1. In the center of the screen, click the upload area or drag and drop your photo file.

  2. Select a clear, front-facing image of the person whose photo you want to animate.

  3. The photo will preview immediately below the upload area.

Alternative: If you don't have a photo ready, scroll down to the Preset Faces Carousel and select one of the sample faces to test the feature.

Step 4: Review Advanced Settings (Optional)

  • Below your photo preview, expand the Advanced Settings accordion.

  • In Expressive mode, you can enter a custom prompt to guide facial expression and movement.

  • In Realistic mode, expression is handled automatically — no additional adjustments are needed.

Step 5: Select Your Audio Source

Choose one of three audio options:

Option A: Upload an Audio File

  • Click Upload from device and select an MP3, WAV, or similar audio file.

  • The file uploads and displays in the audio player.

Option B: Use a Preset Audio

  • Browse and select from the available preset audio clips.

Option C: Use a Cloned Voice / Text to Speech

  • Select a voice and enter your script to generate a voiceover using an AI voice.

Step 6: Trim Audio Duration (Optional)

  1. Below the audio player, adjust the Start Seconds and End Seconds sliders to select the portion of audio you want to use.

  2. The maximum supported duration is 60 seconds.

  3. The UI displays the trimmed duration and estimated credit cost before you generate.

Trim error: If you set "End Seconds" beyond the audio duration, you'll see an error. Adjust the slider to fix it.

Step 7: Review and Generate

  1. Review your selections:

    • Generation mode is set (Realistic or Expressive)

    • Photo is uploaded and visible

    • Audio is selected and trimmed

  2. Check the estimated credit cost displayed on the Generate button.

  3. Click Generate Talking Photo.

You'll see loading status messages as your video is processed.

Step 8: Download Your Video

  1. Once generation completes, you'll be redirected to your project page with a success notification.

  2. The video plays automatically.

  3. Click Download Video to save the MP4 file to your device.


Verify the Setup

To confirm your talking photo was created successfully:

  • Video plays: The generated video shows your photo animating with clear lip movements.

  • Lip-sync accuracy: Mouth movements sync to the audio — words and syllables align visually.

  • Natural expressions: The face shows natural, realistic movements (Realistic mode) or prompt-guided expressions (Expressive mode).

  • Audio clarity: The audio plays clearly with no gaps or distortion.

  • Download file: The MP4 saves to your device and plays in any standard video player.


Troubleshooting

Issue

Likely Cause

Solution

Lip movements look unnatural or don't match audio

Photo is side-angled, blurry, or face is obscured

Upload a clear, front-facing photo with good lighting. Regenerate.

"Insufficient credits" error during generation

Not enough credits for the video length

Purchase a credit pack, or trim the audio to a shorter duration.

"The selected end seconds is past the end of the audio" error

End Seconds slider exceeds audio length

Drag the End Seconds slider back to within the audio duration.

Upload fails for image or audio file

File is too large or unsupported format

Ensure image is JPG/PNG and audio is MP3/WAV. Try uploading again.

Video quality is low or blurry

Free plan limits resolution to 576px; watermark present

Upgrade to a paid plan (Creator, Pro, or Business) to unlock higher resolution and remove watermarks.

Generation is taking a long time or stuck

Video is queued during high platform load

Wait a few minutes. Try a shorter clip. If stuck after 10 minutes, contact support.

Audio is distorted or cuts off

Audio file is corrupted or trimmed incorrectly

Verify the audio plays correctly on your device. Re-upload and check Start/End Seconds sliders.


Realistic vs. Expressive Mode

Feature

Realistic

Expressive

Lip-sync accuracy

Highly accurate

Good (less precise)

Visual fidelity

Sharp details, realistic skin texture

Animated, stylized

Facial expressions

Natural, realistic movements

Prompt-guided expression and movement

Generation speed

Faster

Standard

Custom prompt

N/A (auto-optimized)

Available

Best for

Professional videos, marketing, presentations

Creative experimentation, stylized effects

Realistic is ideal if you want the most realistic, polished results. Switch to Expressive if you want to guide expressions with a text prompt or prefer a more stylized, animated look.


Limitations

  • Video duration: Maximum supported duration is 60 seconds.

  • Face visibility: Realistic mode works best with clear, front-facing faces.

  • Resolution limits: Free-plan videos generate at 576px; paid plans unlock 1024px and higher.

  • Watermarks: Free-plan outputs include a watermark. Upgrade to remove watermarks.

  • No batch processing: Generate one video at a time.

  • Audio-only edits: You cannot edit the audio after generation; you must regenerate with a new audio file.


Best Practices

  • Use clear photos: High-quality, well-lit images with the face centered produce the best results.

  • Test audio quality: Ensure your audio is clear, without background noise, before generating.

  • Keep videos short: 10–30 seconds is optimal. Longer videos consume more credits.

  • Avoid extreme angles: Straight-on or slightly angled photos work best.

  • Use natural audio: Clear speech with normal pacing syncs better than mumbled or very fast audio.

  • Regenerate if needed: If the first result doesn't meet your needs, regenerate with updated inputs (costs additional credits).


What's Next

  • Explore combining Talking Photo with other tools: generate a Talking Photo, then use Lip Sync or Face Swap for advanced effects.

  • Experiment with different audio sources: voice clone, text-to-speech, or uploaded audio.


Getting Help

If you encounter issues or have questions:

When contacting support, include:

  • The photo and audio files you used (or descriptions of them)

  • Steps you took when the issue occurred

  • Any error messages you received

  • Your account email address

Did this answer your question?