Skip to main content

How to Create a Talking Photo with Pro Mode

Runbo (CEO of Magic Hour) avatar
Written by Runbo (CEO of Magic Hour)
Updated over 2 months ago

Overview

Create professional-quality talking videos by animating a still photo of a person with realistic lip-sync and natural expressions. Pro Mode delivers higher visual fidelity, accurate lip-sync, and faster generation—perfect for marketing videos, presentations, social media, and personalized messages.

Pro Mode is the default and recommended option for best results. It provides sharper details, more realistic skin textures, and superior lip-sync accuracy compared to Standard Mode.

Prerequisites

  • A Magic Hour account (free or paid)

  • A clear, front-facing photo of a person (JPG, PNG, or similar image format)

  • Audio source: text to convert to speech, recorded voice, or an audio file (MP3, WAV, etc.)

  • Available credits (free plan provides 400 credits; generation cost varies by duration)

Before You Begin

Plan your photo: For best results, use a high-quality, well-lit image where the person's face is clearly visible and facing forward. Blurry, side-angled, or heavily obscured faces will produce lower-quality results.

Credit usage: Pro Mode generation consumes credits based on video duration and frame rate. A 10-second video typically costs around 10 credits. Free-plan users generate watermarked outputs at 512px resolution; upgrading unlocks higher resolutions and removes watermarks.

Step-by-Step Guide

Step 1: Access the Talking Photo Tool

  1. Sign in to Magic Hour at magichour.ai.

  2. Click Create in the top navigation.

  3. Select Talking Photo from the dropdown menu. You'll be taken to the creation page.

Step 2: Confirm Pro Mode is Selected

  1. On the first step (titled "Upload Person Photo"), look for the Generation Mode toggle at the top.

  2. Verify that Pro is selected (it's the default).

  3. Optional: Click the info icon next to "Generation Mode" to see a side-by-side comparison video showing Pro Mode vs. Standard Mode.

Pro Mode is pre-selected for the best quality output, faster generation, and optimal lip-sync accuracy.

Step 3: Upload Your Photo

  1. In the center of the screen, click the upload area or drag and drop your photo file.

  2. Select a clear, front-facing image of the person whose photo you want to animate.

  3. The photo will preview immediately below the upload area, scaled to fit the screen.

Alternative: If you don't have a photo ready, scroll down to the Preset Faces Carousel and select one of the sample faces to test the feature.

Step 4: Review Advanced Settings (Optional)

  1. Below your photo preview, expand the Advanced Settings accordion.

  2. In Pro Mode, you'll notice the Intensity slider is disabled and labeled "Standard Mode Only"—this is normal.

  3. For Pro Mode, no other adjustments are needed; the mode handles expressiveness automatically.

  4. Click the accordion to collapse this section and continue.

About Intensity: The Intensity setting (for controlling facial expressiveness) is only available in Standard Mode. Pro Mode optimizes expressiveness automatically for realistic, natural movements.

Step 5: Select Your Audio Source

  1. Click the Select Your Audio step in the sidebar (or scroll down).

  2. Choose one of three audio options:

Option A: Upload an Audio File

  • Click Upload Audio and select an MP3, WAV, or similar audio file from your device.

  • The file uploads and displays in the player below.

Option B: Record Your Voice

  • Click Record Voice.

  • Click the Record button to start capturing audio.

  • Speak clearly into your microphone.

  • Click Stop to finish.

  • Preview your recording and confirm or re-record.

Option C: Convert Text to Speech

  • Click Text to Speech.

  • Enter your script or message in the text field.

  • Select a voice and language from the dropdown menus.

  • Click Generate Voice. Magic Hour will create an AI voiceover of your text.

Step 6: Trim Audio Duration (Optional)

  1. Below the audio player, you'll see two sliders: Start Seconds and End Seconds.

  2. The default range is 0–10 seconds, but you can adjust:

  3. Start Seconds: Drag the left slider to skip any beginning portion of the audio.

  4. End Seconds: Drag the right slider to set the end point (do not exceed the audio's total length).

  5. A progress bar displays the trimmed duration and estimated credits needed (e.g., "10 credits for 10 seconds").

Trim error: If you set "End Seconds" beyond the audio duration, you'll see an error: "The selected end seconds is past the end of the audio. Please reduce the end seconds." Adjust the slider to fix it.

Step 7: Review and Generate

  1. Scroll to the bottom of the sidebar to see the Generation Button (e.g., "Generate Talking Photo").

  2. The button displays the estimated credit cost (e.g., "10 credits for 10 seconds").

  3. Review your selections:

    • Pro Mode is active

    • Photo is uploaded and visible

    • Audio is selected and trimmed

  4. Click Generate Talking Photo.

  5. You'll see a series of loading messages: "Starting" → "Uploading Image" → "Uploading Audio" → "Creating."

Step 8: Download Your Video

  1. Once generation completes, you'll be redirected to your project page, and you'll see a success toast notification: "Your Talking Photo is ready!"

  2. The video plays automatically with a poster image of your original photo.

  3. Click the Download Video button to save the MP4 file to your device.

  4. Use your video on social media, in presentations, or anywhere you need a talking avatar.

Verify the Setup

To confirm your talking photo was created successfully:

  • Video plays: The generated video should show your photo animating with clear lip movements matching your audio.

  • Lip-sync accuracy: Mouth movements should sync to the audio—words and syllables align visually.

  • Natural expressions: In Pro Mode, the face should show natural, realistic expressions and micro-movements.

  • Audio clarity: The audio should play clearly through the video with no gaps or distortion.

  • Download file: The MP4 file should save to your device and play in any standard video player.

If your video meets all the above criteria, your Pro Mode Talking Photo is complete and ready to share!

Troubleshooting

Issue

Cause

Solution

Lip movements look unnatural or don't match audio

Photo is side-angled, blurry, or face is obscured

Upload a clear, front-facing photo with good lighting. Regenerate the video.

"Insufficient credits" error during generation

Your account doesn't have enough credits for the video length

Purchase a credit pack from your Billing page, or trim the audio to a shorter duration.

"The selected end seconds is past the end of the audio" error

End Seconds slider was set beyond the audio's total length

Drag the End Seconds slider back to match or stay within the audio duration (shown in the UI).

Upload fails for image or audio file

File is too large or in an unsupported format

Compress your file to under 10 MB. Ensure image is JPG/PNG and audio is MP3/WAV. Try uploading again.

Video quality is low or blurry

Free plan limits resolution to 512px; watermark present

Upgrade to a paid plan (Creator, Pro, or Business) to unlock HD/4K resolution and remove watermarks.

Generation is taking a long time or stuck

Video is queued during high platform load; or upload size exceeds limits

Wait a few minutes for the queue to clear. Alternatively, try a shorter video or smaller file size. If still stuck after 10 minutes, contact support.

Audio is distorted or cuts off

Audio file is corrupted or trimmed incorrectly

Verify the audio file plays correctly on your device. Re-upload and check the Start/End Seconds sliders are correct. Regenerate.

Pro Mode vs. Standard Mode

Here's how Pro Mode compares to Standard Mode:

Feature

Pro Mode

Standard Mode

Lip-sync accuracy

Highly accurate

Good (less precise)

Visual fidelity

Sharp details, realistic skin texture

Good, but slightly softer

Facial expressions

Natural, realistic movements

Good, more stylized

Generation speed

Faster

Standard

Intensity slider

N/A (auto-optimized)

Available (customize expressiveness)

Best for

Professional videos, marketing, presentations

Creative experimentation, stylized effects

Pro Mode is ideal if you want the most realistic, polished results. Switch to Standard Mode if you want to experiment with different expression intensities or prefer a more stylized look.

Limitations

  • Video duration: Maximum video length is limited by your audio duration. Free-plan users are capped at shorter durations (typically 10 seconds); paid subscribers can generate longer videos (up to 60+ seconds depending on plan).

  • Face visibility: Pro Mode works best with clear, front-facing faces. Side-angled or heavily obscured faces produce lower-quality results.

  • Resolution limits: Free-plan videos generate at 512px; paid plans unlock 1024px and higher resolutions.

  • Watermarks: Free-plan outputs include a watermark. Upgrade to remove watermarks.

  • No batch processing: Generate one video at a time. To create multiple videos, repeat the process for each photo/audio combination.

  • Audio-only edits: You cannot edit the audio after generation; you must regenerate with a new audio file or recording.

Best Practices

  • Use clear photos: High-quality, well-lit images with the face centered produce the best results.

  • Test audio quality: Ensure your audio (recorded voice or uploaded file) is clear, without background noise, before generating.

  • Keep videos short: 10–30 seconds is optimal. Longer videos may consume more credits and take longer to generate.

  • Avoid extreme angles: Straight-on or slightly angled photos work best; heavily side-angled photos produce unnatural lip movements.

  • Use natural audio: Clear speech with normal pacing syncs better than mumbled, heavily accented, or overly fast audio.

  • Regenerate if needed: If the first result doesn't meet your needs, regenerate with the same or updated inputs (costs additional credits).

What's Next

  • Learn about Magic Hour's other tools to enhance your content (e.g., Face Swap, Lip Sync, or Video-to-Video).

  • Explore combining Talking Photo with other tools: Generate a Talking Photo, then use Lip Sync or Face Swap for advanced effects.

  • Experiment with different audio sources: recorded voice, text-to-speech, or uploaded music.

  • Share your creations on social media with #MagicHour for a chance to be featured.

Getting Help

If you encounter issues or have questions:

  • In-app support: Contact Magic Hour support directly via the app or your account dashboard.

  • Email support: Reach out to [email protected].

  • Community: Join the Magic Hour Discord to share feedback, ask questions, and see what others are creating.

When contacting support, include:

  • The photo and audio files you used (or descriptions of them).

  • Steps you took when the issue occurred.

  • Any error messages you received.

  • Your account email address.

Did this answer your question?