DeepFaceLab Tutorial Series Part 3 – Step-by-Step Workflow and Detailed Explanation

Content Navigation

1. DeepFaceLab Software Installation
2. Convert Source Video to Images
3. Convert Target Video to Images
4. Extract Source Faces
5. Extract Target Faces
6. Training the Model
- 6.1 Step-by-Step Guide
7. Apply the Model
8. Final Video Synthesis
9. Conclusion

1. DeepFaceLab Software Installation

The official original author’s software ends with an .exe extension, which is essentially a 7z compressed file. Choose an appropriate path and extract it.

Antivirus False Positives
- A few users reported antivirus alerts and file deletions causing incomplete project files and errors.
  - DeepFaceLab3 is a large open-source project on GitHub, verified safe by countless users.
  - While unlikely, third-party tampering (“poisoning”) is possible. This guide uses the official original version.
Choosing an Installation Path
- Avoid non-english characters in the path.
- Avoid overly deep directory hierarchies to simplify navigation and operations.
Key Notes
- For AMD GPUs: Use DirectX12 version, not the RTX3000 edition.
- Update your graphics drivers.

2. Convert Source Video to Images

Double-click the batch file: extract images from video data_src.bat.
A command window will open:
- FPS: Enter 10 (frames per second) and press Enter. For example, a 30 FPS video will extract 10 frames per second. Reducing frames saves processing time and resources.
- Format: Enter jpg (recommended). JPG balances quality and file size. Use PNG only for lossless requirements.
After processing, images will appear in workspace/data_src.

3. Convert Target Video to Images

Double-click the batch file: extract images from video data_dst FULL FPS.bat.
No FPS adjustment here—every frame is extracted. Choose jpg format.
Output images will populate workspace/data_dst.

4. Extract Source Faces

Double-click the batch file: data_dst faceset extract.bat.
Run the Batch File: Double-click the script data_dst faceset extract.bat to start the process.
Face Type (Options: h/mf/f/wf/head | ?:help for details):
- What is Face Type? It determines the size of the facial area extracted from the original image.
- Definitions:
  - h (half-face): Covers half the face, used in early Deepfake models (3+ years ago) due to limited VRAM.
  - mf (mid-half-face): Slightly larger than h.
  - f (full-face): Captures the full face.
  - wf (whole-face): Includes the entire face and some surrounding area. At 256px resolution, the effective facial area is about half, but modern GPUs can handle higher resolutions.
  - head: Extracts the entire head, which may waste VRAM unnecessarily.
- Recommendation: **Use wf** for optimal balance—it captures the full face while remaining compatible with f-type datasets. Avoid h/mf (too restrictive) and head (overkill).
Max Faces per Image (Default: 0 | ?:help for details):
- If your images contain multiple faces, set this to limit extraction (e.g., 3 speeds up processing).
- 0 = No limit (extract all detected faces).
Image Size (Range: 256–2048 | Default: 512 | ?:help for details):
- Use 512px unless your source images are exceptionally high quality and require no enhancement.
- Higher resolutions improve facial details but demand more resources.
Image Quality (JPEG Quality: 1–100):
- Higher values (e.g., 90–100) produce larger files but retain more detail.
Post-Processing:
- After extraction, check the output folder workspace\data_src\aligned.
- Cleanup Tips (for complex datasets):
  - Delete blurry or low-resolution faces.
  - Remove non-target individuals.
  - Discard incomplete or partially obscured faces.
  - Eliminate images with extreme lighting variations or heavy obstructions (e.g., hair, hands).

5. Extract Target Faces

Run the Batch File: Double-click the script data_dst faceset extract.bat to begin extraction.
Process Details: The steps here mirror those for extracting source faces (see previous section).
Output & Filtering:
- Extracted faces are saved in the data_dst/aligned folder. After extraction, curate the dataset:
  - Delete files ending with _1: These often represent duplicate or low-confidence detections.
  - Core Rule: Keep only faces you intend to replace (e.g., the target person). Remove all unrelated faces.
Debug Folder (data_dst/aligned_debug):
- Open any image in this folder to visualize the face detection results. Three colored overlays are displayed:
  - Red: The region cropped for the face image.
  - Blue: The algorithm’s detected facial area.
  - Green: Facial landmarks (contours and key points like eyes, nose, mouth).
- Use these debug images to verify whether faces are accurately identified and aligned. Misaligned or missed detections may require manual cleanup or adjustments to extraction settings.

This workflow ensures precise targeting of faces for replacement while minimizing noise in the dataset.

6. Training the Model

This is the most time-consuming phase of the process. The latest DeepFaceLab version offers three model types: Quick96, SAEHD, and AMP. For this tutorial, we’ll use Quick96—a lightweight model optimized for lower VRAM requirements and faster training times. The trade-offs are limited customization options, lower resolution, and slightly reduced output quality compared to advanced models.

6.1 Step-by-Step Guide

Launch Training
- Double-click the batch file: train Quick96.bat.
Initial Setup
1. First run: If no existing model is found, you’ll be prompted to name your new model. Press Enter to confirm.
2. Existing models: Choose between resuming training on a saved model or creating a new one.
Select Hardware
1. Input 0 to select your GPU (default choice) and press Enter.
Monitor Training Progress
1. After initialization, the command window displays real-time metrics:
2. [Timestamp] [Iteration] [Time/Iter] [Src Loss] [Dst Loss]
  [16:25:30] [#000002] [0059ms] [4.2341] [3.7194]
3. Key metric: Focus on Dst Loss (Destination Loss). Lower values indicate better alignment, with 0.1x being a practical target.
Preview Window
1. Auto-launch: The preview window opens automatically.
2. Refresh previews: Hover your mouse over the window and press P (some systems may require an additional Enter press).
3. Shortcuts:
  1. Enter: Stop training and save progress.
  2. Space: Toggle views (helpful for debugging).
  3. S: Manually save without stopping.
  4. P: Refresh previews.
4. Preview layout:
  1. Column 1: Source faces.
  2. Column 2: Model-generated approximations of source faces.
  3. Column 3: Target faces.
  4. Column 4: Model-generated approximations of target faces.
  5. Column 5: Blending result (expression alignment).
  6. As the number of iterations increases, the loss values gradually decrease. However, 20,000 iterations are far from sufficient—when training a model from scratch, achieving decent results typically requires over 1 million iterations.
  7. Iteration benchmarks:
    20k iterations: Barely scratches the surface.
    100k+ iterations: Minimum for basic usability.
    1M+ iterations: Ideal for high-quality outputs.
When to Stop Training?
1. Loss Value Check
  - Aim for Dst Loss ≤ 0.1 (e.g., 0.12–0.15). Below 0.1 yields diminishing returns.
2. Visual Inspection
  - Column 2 should closely resemble Column 1 (source face replication).
  - Column 4 should align with Column 3 (target face integration).
  - Column 5 must show natural expression transfer and sharp details.

7. Apply the Model

Launch the Merge Process
- Double-click the batch file: merge Quick96.bat (refer to Image-1).
  - Input parameters as shown in Image-2.
Shortcut Reference Interface
- A window will pop up (Image-3), displaying available keyboard shortcuts with Chinese annotations.
  - Note: This screen has no functional purpose—it simply lists shortcuts. For detailed parameter explanations, refer to future articles.
  - Critical step: Ensure your input method is switched to English, then press Tab to enter the preview/editing interface.
Post-Processing Workflow
- This phase resembles Photoshop-style retouching to enhance facial blending realism. Key tools include:
  - Feathering (soften edges)
  - Brightness/Contrast adjustments
  - Sharpening/Denoising (refine details)
- Key controls:
  - W/S: Adjust face opacity/blending intensity.
  - E/D: Tweak sharpening strength.
Preview Comparison

1. Left panel: Raw output (faces appear “pasted” with visible seams).
2. Right panel: Processed result (natural integration achieved through adjustments).
4. Apply Settings to All Frames
  - Press Shift+? (applies current adjustments to subsequent frames).
  - Manual frame navigation: Use < and > keys to scrub through frames.
5. Start Automated Merging
  - Press Shift+> to begin full-video processing.
  - 1. Completion: Close the CMD window manually once the progress bar hits 100%.
6. Output Results
7. Two new folders will appear:
  - merged: Final blended frames.
  - merged_mask: Grayscale masks for advanced editing (e.g., selective adjustments in compositing software).

8. Final Video Synthesis

Generate MP4 Video
- Double-click the batch file: merged to mp4.bat.
- Bitrate setting: Input 3 (recommended default). The script automatically inherits the source video’s metadata, including frame rate and audio tracks.
Output Files
- After processing, two files appear in the workspace folder:
  - result.mp4: Final deepfake video.
  - result_mask.mp4: Mask video for post-production adjustments (e.g., fine-tuning blending in editing software)
Review Results
1. Play result.mp4 to verify facial alignment, lighting consistency, and overall realism.

9. Conclusion

Complex but Manageable
- While the workflow appears daunting due to its many steps, this guide breaks down every critical detail. Follow the instructions meticulously, and you’ll successfully complete your first deepfake project.
Not a One-Click Solution
- DeepFaceLab is not a “magic button” tool—it requires patience, experimentation, and iterative refinement to achieve professional results.
Mastery Demands Investment
- High-quality outputs hinge on understanding nuances: curating datasets, tuning model parameters, and mastering post-processing.
Stay Tuned
- Future tutorials will cover advanced features:
  - SAEHD/AMP models for higher resolution.
  - GAN training to enhance texture realism.
  - Frame interpolation for smoother motion.

1. DeepFaceLab Software Installation

2. Convert Source Video to Images

3. Convert Target Video to Images

4. Extract Source Faces

5. Extract Target Faces

6. Training the Model

6.1 Step-by-Step Guide

7. Apply the Model

8. Final Video Synthesis

9. Conclusion

Leave a Comment Cancel reply

Recently

Navigation

Site Navigation

Navigation

What Hardware is Needed to Play with AI? (Continuously Updated)

AI AI Iamge AI Video

LivePortrait tutorial: how to install LivePortrait from source code? (also portable package)

AI

AI Can Do That Now? A Reality Check on Today’s Capabilities

LLM

Anthropic Launch Claude 4 Opus & Sonnet: A New Era for Programming and AI Agents?

AI Hardware

Why has DDR4 which was launched in 2012 still not been phased out by DDR5 in 2025?

AI LLM

How to get deterministic output when use AI generate content?

AI LLM

What is the significance of small models like 0.6B?

AI LLM

Recommended Tools for Running Large Language Models Locally – LM Studio

AI LLM

Xiaomi has released its first open-source large model, “Xiaomi MiMo”.

AI LLM

How to evaluate Alibaba’s Qwen3 series of large models?

AI LLM

15 Local Large Model Deployment Tools to Share，There’s Always One That Suits You.

DeepFaceLab Tutorial Series Part 3 – Step-by-Step Workflow and Detailed Explanation

1. DeepFaceLab Software Installation

2. Convert Source Video to Images

3. Convert Target Video to Images

4. Extract Source Faces

5. Extract Target Faces

6. Training the Model

6.1 Step-by-Step Guide

7. Apply the Model

8. Final Video Synthesis

9. Conclusion

Leave a Comment Cancel reply

Social

Recently