DeepFaceLab Tutorial Series Part 3 – Step-by-Step Workflow and Detailed Explanation

Silicon Gamer

10/06/2024

updated 21/05/2025

deepfacelab

1. DeepFaceLab Software Installation

The official original author’s software ends with an .exe extension, which is essentially a 7z compressed file. Choose an appropriate path and extract it.

  1. Antivirus False Positives
    • A few users reported antivirus alerts and file deletions causing incomplete project files and errors.
      • DeepFaceLab3 is a large open-source project on GitHub, verified safe by countless users.
      • While unlikely, third-party tampering (“poisoning”) is possible. This guide uses the official original version.
  2. Choosing an Installation Path
    • Avoid non-english characters in the path.
    • Avoid overly deep directory hierarchies to simplify navigation and operations.
  3. Key Notes
    • For AMD GPUs: Use DirectX12 version, ​not​ the RTX3000 edition.
    • Update your graphics drivers.

2. Convert Source Video to Images

  1. Double-click​ the batch file: extract images from video data_src.bat.
  2. A command window will open:
    • FPS: Enter 10 (frames per second) and press Enter. For example, a 30 FPS video will extract 10 frames per second. Reducing frames saves processing time and resources.
    • Format: Enter jpg (recommended). JPG balances quality and file size. Use PNG only for lossless requirements.
  3. After processing, images will appear in workspace/data_src.

3. Convert Target Video to Images

  1. Double-click​ the batch file: extract images from video data_dst FULL FPS.bat.
  2. No FPS adjustment here—every frame is extracted. Choose jpg format.
  3. Output images will populate workspace/data_dst.

 

4. Extract Source Faces

  1. ​Double-click​ the batch file: data_dst faceset extract.bat.
  2. Run the Batch File: Double-click the script data_dst faceset extract.bat to start the process.
  3. Face Type​ (Options: h/mf/f/wf/head | ?:help for details):
    • What is Face Type?​​ It determines the size of the facial area extracted from the original image.
    • Definitions:
      • h (half-face): Covers half the face, used in early Deepfake models (3+ years ago) due to limited VRAM.
      • mf (mid-half-face): Slightly larger than h.
      • f (full-face): Captures the full face.
      • wf (whole-face): Includes the entire face and some surrounding area. At 256px resolution, the effective facial area is about half, but modern GPUs can handle higher resolutions.
      • head: Extracts the entire head, which may waste VRAM unnecessarily.
    • Recommendation: ​**Use wf**​ for optimal balance—it captures the full face while remaining compatible with f-type datasets. Avoid h/mf (too restrictive) and head (overkill).
  4. Max Faces per Image​ (Default: 0 | ?:help for details):
    • If your images contain multiple faces, set this to limit extraction (e.g., 3 speeds up processing).
    • 0 = No limit (extract all detected faces).
  5. Image Size​ (Range: 256–2048 | Default: 512 | ?:help for details):
    • Use 512px unless your source images are exceptionally high quality and require no enhancement.
    • Higher resolutions improve facial details but demand more resources.
  6. Image Quality​ (JPEG Quality: 1–100):
    • Higher values (e.g., 90–100) produce larger files but retain more detail.
  7. Post-Processing:
    • After extraction, check the output folder workspace\data_src\aligned.
    • Cleanup Tips​ (for complex datasets):
      • Delete ​blurry​ or ​low-resolution​ faces.
      • Remove ​non-target individuals.
      • Discard ​incomplete​ or ​partially obscured​ faces.
      • Eliminate images with ​extreme lighting variations​ or ​heavy obstructions​ (e.g., hair, hands).

 

5. Extract Target Faces

  1. Run the Batch File: Double-click the script data_dst faceset extract.bat to begin extraction.
  2. Process Details: The steps here mirror those for extracting source faces (see previous section).
  3. Output & Filtering:
    • Extracted faces are saved in the data_dst/aligned folder. After extraction, ​curate the dataset:
      • Delete files ending with _1: These often represent duplicate or low-confidence detections.
      • Core Rule: Keep ​only faces you intend to replace​ (e.g., the target person). Remove all unrelated faces.
  4. Debug Folder (data_dst/aligned_debug)​:
    • Open any image in this folder to visualize the face detection results. Three colored overlays are displayed:
      • Red: The region cropped for the face image.
      • Blue: The algorithm’s detected facial area.
      • Green: Facial landmarks (contours and key points like eyes, nose, mouth).
    • Use these debug images to verify whether faces are ​accurately identified​ and ​aligned. Misaligned or missed detections may require manual cleanup or adjustments to extraction settings.

This workflow ensures precise targeting of faces for replacement while minimizing noise in the dataset.

 

6. Training the Model

This is the most time-consuming phase of the process. The latest DeepFaceLab version offers three model types: ​Quick96, ​SAEHD, and ​AMP. For this tutorial, we’ll use ​Quick96—a lightweight model optimized for lower VRAM requirements and faster training times. The trade-offs are limited customization options, lower resolution, and slightly reduced output quality compared to advanced models.

6.1 Step-by-Step Guide

  1. Launch Training
    • Double-click the batch file: train Quick96.bat.
  2. Initial Setup
    1. First run: If no existing model is found, you’ll be prompted to name your new model. Press Enter to confirm.
    2. Existing models: Choose between resuming training on a saved model or creating a new one.
  3. Select Hardware
    1. Input 0 to select your GPU (default choice) and press Enter.
  4. Monitor Training Progress
    1. After initialization, the command window displays real-time metrics:
    2. [Timestamp] [Iteration] [Time/Iter] [Src Loss] [Dst Loss]
      [16:25:30] [#000002] [0059ms] [4.2341] [3.7194]
    3. Key metric: Focus on ​Dst Loss​ (Destination Loss). Lower values indicate better alignment, with ​0.1x​ being a practical target.
  5. Preview Window
    1. ​Auto-launch: The preview window opens automatically.
    2. ​Refresh previews: Hover your mouse over the window and press P (some systems may require an additional Enter press).
    3. ​Shortcuts:
      1. Enter: Stop training and save progress.
      2. Space: Toggle views (helpful for debugging).
      3. S: Manually save without stopping.
      4. P: ​Refresh previews.
    4. ​Preview layout:
      1. ​Column 1: Source faces.
      2. ​Column 2: Model-generated approximations of source faces.
      3. ​Column 3: Target faces.
      4. ​Column 4: Model-generated approximations of target faces.
      5. ​Column 5: Blending result (expression alignment).
      6. As the number of iterations increases, the loss values gradually decrease. However, ​20,000 iterations are far from sufficient—when training a model from scratch, achieving decent results typically requires ​over 1 million iterations.
      7. ​Iteration benchmarks:
        20k iterations: Barely scratches the surface.
        100k+ iterations: Minimum for basic usability.
        1M+ iterations: Ideal for high-quality outputs.
  6. When to Stop Training?
    1. Loss Value Check
      • Aim for ​Dst Loss ≤ 0.1​ (e.g., 0.12–0.15). Below 0.1 yields diminishing returns.
    2. Visual Inspection
      • Column 2​ should closely resemble ​Column 1​ (source face replication).
      • Column 4​ should align with ​Column 3​ (target face integration).
      • Column 5​ must show natural expression transfer and sharp details.

 

 

7. Apply the Model

  1. Launch the Merge Process
    • Double-click the batch file: ​merge Quick96.bat​ (refer to Image-1).
      • Input parameters as shown in Image-2.
  2. Shortcut Reference Interface
    • A window will pop up (Image-3), displaying available keyboard shortcuts with Chinese annotations.
      • Note: This screen has no functional purpose—it simply lists shortcuts. For detailed parameter explanations, refer to future articles.
      • Critical step: Ensure your input method is switched to ​English, then press ​Tab​ to enter the preview/editing interface.
  3. Post-Processing Workflow
    • This phase resembles Photoshop-style retouching to enhance facial blending realism. Key tools include:
      • Feathering​ (soften edges)
      • Brightness/Contrast​ adjustments
      • Sharpening/Denoising​ (refine details)
    • Key controls:
      • W/S: Adjust face opacity/blending intensity.
      • E/D: Tweak sharpening strength.
  4. Preview Comparison
    1. Left panel: Raw output (faces appear “pasted” with visible seams).
    2. Right panel: Processed result (natural integration achieved through adjustments).
    3. Apply Settings to All Frames
      • Press ​Shift+?​​ (applies current adjustments to subsequent frames).
      • Manual frame navigation: Use ​​<​ and ​>​​ keys to scrub through frames.
    4. Start Automated Merging
      • Press ​Shift+>​​ to begin full-video processing.
        1. Completion: Close the CMD window manually once the progress bar hits ​100%​.
    5. Output Results
    6. Two new folders will appear:
      • merged: Final blended frames.
      • merged_mask: Grayscale masks for advanced editing (e.g., selective adjustments in compositing software).

 

8. Final Video Synthesis

  1. Generate MP4 Video
    • Double-click the batch file: ​merged to mp4.bat.
    • Bitrate setting: Input 3 (recommended default). The script automatically inherits the source video’s metadata, including frame rate and audio tracks.
  2. Output Files
    • After processing, two files appear in the workspace folder:
      • result.mp4: Final deepfake video.
      • result_mask.mp4: Mask video for post-production adjustments (e.g., fine-tuning blending in editing software)
  3. Review Results
    1. Play result.mp4 to verify facial alignment, lighting consistency, and overall realism.

 

9. Conclusion

  1. Complex but Manageable
    • While the workflow appears daunting due to its many steps, this guide breaks down every critical detail. ​Follow the instructions meticulously, and you’ll successfully complete your first deepfake project.
  2. Not a One-Click Solution
    • DeepFaceLab is ​not​ a “magic button” tool—it requires patience, experimentation, and iterative refinement to achieve professional results.
  3. Mastery Demands Investment
    • High-quality outputs hinge on understanding nuances: curating datasets, tuning model parameters, and mastering post-processing.
  4. Stay Tuned
    • Future tutorials will cover advanced features:
      • SAEHD/AMP models​ for higher resolution.
      • GAN training​ to enhance texture realism.
      • Frame interpolation​ for smoother motion.

Leave a Comment