Real or AI? How to Build a Simple "Deepfake Detector" for the Viral Musk-Kamath Clip

The internet is currently losing its mind over a 39-second black-and-white clip of Elon Musk and Nikhil Kamath laughing in complete silence.¹ Is it a real podcast teaser? Or is it a high-end AI generation from Grok 3 or Sora?

While Twitter argues, developers can solve this with code.

One of the most common failures in AI-generated video is physiological inconsistency. AI models are great at rendering textures, but they often forget “biological rules”—specifically, blinking.² Humans blink spontaneously every 2–10 seconds. AI avatars often stare unblinkingly for unnaturally long periods or blink with irregular, “morphing” eyelids.

In this tutorial, I’ll show you how to build a Python script that analyzes that viral video frame-by-frame to count blinks. If Elon doesn’t blink for 39 seconds, we have our answer.

The Logic: The Eye Aspect Ratio (EAR)

We don’t need to train a massive neural network. We can use a simple geometric metric called the Eye Aspect Ratio (EAR).³

We map 6 facial landmarks around the eye.⁴ The EAR is calculated using the distance between vertical points versus horizontal points.

When the eye is open: The vertical distance is large, so the EAR is high (approx 0.30).⁵
When the eye closes: The vertical distance drops to near zero, so the EAR plummets.⁶

If the graph of EAR over time stays flat, you are likely looking at a Deepfake.

Step 1: The Setup

You will need a few standard computer vision libraries. Open your terminal (or Google Colab) and run:

Bash

pip install opencv-python dlib imutils scipy

Note: dlib can be tricky to install on Windows. If you get errors, you may need to install CMake first, or just run this in a Google Colab notebook where it works out of the box.

You will also need the pre-trained face landmark file. Download shape_predictor_68_face_landmarks.dat (it’s widely available on GitHub/HuggingFace).⁷

Step 2: The Deepfake Detector Script

Create a file named detector.py and paste the following code. I have optimized this to work specifically on video files like the downloaded Twitter clip.

Python

import cv2
import dlib
import numpy as np
from scipy.spatial import distance as dist
from imutils import face_utils

# --- CONFIGURATION ---
# EAR threshold: Below this, we count it as a "closed eye"
EYE_AR_THRESH = 0.25
# Consecutive frames: How long the eye must be closed to count as a blink
EYE_AR_CONSEC_FRAMES = 3

def eye_aspect_ratio(eye):
    # Calculate vertical distances
    A = dist.euclidean(eye[1], eye[5])
    B = dist.euclidean(eye[2], eye[4])
    # Calculate horizontal distance
    C = dist.euclidean(eye[0], eye[3])
    # Compute ratio
    ear = (A + B) / (2.0 * C)
    return ear

# Load Face Detectors
print("[INFO] Loading facial landmark predictor...")
detector = dlib.get_frontal_face_detector()
predictor = dlib.shape_predictor("shape_predictor_68_face_landmarks.dat")

# Get array indexes for left and right eyes
(lStart, lEnd) = face_utils.FACIAL_LANDMARKS_IDXS["left_eye"]
(rStart, rEnd) = face_utils.FACIAL_LANDMARKS_IDXS["right_eye"]

# Load the Viral Video
cap = cv2.VideoCapture("musk_kamath_clip.mp4")

blink_count = 0
counter = 0

while True:
    ret, frame = cap.read()
    if not ret:
        break

    # Resize for faster processing
    frame = cv2.resize(frame, (800, 600))
    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

    # Detect faces
    rects = detector(gray, 0)

    for rect in rects:
        shape = predictor(gray, rect)
        shape = face_utils.shape_to_np(shape)

        # Extract eye coordinates
        leftEye = shape[lStart:lEnd]
        rightEye = shape[rStart:rEnd]

        # Calculate EAR for both eyes
        leftEAR = eye_aspect_ratio(leftEye)
        rightEAR = eye_aspect_ratio(rightEye)
        
        # Average the EAR together
        ear = (leftEAR + rightEAR) / 2.0

        # VISUALIZATION: Draw contours around eyes
        leftEyeHull = cv2.convexHull(leftEye)
        rightEyeHull = cv2.convexHull(rightEye)
        cv2.drawContours(frame, [leftEyeHull], -1, (0, 255, 0), 1)
        cv2.drawContours(frame, [rightEyeHull], -1, (0, 255, 0), 1)

        # LOGIC: Check for blink
        if ear < EYE_AR_THRESH:
            counter += 1
        else:
            if counter >= EYE_AR_CONSEC_FRAMES:
                blink_count += 1
                # Visual Alert
                cv2.putText(frame, "BLINK DETECTED", (10, 30),
                            cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 0, 255), 2)
            counter = 0

        # Display Stats
        cv2.putText(frame, f"Blinks: {blink_count}", (10, 450),
                    cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 255, 0), 2)
        cv2.putText(frame, f"EAR: {ear:.2f}", (300, 30),
                    cv2.FONT_HERSHEY_SIMPLEX, 0.7, (255, 255, 255), 2)

    cv2.imshow("Deepfake Detector", frame)
    
    # Press 'q' to exit
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()

Step 3: Running the Analysis

Download the Clip: Save the viral video from X/Twitter as musk_kamath_clip.mp4 in the same folder as your script.
Run the Script: Execute python detector.py.
Watch the Overlay: You will see green lines drawn around Elon and Nikhil’s eyes.

How to Interpret the Results

The “Human” Result: A normal person blinks roughly 15–20 times per minute. In a 40-second clip, you should see at least 4 to 8 blinks. The EAR number should fluctuate constantly.
The “AI” Result: If the Blinks counter stays at 0 or 1 for the entire duration, or if the EAR value stays “stuck” at 0.30 without dipping, it is highly probable that the video is AI-generated (likely using an image-to-video model like Luma or Runway Gen-3 which animates faces but often forgets blink physics).

Why This Matters

We are entering an era where we can no longer trust our eyes. By building tools like this, we move from passive consumers of content to active analysts. This simple script is your first line of defense against the misinformation age.