computer vision ยท biometric auth
FRS VERSION 3.0

Face
Recognition
System

A production-grade face authentication module combining MTCNN detection, FaceNet embedding, anti-spoofing CNN, and gaze tracking into a single composable Python class.

512
Embedding dims
3
Guard layers
0.75
Cosine threshold
<1s
Auth latency
Face Recognition System Version 3.0
๐Ÿ‘ค
livenessPASS โœ“
gazeFORWARD โœ“
similarity0.847

authentication flow

Recognition
Pipeline

face_recognize_from_image(frame, threshold=0.75)
00 ยท input
๐Ÿ–ผ๏ธ
BGR Frame
cv2 ndarray
โ†’
01 ยท detect
๐Ÿ”ฒ
MTCNN
Pโ†’Rโ†’O net
โ†’
02 ยท gaze
๐Ÿ‘๏ธ
Gaze Check
iris xโˆˆ[.25,.75]
โ†’
03 ยท liveness
๐Ÿ›ก๏ธ
Anti-Spoof
CNN ensemble
โ†’
04 ยท encode
๐Ÿง 
FaceNet
512-d embed
โ†’
05 ยท match
๐Ÿ“
Cosine Sim
top-K score
โ†’
06 ยท out
โœ…
name | None
str or None

security architecture

Three Guard
Layers

01
๐Ÿ‘๏ธ Gaze Detection
MediaPipe Iris Tracking
Class: GazeDetector

Uses FaceMesh with refine_landmarks=True to extract 478 facial landmarks including iris keypoints.

Left iris: landmarks 474โ€“477 ยท Right iris: 469โ€“472

Computes average iris center X per eye. If either falls outside [0.25, 0.75] normalized range โ€” user is not looking forward โ€” authentication is immediately denied. Prevents angled photo attacks.
02
๐Ÿ›ก๏ธ Liveness Detection
Silent-Face CNN Ensemble
Class: predict_liveness

Loads all models from resources/anti_spoof_models/. For each, parses h_input, w_input, scale from filename via parse_model_name().

Crops face at each scale, runs inference, accumulates scores. Final label = argmax of summed predictions.

Label 1 = Real โœ“ ยท Label 0 = Spoof โœ—. Guards against photo, screen replay, and 3D masks.
03
๐Ÿง  Face Embedding
FaceNet / InceptionResnetV1
Class: FaceRecognition

MTCNN(keep_all=False) detects primary face and returns aligned tensor.

InceptionResnetV1(pretrained='vggface2') encodes it into 512 dims. Runs under torch.no_grad().

Compared against all .npy files via cosine_similarity. Best match above 0.75 is returned. All scores are logged.

code structure

Class
Overview

# Face Recognition System Version 3.0 โ€” three composable classes class GazeDetector: # MediaPipe FaceMesh ยท iris landmarks 469-477 is_looking_forward(frame, threshold=0.05) โ†’ bool checks iris center X โˆˆ [0.25, 0.75] class predict_liveness: # Silent-Face Anti-Spoofing ยท model ensemble test(image) โ†’ bool loops models/ ยท crops at scale ยท sums predictions label 1 = real ยท label 0 = spoof class FaceRecognition: # MTCNN detector + InceptionResnetV1 encoder set_face_embedding(name) โ†’ saves {name}.npy get_face_image(image) โ†’ aligned face tensor get_embedding(face_img) โ†’ 512-d torch.Tensor compare_embeddings(emb1, emb2, thr) โ†’ bool face_recognize(video_index=0) โ†’ name | None (CLI mode) face_recognize_from_image(frame, thr) โ†’ name | None (API mode)

technical specs

Model
Details

512
FaceNet dims
478
FaceMesh landmarks
3
MTCNN stages
N+
Anti-spoof ensemble
ComponentModel / LibraryRoleOutput
Face DetectorMTCNN โ€” facenet_pytorchP-Net โ†’ R-Net โ†’ O-Net cascade. Detects, aligns, and crops face region from raw BGR framealigned tensor
Face EncoderInceptionResnetV1 โ€” VGGFace2Deep CNN pretrained on VGGFace2 dataset. Extracts 512-dim discriminative face representation512-d vector
Gaze TrackerFaceMesh โ€” MediaPipeRefine landmark mode. Tracks iris keypoints 469โ€“477. Checks forward-facing constraintbool
LivenessAntiSpoofPredict โ€” Silent-FaceMulti-scale face crop + CNN ensemble. Accumulates prediction scores across all loaded modelsbool
Similaritycosine_similarity โ€” PyTorchScores all enrolled users simultaneously. Best match above threshold 0.75 winsfloat + name

usage modes

CLI vs
API Mode

๐Ÿ–ฅ๏ธ face_recognize() โ€” CLI

Inputwebcam (video_index)
Capture5s countdown
Gaze checkโœ“ yes
Livenessโœ“ yes
Thresholdcosine 0.6
Use caseterminal / local

๐ŸŒ face_recognize_from_image() โ€” API

Inputcv2 frame from server
Capturebrowser WebRTC
Gaze checkโœ“ yes
Livenessโœ“ yes
Thresholdcosine 0.75 (stricter)
Use caseFastAPI / SonDrive

enrollment

Face
Registration

๐Ÿ“ท
Capture via Webcam
5-second countdown lets user position face. capture_face_5s(video_index) reads from OpenCV VideoCapture.
cap = cv2.VideoCapture(video_index)
๐Ÿ”ฒ
MTCNN Face Detection
MTCNN detects face in frame. Returns an aligned, normalized face tensor ready for encoding.
face = self.mtcnn(image) # โ†’ aligned tensor
๐Ÿง 
FaceNet Encoding
InceptionResnetV1 encodes face into 512-d embedding vector. Runs under torch.no_grad().
emb = model(face.unsqueeze(0)) # โ†’ [1, 512]
๐Ÿ’พ
Save Embedding
Saved as {name}.npy in embedded_face/. Loaded at auth time via NumPy for comparison.
np.save("embedded_face/{name}.npy", emb.numpy())