| Technical Writer: Poornachandra Sarang | Technical Review: Pooja Gramopadhye and ABCOM Team | Copy Editor: Anushka Devasthale | Level: Intermediate | Banner Image Source : Internet |


Banner Image Source : Internet

Would you like to get a WhatsApp notification, alerting you that your son/daughter has returned home from school, tuitions, or play? Well, don't you worry, this tutorial will teach you how to develop an ML application to achieve this. The model is based on OpenCV, and you will need some familiarity with it, and its face detection API. If you do not know how the face detection works in OpenCV, read our earlier tutorial on Identity masking with OpenCV. Before I explain the application's code, I will show you how to set up the application on your machine for testing. It will help you better understand the concept.

Project Setup

Before you set up the project on your machine, you will need to install a few packages.

Installing Packages

You will need to install the following software and packages on your machine.

pip install twilio
pip install numpy
pip install -U scikit-learn

Setting Up Folders

After installing the packages, download the project source to your local drive in a folder of your choice. You will see the following folder structure.


You will need to create person1_name, person2_name, and motions_caught folders. As these folders are initially empty, they are not a part of the installation.

We use openface_nn4.small2.v1.t7 library for face detection. The face_detection_model folder contains a pre-trained Caffe-Based Deep Learning model (res10_300x300_ssd_iter_140000.caffemodel). The four .py files are our application program files to be run in a certain sequence. The output folder is initially empty. When you run the application, it will be filled with three .pickle files, as shown in the screenshot. These are our embeddings and trained models. As you know, .pickle file is used for serializing and de-serializing a Python object structure.

Adding Images

In the my_dataset folder, you would be storing the images of the person you want our system to recognize. It contains three empty folders, viz. person1_name, person2_name, and unknown. The individual folder will consist of the images of the desired person. Collect at least 35-40 photos of your kid and add it to the person1_name folder. Change the folder name to your kid’s name. Likewise, add the images of your other child in the person2_name folder and change the folder name. You may add more folders if you want our system to recognize other persons too. Finally, inside the unknown folder, add 35-40 random photo images (celebrities/friends) who are not to be identified. I have already filled this folder with a few pictures for your ready use. So, you need to add only the images of your children.

After adding the images, one last thing that you need to do is to modify the file. This file contains a JSON holding the known person’s details. The JSON structure is shown here:

        self.details_json = {
            "person1_name": {
                "timestamp": 0,
                "captures": 0,
                "total_capture": 0,
                "image_sent": False
            "person2_name": {
                "timestamp": 0,
                "captures": 0,
                "total_capture": 0,
                "image_sent": False

Change the person1_name to the folder name of person1, which you have set above. Note that this constant is case sensitive and must match exactly with your specified folder name. If you have added the second person, then change person2_name name in JSON. If there are more people to be recognized, add more JSON entries.

You are now all set for model training. As our model sends a WhatsApp notification, you will need a WhatsApp API. Twilio provides this API. Create your account on Twilio. They offer a free trial account.

Setting up Twilio Account

You will use Twilio for WhatsApp messaging. Create a Twilio account.

Twilio account

After creating your account, log into it and note down your ACCOUNT SID and AUTH TOKEN given on your Twilio Console. You will need them to set up the environment variables on your machine. Follow the steps provided at the WhatsApp page in the Twilio Console and activate the sandbox to set up a WhatsApp Number.

Twlio Console

It will redirect you to the page like the one shown above, which instructs you how to connect to your sandbox by sending a WhatsApp message through your device.


One last thing you need to do is modify the two variables FROM_NUMBER and TO_NUMBER in the file.

"FROM_NUMBER": "whatsapp:+14155238886",
"TO_NUMBER": "whatsapp:+<Your Mobile Number>"

Where Your Mobile Number is the number, you set during the configuration of the Twilio account. The FROM_NUMBER shown in the above screenshot was assigned to me during my installation, and this may be different for your installation.

You are now ready to run your application.

Running the Project

To run the project, first, you need to set up two environment variables that you obtained during the Twilio setup.

Setting up Environment Variables

If you are using Mac or Linux, use the following two commands to set up the variables.


You may permanently add the above commands in your login script so that you do not have to set up the variables every time you run the application after OS shutdown. The instructions for setting environment variables for Windows can be found at the Twilio site.

Executing Code

To run the application, execute the following four Python files in the given sequence:


On a successful run of the above four files, the webcam in your machine will start capturing the video, which you see in a popup window. The video detects the faces of all the people who appear in front of the webcam. Now, ask the person1 or person2 for whom you have trained the model to appear in front of the webcam. The program will recognize this person as a known person (let us assume the model has been prepared for your daughter) and will send you a WhatsApp notification on your registered mobile. The following screenshot shows you a video shot with the known face marked out.

The screenshot of the WhatsApp messages is as follows:
Now, as you have successfully run the application and observed its working, I will explain the code behind it.

How Does It Work?

The application uses four Python files. The code in each file as per their functionality is described below:


The file contains one utility method to extract an image's path if it is found in the specified folder.

The function definition is shown in the listing below. The code itself is trivial and does not require any further explanation.

def list_files(rootPath, validExtensions=(".jpg", ".jpeg", ".png")):
    # loop over the directory structure
    for (rootDir, dirNames, filenames) in os.walk(rootPath):
        # loop over every file in the current directory
        for filename in filenames:

            # determine the file extension of the current file
            ext = filename[filename.rfind("."):].lower()

            # check to see if the file is an image and should be processed
            if validExtensions is None or ext.endswith(validExtensions):
                # construct the path to the image and yield it
                image_Path = os.path.join(rootDir, filename)
                yield image_Path

Extracting Embeddings

The file contains code for extracting the embeddings in a given image. We first load the face detector model by calling the readNetFromCaffe function:

# Load the OpenCV’s Caffe-based deep learning face detector model
print("[EXEC] Loading face detector model....")
detector = cv2.dnn.readNetFromCaffe(

We load the OpenCV deep learning Torch embedder model by calling the readNetFromTorch method. The model extracts a 128-d facial embedding vector.

# Load the embedder model to extract a 128-D facial embedding vector
# It contains the OpenCV deep learning Torch embedding model.
print("[EXEC] Loading face recognizer model....")
embedder = cv2.dnn.readNetFromTorch("openface_nn4.small2.v1.t7")

We now get the list of images captured in a folder (my_dataset) set up in our configuration file.

print("[EXEC] Reading Image Paths.....")
# Discrete each image path into a list
imagePaths = list(list_files(rootPath="my_dataset"))

We declare two arrays for storing the embeddings and the corresponding names.

knownEmbeddings = []
knownNames = []
total = 0

We iterate through all the images:

# Iterate over every single image
for (i, imagePath) in enumerate(imagePaths):
    print("Processing image {} of {}".format(i + 1, len(imagePaths)))

We load each image, resize and preprocess it:

    # Extract name of the image
    name = imagePath.split(os.path.sep)[-2]
    image = cv2.imread(imagePath)
    image = cv2.resize(image, dsize=(750, 600))
    # Height and Width
    (h, w) = image.shape[:2]

    # Pre-process image by Mean subtraction, Resize and scaling by some
    # factor
    imageBlob = cv2.dnn.blobFromImage(cv2.resize(image, (300, 300)), 1.0,
                                      (300, 300),(104.0, 177.0, 123.0),
                                      swapRB=False, crop=False)

We pass the processed image to the detector.

    # Detect possible face detection in image with the detector model
    detections = detector.forward()

If faces are found, we check the confidence in each detection:

    # Proceed if Faces is detected
    if len(detections) > 0:
        # Index detection with highest detected face confidence
        i = np.argmax(detections[0, 0, :, 2])
        confidence = detections[0, 0, i, 2]

If the confidence level is above our threshold, we determine the bounding box for the detected face:

        # Proceed if detected face confidence is above min_confidence
        if confidence > MIN_CONFIDENCE:
            # Bounding box of face detected
            box = detections[0, 0, i, 3:7] * np.array([w, h, w, h])
            (startX, startY, endX, endY) = box.astype("int")

If the bounding box is less than 20x20 pixels, we ignore the detection considering that the image is too small. We will wait for the person to come closer to our webcam.

            # Extract the ROI (region of interest)
            face = image[startY:endY, startX:endX]
            (fH, fW) = face.shape[:2]

            # Make sure ROI is sufficiently large
            if fW < 20 or fH < 20:

If the detected image is large enough to process, we extract the embeddings and add it to our earlier declared arrays:

            # Now we pre-process the our ROI i.e face detected
            faceBlob = cv2.dnn.blobFromImage(face, 1.0 / 255, (96, 96),
                                             (0, 0, 0), swapRB=True,

            # Use embedder model to extract 128-d face embeddings
            vec = embedder.forward()

            # Append the name and the embedding vector
            total += 1

We save the captured data into the output folder in the embeddings.pickle file.

print("[EXEC] Collecting {} encodings vectors...".format(total))
# Each encoding/embedding is 128-D vector
data = {"embeddings": knownEmbeddings, "names": knownNames}

# Save the embedding to a file
f = open("output/embeddings.pickle", "wb")

After creating the embeddings, we proceed to train our model.

Model Training

The model training code is stored in file. We first load embeddings created in the previous step:

print("[EXEC] Loading face embeddings....")
data = pickle.loads(open("output/embeddings.pickle", "rb").read())

# Encode the labels
print("[EXEC] Encoding labels...")
le = LabelEncoder()
labels = le.fit_transform(data["names"])

We use the Support Vector Machine (SVM) classifier to train the model:

print("[EXEC] Training model...")
recognizer = SVC(C=10.0, kernel="poly", degree=7, probability=True)["embeddings"], labels)

We save the trained and label encoder models into the .pickle files, we use the following code:

# Save the Face Recognizer Model
f = open("output/recognizer.pickle", "wb")

# Save the Label encoder model
f = open("output/le.pickle", "wb")

Now, we are ready to start our webcam and do face detection in real-time.

Person Detection

The entire face recognition and WhatsApp notification code are available in the file. In this file, we first define a class called Activities. This class is used as a utility class for loading the models and embeddings to store the captured frames in a file and send the notification.

class Activities:

In the class initialization, we load the JSON configurations, detector and recognizer models, our face embeddings, and the label encoder. The init function definition is given in the following listing:

    def __init__(self):
        # Initialize Twilio Client to send whatsapp message
        self.client = Client()
        self.details_json = {
            "person1_name": {
                "timestamp": 0,
                "captures": 0,
                "total_capture": 0,
                "image_sent": False
            "person2_name": {
                "timestamp": 0,
                "captures": 0,
                "total_capture": 0,
                "image_sent": False

        self.FROM_NUMBER = "whatsapp:+14155238886"
        self.TO_NUMBER = "whatsapp:+<YOUR MOBILE NUMBER>"
        self.min_confidence = 0.55
        self.max_captures = 1
        self.min_time_gap = 480
        self.show_video = True

        # Load OpenCV’s Caffe-based deep learning face detector model
        print("[EXEC] Loading face detector...")
        self.detector = cv2.dnn.readNetFromCaffe(

        # Load our face embeddings
        print("[EXEC] loading face embeddings...")
        self.embedder = cv2.dnn.readNetFromTorch(

        # Load the recogniser model
        print("[EXEC] loading face recognizer...")
        self.recognizer = pickle.loads(open("output/recognizer.pickle",

        # Load the Label encoder
        self.le = pickle.loads(open("output/le.pickle", "rb").read())

We write a method called store_frame for storing the captured image to a file.

    # save the captured frame in a file
    def store_frame(self, get_name, op_frame):
        p = os.path.sep.join(["motions_caught", "{}.png".format(
        cv2.imwrite(p, op_frame)

We send the WhatsApp message using the following method:

    # send whatsapp message to specific number via TWILIO API
    def send_message(self, get_name, timestamp):
            body='Last seen: {} \[email protected] {}'.format(
                str(timestamp.strftime("%d %B %Y %I:%M%p"))),
        print("WhatsApp message sent")

After the Activities class definition, we instantiate it and start the Video capture from the built-in Webcam. You may set up another camera in your home surveillance system by passing the appropriate parameter in the VideoCapture method.

# initialize Activities 
activity = Activities()

# Load videostream
print("[EXEC] starting video stream...")
cap = cv2.VideoCapture(0)
# Let the camera sensor warm-up

frame_avg = None
last_seen =

while True:
    # Read frames
    ret, frame =
    # Current timestamp
    curr_timestamp =

    # Frame resize
    frame = cv2.resize(frame, dsize=(750, 600))
    # Height and width of the frame
    (h, w) = frame.shape[:2]

We extract the blobs from each frame and do face detection in it by using the previously loaded detector.

    # Pre-process image by Mean subtraction, Resize and
    # scaling by some factor
    imageBlob = cv2.dnn.blobFromImage(cv2.resize(frame, (300, 300)),
                                      1.0, (300, 300),
                                      (104.0, 177.0, 123.0),
                                      swapRB=False, crop=False)
    # Detect possible face detection in image with the detector model
    detections = activity.detector.forward()

From each detected face, we check the confidence level of detection and then extract the face blob if it meets our minimum threshold size of 20x20 pixels. This is done in the following code:

    # Now loop over each detection
    for i in range(0, detections.shape[2]):
        confidence = detections[0, 0, i, 2]
        # Proceed if detected face confidence is above min_confidence
        if confidence > activity.min_confidence:
            box = detections[0, 0, i, 3:7] * np.array([w, h, w, h])
            (startX, startY, endX, endY) = box.astype("int")
            # Extract the ROI (region of interest)
            face = frame[startY:endY, startX:endX]
            (fH, fW) = face.shape[:2]
            # Make sure ROI is sufficiently large
            if fW < 20 or fH < 20:
            # Now we pre-process the our ROI i.e face detected
            faceBlob = cv2.dnn.blobFromImage(face, 1.0 / 255,
                                             (96, 96), (0, 0, 0),
                                             swapRB=True, crop=False)

We now extract the embedding in the above-extracted face blob.

            # Use embedder model to extract 128-d face embeddings
            vec = activity.embedder.forward()

Now, use the recognizer to identify the face.

            # Now make predictions based on our recognizer model
            preds = activity.recognizer.predict_proba(vec)[0]
            # store the index prediction maximum probability
            j = np.argmax(preds)
            # store the prediction maximum probability
            proba = preds[j]
            # extract the name of the prediction
            name = activity.le.classes_[j]

If the name is known, we send a WhatsApp message, store the image, and reset the timer. Note that we send the WhatsApp message only once in an 8 minutes. You can configure this period by setting the min_time_gap variables in the Activities class.

            if name == "unknown":
                # For first frame recog or appear after 8 mins
                # send whatsapp notification
            elif activity.details_json[name]["captures"] == 0 or \
                    (activity.details_json[name]['timestamp'] -
                     curr_timestamp).total_seconds() >= \
                # start new thread and call store_frame function

                # to keep track of captured
                activity.details_json[name]["captures"] = \
                    activity.details_json[name]["captures"] + 1
                    get_name=name, timestamp=curr_timestamp)).start()
                    op_frame=frame, get_name=name)).start()

                # change the timestamp when the frame was captured
                activity.details_json[name]['timestamp'] = curr_timestamp

If you want to see the live video during this face and person detection, use the following code:

    # show the current videostream or not
    if activity.show_video:
        # display the security feed
        cv2.imshow("Output", frame)
        key = cv2.waitKey(1) & 0xFF
        # if the `q` key is pressed, break from the lop
        if key == ord("q"):

In the end, release the Video capture and destroy all windows.



In this tutorial, you learned how to use OpenCV and pre-trained Caffe model to capture a live video stream, detect faces in it, and match the detected face to previously trained embeddings of a known person. The application can find its use in many situations, as an example cited in this tutorial is to get a notification when your kid returns home from the school or play. To ensure that you are not flooded with the messages, the application sends out notifications only once in a configurable 8 minute time interval. Setting up the known embeddings is easy; you need to capture only 35-40 images of the person and store them in the appropriate folder and re-train the model. The face detection and recognition accuracy are very good. This is attributed to the pre-trained models that we used.

Source: Download the project source from our Repository.