Ruchi

| Technical Writer: ABCOM Team | Technical Review: Aaditya Damle | Level: Intermediate | Banner Image Source : Internet |

Overview

"The advance of technology is based on making it fit in so that you don't really even notice it, so it's part of everyday life." - Bill Gates

Voice and Gestures are the widely used ways of communication. We have voice controlled gadgets like Alexa and Siri-enabled devices to control our homes using voice commands. We do have gesture controlled TVs. Even the dashboards on top-of-the-segment luxury cars are gesture controlled. The video games are gesture-controlled. In this tutorial, I will show you how to use gestures to play a game on your computer.

Before you start learning the technique, have a quick look at the following video that gives you an idea of what are you going to achieve at the end of this tutorial.

Demo

In this video, you can observe me playing my all time favourite game since my childhood. The game is called Snake Game. You control the snake’s position on the screen with gestures. Your job is to ensure that the snake does not hit the wall boundaries, if it does, the game is over. I used a blue-colored object as a pointer for guiding the snake. The video capture detects the pointer position in the virtual space of the captured window. Depending on which quadrant the pointer is, it fires the four arrow keys - LEFT, RIGHT, UP, DOWN.

I will now show you how to write an application that detects the gestures and uses it for controlling another application. Before we go into the coding, I suggest you install this application on your computer and get familiar with its operations. That way, you will feel more confident while learning the code behind.

Playing it On Your Machine

Running the gesture controlled App on your machine is trivial, just follow these steps:

Before you run the application and the snake game on your computer, you will need to install the few packages.

Installations

You need the following installations for creating and running the project in this tutorial.

It goes without saying that you need Python installed on your machine. You will need to install OpenCV, which is an Open Source Computer Vision library. You will use several functions from this library for image processing and gesture recognization. You will use Pyautogui for transforming the gestures into the game actions, mainly firing the keypressed events in the game. You use Pygame for running the Snake game and imutils for capturing the video. After these packages are installed, continue with the steps listed below.

Downloading Project Code

Download the project code from our repository. The entire project code consists of the following four python files.

  • gameControl.py
  • directKeys.py
  • snakeGame.py
  • settingsSnakeFun.py

First, you will run the snake game and then the game control application in another window.

Running Snake Game

Open the Python command prompt and run the following two python files.

python settingsSnakeFun.py
python snakeGame.py

You will see the snake game window on your screen as shown here:

image01

Running Game Control

Start another command prompt. Run the Application by starting the following two Python programs.

python directKeys.py
python gameControl.py

You will now have our game control app with the webcam started on your machine. Your desktop at this stage looks like following:

image02

Hold a blue color object in your palm and start controlling the game by moving this object. After you familiarize yourself with the application functionality, proceed to dig into the application source.

I will now explain to you the entire code.

Understanding Code

As you noticed earlier, the entire application consists of four Python files:

  • gameControl.py
  • directKeys.py
  • snakeGame.py
  • settingsSnakeFun.py

The main program file is gameControl.py. I will now explain the code in this file.

Importing Modules

First of all we import all the required modules.

from imutils.video import VideoStream
import numpy as np

import cv2
import imutils
import pyautogui
from directkeys import  up, left, down, right
from directkeys import PressKey, ReleaseKey

You use imutils for capturing real time video, numpy for converting and processing image data into arrays, and cv2 for image processing. You will use CV2 provided functions for image blurring, etc. The time module is for pausing the game, pyautogui is used for pressing the keys virtually and directkey is for defining key controls.

Set Color Range

In the demo, I asked you to use a Blue color object for controlling the game. This “Blue” color is defined in the following two statements.

blueLower = np.array([50,50,50])
blueUpper = np.array([180,180,155])

Note that the color values are entered in hsv (hue, saturation, value) format and are not the typical RGB values. This is the requirement for OpenCV as it uses hsv format for image processing. If you decide to use some other color for controlling the game, you will need to modify the values specified in these statements.

Declare Variables

We will declare a few variables here:

The Python set declared in the statement below serves as a type-ahead buffer for the user keystrokes.

current_key = set()

We define the radius for the bounding circle of the pointer object as follows:

radius_of_circle = 15

The window size for the grabbed frame is defined as follows:

window_size = 160

Start Video

Start the video capture by calling the VideoStream function of imutils. The source specified in the statement below is 0, which indicates the built-in webcam.

video = VideoStream(src=0).start();

Start Capturing

Create an infinite loop for a continuous video capture. The video capture will be stopped when the user presses the “q” key. To ensure that all keys are released initially, create a keyPressed variable and set it to false.

while True:
    keyPressed = False

Grab a video frame from the continuous stream.

    grabbed_frame = video.read()
    height,width = grabbed_frame.shape[:2]

Convert to HSV

HSV is a color scale just like BGR scale, H represents hue, S represents saturation and V represents value. The range of HUE, SATURATION and VALUE are respectively 0-179, 0-255 and 0-255. HUE represents the color, SATURATION represents the amount of respective color that is mixed with white and VALUE represents the amount of respective color that is mixed with black whereas RGB is defined on the basis of combination of primary colors, so RGB is related to intensity and we can not separate color information from intensity whereas HSV can be used to separate color information from luminance (intensity). The reference at the end of this tutorial can give you more information on hsv and its difference with RGB.

We now resize and blur the captured image so as to get a smoother image. It will be easier to detect a blob of blue color (our pointer object in the demo) on a blurred image than the original high-quality image. We blur the image using the gaussian blur feature of cv2. After blurring we convert the image into hsv scale.

    grabbed_frame = imutils.resize(grabbed_frame, width=600)
    blur_frame = cv2.GaussianBlur(grabbed_frame, (11, 11), 0)
    hsv_value = cv2.cvtColor(blur_frame, cv2.COLOR_BGR2HSV)

Creating Cover for Pointer Object

We will create a cover (a sort of mask) for the pointer object. It is defined as follows:

    cover = cv2.inRange(hsv_value, blueLower, blueUpper)

Note that the cover sets the range for the pixel values between the two blue color values, which we defined earlier.

For better object detection, the morphological operators called erosion and dilation are applied to an image. We use erosion to remove noise from the images. The erosion causes the convolution of the image that results in minimizing the features. The dilation is just the opposite of erosion; it dilates the image and is used for increasing the area of the object. We should always use dilation after erosion because when we erode an image it shrinks the image, when we dilate it increases the image area again. In the whole process, the noise in the image is removed making it easier to detect the object. You can find more information about Eroding and Dilating in the reference section. The two images below show the effect of erosion and dilation.

image03

Image source: Eroding and Dilating

    cover = cv2.erode(cover, None, iterations=2)
    cover = cv2.dilate(cover, None, iterations=2)

Now, divide the frame into two halves one for up and down arrow keys and the other half for left and right arrow keys.

    left_cover = cover[:,0:width//2,]
    right_cover = cover[:,width//2:,]

Contours for Object Detection

We will use contours for detecting our pointer object in the image. A contour is a curve joining all the continuous points (along the boundary), having the same color or intensity. The contours are generally used for object detection and recognition. We will define two contours, one for the left side of the screen and the other one for the right side. The following screenshot will give you the idea of what the contour is.

image01

As you can see in the image, I am using a circular contour marked in yellow color with its centroid marked in red. Look up the reference section for getting more information on contours.

We use findContours method of OpenCV for detecting contours on both left and right sides of the image.

# for left side
    contour_l = cv2.findContours(left_cover.copy(), 
        cv2.RETR_EXTERNAL,
        cv2.CHAIN_APPROX_SIMPLE)
    contour_l = imutils.grab_contours(contour_l)
    left_centre = None
 
    #for right side
    contour_r = cv2.findContours(right_cover.copy(), 
        cv2.RETR_EXTERNAL,
        cv2.CHAIN_APPROX_SIMPLE)
    contour_r = imutils.grab_contours(contour_r)
    right_centre = None

If a contour is detected, we find its area using contourArea and the centroid using moments as follows:

if len(contour_l) > 0:
        #for creating a circular contour with centroid
        c = max(contour_l, key=cv2.contourArea)
        ((x, y), r) = cv2.minEnclosingCircle(c)
        M = cv2.moments(c)
 
        left_centre = (int(M["m10"] / (M["m00"]+0.000001)), int(M["m01"] / (M["m00"]+0.000001)))

If the radius of the detected contour is more than our set value (radius_of_circle), we grab the circular frame and draw a circle around it.

if r > radius_of_circle:
            # draw the circle and centroid on the frame,
            cv2.circle(grabbed_frame, (int(x), int(y)), int(r),
                (0, 255, 255), 2)
            cv2.circle(grabbed_frame, left_centre, 5, (0, 0, 255), -1)

Setting Virtual Control for Keys

As we have detected our pointer object along with its position (either left or right half of the screen), we will set the LEFT or RIGHT arrow keys using the pyautogui press method.


            if left_centre[1] < (height/2 - window_size //2):
                cv2.putText(grabbed_frame ,'LEFT',(20,50),cv2.FONT_HERSHEY_SIMPLEX,1,(255,255,255),2)
                pyautogui.press('left')
                current_key.add(left)
                keyPressed = True
                keyPressed_lr=True

            elif left_centre[1] > (height/2 + window_size //2):
                cv2.putText(grabbed_frame,'RIGHT',(20,50),cv2.FONT_HERSHEY_SIMPLEX,1,(255,255,255),2)
                pyautogui.press('right') 
                current_key.add(right)
                keyPressed = True
                keyPressed_lr=True

We also print the text at the top of the window to indicate to the user which key is pressed.

image01

Likewise, we will detect contours in upper and lower portions of the screen and set the UP/DOWN keys accordingly. The above screenshot illustrates this for a DOWN keypress.

Note, when we generate the keypressed event using pyautogui, the event goes to the active application on our screen. So you have to make sure that the snake game application window is active while moving our blue color pointer in the image capture window.

Display the captured frame

We now display the grabbed frame by making a copy of it and drawing the blue lines on top and bottom as tangents to the enclosing circle. These lines will indicate the area in which the key pressed will operate.

    grabbed_frame_copy = grabbed_frame.copy()
    grabbed_frame_copy = cv2.rectangle(grabbed_frame_copy,(0,height//2 - window_size //2),(width,height//2 + window_size //2),(255,0,0),2)
    cv2.imshow("grabbed_frame", grabbed_frame_copy)

The following screenshot shows the bounding area for key press within the two blue lines.

image01

Releasing Keys

To avoid the glitches during our infinite loop, we keep on emptying our buffer, current_key in every iteration.

    if not keyPressed and current_key!= 0:
        for key in current_key:

            ReleaseKey(key)
            current_key=set()

The function ReleaseKey is defined in directKeys.py file.

Terminating Loop

We terminate the loop when the user presses the “q” key. On termination, we stop the video capture and destroy all the windows.

    k = cv2.waitKey(1) & 0xFF

    # to stop the loop
    if k == ord("q"):
        break
 
video.stop() 
# close all windows when you will stop capturing video
cv2.destroyAllWindows()

Snake Game

The source code for the snake game is provided to you for testing the above application. The settings for the game are done in settingsSnakeFun.py file. Discussing this code here is beyond the scope of this tutorial. To test the application, you do not need to understand the game code.

Summary

In this tutorial, you learned how to use OpenCV to detect a pointer object of a specific color in a real-time video capture. The position on the screen of the detected pointer was converted into LEFT, RIGHT and UP, DOWN arrow keys. You used pyautogui to convert this object detection into the action keys. The action keys are applied to the active window on your desktop, which in our case was a snake game. You may apply this technique to control any other game of your choice.

References

image