Arka

| Technical Review: ABCOM Team | Level: Intermediate | Banner Image Source : Internet |

Introduction

In the first part of this tutorial, you learned to use a super fast, real-time object detection model, called YOLO. You used it for classifying objects in an image, video and even a live video stream. In this part of the series, you would learn image segmentation techniques. A picture below will give you an overview of what we mean by segmentation. We mask each child in the picture with a certain color. The masking transparency is controllable.

image01

What is Image Segmentation?

Image Segmentation is a process in computer vision that is used to divide an image into multiple segments. It locates different objects in an image along with their boundaries and draws a mask around the object. Thus image segmentation provides a pixel-wise mask around a recognised object and helps us understand or analyze the image in a better way. The segmentation is of two types.

  • Semantic segmentation
  • Instance segmentation

Semantic segmentation is classifying each pixel belonging to a particular label/class. It gives a different labeled mask for distinct classes/instances and gives the same labeled mask to all the objects of the same category/class. Instance segmentation gives a different labeled mask to each detected object irrespective of the class it belongs to. The following picture will clarify the difference between the two types of segmentation.

image03

How to Do Segmentation?

In this tutorial, I am going to show you two different ways of performing segmentation.

  • Using Pixellib - ready to use
  • Using mcrnn for customized segmentation

The Pixellib library provides a ready-to-use solution for image segmentation. Using mcrnn you will gain some control on the segmentation process by writing an additional code of your own.

Installing Software

Install the Pixellib library using following command:

pip install pixellib

Next, install the MaskRCNN library. You will be installing the Mask RCNN module provided by Matterport. This supports TensorFlow version 1.x and has not been upgraded to the latest version 2.x of TensorFlow, as of this writing. You can download the original version from Matterport’s GitHub repository. Fortunately, an updated version of this library is available in akTwelve GitHub repository.

You need to install these two mentioned packages on your machine. Follow the instructions below:

  • Unzip the downloaded file (Mask_RCNN-master.zip), locate the setup.py file. It will be under this path Mask_RCNN-master/setup.py
  • Run the installer:
python setup.py install

You also need the pre-trained weights for the model and the COCO dataset names file. Download these from the following links:

Finally, download the source program for this tutorial from our Github repository.

After the installation, your folder structure should look like this. If required, rearrange the files in the respective folders.

image03

Creating Project

Create a project using any of your favorite IDE, such as Anaconda, Sypder. Alternatively, you may use a simple Python editor to try out this project. The project comprises just two Python files, one (Image_Segmentation_pixellib_PART2.py) does segmentation using Pixellib library. The other one (Image_Segmentation_mrcnn_PART2.py) uses mrcnn to perform the customized segmentation.

As I described earlier, the image segmentation can be an instance segmentation or a semantic segmentation. I will show you both types of segmentations. You will learn the segmentation with two different techniques:

  • Segmentation using Pixellib library
  • Doing the segmentation on your own from scratch using Mask RCNN

Segmentation Using Pixellib

PixelLib is a library created for performing image and video segmentation using a few lines of code. It is a flexible library created to allow easy integration of image and video segmentation into software solutions.

Importing Libraries

Use the following imports to add the libraries to your project.

import numpy as np
import cv2
import warnings
warnings.filterwarnings('ignore')
import pixellib
from pixellib.instance import instance_segmentation
from pixellib.semantic import semantic_segmentation

Model Initialization

Load and initialize the two models for instance and semantic segmentation using the following code:

#Semantic Segmentation
semantic_segment = semantic_segmentation()

#we are going to use deeplabv3_xception model for ade20k dataset here
semantic_segment.load_ade20k_model("files/deeplabv3_xception65_ade20k.h5")

#Instance Segmentation
inst_segment = instance_segmentation()

#we are going to use the mask_rcnn for coco dataset here
inst_segment.load_model("files/mask_rcnn_coco.h5")

User Interface

Now that we have initialised our models, let us start with the image segmentations. I have created a trivial user interface that allows the user to select the type of input - Instance Segmentation and Semantic Segmentation, which further takes input as image, video or webcam.

inp = int(input('Choose the type of Image Segmentation : \n 1.Instance Segmentation \n 2.Semantic Segmentation \n'))

For each type of segmentation, we will be processing an image, video or a live webcam. The input selection is done using the following piece of code:

if inp == 1: #for instance
   inp1 = int(input('Choose the format for detecting objects : \n 1.Image \n 2.Video \n 3.Webcam \n'))

First, I will show you instance segmentation.

Image Segmentation

To detect an object and segment the image, simply call the segmentImage method by passing the desired image as its argument.

   if inp1 == 1: #for image
       img_ins_seg = inst_segment.segmentImage("data/image00.jpg")

Show the original image to the user using following code:

       #Uncomment next two lines if you want to see the original image
       #cv2.imshow("Image",cv2.imread('data/image00.jpg'))
       #cv2.waitKey(0)

This is the original image:

Original

Show the image after the segmentation is performed:

       #Showing the image with segmentations
       cv2.imshow('Image',img_ins_seg[1])
       cv2.waitKey(0)

The following screenshot shows the result:

image05

Video Segmentation

To process the images in a pre-recorded video, use the process_video method. The first parameter to this method is the image source. The third parameter specifies the name of the output file.

   elif inp1 == 2: #for video
       semantic_segment.process_video_ade20k("data/video00.mp4", frames_per_second= 50, output_video_name="video_seg.mp4")

To play, the video after images in video frames are segmented, use the following code:

       vid = cv2.VideoCapture('video_seg.mp4')
 
       while True:
           success, img = vid.read()
  
           cv2.imshow('Segmented Video',img)
  
           if cv2.waitKey(1) & 0xFF == ord("q"):
               cv2.destroyAllWindows()
               break

The output is shown in the video below.

Webcam Segmentation

For processing a live video captured through a webcam, you need to call the process_camera method. We will store the output of processing in the specified file named webcam_inst.mp4.

    elif inp1 == 3: #for webcam
        capture = cv2.VideoCapture(0)

        inst_segment.process_camera(capture, show_bboxes = True, frames_per_second = 15, output_video_name = "webcam_inst.mp4", show_frames = True, frame_name = "frame")

Here is a sample video from my webcam.

Now, let us look at Semantic segmentation.

Semantic Segmentation

To perform semantic segmentation, you use the earlier created instance - semantic_segment.

Image Segmentation

To perform a semantic segmentation on an image, input its name as the first parameter to segmentAsAde20k method.

   if inp1 == 1: #for image
       img_sem_seg = semantic_segment.segmentAsAde20k("data/image00.jpg", overlay = True)

Display the image after segmentation:

       cv2.imshow("Image",img_sem_seg[1])
       cv2.waitKey(0)

Here is the result:

image08

Video Segmentation

To segment the images in a video, call process_video_ade20k method by passing the name of the recorded video as the first parameter. The third parameter, as in earlier case, specifies the name of the file under which we save the processed video.

   elif inp1 == 2: #for video
       semantic_segment.process_video_ade20k("data/video00.mp4", frames_per_second= 50, output_video_name="video_seg.mp4")

Play the video file with detected segments using following code:

       vid = cv2.VideoCapture('video_seg.mp4')
 
       while True:
           success, img = vid.read()
  
           cv2.imshow('Segmented Video',img)
  
           if cv2.waitKey(1) & 0xFF == ord("q"):
               cv2.destroyAllWindows()
               break

Here is the output:

Webcam Segmentation

To process the live video stream from a webcam, call process_camera_ade20k method. The processed output will be stored under your specified file name.

   elif inp1 == 3: #for webcam
       capture = cv2.VideoCapture(0)
 
       semantic_segment.process_camera_ade20k(capture, overlay=True, frames_per_second= 15, output_video_name="webcam_seg.mp4", show_frames= True, frame_name= "frame", check_fps = True)

Here is the sample output:

Customized Segmentation

To perform Image Segmentations like Pixellib, we will be using MaskRCNN. By doing it on your own, you will not only learn how the segmentation is done, but you will also be able to get control on the output.

Import the following packages:

import numpy as np
import cv2
import warnings
warnings.filterwarnings('ignore')
import random
import colorsys

Import the required mrcnn libraries in your Python code.

from mrcnn.model import MaskRCNN
from mrcnn.config import Config
import matplotlib.pyplot as plt

Getting Classnames

First let us get the class names, so that we can label the detected objects. The YOLO object detection model was trained on Microsoft COCO[2] (Common Objects in Context), which is a large-scale object detection, segmentation, and captioning dataset. It identifies 80 different types of objects, such as truck, boat, bench, bird, dog, and horse. They provide these names in a file called coco.names. Load these names into your project using following code:

classnames = []
 
with open('files/coco.names') as f:
   classnames = f.read().rstrip('\n').split('\n')

Set up the configuration for the network. 
 
class Conf(Config):
   NAME = "test"
   GPU_COUNT = 1
   IMAGES_PER_GPU = 1
   NUM_CLASSES = 1 + 80
 
config = Conf()

Creating Model

Create the network model using the following command, passing the above set configuration.

mrcnn = MaskRCNN(mode='inference',model_dir='/logs', config=config)

Initialize the network using the pre-trained weights.

mrcnn.load_weights('files/mask_rcnn_coco.h5', by_name=True)

Let’s first create some functions in order to perform customized segmentation.

Functions for Instance Segmentation

First, we will generate random colors to mask over each segment. We are going to generate HSV (hue, saturation, value or hue, saturation, brightness) colors, since it is better interpreted by human eyes. Then we would convert the colors to RGB for the computer to understand and ultimately shuffle the color list. Here each object will have a different color (Instance Segmentation).

#generating a random color each segment
 
def random_colors_ins(N):
  
   #Generate random colors.
   #To get visually distinct colors, generating them in HSV space then
   #converting to RGB.
 
   #hsv = hue, saturation , brightness 
   hsv = [(i / N, 1, 1) for i in range(N)]
 
   #converting hsv to rgb
   colors = list(map(lambda c: colorsys.hsv_to_rgb(*c), hsv))
 
   #randomly shuffling the colors
   random.shuffle(colors)
   return colors

Next, we will create a function to apply a mask over an object using a color from our above created colors list.

#apply a mask over each segment
 
#alpha = opacity
def apply_mask_ins(image, mask, color, alpha=0.5):
#Apply the given mask to the image.
  
   for c in range(3):
       image[:, :, c] = np.where(mask == 1,
                                 image[:, :, c] *
                                 (1 - alpha) + alpha * color[c] * 255,
                                 image[:, :, c])
   return image

Ultimately, let us write the function to display the image with object masking. To do this, first we need to know the number of objects detected in our image, and we can do that by getting the number of the values in our results dictionery [refer block 1#], generated in our ‘results’ dictionary. Then we generate random colors for the number of objects [refer block 2#]. After that, for each color we apply a mask around the object using apply_mask function, after multiplying 255 to each value in a color tuple for converting to RGB format [refer block 3#], and put the object class and confidence value over the object [refer block 4#].

#displaying the segmented image
 
def display_ins(image, boxes, masks, class_ids, class_names,
                     scores=None,show_mask=True):
1# Number of instances
   N = boxes.shape[0]
   if not N:
       print("\n*** No instances to display *** \n")
   else:
       assert boxes.shape[0] == masks.shape[-1] == class_ids.shape[0]
 
2# Generate random colors
   colors = random_colors_ins(N)
 
   masked_image = image.copy()
3# Generate random colors
   for i in range(N):
       color = colors[i]
 
       #Converting color to RGB Format
       color_rgb = [255*i for i in color]
      
       y1, x1, y2, x2 = boxes[i]
 
       score = scores[i] if scores is not None else None
      
       #Applying Mask
       mask = masks[:, :, i]
       if show_mask:
           masked_image = apply_mask_ins(masked_image, mask, color)
          
4# Label
       cv2.putText(masked_image, f"{classnames[class_ids[i]].title()} : {int(score*100)}%" ,(x1,y1-5),cv2.FONT_HERSHEY_PLAIN,1.2,tuple(color_rgb),2)
      
   return masked_image

Now, let us write a few utility functions for semantic segmentation.

Functions for Semantic Segmentation

Let’s create some functions to perform semantic segmentation.

First, we will generate random colors for distinct classes. We will use HSV type as done before. Here all objects under a certain class will have the same color, unlike instance segmentation.

#generating random colors for different classes
def random_colors_seg(classnames, bright=True):
  
#Generate random colors.
#To get visually distinct colors, generating them in HSV space then
#converting to RGB.
  
   brightness = 1.0 if bright else 0.7
   N = len(classnames)
   hsv = [(i / N, 1, brightness) for i in range(N)]
   colors = list(map(lambda c: colorsys.hsv_to_rgb(*c), hsv))
   random.shuffle(colors)
   return colors

Next, we will create a dictionary which will contain the distinct classes as keys and the different colors for these keys as values.

#creating a dictionary with the different classes as keys and different colors as values for a key, using random_colors_seg for generating the different colors
colors = {i:j for i,j in zip(classnames,random_colors_seg(classnames, bright=True))}

Function to apply coloured mask over each object.

#alpha = opacity
def apply_mask_seg(image, mask, color, alpha=0.5):
#Apply the given mask to the image.
  
   for c in range(3):
       image[:, :, c] = np.where(mask == 1,
                                 image[:, :, c] *
                                 (1 - alpha) + alpha * color[c] * 255,
                                 image[:, :, c])
   return image

Function to display the segmented image (similar to the instance segmentation function display_ins). See the line where color_rgb is defined to compare with instance segmentation.

#displaying the segmented image
 
def display_seg(image, boxes, masks, class_ids, class_names,
                     scores=None,show_mask=True):
# Number of instances
   N = boxes.shape[0]
   if not N:
       print("\n*** No instances to display *** \n")
   else:
       assert boxes.shape[0] == masks.shape[-1] == class_ids.shape[0]
 
 
   masked_image = image.copy()
   for i in range(N):
 
       color_rgb = [255*i for i in colors[classnames[class_ids[i]]]]
      
       y1, x1, y2, x2 = boxes[i]
 
       score = scores[i] if scores is not None else None
      
   # Label
       cv2.putText(masked_image, f"{classnames[class_ids[i]].title()} : {int(score*100)}%" ,(x1,y1-5),cv2.FONT_HERSHEY_PLAIN,1.2,tuple(color_rgb),2)
 
   # Mask
       mask = masks[:, :, i]
       if show_mask:
           masked_image = apply_mask_seg(masked_image, mask, colors[classnames[class_ids[i]]])
 
   return masked_image

User Interface

We build a user interface similar to our earlier application:

inp = int(input('Choose the type of Image Segmentation : \n 1.Instance Segmentation \n 2.Semantic Segmentation \n'))

Instance Segmentation

We do the selection between the image, a pre-recorded video or webcam using following code:

if inp == 1: #for instance
   inp1 = int(input('Choose the format for detecting objects : \n 1.Image \n 2.Video \n 3.Webcam \n'))

Image Segmentation

Load the image using OpenCV, in which we will detect objects and segment those.

   if inp1 == 1: #for image
       img = cv2.imread('data/image00.jpg')

Next we will be generating the detections for the image

       results_list = mrcnn.detect([img], verbose=1)

The variable results_list below returns a list which contains a dictionary which contains information about the detected objects in the image

Since we require the dictionary inside the list, let’s create a new variable named ‘results and store this dictionary there.’

       results = results_list[0]

Next, we call our earlier defined function and pass our image and other parameters from the ‘results’ dictionary (which contains information about the detected objects), to perform the instance segmentation on the image.

Use the following code to do image segmentation on a given image.

       image = display_ins(img, results['rois'], results['masks'], results['class_ids']-1, classnames, results['scores'])

Display the results:

       cv2.imshow('Image',image)
       cv2.waitKey(0)

Here is the output:

image11

Video Segmentation

We capture the video by calling the VideoCapture method of OpenCV.

   elif inp1 == 2: #for video
       cap = cv2.VideoCapture('data/video00.mp4')

For video segmentation, we split the video into images, perform the segmentation on each captured frame, and display its output.

       while True:
           success, img = cap.read()
  
           results = mrcnn.detect([img], verbose=0)[0]
  
           img = display_ins(img, results['rois'], results['masks'], results['class_ids']-1, classnames, results['scores'])
  
           cv2.imshow('Vid',img)
  
           if cv2.waitKey(1) & 0xFF == ord("q"):
               cv2.destroyAllWindows()
               break

Here is a screenshot from my sample run:

image12

Webcam Segmentation

For capturing webcam video, set the device to zero.

   elif inp1 == 3: #for webcam
       cap = cv2.VideoCapture(0)

You will segment each frame in the live video stream, just the way you did for a recorded video and display the output to the user.

       while True:
           success, img = cap.read()
  
           results = mrcnn.detect([img], verbose=0)[0]
  
           img = display_ins(img, results['rois'], results['masks'], results['class_ids']-1, classnames, results['scores'])
  
           cv2.imshow('Vid',img)
  
           if cv2.waitKey(1) & 0xFF == ord("q"):
               cv2.destroyAllWindows()
               break

Here is the sample output:

Semantic Segmentation

We build a user interface like the earlier case to accept image, video or webcam as an input.

   inp1 = int(input('Choose the format for detecting objects : \n 1.Image \n 2.Video \n 3.Webcam \n'))

Image Segmentation

We call our earlier defined function and pass the image and other parameters from the results dictionary, to perform the Semantic segmentation on the image, similarly, as we did for the instance segmentation.

   if inp1 == 1: #for image
       img = cv2.imread('data/image00.jpg')
 
   #Information about the objects detected the image
       results_list = mrcnn.detect([img], verbose=1)
       results = results_list[0]
 
       image = display_seg(img, results['rois'], results['masks'], results['class_ids']-1, classnames, results['scores'])
 
       cv2.imshow('Image',image)
       cv2.waitKey(0)

Here is the sample output:

image14

Video Segmentation

Next let us do the semantic segmentation over a pre-recorded video.

   elif inp1 == 2: #for video
       cap = cv2.VideoCapture('data/video00.mp4')

Playing the video with semantic segmentation.

       while True:
           success, img = cap.read()
  
           results = mrcnn.detect([img], verbose=0)[0]
  
           img = display_seg(img, results['rois'], results['masks'], results['class_ids']-1, classnames, results['scores'])
  
           cv2.imshow('Vid',img)
  
           if cv2.waitKey(1) & 0xFF == ord("q"):
               cv2.destroyAllWindows()
               break

Here is a screenshot from my sample run:

image15

Webcam Segmentation

   elif inp1 == 3: #for webcam
       cap = cv2.VideoCapture(0)

Playing the video with semantic segmentation.

       while True:
           success, img = cap.read()
  
           results = mrcnn.detect([img], verbose=0)[0]
  
           img = display_seg(img, results['rois'], results['masks'], results['class_ids']-1, classnames, results['scores'])
  
           cv2.imshow('Vid',img)
  
           if cv2.waitKey(1) & 0xFF == ord("q"):
               cv2.destroyAllWindows()
               break

Here is the sample output:

Summary

In this second part of the series, you learned about Image Segmentation and implemented two types of image segmentation namely, Instance Segmentation and Semantic Segmentation. First, you learned segmentation functionality provided in Pixellib library. Then, you learned the use of mrcnn library to understand the insights of segmentation and get an additional control on the segmentation process. We applied both techniques for image, a pre-recorded video and webcam.

In the next part of this tutorial, I will explain how to keep track of objects in the past frames.

Source: Download the project from our Repository

image