读书笔记《Deep Learning for Computer Vision with Python》- 第一卷

下载地址

链接：https://pan.baidu.com/s/1hqtBQf6jRJINx4AgB8S2Tw
提取码：zxkt

第一卷第二十三章案例：微笑检测

Smile detection with OpenCV, Keras, and TensorFlow - PyImageSearchhttps://www.pyimagesearch.com/2021/07/14/smile-detection-with-opencv-keras-and-tensorflow/ 在本章中，我们将构建一个完整的端到端应用程序，该应用程序可以使用深度学习和传统计算机视觉技术实时检测视频流中的微笑。

为了完成这项任务，我们将在包含微笑和不微笑人脸的图像数据集上训练 LetNet 架构。一旦我们的网络训练完毕，我们将创建一个单独的 Python 脚本——这个脚本将通过 OpenCV 的内置 Haar 级联人脸检测器检测图像中的人脸，从图像中提取感兴趣的人脸区域 (ROI)，然后传递 ROI 通过 LeNet 进行微笑检测。

1、微笑数据集

SMILES 数据集由微笑或不微笑的人脸图像组成。数据集中总共有 13,165 张灰度图像，每张图像的大小为 64×64 个像素。

需要注意以下两个问题：

1、因为我们的输入图像不仅会包含人脸，还会包含图像的背景，我们首先需要在图像中定位人脸并提取人脸 ROI，然后才能通过它通过我们的网络进行检测。幸运的是，使用传统的计算机视觉方法（例如 Haar 级联）实现。

2、第二个问题是类不平衡。虽然数据集中有 13,165 张图像，但其中 9,475 个示例没有微笑，而只有 3,690 个属于微笑类。鉴于“不微笑”图像与“微笑”示例的数量超过 2.5 倍，我们在设计训练程序时需要小心。

数据集下载

链接：https://pan.baidu.com/s/1drcFH4Lo7xaoVNXFfj2XGw
提取码：89cj

2、训练微笑CNN

创建一个名为 train_model.py 的新文件:

代码中的“from pyimagesearch.nn.conv import LeNet”，再前些章的代码内有，可以自行提出来。

# import the necessary packages
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from tensorflow.keras.preprocessing.image import img_to_array
from tensorflow.keras.utils import to_categorical
from pyimagesearch.nn.conv import LeNet
from imutils import paths
import matplotlib.pyplot as plt
import numpy as np
import argparse
import imutils
import cv2
import os

# construct the argument parse and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-d", "--dataset", required=True,
	help="path to input dataset of faces")
ap.add_argument("-m", "--model", required=True,
	help="path to output model")
args = vars(ap.parse_args())
# initialize the list of data and labels
data = []
labels = []

# loop over the input images
for imagePath in sorted(list(paths.list_images(args["dataset"]))):
	# load the image, pre-process it, and store it in the data list
	image = cv2.imread(imagePath)
	image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
	image = imutils.resize(image, width=28)
	image = img_to_array(image)
	data.append(image)
	# extract the class label from the image path and update the
	# labels list
	label = imagePath.split(os.path.sep)[-3]
	label = "smiling" if label == "positives" else "not_smiling"
	labels.append(label)

# scale the raw pixel intensities to the range [0, 1]
data = np.array(data, dtype="float") / 255.0
labels = np.array(labels)
# convert the labels from integers to vectors
le = LabelEncoder().fit(labels)
labels = to_categorical(le.transform(labels), 2)

# calculate the total number of training images in each class and
# initialize a dictionary to store the class weights
classTotals = labels.sum(axis=0)
classWeight = dict()
# loop over all classes and calculate the class weight
for i in range(0, len(classTotals)):
	classWeight[i] = classTotals.max() / classTotals[i]

# partition the data into training and testing splits using 80% of
# the data for training and the remaining 20% for testing
(trainX, testX, trainY, testY) = train_test_split(data,
	labels, test_size=0.20, stratify=labels, random_state=42)

# initialize the model
print("[INFO] compiling model...")
model = LeNet.build(width=28, height=28, depth=1, classes=2)
model.compile(loss="binary_crossentropy", optimizer="adam",
	metrics=["accuracy"])
# train the network
print("[INFO] training network...")
H = model.fit(trainX, trainY, validation_data=(testX, testY),
	class_weight=classWeight, batch_size=64, epochs=15, verbose=1)

# evaluate the network
print("[INFO] evaluating network...")
predictions = model.predict(testX, batch_size=64)
print(classification_report(testY.argmax(axis=1),
	predictions.argmax(axis=1), target_names=le.classes_))
# save the model to disk
print("[INFO] serializing network...")
model.save(args["model"])

# plot the training + testing loss and accuracy
plt.style.use("ggplot")
plt.figure()
plt.plot(np.arange(0, 15), H.history["loss"], label="train_loss")
plt.plot(np.arange(0, 15), H.history["val_loss"], label="val_loss")
plt.plot(np.arange(0, 15), H.history["accuracy"], label="acc")
plt.plot(np.arange(0, 15), H.history["val_accuracy"], label="val_acc")
plt.title("Training Loss and Accuracy")
plt.xlabel("Epoch #")
plt.ylabel("Loss/Accuracy")
plt.legend()
plt.show()

训练模型：

$ python train_model.py --dataset ../datasets/SMILEsmileD 
	--model output/lenet.hdf5
[INFO] compiling model...
[INFO] training network...
Train on 10532 samples, validate on 2633 samples
Epoch 1/15
8s - loss: 0.3970 - acc: 0.8161 - val_loss: 0.2771 - val_acc: 0.8872
Epoch 2/15
8s - loss: 0.2572 - acc: 0.8919 - val_loss: 0.2620 - val_acc: 0.8899
Epoch 3/15
7s - loss: 0.2322 - acc: 0.9079 - val_loss: 0.2433 - val_acc: 0.9062
...
Epoch 15/15
8s - loss: 0.0791 - acc: 0.9716 - val_loss: 0.2148 - val_acc: 0.9351
[INFO] evaluating network...
             precision    recall  f1-score   support
not_smiling       0.95      0.97      0.96      1890
    smiling       0.91      0.86      0.88       743
avg / total       0.93      0.94      0.93      2633
[INFO] serializing network...

在 SMILES 数据集上训练的 LeNet 架构的学习曲线图。十五个时期后，我们在测试集上获得了 93% 的分类准确率

在第 6 个时期后，我们的验证损失开始停滞不前——在第 15 个时期后进一步训练会导致过度拟合。如果需要，我们将通过使用更多训练数据来提高微笑检测器的准确性，方法是：

1. 收集额外的训练数据。

2. 应用数据增强来随机平移、旋转和移动我们现有的训练集。

数据增强在 Practitioner篇中有详细介绍。

3、实时运行微笑CNN

创建一个新文件，detect_smile.py，代码如下：

# import the necessary packages
from tensorflow.keras.preprocessing.image import img_to_array
from tensorflow.keras.models import load_model
import numpy as np
import argparse
import imutils
import cv2

# construct the argument parse and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-c", "--cascade", required=True,
	help="path to where the face cascade resides")
ap.add_argument("-m", "--model", required=True,
	help="path to pre-trained smile detector CNN")
ap.add_argument("-v", "--video",
	help="path to the (optional) video file")
args = vars(ap.parse_args())

# load the face detector cascade and smile detector CNN
detector = cv2.CascadeClassifier(args["cascade"])
model = load_model(args["model"])
# if a video path was not supplied, grab the reference to the webcam
if not args.get("video", False):
	camera = cv2.VideoCapture(0)
# otherwise, load the video
else:
	camera = cv2.VideoCapture(args["video"])

# keep looping
while True:
	# grab the current frame
	(grabbed, frame) = camera.read()
	# if we are viewing a video and we did not grab a frame, then we
	# have reached the end of the video
	if args.get("video") and not grabbed:
		break
	# resize the frame, convert it to grayscale, and then clone the
	# original frame so we can draw on it later in the program
	frame = imutils.resize(frame, width=300)
	gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
	frameClone = frame.copy()
	# detect faces in the input frame, then clone the frame so that
	# we can draw on it
	rects = detector.detectMultiScale(gray, scaleFactor=1.1, 
		minNeighbors=5, minSize=(30, 30),
		flags=cv2.CASCADE_SCALE_IMAGE)
	# loop over the face bounding boxes
	for (fX, fY, fW, fH) in rects:
		# extract the ROI of the face from the grayscale image,
		# resize it to a fixed 28x28 pixels, and then prepare the
		# ROI for classification via the CNN
		roi = gray[fY:fY + fH, fX:fX + fW]
		roi = cv2.resize(roi, (28, 28))
		roi = roi.astype("float") / 255.0
		roi = img_to_array(roi)
		roi = np.expand_dims(roi, axis=0)
		# determine the probabilities of both "smiling" and "not
		# smiling", then set the label accordingly
		(notSmiling, smiling) = model.predict(roi)[0]
		label = "Smiling" if smiling > notSmiling else "Not Smiling"
		# display the label and bounding box rectangle on the output
		# frame
		cv2.putText(frameClone, label, (fX, fY - 10),
			cv2.FONT_HERSHEY_SIMPLEX, 0.45, (0, 0, 255), 2)
		cv2.rectangle(frameClone, (fX, fY), (fX + fW, fY + fH),
			(0, 0, 255), 2)
	# show our detected faces along with smiling/not smiling labels
	cv2.imshow("Face", frameClone)
	# if the 'q' key is pressed, stop the loop
	if cv2.waitKey(1) & 0xFF == ord("q"):
		break
# cleanup the camera and close any open windows
camera.release()
cv2.destroyAllWindows()

输入如下代码运行：

haarcascade_frontalface_default.xml需要在opencv的安装或者源码中找一下。

#实时检测
$ python detect_smile.py --cascade haarcascade_frontalface_default.xml 
	--model output/lenet.hdf5 

#视频检测
$ python detect_smile.py --cascade haarcascade_frontalface_default.xml 
	--model output/lenet.hdf5 --video path/to/your/video.mov

4、小结

在本章中，我们学习了如何构建端到端的计算机视觉和深度学习应用程序来执行微笑检测。为此，我们首先在 SMILES 数据集上训练 LeNet 架构。

由于 SMILES 数据集中的类不平衡，我们发现了如何计算类权重以帮助缓解问题。

训练完成后，我们在测试集上评估了 LeNet，发现该网络获得了可观的 93% 分类准确率。通过收集更多的训练数据或对现有训练数据应用数据增强，可以获得更高的分类精度。

然后我们创建了一个 Python 脚本来从网络摄像头/视频文件中读取帧，检测人脸，然后应用我们的预训练网络。为了检测人脸，我们使用了 OpenCV 的 Haar 级联。一旦检测到人脸，它就会从框架中提取出来，然后通过 LeNet 来确定这个人是在微笑还是没有微笑。总的来说，我们的微笑检测系统可以使用现代硬件轻松地在 CPU 上实时运行。

读书笔记《Deep Learning for Computer Vision with Python》- 第一卷 - 第23章案例：微笑检测

Python相关栏目本月热门文章

读书笔记《Deep Learning for Computer Vision with Python》- 第一卷 - 第23章 案例：微笑检测

Python相关栏目本月热门文章

读书笔记《Deep Learning for Computer Vision with Python》- 第一卷 - 第23章案例：微笑检测