大家好,我是eriktse,最近在学习计算机视觉。对cv略微有点了解的小伙伴都知道,猫狗辨认是一个入门的项目,虽说这是入门级项目,可是要自己写一个神经网络仍是没那么简单的。
在这篇文章中,我将从数据处理、网络模型构建、模型练习、模型评价四步来带你亲主动手制作一个准确率达70%以上的能够实现猫狗辨认的卷积神经网络。
试验环境
深度学习结构是百度的PaddlePaddle,机器用的是AI Studio渠道。
进入AI Studio官网后点击顶部菜单中的项目
进入项目界面,再点击创立项目
,然后依照下图的设置创立一个环境。
选择Notebook
类型。
选择BML Codelab
版本。
留意在数据集的设置这儿需求自己上传kaggle猫狗辨认
的数据集,我当然已经给大家上传好了,大家搜索Kaggle猫狗辨认
即可找到我的数据集。
aistudio渠道是支撑从百度网盘导入数据集的,这个非常nice!
数据处理
咱们的数据在下面这个文件夹里边,将train.zip和test.zip解压。
train里边的图片是cat.xxx.jpg, dog.xxx.jpg,数据处理的思路是,先用os
模块获取一切图片的地址,打上标签,然后用PIL.Image
加载成tensor,再Resize
到一个固定的大小,然后进行归一化,再将多个img
和lbl
构成一个batch
。
构造适宜的batch能够加速练习进程。batch相当于是将许多个输入一起输进去,然后一次跑出成果,比较于一张一张图片处理要快许多。
经过os.listdir()
能够得到某个地址的一切子文件和子文件夹并回来一个可迭代对象。
经过os.path.join(path1, path2)
兼并两个path。
编写一个loadImagetoTensor()
函数便于咱们处理图画。
这部分需求读者有一定的图片处理根底,比如Image
模块的用法,numpy
模块的用法。
咱们这儿的图片维度标准为[3. 150, 150]表示有3个通道,每个通道尺度为[150, 150]。
import paddle, os
from PIL import Image
from paddle.vision.transforms import Resize
import numpy as np
print(paddle.get_device())
paddle.set_device('gpu:0')
data_dir = "/home/aistudio/data/data195536/train"
train_data = []
test_data = []
siz = 150 # 规则图片大小为(size, size)
batch_size = 16
data_paths = os.listdir(data_dir)
def loadImagetoTensor(path: str):
#将某个地址的图片读入并进行Transform处理,回来一个tensor
img = np.array(Image.open(path), dtype='float32') # 读取图片并转化成灰度,这样就只有一个通道了
img = Resize(size=(siz, siz))(img) # 将图片缩放到(siz, siz)
img /= 255.0
return paddle.to_tensor(img) # 回来一个(siz, siz)的tensor
# 规则0为狗,1为猫
train_data = [] # 存一个元组(图画tensor, 标签int)
test_data = []
# 为了加速试验进展,咱们取其中练习会集的前5000张
cnt = 0
for path in data_paths:
img_path = os.path.join(data_dir, path)
img = loadImagetoTensor(img_path)
img = paddle.transpose(img, [2, 0, 1])#转化维度
# 分析地址得到label
lbl = path.split('.')[0]
lbl = 0 if lbl == 'dog' else 1
train_data.append((img, lbl))
cnt += 1
if cnt == 5000:
cnt = 0
break
# 咱们取其中练习会集的前500张作为测验集
test_data = train_data[:500]
train_data = train_data[500:]
def getBatch(data: list):
imgs = []
lbls = []
img, lbl = [], []
#将data打包成batch
for idx, val in enumerate(data):
if idx > 0 and idx % batch_size == 0:
imgs.append(img)
lbls.append(lbl)
img, lbl = [], []
img.append(val[0])
lbl.append(val[1])
return paddle.to_tensor(imgs), paddle.to_tensor(lbls)
(train_imgs, train_lbls) = getBatch(train_data)
print(train_imgs.shape)
print("数据加载完成")
运转成果如图:
模型构建
模型采用3层卷积层+3层池化层+3层线性层的结构。
import matplotlib.pyplot as plt
class Model(paddle.nn.Layer):
def __init__(self):
super(Model, self).__init__()
self.conv0 = paddle.nn.Conv2D(in_channels=3, out_channels=20, kernel_size=12, padding=0)
self.pool0 = paddle.nn.MaxPool2D(kernel_size =4, stride =4)
self.conv1 = paddle.nn.Conv2D(in_channels=20, out_channels=50, kernel_size=5, padding=0)
self.pool1 = paddle.nn.MaxPool2D(kernel_size =2, stride =2)
self.conv2 = paddle.nn.Conv2D(in_channels=50, out_channels=50, kernel_size=5, padding=0)
self.pool2 = paddle.nn.MaxPool2D(kernel_size =2, stride =2)
self.fc1 = paddle.nn.Linear(in_features=1250, out_features=512)
self.fc2 = paddle.nn.Linear(in_features=512, out_features=64)
self.fc3 = paddle.nn.Linear(in_features=64, out_features=2)
def forward(self, input):
input=paddle.reshape(input, shape=[-1, 3,150,150])
x = self.conv0(input)
x = paddle.nn.functional.relu(x)
x = self.pool0(x)
x = paddle.nn.functional.relu(x)
x = self.conv1(x)
x = paddle.nn.functional.relu(x)
x = self.pool1(x)
x = paddle.nn.functional.relu(x)
x = self.conv2(x)
x = paddle.nn.functional.relu(x)
x = self.pool2(x)
x = paddle.reshape(x, [x.shape[0], -1])
x = self.fc1(x)
x = paddle.nn.functional.relu(x)
x = self.fc2(x)
x = paddle.nn.functional.relu(x)
x = self.fc3(x)
return x
model = Model()
paddle.Model(Model()).summary(input_size=(1,3, 150, 150))#输出模型结构
losser = paddle.nn.loss.CrossEntropyLoss()
opt = paddle.optimizer.Adam(learning_rate=0.0001,parameters=model.parameters())#学习率尽量小一点,防止呈现loss震动或不收敛的情况
当把图画tensor组成的batch传入model
时,会主动调用forward
函数。
开端练习
设置好迭代次数epoches,再将img丢进model前,需求将model扩大两个维度,确保和model的conv层接受参数维度一致。
然后其他的东西都是套路一样地写就行了。
迭代次数设置个10左右,准确率就能够到达70%以上了,我这儿的迭代次数较多是因为学习率较低,防止函数不收敛。
# 保存和加载模型参数
# model.set_state_dict(paddle.load("linear_net.pdparams"))
# opt.set_state_dict(paddle.load("adam.pdopt"))
# paddle.save(model.state_dict(), "linear_net.pdparams")
# paddle.save(opt.state_dict(), "adam.pdopt")
loss_pic = []
acc_cnt = 0
test_cnt = 0
epoches = 30 #迭代次数
losssum = 0
for epoch in range(epoches):
epoch_loss = 0
epoch_cnt = 0
for idx, (img, lbl) in enumerate(zip(train_imgs, train_lbls)):
pred = model(img)
loss = losser(pred, lbl)
loss.backward()
opt.step()
opt.clear_grad()
test_cnt += batch_size
epoch_cnt += batch_size
losssum += loss.numpy()[0]
epoch_loss += loss.numpy()[0]
for it, val in enumerate(pred):
if np.argmax(val.numpy()) == lbl[it]:
acc_cnt += 1
if idx > 0 and idx % 50 == 0:
mean_loss = losssum / 50
print("epoch:[{}/{}], batch:[{}/{}] acc: {:.3f} mean_loss: {:.5f}, epoch_loss:{:.5f}".format(
epoch+1, epoches, idx, len(train_imgs), acc_cnt / test_cnt,
mean_loss, epoch_loss / epoch_cnt))
loss_pic.append(mean_loss)
losssum = 0
#展示loss下降图画
plt.figure()
plt.plot(range(0, len(loss_pic)), loss_pic, 'r')
plt.show()
部分练习过程如下:
epoch:[1/10], batch:[50/281] acc: 0.540 mean_loss: 0.70554, epoch_loss:0.04323
epoch:[1/10], batch:[100/281] acc: 0.530 mean_loss: 0.70643, epoch_loss:0.04369
epoch:[1/10], batch:[150/281] acc: 0.553 mean_loss: 0.66813, epoch_loss:0.04305
epoch:[1/10], batch:[200/281] acc: 0.563 mean_loss: 0.67019, epoch_loss:0.04276
epoch:[1/10], batch:[250/281] acc: 0.577 mean_loss: 0.65668, epoch_loss:0.04242
epoch:[2/10], batch:[50/281] acc: 0.585 mean_loss: 1.06030, epoch_loss:0.04058
epoch:[2/10], batch:[100/281] acc: 0.584 mean_loss: 0.66430, epoch_loss:0.04104
epoch:[2/10], batch:[150/281] acc: 0.592 mean_loss: 0.62010, epoch_loss:0.04029
epoch:[2/10], batch:[200/281] acc: 0.601 mean_loss: 0.60328, epoch_loss:0.03964
epoch:[2/10], batch:[250/281] acc: 0.607 mean_loss: 0.60943, epoch_loss:0.03933
epoch:[3/10], batch:[50/281] acc: 0.615 mean_loss: 0.98787, epoch_loss:0.03771
epoch:[3/10], batch:[100/281] acc: 0.619 mean_loss: 0.60885, epoch_loss:0.03788
epoch:[3/10], batch:[150/281] acc: 0.624 mean_loss: 0.57734, epoch_loss:0.03728
epoch:[3/10], batch:[200/281] acc: 0.631 mean_loss: 0.54101, epoch_loss:0.03642
epoch:[3/10], batch:[250/281] acc: 0.637 mean_loss: 0.54732, epoch_loss:0.03598
epoch:[4/10], batch:[50/281] acc: 0.644 mean_loss: 0.89757, epoch_loss:0.03418
epoch:[4/10], batch:[100/281] acc: 0.647 mean_loss: 0.55520, epoch_loss:0.03444
epoch:[4/10], batch:[150/281] acc: 0.652 mean_loss: 0.52098, epoch_loss:0.03382
epoch:[4/10], batch:[200/281] acc: 0.657 mean_loss: 0.49120, epoch_loss:0.03304
epoch:[4/10], batch:[250/281] acc: 0.663 mean_loss: 0.48655, epoch_loss:0.03252
epoch:[5/10], batch:[50/281] acc: 0.670 mean_loss: 0.80389, epoch_loss:0.03140
epoch:[5/10], batch:[100/281] acc: 0.673 mean_loss: 0.50279, epoch_loss:0.03141
epoch:[5/10], batch:[150/281] acc: 0.677 mean_loss: 0.46048, epoch_loss:0.03054
epoch:[5/10], batch:[200/281] acc: 0.682 mean_loss: 0.43338, epoch_loss:0.02968
epoch:[5/10], batch:[250/281] acc: 0.686 mean_loss: 0.43409, epoch_loss:0.02917
epoch:[6/10], batch:[50/281] acc: 0.692 mean_loss: 0.72375, epoch_loss:0.02844
loss下降图画如下:
模型评价
评价的时分咱们能够将图片一张一张输入,然后算出准确率,loss等评价参数。
评价代码:
acc_cnt = 0
test_cnt = 0
epoches = 1
losssum = 0
for epoch in range(epoches):
for idx, (img, lbl) in enumerate(test_data):
img = paddle.unsqueeze(img, 0)
pred = model(img)
loss = losser(pred, paddle.to_tensor(lbl))
test_cnt += 1
losssum += loss.numpy()[0]
if np.argmax(pred.numpy()) == lbl:
acc_cnt += 1
if idx > 0 and idx % 100 == 0:
plt.figure()
img = paddle.squeeze(img, 0)
print(img.shape)
img = img.numpy() * 255
img = img.astype('uint8')
plt.imshow(Image.fromarray(img.transpose(1, 2, 0)))
plt.title("(0 dog, 1 cat)pred:{},lbl:{}".format(np.argmax(pred.numpy()), lbl))
plt.show()
print("epoch:[{}/{}], batch:[{}/{}] acc: {} mean_loss: {}".format(
epoch+1, epoches, idx, len(test_data), acc_cnt / test_cnt, losssum / test_cnt))
终究评测出的准确率为:71.82%
在数据集和练习次数足够大的情况下,限制模型准确率的主要因素便是模型结构了,所以需求更牛逼的网络才能到达更高的准确率。
我是一名大二学生,刚刚入门机器学习,所以有许多当地了解不够,有些东西解释不明晰,如果有写的欠好的当地欢迎大家纠正!
本文正在参与 人工智能创作者扶持计划