持续创造,加速生长!这是我参加「日新计划 · 10 月更文应战」的第10天,点击检查活动概况
tensorflow 2系列04 从0构建一个图画分类模型之花卉分类
本期文章是一个系列课程,本文是这个系列的第3篇温习笔记
(1)Build and train neural network models using TensorFlow 2.x
(2)Image classification
(3)Natural language processing(NLP)
(4)Time series, sequences and predictions
本次图画分类是从0构建一个出产可用的模型,首要包括以下步骤
- 数据处理
- 构建模型
- 检验模型作用
- 过拟合调优
数据处理
数据下载
import tensorflow as tf
import numpy as np
import os
import PIL
import matplotlib.pyplot as plt
import pathlib
dataset_url = "https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz"
data_dir = tf.keras.utils.get_file('flower_photos', origin=dataset_url, untar=True)
data_dir = pathlib.Path(data_dir)
image_count = len(list(data_dir.glob('*/*.jpg')))
print(image_count)
3670
roses = list(data_dir.glob('roses/*'))
img=plt.imread(roses[0])
img.shape
plt.imshow(img)
PIL.Image.open(str(roses[0]))
PIL.Image.open(str(roses[1]))
tulips = list(data_dir.glob('tulips/*'))
PIL.Image.open(str(tulips[0]))
PIL.Image.open(str(tulips[1]))
创立数据集
batch_size = 32
img_height = 180
img_width = 180
# tf.keras.preprocessing.image.ImageDataGenerator
train_ds = tf.keras.preprocessing.image_dataset_from_directory(
data_dir,
validation_split=0.2,
subset="training",
seed=123,
image_size=(img_height, img_width),
batch_size=batch_size)
val_ds = tf.keras.preprocessing.image_dataset_from_directory(
data_dir,
validation_split=0.2,
subset="validation",
seed=123,
image_size=(img_height, img_width),
batch_size=batch_size)
class_names = train_ds.class_names
print(class_names)
Found 3670 files belonging to 5 classes.
Using 2936 files for training.
Found 3670 files belonging to 5 classes.
Using 734 files for validation.
['daisy', 'dandelion', 'roses', 'sunflowers', 'tulips']
可视化数据
plt.figure(figsize=(10, 10))
for images, labels in train_ds.take(1):
for i in range(9):
ax = plt.subplot(3, 3, i + 1)
plt.imshow(images[i].numpy().astype("uint8"))
plt.title(class_names[labels[i]])
AUTOTUNE =1000
train_ds=train_ds.cache().shuffle(1000).prefetch(buffer_size=AUTOTUNE)
val_ds=val_ds.cache().shuffle(1000).prefetch(buffer_size=AUTOTUNE)
normalization_layer = tf.keras.layers.experimental.preprocessing.Rescaling(1./255)
normalized_ds = train_ds.map(lambda x, y: (normalization_layer(x), y))
image_batch, labels_batch = next(iter(normalized_ds))
first_image = image_batch[0]
# Notice the pixels values are now in `[0,1]`.
print(np.min(first_image), np.max(first_image))
0.0 1.0
0.0 1.0
0.0 1.0
界说模型
num_classes = 5
model = tf.keras.Sequential([
tf.keras.layers.experimental.preprocessing.Rescaling(1./255, input_shape=(img_height, img_width, 3)),
tf.keras.layers.Conv2D(16, 3, padding='same', activation='relu'),
tf.keras.layers.MaxPooling2D(),
tf.keras.layers.Conv2D(32, 3, padding='same', activation='relu'),
tf.keras.layers.MaxPooling2D(),
tf.keras.layers.Conv2D(64, 3, padding='same', activation='relu'),
tf.keras.layers.MaxPooling2D(),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dense(num_classes,activation="softmax")
])
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),
metrics=['accuracy'])
model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
rescaling_1 (Rescaling) (None, 180, 180, 3) 0
_________________________________________________________________
conv2d (Conv2D) (None, 180, 180, 16) 448
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 90, 90, 16) 0
_________________________________________________________________
conv2d_1 (Conv2D) (None, 90, 90, 32) 4640
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 45, 45, 32) 0
_________________________________________________________________
conv2d_2 (Conv2D) (None, 45, 45, 64) 18496
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 22, 22, 64) 0
_________________________________________________________________
flatten (Flatten) (None, 30976) 0
_________________________________________________________________
dense (Dense) (None, 128) 3965056
_________________________________________________________________
dense_1 (Dense) (None, 5) 645
=================================================================
Total params: 3,989,285
Trainable params: 3,989,285
Non-trainable params: 0
_________________________________________________________________
epochs=5
history = model.fit(
train_ds,
validation_data=val_ds,
epochs=epochs
)
Epoch 1/5
92/92 [==============================] - 51s 550ms/step - loss: 1.3677 - accuracy: 0.4166 - val_loss: 1.0836 - val_accuracy: 0.5681
Epoch 2/5
92/92 [==============================] - 50s 542ms/step - loss: 0.9736 - accuracy: 0.6100 - val_loss: 0.9855 - val_accuracy: 0.6063
Epoch 3/5
92/92 [==============================] - 50s 544ms/step - loss: 0.7864 - accuracy: 0.6904 - val_loss: 0.8481 - val_accuracy: 0.6608
Epoch 4/5
92/92 [==============================] - 50s 546ms/step - loss: 0.5951 - accuracy: 0.7766 - val_loss: 0.8688 - val_accuracy: 0.6703
Epoch 5/5
92/92 [==============================] - 50s 545ms/step - loss: 0.4205 - accuracy: 0.8498 - val_loss: 0.9487 - val_accuracy: 0.6894
Visualize training results
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs_range = range(epochs)
plt.figure(figsize=(8, 8))
plt.subplot(1, 2, 1)
plt.plot(epochs_range, acc, label='Training Accuracy')
plt.plot(epochs_range, val_acc, label='Validation Accuracy')
plt.legend(loc='lower right')
plt.title('Training and Validation Accuracy')
plt.subplot(1, 2, 2)
plt.plot(epochs_range, loss, label='Training Loss')
plt.plot(epochs_range, val_loss, label='Validation Loss')
plt.legend(loc='upper right')
plt.title('Training and Validation Loss')
plt.show()
过拟合调优
数据增强
data_augmentation = tf.keras.Sequential(
[ tf.keras.layers.experimental.preprocessing.RandomFlip("horizontal", input_shape=(img_height, img_width, 3)), tf.keras.layers.experimental.preprocessing.RandomRotation(0.1), tf.keras.layers.experimental.preprocessing.RandomZoom(0.1), ]
)
plt.figure(figsize=(10, 10))
for images, _ in train_ds.take(1):
for i in range(9):
augmented_images = data_augmentation(images)
ax = plt.subplot(3, 3, i + 1)
plt.imshow(augmented_images[0].numpy().astype("uint8"))
plt.axis("off")
dropout
model = tf.keras.Sequential([
data_augmentation,
tf.keras.layers.experimental.preprocessing.Rescaling(1./255),
tf.keras.layers.Conv2D(16, 3, padding='same', activation='relu'),
tf.keras.layers.MaxPooling2D(),
tf.keras.layers.Conv2D(32, 3, padding='same', activation='relu'),
tf.keras.layers.MaxPooling2D(),
tf.keras.layers.Conv2D(64, 3, padding='same', activation='relu'),
tf.keras.layers.MaxPooling2D(),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dense(num_classes,activation="softmax")
])
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),
metrics=['accuracy'])
model.summary()
Model: "sequential_2"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
sequential_1 (Sequential) (None, 180, 180, 3) 0
_________________________________________________________________
rescaling_2 (Rescaling) (None, 180, 180, 3) 0
_________________________________________________________________
conv2d_3 (Conv2D) (None, 180, 180, 16) 448
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 90, 90, 16) 0
_________________________________________________________________
conv2d_4 (Conv2D) (None, 90, 90, 32) 4640
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (None, 45, 45, 32) 0
_________________________________________________________________
conv2d_5 (Conv2D) (None, 45, 45, 64) 18496
_________________________________________________________________
max_pooling2d_5 (MaxPooling2 (None, 22, 22, 64) 0
_________________________________________________________________
dropout (Dropout) (None, 22, 22, 64) 0
_________________________________________________________________
flatten_1 (Flatten) (None, 30976) 0
_________________________________________________________________
dense_2 (Dense) (None, 128) 3965056
_________________________________________________________________
dense_3 (Dense) (None, 5) 645
=================================================================
Total params: 3,989,285
Trainable params: 3,989,285
Non-trainable params: 0
_________________________________________________________________
epochs = 5
history = model.fit(
train_ds,
validation_data=val_ds,
epochs=epochs
)
Epoch 1/5
92/92 [==============================] - 57s 616ms/step - loss: 1.2607 - accuracy: 0.4813 - val_loss: 1.1557 - val_accuracy: 0.5572
Epoch 2/5
92/92 [==============================] - 56s 613ms/step - loss: 1.0268 - accuracy: 0.5926 - val_loss: 1.0223 - val_accuracy: 0.5777
Epoch 3/5
92/92 [==============================] - 56s 612ms/step - loss: 0.9231 - accuracy: 0.6349 - val_loss: 0.9180 - val_accuracy: 0.6431
Epoch 4/5
92/92 [==============================] - 63s 680ms/step - loss: 0.8650 - accuracy: 0.6649 - val_loss: 0.8474 - val_accuracy: 0.6744
Epoch 5/5
92/92 [==============================] - 59s 646ms/step - loss: 0.8098 - accuracy: 0.6907 - val_loss: 0.9028 - val_accuracy: 0.6417
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs_range = range(epochs)
plt.figure(figsize=(8, 8))
plt.subplot(1, 2, 1)
plt.plot(epochs_range, acc, label='Training Accuracy')
plt.plot(epochs_range, val_acc, label='Validatiplt.subplot(1, 2, 2)on Accuracy')
plt.legend(loc='lower right')
plt.title('Training and Validation Accuracy')
plt.subplot(1, 2, 2)
plt.plot(epochs_range, loss, label='Training Loss')
plt.plot(epochs_range, val_loss, label='Validation Loss')
plt.legend(loc='upper right')
plt.title('Training and Validation Loss')
plt.show()
sunflower_url = "https://storage.googleapis.com/download.tensorflow.org/example_images/592px-Red_sunflower.jpg"
sunflower_path = tf.keras.utils.get_file('Red_sunflower', origin=sunflower_url)
img = tf.keras.preprocessing.image.load_img(
sunflower_path, target_size=(img_height, img_width)
)
plt.imshow(img)
img_array = tf.keras.preprocessing.image.img_to_array(img)
print(img_array.shape)
img_array = tf.expand_dims(img_array, 0) # Create a batch
print(img_array.shape)
predictions = model.predict(img_array)
print(predictions.shape)
print(class_names[np.argmax(predictions)])
score = tf.nn.softmax(predictions[0])
print(score)
print(np.max(score))
print(
"This image most likely belongs to {} with a {:.2f} percent confidence."
.format(class_names[np.argmax(score)], 100 * np.max(score))
)
(180, 180, 3)
(1, 180, 180, 3)
(1, 5)
sunflowers
tf.Tensor([0.1501881 0.15026027 0.15038382 0.39373598 0.15543188], shape=(5,), dtype=float32)
0.39373598
This image most likely belongs to sunflowers with a 39.37 percent confidence.
总结
数据相关东西类
- tf.kears.util.get_file(fname,origin,untar)从网络上下载压缩包并主动解压,回来途径
- pathlib.Path()加载途径
- 图画显现PIL.Image.Open() plt.imgshow()
- tf.keras.preprocessing.image_dataset_from_directory从目录结构数据集 是一个feature,label的元组数组首要参数 data_dir,数据目录 validation_split=0.2,验证集占比 subset=”validation”,验证集称号 seed=123,划分的随机种子 image_size=(img_height, img_width),设置target image size batch_size=batch_size 数据批处理巨细 回来的是一个dataset对象 主键包括的特点 class_names cache()办法缓存数据集,不必频繁读取磁盘 shuffle()办法设置随机混洗多少次 prefetch()预取多少条做缓冲,加快训练速度 上面三个办法能够链式调用 cache().shuffle(1000).prefetch(buffer_size=AUTOTUNE)
- 图片可视化 PIL.Image.Open打开一个图片 或者用plt来显现图片plt.imgshow()
- 数据可视化 首先界说一个画布plt.figure(figsize=(10, 10)) plt.subplot()来指定队伍第几行,界说图形的队伍 plt.plot(num,arr,label)num是数组arr的长度,arr要显现的数组,label是表示的图例
显现数组的值
特征处理相关
特征处理有几种方法, 一种是直接对数据集进行处理,还有一种是端到端的处理方法,便是将数据处理放进模型中,模型直接接收原始数据,前面几层直接是处理特征的层 个人觉得看情况调用,第一种情况提前处理,有利于节约后边模型训练的时刻,第二种是端到端的,比较方便开箱即用.
出产阶段推荐第一种 验证能够用第二种,或者机器条件好的情况,能够用第二种 第二种方法代码如下
data_augmentation = tf.keras.Sequential(
[
tf.keras.layers.experimental.preprocessing.RandomFlip("horizontal",
input_shape=(img_height,
img_width,
3)),#图画翻转
tf.keras.layers.experimental.preprocessing.RandomRotation(0.1),#随机旋转
tf.keras.layers.experimental.preprocessing.RandomZoom(0.1),#随机扩大缩小
tf.keras.layers.experimental.preprocessing.Rescaling(1./255) #缩放到0,1之间
]
)
这个data_augmentation能够当函数运用,也能够作为模型输入的第一层 比如下面
model = tf.keras.Sequential([
data_augmentation,
tf.keras.layers.experimental.preprocessing.Rescaling(1./255),
tf.keras.layers.Conv2D(16, 3, padding='same', activation='relu'),
tf.keras.layers.MaxPooling2D(),
tf.keras.layers.Conv2D(32, 3, padding='same', activation='relu'),
tf.keras.layers.MaxPooling2D(),
tf.keras.layers.Conv2D(64, 3, padding='same', activation='relu'),
tf.keras.layers.MaxPooling2D(),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dense(num_classes,activation="softmax")
])