携手创造,共同生长!这是我参加「日新计划 8 月更文应战」的第31天,点击检查活动详情
一、【超简单】之依据PaddleX的2022“兴智杯”齿轮瑕疵检测练习
1.赛事布景阐明
近年来,AI+工业瑕疵检测已成为工业智能范畴的重要应用场景,可以进一步提高工业检测效率和精度、下降人力本钱。本赛题选取齿轮配件反常检测作为AI+工业瑕疵检测竞赛场景,鼓励选手经过机器视觉技能提高齿轮反常检测速度和精确率。
齿轮配件反常检测是工业瑕疵检测的痛点场景。齿轮传动装置是机械装备的重要基础件,与带链、摩擦、液压等传动方式相比,具有功率规模大、传动效率高、运动平稳、传动比精确、运用寿命长、结构紧凑等特点,一起安全、牢靠、性价比优越的长处,决议了它在通用机械装备范畴中的不可代替性。齿轮作为一种典型的动力传递器材,其质量的好坏直接影响着机械产品功能。
现在机器视觉行业仍然由少数国际龙头垄断。美国康耐视(cognex)及日本基恩士(Keyence)简直垄断全球 50%以上的视觉检测市场,两者均依据核心零部件和技能(操作系统、传感器等)供给相应解决方案。国内机器视觉检测方案虽已有长足发展,但与世界巨头相比仍存较大距离。因而,齿轮反常检测使命对于提高我国工业质检效率,保证产品质量具有重要意义。
2.赛题使命
赛题链接
赛题选取制作范畴的齿轮配件反常检测场景,供给实在生产环境数据集,要求依据百度飞桨国产开发结构构建算法模型,比拼算法精度、召回率等目标,然后提高国产结构在工业智能范畴的应用能力,解决企业实际生产痛点问题。参赛团队构建算法模型,实现从测试数据会集自动检测齿面黑皮、齿底黑皮、磕碰三类缺点的方针。
3.数据集介绍
本使命数据集为一汽红旗汽车供给的齿轮配件在生产加工中的实在数据,一切数据在生产流水线中拍摄而得。
依据赛事要求为确保数据隐私性,数据仅可经过赛题渠道下载,下载后仅可用于本次竞赛,制止在其他途径传播,各位读者可以到竞赛页面注册登录,然后点击页面最下方的【数据下载】获得使命数据集。
数据会集的图片均为实在缺点齿轮的平面展开图,并由专业人员标示。样图中会明确标识影像中所包含的缺点和辨认线类型。
下面从左至右分别为齿轮示意图、原始图画和标示后的例图:
下面是典型缺点的部分扩大图:
练习数据文件结构:
将供给用于练习的图画数据和辨认标签,文件夹结构:
|– Images/train # 寄存测试图画数据,jpeg编码图画文件
|– Annotations # 寄存特点标签标示数据
数据标示文件的结构上,归于比较标示的coco格局标示。
二、数据处理
1.标示文件收拾
数据集文件名是中文,解压的时候要指定编码(抄坑总,确实好,记住了)
# 数据集解紧缩:读者可以将数据集上传到AI Studio,再依据实际项目的具体路径,解压数据集
# 留意由于数据集文件名是中文,解压的时候要指定编码(也可以本地对数据集改名后再上传)
!unzip -qoa -O GBK data/data163113/齿轮检测数据集.zip -d ./data/
# 收拾数据集结构
!mv data/齿轮检测数据集/train/train_coco.json data/齿轮检测数据集/
!rm data/齿轮检测数据集/train/Thumbs.db
!mkdir data/JPEGImages
!mv data/齿轮检测数据集/train/*.jpg data/JPEGImages/
!mv data/齿轮检测数据集/train_coco.json data/annotations.json
!rm data/齿轮检测数据集 -rf
# 统计文件数量
import glob
# 加载练习集路径
img_dir = 'data/JPEGImages/'
# 加载练习集图片目录
train_imgs = glob.glob(img_dir + '/*.jpg')
print('数据集图片数量: {}'.format(len(train_imgs)))
数据集图片数量: 2000
!dir data -l
总用量 7176
-rw-r--r-- 1 aistudio aistudio 7177387 6月 23 10:35 annotations.json
drwxrwxrwx 6 aistudio aistudio 4096 8月 28 09:39 data163113
drwxr-xr-x 2 aistudio aistudio 159744 8月 28 09:53 JPEGImages
- 经检查,图片有2000张
- 格局收拾为PaddleX常用的coco格局
2.PaddleX安装
!python -m pip install --upgrade -q pip --user
!pip install -q -U paddlex
3.数据集划分
# 按份额切分数据集
!paddlex --split_dataset --format COCO --dataset_dir data --val_value 0.2
- Train samples: 1600
- Eval samples: 400
三、模型练习
1.transforms界说
# 界说练习和验证时的transforms
# API阐明:https://github.com/PaddlePaddle/PaddleX/blob/develop/docs/apis/transforms/transforms.md
import paddlex as pdx
from paddlex import transforms as T
train_transforms = T.Compose([
# T.MixupImage(mixup_epoch=-1),
T.RandomDistort(),
T.RandomHorizontalFlip(),
T.RandomVerticalFlip(),
T.BatchRandomResize(
target_sizes=[320, 352, 384, 416, 448, 480, 512, 544, 576, 608],
interp='RANDOM'),
# T.Resize(target_size=224, interp='LINEAR'),
T.Normalize(
mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])
eval_transforms = T.Compose([
T.Resize(
224, interp='CUBIC'), T.Normalize(
mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])
[08-28 10:02:45 MainThread @utils.py:79] WRN paddlepaddle version: 2.3.1. The dynamic graph version of PARL is under development, not fully tested and supported
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/parl/remote/communication.py:38: DeprecationWarning: 'pyarrow.default_serialization_context' is deprecated as of 2.0.0 and will be removed in a future version. Use pickle or the pyarrow IPC functionality instead.
context = pyarrow.default_serialization_context()
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/__init__.py:107: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
from collections import MutableMapping
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/rcsetup.py:20: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
from collections import Iterable, Mapping
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/colors.py:53: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
from collections import Sized
2022-08-28 10:02:46,648-WARNING: type object 'QuantizationTransformPass' has no attribute '_supported_quantizable_op_type'
2022-08-28 10:02:46,650-WARNING: If you want to use training-aware and post-training quantization, please use Paddle >= 1.8.4 or develop version
2.数据集界说
# 界说练习和验证所用的数据集
# API阐明:https://github.com/PaddlePaddle/PaddleX/blob/develop/docs/apis/datasets.md
train_dataset = pdx.datasets.CocoDetection(
data_dir='data/JPEGImages',
ann_file='data/train.json',
transforms=train_transforms,
shuffle=True)
eval_dataset = pdx.datasets.CocoDetection(
data_dir='data/JPEGImages',
ann_file='data/val.json',
transforms=eval_transforms)
loading annotations into memory...
Done (t=0.30s)
creating index...
index created!
2022-08-28 10:06:35 [INFO] Starting to read file list from dataset...
2022-08-28 10:06:35 [INFO] 1121 samples in file data/train.json, including 1121 positive samples and 0 negative samples.
loading annotations into memory...
Done (t=0.02s)
creating index...
index created!
2022-08-28 10:06:35 [INFO] Starting to read file list from dataset...
2022-08-28 10:06:35 [INFO] 277 samples in file data/val.json, including 277 positive samples and 0 negative samples.
3.模型界说
# YOLO检测模型的预置anchor生成
# API阐明: https://github.com/PaddlePaddle/PaddleX/blob/release/2.0.0/paddlex/tools/anchor_clustering/yolo_cluster.py
import numpy as np
anchors = train_dataset.cluster_yolo_anchor(num_anchors=9, image_size=480)
anchor_masks = [[6, 7, 8], [3, 4, 5], [0, 1, 2]]
# 初始化模型,并进行练习
# 可运用VisualDL检查练习目标,参阅https://github.com/PaddlePaddle/PaddleX/tree/release/2.0.0/tutorials/train#visualdl可视化练习目标
num_classes = len(train_dataset.labels)
model = pdx.det.YOLOv3(
num_classes=num_classes,
backbone='DarkNet53',
anchors=anchors.tolist() if isinstance(anchors, np.ndarray) else anchors,
anchor_masks=[[6, 7, 8], [3, 4, 5], [0, 1, 2]],
label_smooth=True,
ignore_threshold=0.6)
100%|██████████| 1121/1121 [00:00<00:00, 20013.00it/s]
2022-08-28 10:07:25 [WARNING] Extremely small objects found. 32 of 22857 labels are < 3 pixels in width or height
2022-08-28 10:07:25 [INFO] Running kmeans for 9 anchors on 22857 points...
Evolving anchors with Genetic Algorithm: fitness = 0.7917: 100%|██████████| 1000/1000 [00:05<00:00, 171.65it/s]
W0828 10:07:38.093415 1037 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.2, Runtime API Version: 10.1
W0828 10:07:38.097923 1037 gpu_resources.cc:91] device: 0, cuDNN Version: 7.6.
4.模型练习
主要是batch size 挑选,按份额调整,尽可能利用好显存。
# API阐明:https://github.com/PaddlePaddle/PaddleX/blob/release/2.0.0/paddlex/cv/models/detector.py
# 各参数介绍与调整阐明:https://paddlex.readthedocs.io/zh_CN/develop/appendix/parameters.html
model.train(
num_epochs=300, # 练习次序
train_dataset=train_dataset, # 练习数据
eval_dataset=eval_dataset, # 验证数据
train_batch_size=20, # 批巨细
pretrain_weights='COCO', # 预练习权重
learning_rate=0.005 / 12, # 学习率
warmup_steps=500, # 预热步数
warmup_start_lr=0.0, # 预热起始学习率
save_interval_epochs=5, # 每5个次序保存一次,有验证数据时,自动评估
lr_decay_epochs=[85, 135], # step学习率衰减
save_dir='output/yolov3_darknet53', # 保存路径
use_vdl=True) # 其用visuadl进行可视化练习记载