fastai(三）使用正则表达式处理标签

简介：

本文为fastai学习笔记（三），建立宠物分类器。

特点：1，介绍正则表达式对标签处理的作用。2，使用lr_finder技术寻找最优学习率。

正文：

配置：

语言环境：python 3.8.5

编译器：jupyter notebook

使用到的库: numpy,fastai,pandas,os

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import os
import fastbook
fastbook.setup_book()
from fastbook import *
from fastai.vision.all import *
from fastai.callback.fp16 import *

使用到的数据集：

本文导入fastai内置数据集，数据来源：Visual Geometry Group - University of Oxford

path=untar_data(URLs.PETS)
Path.base_PATH=path
(path/'images').ls()

对数据检查后发现图片在同一个文件夹，且命名已数字标签混合。

设置DataBlock:

datablock解释表
blocks	ImageBlock, CategoryBlock	载入图片模块和分类模块
get_items	get_image_files	从文件夹中获取照片（内置函数）
splitter	RandomSplitter(seed=42)	同sklearn的train_test_split一致。
get_y	using_attr(RegexLabeller(r'(.+)_d+.jpg$'), 'name')	见下：
item_tfms	Resize(460)	将
batch_tfms	aug_transforms(size=224, min_scale=0.75)	使用GPU,进行数据增强。

正则化表达式：

using_attr(RegexLabeller(r'(.+)_d+.jpg$'), 'name')

RegexLabeller:

从源码中可以看出， RegexLabeller同样调用了re模块，进行正则化表达，在__init__中对pat进行编译，对__call__中传入的o进行匹配：

class RegexLabeller():
    "Label `item` with regex `pat`."
    def __init__(self, pat, match=False):
        self.pat = re.compile(pat)
        self.matcher = self.pat.match if match else self.pat.search

    def __call__(self, o):
        o = str(o).replace(os.sep, posixpath.sep)
        res = self.matcher(o)
        assert res,f'Failed to find "{self.pat}" in "{o}"'
        return res.group(1)

using_attr:

发现：using_attr的作用是调用了partial函数，并为class（x）为name的传入留下空位：

def _using_attr(f, attr, x): return f(getattr(x,attr))

# Cell
def using_attr(f, attr):
    "Construct a function which applies `f` to the argument's attribute `attr`"
    return partial(_using_attr, f, attr)

导入dataloders：

pets = DataBlock(blocks = (ImageBlock, CategoryBlock),
                 get_items=get_image_files, 
                 splitter=RandomSplitter(seed=42),
                 get_y=using_attr(RegexLabeller(r'(.+)_d+.jpg$'), 'name'),
                 item_tfms=Resize(460),
                 batch_tfms=aug_transforms(size=224, min_scale=0.75))
dls = pets.dataloaders(path/"images")

使用summary对dls进行检查发现：

train_set	valid_set
5912	1478

构建学习器：

learn=cnn_learner(dls,resnet34,metrics=error_rate).to_fp16()
learn.lr_find()

预训练后

learn.fit_one_cycle(3,lr_max=slice(1e-6,1e-3))

发现最后正确率为94.2%

interp=ClassificationInterpretation.from_learner(learn)
interp.plot_confusion_matrix(figsize=(12,12))
interp.most_confused(min_val=5)

画出混淆矩阵并输出最大错误。

learn.show_results(max_n=3)

最后：本次我们使用了fit_one_cycle进行训练6个epochs后，正确率达到：94.2%。

fastai(三）使用正则表达式处理标签

Python相关栏目本月热门文章