TSN源码阅读（未完成）

一、项目结构
- 1.py文件解释
- 2.函数组成及调用关系
- 3.IPO图
二、opts.py解读
三、main.py解读
- 1.整体架构
- 2.具体代码解读
四、models.py解读
- 1.整体架构
- 2.具体代码解读
五、dataset.py解读
- 1.整体架构
- 2.具体代码解读
瞎写八写在最后

一、项目结构 1.py文件解释

main.py 训练脚本
test_models.py 测试脚本
opts.py 参数配置脚本
dataset.py 数据读取脚本
models.py 网络结构构建脚本
transforms.py 数据预处理相关脚本
tf_model_zoo 文件夹关于导入模型结构的脚本

2.函数组成及调用关系

3.IPO图

下图展示了，TSN如何将UCF-101数据集提出的帧进行分类的过程，标注了每一个Tensor的大小

二、opts.py解读

import argparse
parser = argparse.ArgumentParser(description="PyTorch implementation of Temporal Segment Networks")
parser.add_argument('dataset', type=str, choices=['ucf101', 'hmdb51', 'kinetics'])     三种类型的dataset
parser.add_argument('modality', type=str, choices=['RGB', 'Flow', 'RGBDiff'])         # 三种不同的输入
parser.add_argument('train_list', type=str)
parser.add_argument('val_list', type=str)

# ========================= Model Configs ==========================                   模型参数
parser.add_argument('--arch', type=str, default="resnet101")                         # 基本模型 默认resnet101
parser.add_argument('--num_segments', type=int, default=3)                           # segment的数值，默认3
parser.add_argument('--consensus_type', type=str, default='avg',                     # 聚合函数的选择，默认均值，choices avg、max、topk、identity、rnn、cnn
                    choices=['avg', 'max', 'topk', 'identity', 'rnn', 'cnn'])
parser.add_argument('--k', type=int, default=3)                                      # k的数值 默认3

parser.add_argument('--dropout', '--do', default=0.5, type=float,                    # dropout默认值 0.5
                    metavar='DO', help='dropout ratio (default: 0.5)')               # metavar大概可以理解为注释？
parser.add_argument('--loss_type', type=str, default="nll",
                    choices=['nll'])

# ========================= Learning Configs ==========================
parser.add_argument('--epochs', default=45, type=int, metavar='N',                   # 学习轮数默认45
                    help='number of total epochs to run')
parser.add_argument('-b', '--batch-size', default=256, type=int,                     # batch_size默认256
                    metavar='N', help='mini-batch size (default: 256)')
parser.add_argument('--lr', '--learning-rate', default=0.001, type=float,            # learning_rate默认0.001
                    metavar='LR', help='initial learning rate')
parser.add_argument('--lr_steps', default=[20, 40], type=float, nargs="+",           # 学习率每一轮下降10%
                    metavar='LRSteps', help='epochs to decay learning rate by 10')
parser.add_argument('--momentum', default=0.9, type=float, metavar='M',              # 冲量默认为0.9
                    help='momentum')
parser.add_argument('--weight-decay', '--wd', default=5e-4, type=float,              # weight_dacay默认为5e-4
                    metavar='W', help='weight decay (default: 5e-4)')
parser.add_argument('--clip-gradient', '--gd', default=None, type=float,             # clip_gradient默认为none
                    metavar='W', help='gradient norm clipping (default: disabled)')
parser.add_argument('--no_partialbn', '--npb', default=False, action="store_true")   # no_partiblbn 应该就是论文中说的部分BN，初始值为false

# ========================= Monitor Configs ==========================
parser.add_argument('--print-freq', '-p', default=20, type=int,                      # 输出的频率 默认值20次一输出
                    metavar='N', help='print frequency (default: 20)')
parser.add_argument('--eval-freq', '-ef', default=5, type=int,                       #
                    metavar='N', help='evaluation frequency (default: 5)')


# ========================= Runtime Configs ==========================                # 用到再说吧，具体跑的时候设置的参数
parser.add_argument('-j', '--workers', default=4, type=int, metavar='N',
                    help='number of data loading workers (default: 4)')
parser.add_argument('--resume', default='', type=str, metavar='PATH',
                    help='path to latest checkpoint (default: none)')
parser.add_argument('-e', '--evaluate', dest='evaluate', action='store_true',
                    help='evaluate model on validation set')
parser.add_argument('--snapshot_pref', type=str, default="")
parser.add_argument('--start-epoch', default=0, type=int, metavar='N',
                    help='manual epoch number (useful on restarts)')
parser.add_argument('--gpus', nargs='+', type=int, default=None)
parser.add_argument('--flow_prefix', default="", type=str)

三、main.py解读 1.整体架构

1.from opts import parser，解析命令行中的参数
2.调用models.py初始化TSN模型
3.调用dataset.py导入数据
4.训练、保存模型

2.具体代码解读

import argparse
import os
import time
import shutil
import torch
import torchvision
import torch.nn.parallel
import torch.backends.cudnn as cudnn
import torch.optim
from torch.nn.utils import clip_grad_norm
from dataset import TSNDataSet
from models import TSN
from transforms import *
from opts import parser

"导入一些要用的包，其中比较重要的是 "
"导入模型：from models import TSN."
"导入配置的参数：from opts import parser."

# 最好的预测正确率！
best_prec1 = 0


def main():
    # global 全局变量
    global args, best_prec1
    # 进行cmd调参，具体设计的参数都在opts.py中
    args = parser.parse_args()

    if args.dataset == 'ucf101':
        num_class = 101
    elif args.dataset == 'hmdb51':
        num_class = 51
    elif args.dataset == 'kinetics':
        num_class = 400
    else:
        raise ValueError('Unknown dataset '+args.dataset)

    model = TSN(num_class, args.num_segments, args.modality,base_model=args.arch,consensus_type=args.consensus_type, dropout=args.dropout, partial_bn=not args.no_partialbn)
    "TSN的定义在models.py脚本中"
    "num_class：分类的类别数"
    "args.num_segments：把一个video分成多少份，对应论文中的K，默认为K=3"
    "args.modality：采用哪种输入，比如RGB表示常规图像，Flow表示optical flow等"
    "args.arch：采用哪种模型，比如ResNet101，BNInception等"
    "rags.consensus_type：采用不同snippet融合方式，比如avg"
    "args.dropout:dropout参数"


    crop_size = model.crop_size
    scale_size = model.scale_size
    input_mean = model.input_mean
    input_std = model.input_std
    policies = model.get_optim_policies()
    train_augmentation = model.get_augmentation()

    # 使用torch.nn.DataParallel方法设置多GPU训练
    model = torch.nn.DataParallel(model, device_ids=args.gpus).cuda()

    # 而args.resume主要是用来设置是否从断点处继续训练，比如原来训练模型训到一半停止了，
    # 希望继续从保存的最新epoch开始训练，因此args.resume要么是默认的None，
    # 要么就是保存的模型文件（.pth）的路径

    # 其中checkpoint = torch.load(args.resume)是用来导入已训练好的模型，
    # model.load_state_dict方法是完成导入模型的参数初始化model这个网络的过程，
    # 这也是torch.nn.Module类中的重要方法之一。
    if args.resume:
        if os.path.isfile(args.resume):
            print(("=> loading checkpoint '{}'".format(args.resume)))
            checkpoint = torch.load(args.resume)
            args.start_epoch = checkpoint['epoch']
            best_prec1 = checkpoint['best_prec1']
            model.load_state_dict(checkpoint['state_dict'])
            print(("=> loaded checkpoint '{}' (epoch {})"
                  .format(args.evaluate, checkpoint['epoch'])))
        else:
            print(("=> no checkpoint found at '{}'".format(args.resume)))

    cudnn.benchmark = True

    # Data loading code
    if args.modality != 'RGBDiff':
        normalize = GroupNormalize(input_mean, input_std)
    else:
        normalize = IdentityTransform()

    if args.modality == 'RGB':
        data_length = 1
    elif args.modality in ['Flow', 'RGBDiff']:
        data_length = 5

    train_loader = torch.utils.data.DataLoader(
        TSNDataSet("", args.train_list, num_segments=args.num_segments,
                   new_length=data_length,
                   modality=args.modality,
                   image_tmpl="img_{:05d}.jpg" if args.modality in ["RGB", "RGBDiff"] else args.flow_prefix+"{}_{:05d}.jpg",
                   transform=torchvision.transforms.Compose([
                       train_augmentation,
                       Stack(roll=args.arch == 'BNInception'),
                       ToTorchFormatTensor(div=args.arch != 'BNInception'),
                       normalize,
                   ])),
        batch_size=args.batch_size, shuffle=True,
        num_workers=args.workers, pin_memory=True)

    val_loader = torch.utils.data.DataLoader(
        TSNDataSet("", args.val_list, num_segments=args.num_segments,
                   new_length=data_length,
                   modality=args.modality,
                   image_tmpl="img_{:05d}.jpg" if args.modality in ["RGB", "RGBDiff"] else args.flow_prefix+"{}_{:05d}.jpg",
                   random_shift=False,
                   transform=torchvision.transforms.Compose([
                       GroupScale(int(scale_size)),
                       GroupCenterCrop(crop_size),
                       Stack(roll=args.arch == 'BNInception'),
                       ToTorchFormatTensor(div=args.arch != 'BNInception'),
                       normalize,
                   ])),
        batch_size=args.batch_size, shuffle=False,
        num_workers=args.workers, pin_memory=True)

    # define loss function (criterion) and optimizer
    # 定义损失函数，优化器和设置一些超参数，
    # 从代码中可以看到，这里使用的是交叉熵损失函数，优化器使用SGD方式
    if args.loss_type == 'nll':
        criterion = torch.nn.CrossEntropyLoss().cuda()
    else:
        raise ValueError("Unknown loss type")

    for group in policies:
        print(('group: {} has {} params, lr_mult: {}, decay_mult: {}'.format(
            group['name'], len(group['params']), group['lr_mult'], group['decay_mult'])))

    optimizer = torch.optim.SGD(policies,
                                args.lr,
                                momentum=args.momentum,
                                weight_decay=args.weight_decay)

    # 根据args.evaluate参数判断当前是训练模式还是测试模式
    if args.evaluate:
        # 是测试模型，直接调用validate方法验证模型
        validate(val_loader, model, criterion, 0)
        return
    # 不是测试模型，是训练模型
    for epoch in range(args.start_epoch, args.epochs):
        # 调用方法，调整学习率
        adjust_learning_rate(optimizer, epoch, args.lr_steps)

        # train for one epoch
        # 调用方法，开始训练模型
        train(train_loader, model, criterion, optimizer, epoch)

        # evaluate on validation set 模型验证和保存
        # 当训练epoch到达指定值，进行模型验证保存，使用args.eval_freq控制保存的epoch值
        if (epoch + 1) % args.eval_freq == 0 or epoch == args.epochs - 1:
            prec1 = validate(val_loader, model, criterion, (epoch + 1) * len(train_loader))

            # remember best prec@1 and save checkpoint
            # 比较得到val更好的模型并保存
            is_best = prec1 > best_prec1
            best_prec1 = max(prec1, best_prec1)
            # 调用save_checkpoint保存模型参数和其他信息
            save_checkpoint({
                'epoch': epoch + 1,
                'arch': args.arch,
                'state_dict': model.state_dict(),
                'best_prec1': best_prec1,
            }, is_best)

# train函数是峥哥训练部分的入口

def train(train_loader, model, criterion, optimizer, epoch):
    batch_time = AverageMeter()
    data_time = AverageMeter()
    losses = AverageMeter()
    top1 = AverageMeter()
    top5 = AverageMeter()

    if args.no_partialbn:
        model.module.partialBN(False)
    else:
        model.module.partialBN(True)

    # switch to train mode
    model.train()

    end = time.time()
    for i, (input, target) in enumerate(train_loader):
        # measure data loading time
        data_time.update(time.time() - end)

        target = target.cuda()
        input_var = torch.autograd.Variable(input)
        target_var = torch.autograd.Variable(target)

        # compute output
        output = model(input_var)
        loss = criterion(output, target_var)

        # measure accuracy and record loss
        prec1, prec5 = accuracy(output.data, target, topk=(1,5))
        losses.update(loss.data[0], input.size(0))
        top1.update(prec1[0], input.size(0))
        top5.update(prec5[0], input.size(0))


        # compute gradient and do SGD step
        optimizer.zero_grad()

        loss.backward()

        if args.clip_gradient is not None:
            total_norm = clip_grad_norm(model.parameters(), args.clip_gradient)
            if total_norm > args.clip_gradient:
                print("clipping gradient: {} with coef {}".format(total_norm, args.clip_gradient / total_norm))

        optimizer.step()

        # measure elapsed time
        batch_time.update(time.time() - end)
        end = time.time()

        if i % args.print_freq == 0:
            print(('Epoch: [{0}][{1}/{2}], lr: {lr:.5f}t'
                  'Time {batch_time.val:.3f} ({batch_time.avg:.3f})t'
                  'Data {data_time.val:.3f} ({data_time.avg:.3f})t'
                  'Loss {loss.val:.4f} ({loss.avg:.4f})t'
                  'Prec@1 {top1.val:.3f} ({top1.avg:.3f})t'
                  'Prec@5 {top5.val:.3f} ({top5.avg:.3f})'.format(
                   epoch, i, len(train_loader), batch_time=batch_time,
                   data_time=data_time, loss=losses, top1=top1, top5=top5, lr=optimizer.param_groups[-1]['lr'])))


def validate(val_loader, model, criterion, iter, logger=None):
    batch_time = AverageMeter()
    losses = AverageMeter()
    top1 = AverageMeter()
    top5 = AverageMeter()

    # switch to evaluate mode
    model.eval()

    end = time.time()
    for i, (input, target) in enumerate(val_loader):
        target = target.cuda()
        input_var = torch.autograd.Variable(input, volatile=True)
        target_var = torch.autograd.Variable(target, volatile=True)

        # compute output
        output = model(input_var)
        loss = criterion(output, target_var)

        # measure accuracy and record loss
        prec1, prec5 = accuracy(output.data, target, topk=(1,5))

        losses.update(loss.data[0], input.size(0))
        top1.update(prec1[0], input.size(0))
        top5.update(prec5[0], input.size(0))

        # measure elapsed time
        batch_time.update(time.time() - end)
        end = time.time()

        if i % args.print_freq == 0:
            print(('Test: [{0}/{1}]t'
                  'Time {batch_time.val:.3f} ({batch_time.avg:.3f})t'
                  'Loss {loss.val:.4f} ({loss.avg:.4f})t'
                  'Prec@1 {top1.val:.3f} ({top1.avg:.3f})t'
                  'Prec@5 {top5.val:.3f} ({top5.avg:.3f})'.format(
                   i, len(val_loader), batch_time=batch_time, loss=losses,
                   top1=top1, top5=top5)))

    print(('Testing Results: Prec@1 {top1.avg:.3f} Prec@5 {top5.avg:.3f} Loss {loss.avg:.5f}'
          .format(top1=top1, top5=top5, loss=losses)))

    return top1.avg


def save_checkpoint(state, is_best, filename='checkpoint.pth.tar'):
    filename = '_'.join((args.snapshot_pref, args.modality.lower(), filename))
    torch.save(state, filename)
    if is_best:
        best_name = '_'.join((args.snapshot_pref, args.modality.lower(), 'model_best.pth.tar'))
        shutil.copyfile(filename, best_name)

# 定义一个类AverageMeter来管理一些变量的更新，比如loss损失、top1准确率等
# 在初始化的时候，调用重置方法reset
# 当调用该类对象的update方法的时候就会进行变量更新
# 当要读取某个变量的时候，可以通过对象.属性的方式来获取
# 比如在train函数中的top1.val读取top1准确率
class AverageMeter(object):
    """Computes and stores the average and current value"""
    def __init__(self):
        self.reset()

    def reset(self):
        self.val = 0
        self.avg = 0
        self.sum = 0
        self.count = 0

    def update(self, val, n=1):
        self.val = val
        self.sum += val * n
        self.count += n
        self.avg = self.sum / self.count


def adjust_learning_rate(optimizer, epoch, lr_steps):
    """Sets the learning rate to the initial LR decayed by 10 every 30 epochs"""
    decay = 0.1 ** (sum(epoch >= np.array(lr_steps)))
    lr = args.lr * decay
    decay = args.weight_decay
    for param_group in optimizer.param_groups:
        param_group['lr'] = lr * param_group['lr_mult']
        param_group['weight_decay'] = decay * param_group['decay_mult']


def accuracy(output, target, topk=(1,)):
    """Computes the precision@k for the specified values of k"""
    maxk = max(topk)
    batch_size = target.size(0)

    _, pred = output.topk(maxk, 1, True, True)
    pred = pred.t()
    correct = pred.eq(target.view(1, -1).expand_as(pred))

    res = []
    for k in topk:
        correct_k = correct[:k].view(-1).float().sum(0)
        res.append(correct_k.mul_(100.0 / batch_size))
    return res


if __name__ == '__main__':
    main()

四、models.py解读 1.整体架构

1.init:初始化模型，设置参数
2._prepare_base_model:选择主体网络结构，并进行数据预处理设置
3._prepare_tsn:_prepare_base_model完事儿后，对最后的全连接层做不同的修改
4.train:冻结除了第一层以外的其他BN层
5.get_optim_policies:获取模型的每一层并保存参数用于优化
6.forward:向前传播
7.get_augmentation:根据输入不同获取不同的数据预处理操作
8.还有两个是optical flow和RGB Diff的数据定义

2.具体代码解读

from torch import nn
from ops.basic_ops import ConsensusModule, Identity
from transforms import *
from torch.nn.init import normal, constant

# 初始化模型
# models.py 的主要功能是对之后的训练模型进行准备。
# 使用一些经典模型作为基础模型，如resnet101、BNInception等，
# 针对不同的输入模态，对最后一层全连接层就行修改，得到我们所需的TSN网络模型。

class TSN(nn.Module):

    # __init__初始化TSN模型，设置一些参数和参数默认值，并且调用一些函数来修改TSN模型

    # num_class:分类后的标签数
    # num_segments:视频分割的段数
    # modality:输入的模态（RGB、光流、RGB diff）
    # base_model:基础模型，之后的TSN模型以此为基础来修改，默认值为resnet101
    # new_length:视频取帧的起点，RGB为1，光流为5，默认值为0
    # consensus_type:选择聚合函数，默认为avg（平均池化）
    # before_softmax:是否在softmax前融合，默认为True
    # dropout:设置dropout值，默认0.8
    # crop_num:数据集修改的类别？？？？？
    # partial_bn:是否部分BN，默认为True

    def __init__(self, num_class, num_segments, modality,
                 base_model='resnet101', new_length=None,
                 consensus_type='avg', before_softmax=True,
                 dropout=0.8,
                 crop_num=1, partial_bn=True):
        super(TSN, self).__init__()
        self.modality = modality
        self.num_segments = num_segments
        self.reshape = True
        self.before_softmax = before_softmax
        self.dropout = dropout
        self.crop_num = crop_num
        self.consensus_type = consensus_type
        if not before_softmax and consensus_type != 'avg':
            raise ValueError("only avg consensus can be used after Softmax")

        # new_length为None时，作者将RGB的new_length设置为1，光流和RGB Diff的new_length设置为5
        if new_length is None:
            self.new_length = 1 if modality == "RGB" else 5
        else:
            self.new_length = new_length
        "new_length和输入数据类型相关"

        print(("""
Initializing TSN with base model: {}.
TSN Configurations:
    input_modality:     {}
    num_segments:       {}
    new_length:         {}
    consensus_module:   {}
    dropout_ratio:      {}
        """.format(base_model, self.modality, self.num_segments, self.new_length, consensus_type, self.dropout)))

        # 通过调用TSN类的_prepare_base_model方法来导入模型
        self._prepare_base_model(base_model)

        # 通过调用TSN类的_prepare_tsn方法来得到
        feature_dim = self._prepare_tsn(num_class)

        # 这两个输入的主要差别在第一个卷积层，因为该层的输入channel依据不同的输入类型而变化
        if self.modality == 'Flow':           # 如果输入的数据是optical flow，则调用_construct_flow_model方法实现输入
            print("Converting the ImageNet model to a flow init model")
            self.base_model = self._construct_flow_model(self.base_model)
            print("Done. Flow model ready...")
        elif self.modality == 'RGBDiff':      # 如果输入的数据是RGB difference，则调用_construct_diff_model方法实现输入
            print("Converting the ImageNet model to RGB+Diff init model")
            self.base_model = self._construct_diff_model(self.base_model)
            print("Done. RGBDiff model ready.")

        self.consensus = ConsensusModule(consensus_type)

        # before_softmax = True, nn.Softmax()不会在后续的网络中使用
        # before_softmax = False,nn.Softmax()会在后续的网络中使用，具体用法见forward
        if not self.before_softmax:
            self.softmax = nn.Softmax()

        self._enable_pbn = partial_bn
        if partial_bn:
            self.partialBN(True)

    # _prepare_tsn函数的功能在于对已知的base_model最后的全连接层进行修改，
    # 微调最后一层（全连接层）的结构，成为适合该数据集输出的形式。

    def _prepare_tsn(self, num_class):
        # 获取网络最后一层的输入feature_map的通道数，存储于feature_dim中
        # getattr(base_model,base_model.last_layer_name)
        # 得到Linear(in_features=2048, out_features=1000, bias=True)
        # getattr(base_model,base_model.last_layer_name).in_features 得到2048
        feature_dim = getattr(self.base_model, self.base_model.last_layer_name).in_features
        # 判断是否有dropout层，如果有，则添加一个dropout层后再添加一个全连接层，否则直接连接全连接层
        if self.dropout == 0:
            # 当这个setattr运行结束后，self.base_model.last_layer_name这一层就是nn.Linear(feature_dim, num_class)
            setattr(self.base_model, self.base_model.last_layer_name, nn.Linear(feature_dim, num_class))
            self.new_fc = None
        else:
            # 当这个setattr运行结束后，self.base_model.last_layer_name这一层就是nn.Dropout(p=self.dropout)
            setattr(self.base_model, self.base_model.last_layer_name, nn.Dropout(p=self.dropout))
            # 新建一个fc层供后续的forward函数调用,该层并未在base_model中
            self.new_fc = nn.Linear(feature_dim, num_class)

        # 对全连接层的参数weight做一个0均值且指定标准差(std=0.001)的初始化操作，bias初始化为0
        std = 0.001
        if self.new_fc is None:
            normal(getattr(self.base_model, self.base_model.last_layer_name).weight, 0, std)
            constant(getattr(self.base_model, self.base_model.last_layer_name).bias, 0)
        else:
            normal(self.new_fc.weight, 0, std)
            constant(self.new_fc.bias, 0)
        return feature_dim

    # 对不同基础网络结构进行数据预处理设置，一共三种resnet101、BNInception、inception

    def _prepare_base_model(self, base_model):

        # 主要是使用getattr模块：getattr(torchvision.models, base_model)()
        # 根据base_model的不同指定值来导入不同的网络，
        # 对不同基础模型设定不同的输入尺寸、均值和方差，这些后面进行数据处理时使用，
        # 此外对光流输入和RGB差分，需要进行不同的设置。

        if 'resnet' in base_model or 'vgg' in base_model:  # 如果base_model的值为resnet或者vgg时
            self.base_model = getattr(torchvision.models, base_model)(True)
            self.base_model.last_layer_name = 'fc'   # 将resnet101中的'fc'层赋值给该变量
            self.input_size = 224           # 往resnet或者vgg网络的输入大小是224

            # 三个输入维度减去input_mean数组的对应值
            # 三个输入维度减去input_mean数组的对应值后，除以input_std数组的对应值，完成标准化操作

            self.input_mean = [0.485, 0.456, 0.406]
            self.input_std = [0.229, 0.224, 0.225]
            if self.modality == 'Flow':
                self.input_mean = [0.5]
                self.input_std = [np.mean(self.input_std)]
            elif self.modality == 'RGBDiff':
                self.input_mean = [0.485, 0.456, 0.406] + [0] * 3 * self.new_length
                self.input_std = self.input_std + [np.mean(self.input_std) * 2] * 3 * self.new_length

        elif base_model == 'BNInception':
            import tf_model_zoo
            self.base_model = getattr(tf_model_zoo, base_model)()
            self.base_model.last_layer_name = 'fc'
            self.input_size = 224
            self.input_mean = [104, 117, 128]
            self.input_std = [1]

            if self.modality == 'Flow':
                self.input_mean = [128]
            elif self.modality == 'RGBDiff':
                self.input_mean = self.input_mean * (1 + self.new_length)

        elif 'inception' in base_model:
            import tf_model_zoo
            self.base_model = getattr(tf_model_zoo, base_model)()
            self.base_model.last_layer_name = 'classif'
            self.input_size = 299
            self.input_mean = [0.5]
            self.input_std = [0.5]
        else:
            raise ValueError('Unknown base model: {}'.format(base_model))

    # 重写train函数，冻结除了第一层以外的其他BN层

    def train(self, mode=True):
        """
        Override the default train() to freeze the BN parameters
        :return:
        """
        super(TSN, self).train(mode)
        count = 0
        if self._enable_pbn:
            print("Freezing BatchNorm2D except the first one.")
            for m in self.base_model.modules():
                if isinstance(m, nn.BatchNorm2d):
                    count += 1
                    if count >= (2 if self._enable_pbn else 1):
                        m.eval()

                        # shutdown update in frozen mode
                        m.weight.requires_grad = False
                        m.bias.requires_grad = False

    def partialBN(self, enable):
        self._enable_pbn = enable

    # 获取模型的每一层并保存参数用于优化

    def get_optim_policies(self):
        first_conv_weight = []
        first_conv_bias = []
        normal_weight = []
        normal_bias = []
        bn = []

        conv_cnt = 0
        bn_cnt = 0
        for m in self.modules():
            if isinstance(m, torch.nn.Conv2d) or isinstance(m, torch.nn.Conv1d):
                ps = list(m.parameters())
                conv_cnt += 1
                if conv_cnt == 1:
                    first_conv_weight.append(ps[0])
                    if len(ps) == 2:
                        first_conv_bias.append(ps[1])
                else:
                    normal_weight.append(ps[0])
                    if len(ps) == 2:
                        normal_bias.append(ps[1])
            elif isinstance(m, torch.nn.Linear):
                ps = list(m.parameters())
                normal_weight.append(ps[0])
                if len(ps) == 2:
                    normal_bias.append(ps[1])
                  
            elif isinstance(m, torch.nn.BatchNorm1d):
                bn.extend(list(m.parameters()))
            elif isinstance(m, torch.nn.BatchNorm2d):
                bn_cnt += 1
                # later BN's are frozen
                if not self._enable_pbn or bn_cnt == 1:
                    bn.extend(list(m.parameters()))
            elif len(m._modules) == 0:
                if len(list(m.parameters())) > 0:
                    raise ValueError("New atomic module type: {}. Need to give it a learning policy".format(type(m)))

        return [
            {'params': first_conv_weight, 'lr_mult': 5 if self.modality == 'Flow' else 1, 'decay_mult': 1,
             'name': "first_conv_weight"},
            {'params': first_conv_bias, 'lr_mult': 10 if self.modality == 'Flow' else 2, 'decay_mult': 0,
             'name': "first_conv_bias"},
            {'params': normal_weight, 'lr_mult': 1, 'decay_mult': 1,
             'name': "normal_weight"},
            {'params': normal_bias, 'lr_mult': 2, 'decay_mult': 0,
             'name': "normal_bias"},
            {'params': bn, 'lr_mult': 1, 'decay_mult': 0,
             'name': "BN scale/shift"},
        ]

    # 向前传播

    def forward(self, input):
        sample_len = (3 if self.modality == "RGB" else 2) * self.new_length

        if self.modality == 'RGBDiff':
            sample_len = 3 * self.new_length
            input = self._get_diff(input)

        base_out = self.base_model(input.view((-1, sample_len) + input.size()[-2:]))

        if self.dropout > 0:
            base_out = self.new_fc(base_out)

        if not self.before_softmax:
            base_out = self.softmax(base_out)
        if self.reshape:
            base_out = base_out.view((-1, self.num_segments) + base_out.size()[1:])

        output = self.consensus(base_out)
        return output.squeeze(1)


    def _get_diff(self, input, keep_rgb=False):
        input_c = 3 if self.modality in ["RGB", "RGBDiff"] else 2
        input_view = input.view((-1, self.num_segments, self.new_length + 1, input_c,) + input.size()[2:])
        if keep_rgb:
            new_data = input_view.clone()
        else:
            new_data = input_view[:, :, 1:, :, :, :].clone()

        for x in reversed(list(range(1, self.new_length + 1))):
            if keep_rgb:
                new_data[:, :, x, :, :, :] = input_view[:, :, x, :, :, :] - input_view[:, :, x - 1, :, :, :]
            else:
                new_data[:, :, x - 1, :, :, :] = input_view[:, :, x, :, :, :] - input_view[:, :, x - 1, :, :, :]

        return new_data


    def _construct_flow_model(self, base_model):
        # modify the convolution layers
        # Torch models are usually defined in a hierarchical way.
        # nn.modules.children() return all sub modules in a DFS manner
        modules = list(self.base_model.modules())
        first_conv_idx = list(filter(lambda x: isinstance(modules[x], nn.Conv2d), list(range(len(modules)))))[0]
        conv_layer = modules[first_conv_idx]
        container = modules[first_conv_idx - 1]

        # modify parameters, assume the first blob contains the convolution kernels
        params = [x.clone() for x in conv_layer.parameters()]
        kernel_size = params[0].size()
        new_kernel_size = kernel_size[:1] + (2 * self.new_length, ) + kernel_size[2:]
        new_kernels = params[0].data.mean(dim=1, keepdim=True).expand(new_kernel_size).contiguous()

        new_conv = nn.Conv2d(2 * self.new_length, conv_layer.out_channels,
                             conv_layer.kernel_size, conv_layer.stride, conv_layer.padding,
                             bias=True if len(params) == 2 else False)
        new_conv.weight.data = new_kernels
        if len(params) == 2:
            new_conv.bias.data = params[1].data # add bias if neccessary
        layer_name = list(container.state_dict().keys())[0][:-7] # remove .weight suffix to get the layer name

        # replace the first convlution layer
        setattr(container, layer_name, new_conv)
        return base_model

    def _construct_diff_model(self, base_model, keep_rgb=False):
        # modify the convolution layers
        # Torch models are usually defined in a hierarchical way.
        # nn.modules.children() return all sub modules in a DFS manner
        modules = list(self.base_model.modules())
        first_conv_idx = list(filter(lambda x: isinstance(modules[x], nn.Conv2d), list(range(len(modules)))))[0]
        conv_layer = modules[first_conv_idx]
        container = modules[first_conv_idx - 1]

        # modify parameters, assume the first blob contains the convolution kernels
        params = [x.clone() for x in conv_layer.parameters()]
        kernel_size = params[0].size()
        if not keep_rgb:
            new_kernel_size = kernel_size[:1] + (3 * self.new_length,) + kernel_size[2:]
            new_kernels = params[0].data.mean(dim=1, keepdim=True).expand(new_kernel_size).contiguous()
        else:
            new_kernel_size = kernel_size[:1] + (3 * self.new_length,) + kernel_size[2:]
            new_kernels = torch.cat((params[0].data, params[0].data.mean(dim=1, keepdim=True).expand(new_kernel_size).contiguous()),
                                    1)
            new_kernel_size = kernel_size[:1] + (3 + 3 * self.new_length,) + kernel_size[2:]

        new_conv = nn.Conv2d(new_kernel_size[1], conv_layer.out_channels,
                             conv_layer.kernel_size, conv_layer.stride, conv_layer.padding,
                             bias=True if len(params) == 2 else False)
        new_conv.weight.data = new_kernels
        if len(params) == 2:
            new_conv.bias.data = params[1].data  # add bias if neccessary
        layer_name = list(container.state_dict().keys())[0][:-7]  # remove .weight suffix to get the layer name

        # replace the first convolution layer
        setattr(container, layer_name, new_conv)
        return base_model

    @property
    def crop_size(self):
        return self.input_size

    @property
    def scale_size(self):
        return self.input_size * 256 // 224

    # 根据输入不同获取不同的数据预处理操作
    def get_augmentation(self):
        if self.modality == 'RGB':
            return torchvision.transforms.Compose([GroupMultiScaleCrop(self.input_size, [1, .875, .75, .66]),
                                                   GroupRandomHorizontalFlip(is_flow=False)])
        elif self.modality == 'Flow':
            return torchvision.transforms.Compose([GroupMultiScaleCrop(self.input_size, [1, .875, .75]),
                                                   GroupRandomHorizontalFlip(is_flow=True)])
        elif self.modality == 'RGBDiff':
            return torchvision.transforms.Compose([GroupMultiScaleCrop(self.input_size, [1, .875, .75]),
                                                   GroupRandomHorizontalFlip(is_flow=False)])

五、dataset.py解读 1.整体架构

1.class VideoRecord:
封装视频信息，返回数据的信息（帧路径、视频包含多少帧、帧标签）
2.class TSNDataSet:
（1）init:初始化，设置参数
（2）_load_image::根据路径加载图片
（3）_parse_list:读取list_file，将video的名字、帧数、标签封装成VideoRecord类，存储在video_list中
（4）_sample_indices:TSN的稀疏采样，返回的是稀疏采样的帧数列表
（5）_get_val_indices:获取验证采样针列表
（6）_get_test_indices:获取测试采样针列表
（7）getitem:调用稀疏采样_sample_indices，并调用get方法得到TSNDataSet的返回
（8）get:获取提帧后的图片并做变化（角裁剪、中心提取等）
（9）len:返回数据集长度

2.具体代码解读

import torch.utils.data as data
from PIL import Image
import os
import os.path
import numpy as np
from numpy.random import randint

# dataset.py的主要功能就是对数据集进行读取，并且对其稀疏采样，返回稀疏采样后得到的数据集

# 封装视频信息，返回数据的信息（帧路径、视频包含多少帧、帧标签）
class VideoRecord(object):
    def __init__(self, row):
        self._data = row

    @property
    def path(self):
        return self._data[0]

    @property
    def num_frames(self):
        return int(self._data[1])

    @property
    def label(self):
        return int(self._data[2])

# 首先定义了一个类TSNDataSet，用来处理最原始的数据。
# TSNDataSet继承了pytorch中原生的Dataset类，该类返回的是torch.utils.data.Dataset类型，
# 注：一般而言在pytorch中自定义的数据读取类都要继承torch.utils.DataSet这个基类,
# 然后通过重写_init_和_getitem_方法来读取函数。

class TSNDataSet(data.Dataset):
    # 初始化，设置参数
    def __init__(self, root_path, list_file,
                 num_segments=3, new_length=1, modality='RGB',
                 image_tmpl='img_{:05d}.jpg', transform=None,
                 force_grayscale=False, random_shift=True, test_mode=False):

        # root_path:项目根目录
        # list_file:训练/测试的列表文件（.txt）地址
        # num_segments:视频分割的段数
        # new_length:视频取帧的起点
        # modality:输入的模态（RGB、光流、RGB diff）
        # image_tmpl:图片名
        # transform:数据变换操作
        # random_shift:洗漱采样时是否增加一个随机数
        # test_mode:是否为测试模式

        self.root_path = root_path
        self.list_file = list_file
        self.num_segments = num_segments
        self.new_length = new_length
        self.modality = modality
        self.image_tmpl = image_tmpl
        self.transform = transform
        self.random_shift = random_shift
        self.test_mode = test_mode

        if self.modality == 'RGBDiff':
            self.new_length += 1 # Diff needs one more image to calculate diff

        self._parse_list()

    # 根据路径加载图片
    def _load_image(self, directory, idx):
        if self.modality == 'RGB' or self.modality == 'RGBDiff':
            return [Image.open(os.path.join(directory, self.image_tmpl.format(idx))).convert('RGB')]
        elif self.modality == 'Flow':
            x_img = Image.open(os.path.join(directory, self.image_tmpl.format('x', idx))).convert('L')
            y_img = Image.open(os.path.join(directory, self.image_tmpl.format('y', idx))).convert('L')

            return [x_img, y_img]

    # 读取list_file，将video的名字、帧数、标签封装成VideoRecord类，存储在video_list内
    # self.list_file是训练或测试的列表文件（.txt文件），里面包含三列内容，用空格键分隔，
    # 第一列是video名，第二列是video的帧数，第三列是video的标签，
    # 分别将这三个信息提取出来封装为VideoRecord对象存储在video_list中。
    def _parse_list(self):
        self.video_list = [VideoRecord(x.strip().split(' ')) for x in open(self.list_file)]

    # TSN的稀疏采样，返回的是稀疏采样的帧数列表
    def _sample_indices(self, record):
        """
        :param record: VideoRecord
        :return: list
        """
        # 假设一个视频共有125帧，num_segments=3，输入模态为RGB，稀疏采样的步骤如下：
        # 将视频分成num_segments=3段。根据代码，record.num_frames=150，self.new_length=1,
        # 求出平均每段的帧数为50帧,即average_duration = 50
        # 定义一个list类型的变量offset
        # 首先取第一个片段里的帧，假设随机数randint(average_duration,size=self.num_segments)=10，
        # 第一个片段时range(self.num_segments)=0，计算可得第一个片段中取到的帧编号为10
        # 同理可获得其他片段中取到帧的编号，假设第二帧时，随机数取12，第三帧时，随机数取15，计算可得第二个、第三个片段中取到的帧编号，分别为62,115
        # 经过上述计算，列表offset = [10, 62, 115]，当返回时，返回的为offset + 1，即真正取到的帧数为[11, 63, 116]
        average_duration = (record.num_frames - self.new_length + 1) // self.num_segments
        if average_duration > 0:
            offsets = np.multiply(list(range(self.num_segments)), average_duration) + randint(average_duration, size=self.num_segments)
        elif record.num_frames > self.num_segments:
            offsets = np.sort(randint(record.num_frames - self.new_length + 1, size=self.num_segments))
        else:
            offsets = np.zeros((self.num_segments,))
        return offsets + 1

    # 获取验证采样帧列表，本方法在模型内部val的时候调用
    def _get_val_indices(self, record):
        if record.num_frames > self.num_segments + self.new_length - 1:
            tick = (record.num_frames - self.new_length + 1) / float(self.num_segments)
            offsets = np.array([int(tick / 2.0 + tick * x) for x in range(self.num_segments)])
        else:
            offsets = np.zeros((self.num_segments,))
        return offsets + 1

    # 获取测试采样帧列表，本方法在模型外部test的时候调用
    # 将输入video按照相等帧数距离分成self.num_segments份，
    # 最终返回的offsets就是长度为self.num_segments的numpy array，
    # 表示从输入video中取哪些帧作为模型的输入。
    def _get_test_indices(self, record):

        tick = (record.num_frames - self.new_length + 1) / float(self.num_segments)

        offsets = np.array([int(tick / 2.0 + tick * x) for x in range(self.num_segments)])

        return offsets + 1

    # 调用稀疏采样_sample_indices，并调用get方法得到TSNDataSet的返回
    # 该函数会在TSNDataSet初始化之后执行，功能在于调用执行稀疏采样的函数_sample_indices,并且调用get方法，得到TSNDataSet的返回
    # record变量读取的是video_list的第index个数据，包含该视频所在的文件地址、视频包含的帧数和视频所属的分类
    # 训练时self.test_mode是False，故执行if语句，而self.random_shift默认是True，所以最终执行的是_sample_indices(record)采样函数。
    # 测试时self.test_mode为True，实际执行的是_get_test_indices函数
    # 将稀疏采样获得的帧列表保存于segment_indices中，之后调用get()方法，作为其中的参数
    def __getitem__(self, index):
        record = self.video_list[index]

        if not self.test_mode:
            segment_indices = self._sample_indices(record) if self.random_shift else self._get_val_indices(record)
        else:
            segment_indices = self._get_test_indices(record)

        return self.get(record, segment_indices)

    # 获取提帧后的图片并做变化(角裁剪、中心提取等)
    # 对提取到的帧序号列表进行遍历，找到每一个帧对应的图片，添加到Images列表中
    # 之后对提到的images进行数据集变形，返回变形后的数据集和对应的类型标签
    def get(self, record, indices):

        images = list()
        for seg_ind in indices:
            p = int(seg_ind)
            for i in range(self.new_length):
                seg_imgs = self._load_image(record.path, p)
                images.extend(seg_imgs)
                if p < record.num_frames:
                    p += 1

        process_data = self.transform(images)
        return process_data, record.label

    # 返回数据集长度
    def __len__(self):
        return len(self.video_list)

瞎写八写在最后

第一次读代码，读了快一周了还是迷离迷糊的，不知道从哪读起，也不知道怎么debug，python学的感觉也用不到，用到的都是没学的。慢慢来把，争取下一周读完+能在工作站上跑一下UCF-101的数据集。
上周日看了Randy教授的最后一课，感触最深的一段话是他说的遇到困难时候的态度，以前读到这种感觉就是鸡汤一看而过，当真正开始遇到这种有劲但不知道怎么使的困难的时候，觉得说的确实挺好的，可以振奋人心。送给大家共勉。
That was a bit of a setback.
But remember, the brick walls are there for a reason.
The brick walls are not there to keep us out.
The brick walls are there to give us a chance to show how badly we want something.
Because the brick walls are there to stop the people who don’t want it badly enough.
They’re there to stop the other people.
Remember brick walls let us show our dedication.
They are there to separate us from the people who don’t really want to achieve their childhood dreams.
还有个感触就是，很多事虽然不知道该怎么下手、该怎么做，但是别光犹豫也别光徘徊，开始动手做，哪怕做的很蠢，但是慢慢的就会好起来的（大概）（拆分成一个一个小任务在解决害怕困难这个问题上确实挺管用的）。

TSN源码阅读（未完成）

Python相关栏目本月热门文章