随手记,以后有时间再整理吧
问题一:环境配置后cuda与pytorch不符
detect: weights=yolov5s.pt, source=data/images, imgsz=[640, 640], conf_thres=0.25, iou_thres=0.45, max_det=1000, device=, view_img=False, save_txt=False, save_conf=False, save_crop=False, nosave=False, classes=None, agnostic_nms=False, augment=False, visualize=False, update=False, project=runs/detect, name=exp, exist_ok=False, line_thickness=3, hide_labels=False, hide_conf=False, half=False, dnn=False requirements: opencv-python>=4.1.2 not found and is required by YOLOv5, attempting auto-update... requirements: 'pip install opencv-python>=4.1.2' skipped (offline) /home/milk/anaconda3/envs/milk/lib/python3.8/site-packages/torch/cuda/__init__.py:143: UserWarning: NVIDIA GeForce RTX 3060 Laptop GPU with CUDA capability sm_86 is not compatible with the current PyTorch installation. The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70. If you want to use the NVIDIA GeForce RTX 3060 Laptop GPU GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/ warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name)) YOLOv5 2021-11-11 torch 1.10.0+cu102 CUDA:0 (NVIDIA GeForce RTX 3060 Laptop GPU, 5938MiB) Downloading https://github.com/ultralytics/yolov5/releases/download/v6.0/yolov5s.pt to yolov5s.pt... 100%|██████████| 14.0M/14.0M [00:23<00:00, 613kB/s] Traceback (most recent call last): File "/home/milk/yolo/yolov5/detect.py", line 244, inmain(opt) File "/home/milk/yolo/yolov5/detect.py", line 239, in main run(**vars(opt)) File "/home/milk/anaconda3/envs/milk/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context return func(*args, **kwargs) File "/home/milk/yolo/yolov5/detect.py", line 79, in run model = DetectMultiBackend(weights, device=device, dnn=dnn) File "/home/milk/yolo/yolov5/models/common.py", line 305, in __init__ model = torch.jit.load(w) if 'torchscript' in w else attempt_load(weights, map_location=device) File "/home/milk/yolo/yolov5/models/experimental.py", line 98, in attempt_load model.append(ckpt['ema' if ckpt.get('ema') else 'model'].float().fuse().eval()) # FP32 model File "/home/milk/anaconda3/envs/milk/lib/python3.8/site-packages/torch/nn/modules/module.py", line 735, in float return self._apply(lambda t: t.float() if t.is_floating_point() else t) File "/home/milk/yolo/yolov5/models/yolo.py", line 240, in _apply self = super()._apply(fn) File "/home/milk/anaconda3/envs/milk/lib/python3.8/site-packages/torch/nn/modules/module.py", line 570, in _apply module._apply(fn) File "/home/milk/anaconda3/envs/milk/lib/python3.8/site-packages/torch/nn/modules/module.py", line 570, in _apply module._apply(fn) File "/home/milk/anaconda3/envs/milk/lib/python3.8/site-packages/torch/nn/modules/module.py", line 570, in _apply module._apply(fn) File "/home/milk/anaconda3/envs/milk/lib/python3.8/site-packages/torch/nn/modules/module.py", line 593, in _apply param_applied = fn(param) File "/home/milk/anaconda3/envs/milk/lib/python3.8/site-packages/torch/nn/modules/module.py", line 735, in return self._apply(lambda t: t.float() if t.is_floating_point() else t) RuntimeError: CUDA error: no kernel image is available for execution on the device CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Process finished with exit code 1
解决方法一:
在该环境下重新安装torch,以下命令从pytorch官网下载----Start Locally | PyTorch
成功解决!
yolov5开始训练记录
/home/milk/anaconda3/envs/milk/bin/python3 /home/milk/yolo/yolov5/train.py
train: weights=yolov5s.pt, cfg=, data=data/Underwater.yaml, hyp=data/hyps/hyp.scratch.yaml, epochs=100, batch_size=16, imgsz=1280, rect=False, resume=False, nosave=False, noval=False, noautoanchor=False, evolve=None, bucket=, cache=None, image_weights=False, device=, multi_scale=False, single_cls=False, adam=False, sync_bn=False, workers=8, project=runs/train, name=exp, exist_ok=False, quad=False, linear_lr=False, label_smoothing=0.0, patience=100, freeze=0, save_period=-1, local_rank=-1, entity=None, upload_dataset=False, bbox_interval=-1, artifact_alias=latest
github: skipping check (not a git repository), for updates see https://github.com/ultralytics/yolov5
YOLOv5 2021-11-11 torch 1.10.0+cu113 CUDA:0 (NVIDIA GeForce RTX 3060 Laptop GPU, 5938MiB)
hyperparameters: lr0=0.01, lrf=0.1, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=0.05, cls=0.5, cls_pw=1.0, obj=1.0, obj_pw=1.0, iou_t=0.2, anchor_t=4.0, fl_gamma=0.0, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0, copy_paste=0.0
TensorBoard: Start with 'tensorboard --logdir runs/train', view at http://localhost:6006/
Weights & Biases: run 'pip install wandb' to automatically track and visualize YOLOv5 runs (RECOMMENDED)
Overriding model.yaml nc=80 with nc=4
from n params module arguments
0 -1 1 3520 models.common.Conv [3, 32, 6, 2, 2]
1 -1 1 18560 models.common.Conv [32, 64, 3, 2]
2 -1 1 18816 models.common.C3 [64, 64, 1]
3 -1 1 73984 models.common.Conv [64, 128, 3, 2]
4 -1 2 115712 models.common.C3 [128, 128, 2]
5 -1 1 295424 models.common.Conv [128, 256, 3, 2]
6 -1 3 625152 models.common.C3 [256, 256, 3]
7 -1 1 1180672 models.common.Conv [256, 512, 3, 2]
8 -1 1 1182720 models.common.C3 [512, 512, 1]
9 -1 1 656896 models.common.SPPF [512, 512, 5]
10 -1 1 131584 models.common.Conv [512, 256, 1, 1]
11 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
12 [-1, 6] 1 0 models.common.Concat [1]
13 -1 1 361984 models.common.C3 [512, 256, 1, False]
14 -1 1 33024 models.common.Conv [256, 128, 1, 1]
15 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
16 [-1, 4] 1 0 models.common.Concat [1]
17 -1 1 90880 models.common.C3 [256, 128, 1, False]
18 -1 1 147712 models.common.Conv [128, 128, 3, 2]
19 [-1, 14] 1 0 models.common.Concat [1]
20 -1 1 296448 models.common.C3 [256, 256, 1, False]
21 -1 1 590336 models.common.Conv [256, 256, 3, 2]
22 [-1, 10] 1 0 models.common.Concat [1]
23 -1 1 1182720 models.common.C3 [512, 512, 1, False]
24 [17, 20, 23] 1 24273 models.yolo.Detect [4, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [128, 256, 512]]
Model Summary: 270 layers, 7030417 parameters, 7030417 gradients
Transferred 343/349 items from yolov5s.pt
Scaled weight_decay = 0.0005
optimizer: SGD with parameter groups 57 weight, 60 weight (no decay), 60 bias
Traceback (most recent call last):
File "/home/milk/yolo/yolov5/utils/datasets.py", line 406, in __init__
raise Exception(f'{prefix}{p} does not exist')
Exception: train: /home/milk/yolo/yolov5/data/tain.txt does not exist
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/milk/yolo/yolov5/train.py", line 625, in
main(opt)
File "/home/milk/yolo/yolov5/train.py", line 522, in main
train(opt.hyp, opt, device, callbacks)
File "/home/milk/yolo/yolov5/train.py", line 212, in train
train_loader, dataset = create_dataloader(train_path, imgsz, batch_size // WORLD_SIZE, gs, single_cls,
File "/home/milk/yolo/yolov5/utils/datasets.py", line 98, in create_dataloader
dataset = LoadImagesAndLabels(path, imgsz, batch_size,
File "/home/milk/yolo/yolov5/utils/datasets.py", line 411, in __init__
raise Exception(f'{prefix}Error loading data from {path}: {e}nSee {HELP_URL}')
Exception: train: Error loading data from ['/home/milk/yolo/yolov5/data/tain.txt']: train: /home/milk/yolo/yolov5/data/tain.txt does not exist
See https://github.com/ultralytics/yolov5/wiki/Train-Custom-Data
Process finished with exit code 1
修改yolov5s.yaml 中的nc=4
/home/milk/anaconda3/envs/milk/bin/python3 /home/milk/yolo/yolov5/train.py
train: weights=yolov5s.pt, cfg=, data=data/Underwater.yaml, hyp=data/hyps/hyp.scratch.yaml, epochs=100, batch_size=16, imgsz=1280, rect=False, resume=False, nosave=False, noval=False, noautoanchor=False, evolve=None, bucket=, cache=None, image_weights=False, device=, multi_scale=False, single_cls=False, adam=False, sync_bn=False, workers=8, project=runs/train, name=exp, exist_ok=False, quad=False, linear_lr=False, label_smoothing=0.0, patience=100, freeze=0, save_period=-1, local_rank=-1, entity=None, upload_dataset=False, bbox_interval=-1, artifact_alias=latest
github: skipping check (not a git repository), for updates see https://github.com/ultralytics/yolov5
YOLOv5 2021-11-11 torch 1.10.0+cu113 CUDA:0 (NVIDIA GeForce RTX 3060 Laptop GPU, 5938MiB)
hyperparameters: lr0=0.01, lrf=0.1, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=0.05, cls=0.5, cls_pw=1.0, obj=1.0, obj_pw=1.0, iou_t=0.2, anchor_t=4.0, fl_gamma=0.0, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0, copy_paste=0.0
TensorBoard: Start with 'tensorboard --logdir runs/train', view at http://localhost:6006/
Weights & Biases: run 'pip install wandb' to automatically track and visualize YOLOv5 runs (RECOMMENDED)
Overriding model.yaml nc=80 with nc=4
from n params module arguments
0 -1 1 3520 models.common.Conv [3, 32, 6, 2, 2]
1 -1 1 18560 models.common.Conv [32, 64, 3, 2]
2 -1 1 18816 models.common.C3 [64, 64, 1]
3 -1 1 73984 models.common.Conv [64, 128, 3, 2]
4 -1 2 115712 models.common.C3 [128, 128, 2]
5 -1 1 295424 models.common.Conv [128, 256, 3, 2]
6 -1 3 625152 models.common.C3 [256, 256, 3]
7 -1 1 1180672 models.common.Conv [256, 512, 3, 2]
8 -1 1 1182720 models.common.C3 [512, 512, 1]
9 -1 1 656896 models.common.SPPF [512, 512, 5]
10 -1 1 131584 models.common.Conv [512, 256, 1, 1]
11 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
12 [-1, 6] 1 0 models.common.Concat [1]
13 -1 1 361984 models.common.C3 [512, 256, 1, False]
14 -1 1 33024 models.common.Conv [256, 128, 1, 1]
15 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
16 [-1, 4] 1 0 models.common.Concat [1]
17 -1 1 90880 models.common.C3 [256, 128, 1, False]
18 -1 1 147712 models.common.Conv [128, 128, 3, 2]
19 [-1, 14] 1 0 models.common.Concat [1]
20 -1 1 296448 models.common.C3 [256, 256, 1, False]
21 -1 1 590336 models.common.Conv [256, 256, 3, 2]
22 [-1, 10] 1 0 models.common.Concat [1]
23 -1 1 1182720 models.common.C3 [512, 512, 1, False]
24 [17, 20, 23] 1 24273 models.yolo.Detect [4, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [128, 256, 512]]
Model Summary: 270 layers, 7030417 parameters, 7030417 gradients
Transferred 343/349 items from yolov5s.pt
Scaled weight_decay = 0.0005
optimizer: SGD with parameter groups 57 weight, 60 weight (no decay), 60 bias
Traceback (most recent call last):
File "/home/milk/yolo/yolov5/utils/datasets.py", line 406, in __init__
raise Exception(f'{prefix}{p} does not exist')
Exception: train: /home/milk/yolo/yolov5/data/tain.txt does not exist
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/milk/yolo/yolov5/train.py", line 625, in
main(opt)
File "/home/milk/yolo/yolov5/train.py", line 522, in main
train(opt.hyp, opt, device, callbacks)
File "/home/milk/yolo/yolov5/train.py", line 212, in train
train_loader, dataset = create_dataloader(train_path, imgsz, batch_size // WORLD_SIZE, gs, single_cls,
File "/home/milk/yolo/yolov5/utils/datasets.py", line 98, in create_dataloader
dataset = LoadImagesAndLabels(path, imgsz, batch_size,
File "/home/milk/yolo/yolov5/utils/datasets.py", line 411, in __init__
raise Exception(f'{prefix}Error loading data from {path}: {e}nSee {HELP_URL}')
Exception: train: Error loading data from ['/home/milk/yolo/yolov5/data/tain.txt']: train: /home/milk/yolo/yolov5/data/tain.txt does not exist
See https://github.com/ultralytics/yolov5/wiki/Train-Custom-Data
Process finished with exit code 1
训练出错,修改
parser.add_argument('--cfg', type=str, default=ROOT / 'models/yolov5s.yaml', help='model.yaml path')



