Author:XuLiu Time:20211123 Fuction:数据集划分,但是有区别的是将文件夹里的图片划分到train和test中 而且train和test是没有重合的
输入:
输出:根据设定的比例切分到train和test
代码如下:
import os
import random
import shutil
from shutil import copy2
"""os.listdir会将文件夹下的文件名集合成一个列表并返回"""
def getDir(filepath):
pathlist = os.listdir(filepath)
return pathlist
"""如果不存在文件夹,就建一个文件夹"""
def makesDir(filepath):
if not os.path.exists(filepath):
os.makedirs(filepath)
# source_path:原始图像的存放路径
# train_path:训练集保存的路径
# test_path:测试集保存的路径
def copy_dir(src_path, target_path):
if os.path.isdir(src_path) and os.path.isdir(target_path):
filelist_src = os.listdir(src_path)
for file in filelist_src:
path = os.path.join(os.path.abspath(src_path), file)
if os.path.isdir(path):
path1 = os.path.join(os.path.abspath(target_path), file)
if not os.path.exists(path1):
os.mkdir(path1)
copy_dir(path, path1)
else:
with open(path, 'rb') as read_stream:
contents = read_stream.read()
path1 = os.path.join(target_path, file)
with open(path1, 'wb') as write_stream:
write_stream.write(contents)
return True
else:
return False
def mkdataset(path, str):
train_path = os.path.join(data_path, str)
makesDir(train_path)
return train_path
def divideTrainValidationTest(source_path, train_path, test_path):
"""
将原来的4169的文件夹按train1619和test2550 0.63:1 1627 2542
"""
source_image_dir = os.listdir(source_path)
random.shuffle(source_image_dir)
train_image_list = source_image_dir[0:int(0.39 * len(source_image_dir))]
test_image_list = source_image_dir[int(0.39 * len(source_image_dir)):]
"""
找到每一个集合列表中每一张图像的原始图像位置,然后将这张图像复制到目标的路径下,一共是五类图像
每类图像随机被分成三个去向,使用shutil库中的copy2函数进行复制,当然也可以使用move函数,但是move
相当于移动图像,当操作结束后,原始文件夹中的图像会都跑到目标文件夹中,如果划分不正确你想重新划分
就需要备份,不然的话很麻烦
"""
for train_image in train_image_list:
origins_train_image_path = source_path + '/' + train_image
copy_dir(origins_train_image_path, train_path)
#shutil.copytree(origins_train_image_path, train_path+'1')
```
将指定的文件夹移到另一个文件夹中,
除了copy_dir这个函数,
还有自带的函数 shutil.copytree这种写法
具体用法如下所示
```
for test_image in test_image_list:
origins_test_image_path = source_path + '/' + test_image
copy_dir(origins_test_image_path, test_path)
if __name__ == '__main__':
data_path = '/home/jy/xl/workstation/Datasets/Car/Nighttime_Vehicle_ReID'
source_path = '/home/jy/xl/workstation/Datasets/Car/id_Car_dataset_Cut'
train_path = mkdataset(data_path, 'train')
test_path = mkdataset(data_path, 'test')
divideTrainValidationTest(source_path, train_path,test_path)
shutil.copytree(oldpath, newpath)
oldpath是待复制的文件夹路径,newpath是复制后的新文件夹路径
示例:
import shutil
oldpath = ‘G:testtest1’
newpath = ‘G:testtest2test3’
shutil.copytree(oldpath, newpath)
解释:
(1)如果test3文件夹不存在,会生成一个名为test3的文件夹,其内容与状态与test1相同
(2)如果test3文件夹存在,运行shutil.copytree()会报错



