五折交叉验证划分数据集为训练集和验证集

Post author:xfxia
Post published:2023年8月29日
Post category:其他

五折交叉验证用于数据量比较少的模型训练，本文通过给定训练集的数据路径将训练集划分为五个文件夹folder1,folder2,folder3,folder4,folder5，并在每个文件夹下创建一个包含训练图像和验证图像名称的文件列表。

运行代码前只需更改数据路径和数据后缀即可，划分的数据集在该python文件的路径下。

代码如下：

import os
import numpy as np
from sklearn.model_selection import KFold

# Set the number of folds
n_folds = 5

# Set the path to the image directory
data_dir = "C:/Users/sxxyc/Desktop/cf_net/CZNet-main/CZNet-main/data/npy/image"

# Get a list of all the npy files in the directory
file_list = [f for f in os.listdir(data_dir) if f.endswith(".npy")]  # 更改为自己的数据后缀即可，这里数据为.npy文件

# Create a list of indices for the number of files
indices = np.arange(len(file_list))

# Shuffle the indices
np.random.shuffle(indices)

# Split the indices into n_folds sets
kf = KFold(n_splits=n_folds, shuffle=False)
folds = kf.split(indices)

# Create the output directories and files
for i, (train_idx, val_idx) in enumerate(folds):
    # Create the output directories
    folder_name = f"folder{i+1}"
    os.makedirs(folder_name, exist_ok=True)

    # Get the filenames for the train and validation sets
    train_files = [file_list[idx] for idx in train_idx]
    val_files = [file_list[idx] for idx in val_idx]

    # Save the train and validation filenames to files
    with open(os.path.join(folder_name, f"{folder_name}_train.list"), "w") as f:
        for file in train_files:
            f.write(file + "\n")
    with open(os.path.join(folder_name, f"{folder_name}_validation.list"), "w") as f:
        for file in val_files:
            f.write(file + "\n")

原文链接：https://blog.csdn.net/qq_43916860/article/details/130567315

你可能也喜欢