由于自己读取的csv格式数据集,每行的数据长度不一致,导致用pandas读取数据时候报错:
pandas.errors.ParserError: Error tokenizing data. C error: Expected 55 fields in line 3, saw 7
解决思路:
1. 遍历csv的每行长度,找到最大长度;
2. 以最大长度定义整体csv列数。
import pandas as pd
csv_file = "../datasets/dataset.csv"
largest_column_count =0
with open(csv_file, 'r') as temp_f:
lines = temp_f.readlines()
for l in lines:
column_count = len(l.split(',')) + 1
#找到列数最多的行
largest_column_count = column_count if largest_column_count < column_count else largest_column_count
temp_f.close()
# colunm_names为最大列数展开
column_names = [i for i in range(0, largest_column_count)]
data = pd.read_csv("../datasets/dataset.csv", header=None, delimiter=',', names=column_names)



