我将所有这些CSV收集到具有以下结构的Dataframes字典中:
df['20140803']-DF包含属于所有
df_trip_20140803_*.csvCSV文件的串联数据。
解:
import osimport reimport globimport pandas as pdfpattern = r'D:temp.data41444939df_trip_{}_{}.csv'files = glob.glob(fpattern.format('*','*'))dates = sorted(set([re.split(r'_(d{8})_(d+).(w+)', f)[1] for f in files]))dfs = {}for d in dates: dfs[d] = pd.concat((pd.read_csv(f) for f in glob.glob(fpattern.format(d, '*'))), ignore_index=True)测试:
In [95]: dfs.keys()Out[95]: dict_keys(['20140804', '20140805', '20140803', '20140806'])In [96]: dfs['20140803']Out[96]: a b c0 0 0 71 3 7 12 9 7 33 7 4 74 5 2 45 0 0 46 7 2 27 8 4 18 0 8 39 3 9 010 7 3 911 1 9 812 6 7 213 3 8 114 3 4 515 0 9 216 5 8 717 8 5 418 2 0 219 9 6 620 6 6 621 2 6 922 1 0 823 3 1 124 7 4 225 7 4 226 8 3 727 7 3 228 1 7 729 3 6 5
设定:
fn = r'D:temp.data41444939a.txt'base_dir = r'D:temp.data41444939'files = open(fn).read().splitlines()for f in files: pd.Dataframe(np.random.randint(0, 10, (5, 3)), columns=list('abc')) .to_csv(os.path.join(base_dir, f), index=False)


