如何使熊猫HDFStore的“放置”操作更快

如何使它更快？

使用“ io.sql.read_frame”将数据从sql db加载到数据框。因为’read_frame’通过将其变为float来处理类型为’decimal’的列。
填写每列的缺失数据。
在进行操作之前调用函数“ Dataframe.convert_objects”
如果日期框架中有字符串类型的列，请使用“表”而不是“存储器”

store.put（’key’，df，table = True）

完成这些工作后，使用相同的数据集，放置操作的性能有了很大的提高：

CPU times: user 42.07 s, sys: 28.17 s, total: 70.24 sWall time: 98.97 s

第二次测试的配置文件日志：

在68.688 CPU秒内执行95984函数调用（95958原语调用）   排序：内部时间   ncalls tottime percall cumtime percall filename：lineno（function）      445 16.757 0.038 16.757 0.038 {numpy.core.multiarray.array}       19 16.250 0.855 16.250 0.855 {'tables.tableExtension.Table'对象的'_append_records'方法}       16 7.958 0.497 7.958 0.497 {'numpy.ndarray'对象的方法'astype'}       19 6.533 0.344 6.533 0.344 {pandas.lib.create_hdf_rows_2d}        4 6.284 1.571 6.388 1.597 {'tables.tableExtension.Row'对象的'_fillCol'方法}       20 2.640 0.132 2.641 0.132 {pandas.lib.maybe_convert_objects}        1 1.785 1.785 1.785 1.785 {pandas.lib.isnullobj}        7 1.619 0.231 1.619 0.231 {'numpy.ndarray'对象的方法'flatten'}       11 1.059 0.096 1.059 0.096 {pandas.lib.infer_dtype}        1 0.997 0.997 41.952 41.952 pytables.py:2468(write_data）       19 0.985 0.052 40.590 2.136 pytables.py:2504(write_data_chunk）        1 0.827 0.827 60.617 60.617 pytables.py:2433（写入）     1504 0.592 0.000 0.592 0.000 {'tables.hdf5Extension.Array'对象的方法'_g_readSlice'}        4 0.534 0.133 13.676 3.419 pytables.py:1038(set_atom）        1 0.528 0.528 0.528 0.528 {pandas.lib.max_len_string_array}        4 0.441 0.110 0.571 0.143 internals.py:1409(_stack_arrays）       35 0.358 0.010 0.358 0.010 {numpy.ndarray'对象的方法'copy'}        1 0.276 0.276 3.135 3.135 internals.py:208(fillna）        5 0.263 0.053 2.054 0.411 common.py:128(_isnull_ndarraylike）       48 0.253 0.005 0.253 0.005 {'tables.hdf5Extension.Array'对象的'_append'方法}        4 0.240 0.060 1.500 0.375 internals.py:1400(_simple_blockify）        1 0.234 0.234 12.145 12.145 pytables.py:1066(set_atom_string）       28 0.225 0.008 0.225 0.008 {'tables.hdf5Extension.Array'对象的'_createCArray'方法}       36 0.218 0.006 0.218 0.006 {'tables.hdf5Extension.Array'对象的'_g_writeSlice'方法}     6110 0.155 0.000 0.155 0.000 {numpy.core.multiarray.empty}        4 0.097 0.024 0.097 0.024 {numpy.ndarray'对象的'all'方法}        6 0.084 0.014 0.084 0.014 {tables.indexesExtension.keysort}       18 0.084 0.005 0.084 0.005 {'tables.hdf5Extension.Leaf'对象的方法'_g_close'}    11816 0.064 0.000 0.108 0.000 file.py:1036(_getNode）       19 0.053 0.003 0.053 0.003 {'tables.hdf5Extension.Leaf'对象的'_g_flush'方法}     1528 0.045 0.000 0.098 0.000 array.py:342(_interpret_indexing）    11709 0.040 0.000 0.042 0.000 file.py:248(__getitem__）        2 0.027 0.013 0.383 0.192 index.py:1099(get_neworder）        1 0.018 0.018 0.018 0.018 {numpy.core.multiarray.putmask}        4 0.013 0.003 0.017 0.004 index.py:607(final_idx32）

如何使熊猫HDFStore的“放置”操作更快

如何使它更快？

面试问答相关栏目本月热门文章