在SVHN数据集中访问hdf5组的两种方式之间有什么区别？

首先，这两种方法的输出存在细微差异。
方法1：返回完整的数组（编码的文件名）
方法2：仅返回数组的第一个元素（字符）

让我们解构您的代码以了解您所拥有的。
第一部分处理

h5py

数据对象。

f['digitStruct']

->返回h5py 组对象

f['digitStruct']['name']

->返回h5py 数据集 对象

f['digitStruct']['name'].name

->返回 数据集 对象的名称（路径）

注：
该

/digitStruct/name

数据集包含“对象引用”。每个数组条目都是一个指向另一个h5py对象（在本例中为另一个数据集）的指针。例如（用于描绘2个对象引用的空格）：

f[ f['digitStruct']['name'][0][0] ]

->返回在[0] [0]处引用的对象。
因此，外部

f[ obj_ref ]

对象的工作方式与其他对象引用一样。

在的情况下

f['digitStruct']['name'][0][0]

，这是指向数据集的对象。

/#refs#/b

换句话说，

f['digitStruct']['name'][0][0]

引用与以下对象相同的对象：

f['#refs#']['b']

或

f['/#refs#/b']

对于h5py对象引用来说就这么多。
让我们继续使用 方法1 从此对象引用中获取数据。

f[f['digitStruct']['name'][0][0]].value

->将整个

/#refs#/b

数据集作为NumPy数组返回。

但是，

dataset.value

已弃用，并且首选使用NumPy索引，例如：（

f[f['digitStruct']['name'][0][0]][:]

获取整个数组）

注意：这两个都返回整个编码字符数组。此时，获得的名称是Python和NumPy功能。使用此命令以字符串形式返回文件名：

f[f['digitStruct']['name'][0][0]][:].tostring().depre('ascii')

现在，让我们解构用于 方法2 的对象引用。

f['digitStruct']['name'].value

->将整个

/digitStruct/name

数据集作为NumPy数组返回。它具有13,068行，带有对象引用

f['digitStruct']['name'].value[0]

->是第一行

f['digitStruct']['name'].value[0].item()

->将数组元素复制到python标量

因此，所有这些都指向同一个对象：
方法1：

f['digitStruct']['name'][0][0]

方法2：

f['digitStruct']['name'].value[0].item()

并且与该示例相同

f['#refs#']['b']

或相同

f['/#refs#/b']

。

像方法1一样，获取字符串是Python和NumPy功能。

f[f['digitStruct']['name'].value[0].item()][:].tostring().depre('ascii')

是的，对象引用很复杂....
我的建议：
而不是使用NumPy索引从对象中提取NumPy数组

.value

（如上面的修改方法1所示）。

完整性示例代码。中间的打印语句用来显示正在发生的事情。

import h5py# Both of these methods read the name of the 1st# image in svhn datasetf = h5py.File('test_digitStruct.mat','r')print (f['digitStruct'])print (f['digitStruct']['name'])print (f['digitStruct']['name'].name)# method 1print('ntest method 1')print (f[f['digitStruct']['name'][0][0]])print (f[f['digitStruct']['name'][0][0]].name)#  both of these get the entire array / filename:print (f[f['digitStruct']['name'][0][0]].value)print (f[f['digitStruct']['name'][0][0]][:]) # same as .value aboveprint (f[f['digitStruct']['name'][0][0]][:].tostring().depre('ascii'))# method 2print('ntest method 2')print (f[f['digitStruct']['name'].value[0].item()]) print (f[f['digitStruct']['name'].value[0].item()].name)# this only gets the first array member / character:print (f[f['digitStruct']['name'].value[0].item()].value[0][0])print (f[f['digitStruct']['name'].value[0].item()].value[0][0].tostring().depre('ascii'))#  this gets the entire array / filename:print (f[f['digitStruct']['name'].value[0].item()][:])print (f[f['digitStruct']['name'].value[0].item()][:].tostring().depre('ascii'))

每个方法的最后2条打印语句的输出相同：

[[ 49] [ 46] [112] [110] [103]]1.png

在SVHN数据集中访问hdf5组的两种方式之间有什么区别？

面试问答相关栏目本月热门文章