在下面的熊猫示例中,方括号是什么意思?有没有遵循[]的逻辑。 […]
result = json_normalize(data, 'counties', ['state', 'shortname',[‘info’, ‘governor’]])
值中的每个字符串或字符串列表都是 除所选行之外要
['state', 'shortname', ['info','governor']]包含的元素的路径。第二个参数实参(在文档示例中设置为)告诉该函数如何从输入数据结构中选择组成输出中各行的元素,并且路径会添加更多元数据,这些元数据将包含在每行中。如果可以的话,可以将它们视为数据库中的表联接。
__
json_normalize()``record_path``'counties'``meta
对于输入的 美国各州 文档例如在一个列表两个字典,而且这两个字典有一个
counties关键是引用类型的字典的另一个列表:
>>> data = [{'state': 'Florida',... 'shortname': 'FL',... 'info': {'governor': 'Rick Scott'},... 'counties': [{'name': 'Dade', 'population': 12345},...{'name': 'Broward', 'population': 40000},...{'name': 'Palm Beach', 'population': 60000}]},... {'state': 'Ohio',... 'shortname': 'OH',... 'info': {'governor': 'John Kasich'},... 'counties': [{'name': 'Summit', 'population': 1234},... {'name': 'Cuyahoga', 'population': 1337}]}]>>> pprint(data[0]['counties'])[{'name': 'Dade', 'population': 12345}, {'name': 'Broward', 'population': 40000}, {'name': 'Palm Beach', 'population': 60000}]>>> pprint(data[1]['counties'])[{'name': 'Summit', 'population': 1234}, {'name': 'Cuyahoga', 'population': 1337}]它们之间有5行数据可用于输出:
>>> json_normalize(data, 'counties') name population0 Dade 123451 Broward 400002 Palm Beach 600003 Summit 12344 Cuyahoga 1337
meta然后,该参数命名位于这些列表 旁边的
一些元素,然后将这些元素
counties分别合并。来自第一个
data[0]字典的这些
meta元素的值
('Florida', 'FL','Rick Scott')分别是和,来自这些字典data[1]的值分别来自于同一顶级字典
('Ohio', 'OH', 'JohnKasich')的counties行,分别重复了3次和2次:
>>> data[0]['state'], data[0]['shortname'], data[0]['info']['governor']('Florida', 'FL', 'Rick Scott')>>> data[1]['state'], data[1]['shortname'], data[1]['info']['governor']('Ohio', 'OH', 'John Kasich')>>> json_normalize(data, 'counties', ['state', 'shortname', ['info', 'governor']]) name population state shortname info.governor0 Dade 12345 Florida FL Rick Scott1 Broward 40000 Florida FL Rick Scott2 Palm Beach 60000 Florida FL Rick Scott3 Summit 1234 Ohio OH John Kasich4 Cuyahoga 1337 Ohio OH John Kasich因此,如果您为
meta参数传递一个列表,则列表中的每个元素都是单独的路径,并且每个单独的路径都标识要添加到输出中的行的数据。
在 您的
例子JSON,只有少数嵌套列表的第一个参数提升,喜欢
'counties'的例子一样。该数据结构中的唯一示例是嵌套
'authors'键。您必须提取每个
['_source','authors']路径,然后才能从父对象添加其他键以增加这些行。
然后,第二个
meta参数
_id从最外面的对象中提取键,然后是嵌套
['_source', 'title']和
['_source','journal']嵌套的路径。
该
record_path参数以
authors列表为起点,如下所示:
>>> d['hits']['hits'][0]['_source']['authors'] # this value is None, and is skipped>>> d['hits']['hits'][1]['_source']['authors'][{'affiliations': ['Punjabi University'], 'author_id': '780E3459', 'author_name': 'munish puri'}, {'affiliations': ['Punjabi University'], 'author_id': '48D92C79', 'author_name': 'rajesh dhaliwal'}, {'affiliations': ['Punjabi University'], 'author_id': '7D9BD37C', 'author_name': 'r s singh'}]>>> d['hits']['hits'][2]['_source']['authors'][{'author_id': '7FF872BC', 'author_name': 'barbara eileen ryan'}]>>> # etc.因此为您提供以下行:
>>> json_normalize(d['hits']['hits'], ['_source', 'authors'])affiliations author_id author_name0 [Punjabi University] 780E3459 munish puri1 [Punjabi University] 48D92C79 rajesh dhaliwal2 [Punjabi University] 7D9BD37C r s singh3 NaN 7FF872BC barbara eileen ryan4 NaN 0299B8E9 fraser j harbutt5 NaN 7DAB7B72 richard m freeland
然后我们可以使用第三个
meta参数来添加更多的列一样
_id,
_source.title并且
_source.journal,使用
['_id',['_source', 'journal'], ['_source', 'title']]:
>>> json_normalize(... data['hits']['hits'],... ['_source', 'authors'],... ['_id', ['_source', 'journal'], ['_source', 'title']]... )affiliations author_id author_name _id [Punjabi University] 780E3459 munish puri 7AF8EBC3 1 [Punjabi University] 48D92C79 rajesh dhaliwal 7AF8EBC32 [Punjabi University] 7D9BD37C r s singh 7AF8EBC33 NaN 7FF872BC barbara eileen ryan 7521A7214 NaN 0299B8E9 fraser j harbutt 7DAEB9A45 NaN 7DAB7B72 richard m freeland 7B3236C5 _source.journal0 Journal of Industrial Microbiology & Biotechno...1 Journal of Industrial Microbiology & Biotechno...2 Journal of Industrial Microbiology & Biotechno...3 The American Historical Review4 The American Historical Review5 The American Historical Review _source.title Development of a stable continuous flow immobi...1 Development of a stable continuous flow immobi...2 Development of a stable continuous flow immobi...3 Feminism and the women's movement : dynamics o...4 The iron curtain : Churchill, America, and the...5 The Truman Doctrine and the origins of McCarth...



