多亏了这篇受启发的博客文章,我才得以制定解决方案。这是:
- 创建一个查找表,以有效地“迭代”每个数组的元素。该表中的行数等于或大于数组的最大元素数。假设这是4(可以使用来计算
SELECt MAX(JSON_ARRAY_LENGTH(metadata)) FROM input_table
):
CREATE VIEW seq_0_to_3 AS SELECt 0 AS i UNIOn ALL SELECT 1 UNIOn ALL SELECT 2 UNIOn ALL SELECT 3 );
- 由此,我们可以为每个JSON元素创建一行:
WITH exploded_array AS ( SELECT id, JSON_EXTRACT_ARRAY_ELEMENT_TEXT(metadata, seq.i) AS json FROM input_table, seq_0_to_3 AS seq WHERe seq.i < JSON_ARRAY_LENGTH(metadata) ) SELECt * FROM exploded_array;
生产:
id | json------------------------------ 1 | {"pet":"dog"} 1 | {"country":"uk"} 2 | {"pet":"cat"} 4 | {"country":"germany"} 4 | {"education":"masters"} 4 | {"country":"belgium"}- 但是,我需要提取字段名称/值。由于我看不到使用Redshift的有限函数提取JSON字段名称的任何方法,因此我将使用正则表达式来做到这一点:
WITH exploded_array AS ( SELECt id, JSON_EXTRACT_ARRAY_ELEMENT_TEXT(metadata, seq.i) AS json FROM input_table, seq_0_to_3 AS seq WHERe seq.i < JSON_ARRAY_LENGTH(metadata) ) SELECt id, field, JSON_EXTRACT_PATH_TEXT(json, field) FROM ( SELECt id, json, REGEXP_SUBSTr(json, '[^{"]\w+[^"]') AS field FROM exploded_array );


