我创建了一个样本JSON数据集来匹配该模式:
{"ClientNum":"abc123","Filters":[{"Op":"foo","Type":"bar","Val":"baz"}]}select(s.col("ClientNum"),s.col("Filters").cast(StringType)).show(false)+---------+------------------------------------------------------------------+|ClientNum|Filters |+---------+------------------------------------------------------------------+|abc123 |org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@60fca57e|+---------+------------------------------------------------------------------+使用explode()函数可以最佳化解决您的问题,该函数可以展平数组,然后使用星号扩展表示法:
s.selectExpr("explode(Filters) AS structCol").selectExpr("structCol.*").show()+---+----+---+| Op|Type|Val|+---+----+---+|foo| bar|baz|+---+----+---+使其成为由逗号分隔的单列字符串:
s.selectExpr("explode(Filters) AS structCol").select(F.expr("concat_ws(',', structCol.*)").alias("single_col")).show()+-----------+| single_col|+-----------+|foo,bar,baz|+-----------+


