如果您之前将数据帧转换为字符串的RDD,则将带有json字符串的数据帧转换为结构化数据帧实际上非常简单(请参阅:http ://spark.apache.org/docs/latest/sql-
programming-guide 。 html#json-
datasets)
例如:
>>> new_df = sql_context.read.json(df.rdd.map(lambda r: r.json))>>> new_df.printSchema()root |-- body: struct (nullable = true) | |-- id: long (nullable = true) | |-- name: string (nullable = true) | |-- sub_json: struct (nullable = true) | | |-- id: long (nullable = true) | | |-- sub_sub_json: struct (nullable = true) | | | |-- col1: long (nullable = true) | | | |-- col2: string (nullable = true) |-- header: struct (nullable = true) | |-- foo: string (nullable = true) | |-- id: long (nullable = true)



