TupleTypeSpark中没有这样的东西。产品类型
structs用特定类型的字段表示。例如,如果您想返回一个成对的数组(整数,字符串),则可以使用如下模式:
from pyspark.sql.types import *schema = ArrayType(StructType([ StructField("char", StringType(), False), StructField("count", IntegerType(), False)]))用法示例:
from pyspark.sql.functions import udffrom collections import Counterchar_count_udf = udf( lambda s: Counter(s).most_common(), schema)df = sc.parallelize([(1, "foo"), (2, "bar")]).toDF(["id", "value"])df.select("*", char_count_udf(df["value"])).show(2, False)## +---+-----+-------------------------+## |id |value|PythonUDF#<lambda>(value)|## +---+-----+-------------------------+## |1 |foo |[[o,2], [f,1]]|## |2 |bar |[[r,1], [a,1], [b,1]] |## +---+-----+-------------------------+


