要将多列或整行传递给UDF,请使用struct:
from pyspark.sql.functions import udf, structfrom pyspark.sql.types import IntegerTypedf = sqlContext.createDataframe([(None, None), (1, None), (None, 2)], ("a", "b"))count_empty_columns = udf(lambda row: len([x for x in row if x == None]), IntegerType())new_df = df.withColumn("null_count", count_empty_columns(struct([df[x] for x in df.columns])))new_df.show()返回:
+----+----+----------+| a| b|null_count|+----+----+----------+|null|null| 2|| 1|null| 1||null| 2| 1|+----+----+----------+



