一种实现
withColumn方法是使用方法:
old_df = sqlContext.createDataframe(sc.parallelize( [(0, 1), (1, 3), (2, 5)]), ('col_1', 'col_2'))new_df = old_df.withColumn('col_n', old_df.col_1 - old_df.col_2)或者,您可以在已注册的表上使用SQL:
old_df.registerTempTable('old_df')new_df = sqlContext.sql('SELECt *, col_1 - col_2 AS col_n FROM old_df')


