下载页面:
https://archive.apache.org/dist/spark/spark-2.4.5/
下载地址:
https://archive.apache.org/dist/spark/spark-2.4.5/spark-2.4.5-bin-hadoop2.7.tgz
hadoop-2.7.1下载页面:
https://archive.apache.org/dist/hadoop/core/hadoop-2.7.1/
下载地址:
https://archive.apache.org/dist/hadoop/core/hadoop-2.7.1/hadoop-2.7.1.tar.gz
winutils下载地址:
https://github.com/duanjz/winutils
从github上下载项目 从hadoop-2.7.3/bin下将winutils.exe和winutils.pdb两个文件放入hadoop-2.7.1的bin文件夹下:
2.环境变量配置 Java环境变量配置 spark环境变量配置SPARK_HOME D:spark-2.4.5-bin-hadoop2.7
hadoop环境变量配置HADOOP_HOME D:hadoop-2.7.1
path配置%HADOOP_HOME%bin
%SPARK_HOME%sbin
%SPARK_HOME%bin
3.Pycharm下载pysparkFile->Settings->Project:你的项目名->Python Interpreter
通过勾选Specify version选择与Spark相同的版本
下载成功后如图:
4.Python版本WordCountspark.py
# -- coding: GBK --
from pyspark import SparkContext
sc = SparkContext( 'local', 'test')
textFile = sc.textFile("./word.txt")
wordCount = textFile.flatMap(lambda line: line.split(" "))
wordCount = wordCount.map(lambda word: (word,1)).reduceByKey(lambda a, b : a + b)
wordCount.foreach(print)
word.txt为执行python代码的相同目录下。如图所示:
5.运行结果


