- 安装Anaconda
- 配置JupyterNotebook
- JupyterNotebook与Pyspark交互
参考网站:林子雨《Spark编程基础》官网
电脑太垃圾了,可能不久要换个新的,就把基本的命令搬了一下,以后安装快一点。 安装Anaconda
Anaconda清华大学镜像下载
Anaconda3-2020.02-Linux-x86_64.sh
$ cd ~/下载 $ bash Anaconda3-2020.02-Linux-x86_64.sh
浏览许可证,回复yes
按回车默认安装路径
conda初始化 yes 下载时不要按回车,会自动no
$ conda -V $ anaconda -V $ conda config --set auto_activate_base false #消除base $ sudo vim ~/.bashrc export PATH=$PATH:/home/hadoop/anaconda3/bin $ source ~/.bashrc $ anaconda -V anaconda Command line client (version 1.7.2)配置JupyterNotebook
$ conda install jupyter notebook $ jupyter notebook --generate-config $ cd /home/hadoop/anaconda3/bin $ ./python # 进入Python环境 `` ```python >>> from notebook.auth import passwd >>> passwd() 'sha1:4b2678fa7669:037692fc089b07c56f10b5b50e11e00e5a87c4b3'
$ vim ~/.jupyter/jupyter_notebook_config.py c.NotebookApp.ip='*' # 就是设置所有ip皆可访问 c.NotebookApp.password = 'sha1:4b2678fa7669:037692fc089b07c56f10b5b50e11e00e5a87c4b3' # 上面复制的那个sha密文' c.NotebookApp.open_browser = False # 禁止自动打开浏览器 c.NotebookApp.port =8888 # 端口 c.NotebookApp.notebook_dir = '/home/hadoop/jupyternotebook' #设置Notebook启动进 入的目录 $ cd /home/hadoop $ mkdir jupyternotebook $ jupyter notebook
打开localhost:8888,输入密码
JupyterNotebook与Pyspark交互
$ vim ~/.bashrc # 删除 export PYSPARK_PYTHON=python3 export PYSPARK_PYTHON=/home/hadoop/anaconda3/bin/python export PYSPARK_DRIVER_PYTHON=/home/hadoop/anaconda3/bin/python $ source ~/.bashrc
测试
from pyspark import SparkConf, SparkContext
conf = SparkConf().setMaster("local").setAppName("My App")
sc = SparkContext(conf = conf)
logFile = "file:///usr/local/spark/README.md"
logData = sc.textFile(logFile, 2).cache()
numAs = logData.filter(lambda line: 'a' in line).count()
numBs = logData.filter(lambda line: 'b' in line).count()
print('Lines with a: %s, Lines with b: %s' % (numAs, numBs))
# Lines with a: 62, Lines with b: 31
# 文件路径别写错了
有点慢,忍一下
啊 运行失败,应为hdfs没启动
只要出现了file或hdfs,都要启动hdfs



