栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 系统运维 > 运维 > Linux

spark-词频统计-socket流

Linux 更新时间: 发布时间: IT归档 最新发布 模块sitemap 名妆网 法律咨询 聚返吧 英语巴士网 伯小乐 网商动力

spark-词频统计-socket流

filestream.py代码

from pyspark.streaming import StreamingContext
from pyspark import SparkContext

sc=SparkContext(appName='test straming')

sc.setLogLevel("ERROR")
ssc=StreamingContext(sc,2)

line=ssc.textFileStream("file:///root/recruit/data")
rdd=line.map(lambda x:x)
rdd.pprint()

ssc.start()
ssc.awaitTermination()

 socket_wordcount.py代码

import sys
from pyspark import SparkContext
from pyspark.streaming import StreamingContext

if __name__=="__main__":
    sc=SparkContext(appName="PythonStreamingNetworkWordCount")
    sc.setLogLevel("ERROR")
    ssc=StreamingContext(sc,1)
    lines=ssc.socketTextStream("spark-worker1",9001)
    counts=lines.flatMap(lambda line:line.split(" ")) 
                .map(lambda word:(word,1)) 
                .reduceByKey(lambda a,b:a+b)
    counts.pprint()
    ssc.start()
    ssc.awaitTermination()

docker-compose.yml 文件代码

version: "2"
services:
  master:
    image: zylctgu/spark2.4
    command: /start-master
    hostname: spark-master
    container_name: spark-master
    volumes:
      - /d/documents/docker-files/spark/share_files:/root/spark
    ports:
      - "4040:4040"
      - "8080:8080"
  worker1:
    image: zylctgu/spark2.4
    command: /start-worker
    hostname: worker1
    container_name: spark-worker1
    volumes:
      - /d/documents/docker-files/spark/share_files:/root/spark
    ports:
      - "4041:4040"
      - "8081:8081"
    links:
      - master  
    environment:
      SPARK_WORKER_CORES: 1
      SPARK_WORKER_MEMORY: 2g
  worker2:
    image: zylctgu/spark2.4
    command: /start-worker
    hostname: worker2
    container_name: spark-worker2
    volumes:
      - /d/documents/docker-files/spark/share_files:/root/spark
    ports:
      - "4042:4040"
      - "8082:8081"
    links:
      - master
    environment:
      SPARK_WORKER_CORES: 1
      SPARK_WORKER_MEMORY: 2g

具体执行步骤

1、打开master 窗口

docker exec -it spark-master bash        #是docke-compose对应的

cd root/spark

nc -lk -p 9005        #是docker-compose里面的端口号

 

2、新开一个worker1窗口

docker exec -it spark-worker1 bash        #新开的worker1窗口

cd root/spark

spark-submit socket_wordcount.py        #要运行的文件名socket_wordcount.py 

3、在maser窗口输入词频,以“ ”空格分开

在新窗口就可以看见统计了

结果如图所示: 

转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/854534.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号