栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 前沿技术 > 大数据 > 大数据系统

spark教程

spark教程

​linux 安装部署spark:

https://www.cnblogs.com/tijun/p/7561718.html

https://blog.csdn.net/heartsdance/article/details/119751588

https://blog.csdn.net/weixin_43854358/article/details/90666193 

(这篇不错,有case)

如何排查启动失败问题:

https://blog.csdn.net/C_time/article/details/100023332

测试圆周率:

bin/spark-submit --master spark://10.153.110.18:8077 --class org.apache.spark.examples.SparkPi examples/jars/spark-examples_2.11-2.4.7.jar 100 (100可自行设定)

pyspark SparkConf详解:

https://blog.csdn.net/weixin_40161254/article/details/87916880

b9b上的spark位置:

/home/disk1/software/spark-2.4.7-bin-hadoop2.7/

b9b 上的 spark日志位置:

/home/disk1/software/spark-2.4.7-bin-hadoop2.7/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-yq01-aip-aip07a73b9b.yq01.baidu.com.out

b9b的spark web-ui(可查看任务日志等):

http://yq01-aip-aip07a73b9b.yq01.baidu.com:8078/

本地提交spark任务排查

https://www.freesion.com/article/7551171582/

spark统计:

rdd统计:

https://blog.csdn.net/liangzelei/article/details/80573015

dataframe 统计:

https://cloud.tencent.com/developer/article/1031061

https://blog.csdn.net/suzyu12345/article/details/79673557

https://zhuanlan.zhihu.com/p/237637848

内存不足的解决方案:WARN MemoryStore: Not enough space to cache rdd

https://www.playpi.org/2020012201.html

关于是否会删除缓存:

https://www.jianshu.com/p/761fa2ee868e

pyspark 提交 :spark-sumbit

www.jianshu.com/p/df0a189ff28b

https://www.cnblogs.com/piperck/p/10121097.html

https://zhuanlan.zhihu.com/p/101740397

Pyspark 支持yarn模式:

https://www.cnblogs.com/yanshw/p/12083488.html

关于在python脚本中设置master(不想用submit脚本,这个是做不到滴):

https://stackoverflow.com/questions/31327275/pyspark-on-yarn-cluster-mode

pyspark支持yarn-cluster:

spark-submit具体参数含义:

https://www.malaoshi.top/show_1IXnhwPEDg0.html

https://xujiyou.work/%E5%A4%A7%E6%95%B0%E6%8D%AE/Spark/spark-submit%E8%AF%A6%E8%A7%A3.html

b9b上面排查spark的python任务错误的日志(注意std_out):

/home/work/hadoop-2.10.0/logs/userlogs/application_1629377733083_3679/

or

/home/work/hadoop-2.10.0/logs/userlogs

spark自定义分区:

https://blog.csdn.net/weixin_45102492/article/details/104726795

spark actions 、 transfoamations等:

https://spark.apache.org/docs/latest/rdd-programming-guide.html#actions

厂内b9b机器跑bmr:

./bin/run-example --master yarn --deploy-mode cluster --files /home/aicu-tob/software/baidu_spark_emr/output_afs_agent/conf/yarn-site.xml SparkPi

./bin/run-example --master yarn --deploy-mode cluster --class org.apache.spark.examples.SparkPi examples/jars/spark-examples_2.11-2.4.3.2-baidu.jar

yarn-site.xml 配置:

https://ifeve.com/spark-yarn-run-spark/

https://blog.csdn.net/Jerry_991/article/details/85042305 

工单链接:

https://console.cloud.baidu-int.com/ticket/new/?productId=217

厂内的spark的各种问题(这煞笔公司什么时候倒闭算了):

http://wiki.baidu.com/pages/viewpage.action?pageId=324590584#id-6.%E5%B8%B8%E8%A7%81%E9%97%AE%E9%A2%98-9.%E7%94%A8hadoopfs-ls%E6%9F%A5%E7%9C%8Bafs%EF%BC%8C%E6%8A%A5ls:NoFileSystemforscheme:afs%E7%9A%84%E9%94%99%E8%AF%AF

Spark_env教程:

https://blog.csdn.net/u010199356/article/details/89056304

spark join教程:

https://www.sohu.com/a/427258627_315839

https://zhuanlan.zhihu.com/p/317226768

转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/748495.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号