栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 前沿技术 > 大数据 > 大数据系统

pyspark使用报错记录

pyspark使用报错记录

1、启动spark时,提示JAVA_HOME not set

(1)下载jdk-8u291-linux-x64.tar.gz

(2)解压到/usr/local/java目录下

(3)在~/.bashrc中添加以下内容

export JAVA_HOME="/usr/local/java/jdk1.8.0_291"
export PATH=$JAVA_HOME/bin:$PATH

(4)source ~/.bashrc

(5)测试

(py3_spark) [root@100-020-gpuserver controller]# java -version
java version "1.8.0_291"
Java(TM) SE Runtime Environment (build 1.8.0_291-b10)
Java HotSpot(TM) 64-Bit Server VM (build 25.291-b10, mixed mode)

2、Exception: Java gateway process exited before sending the driver its port number异常

(py3_spark) [root@100-020-gpuserver controller]# python conn_spark.py
Exception in thread "main" java.lang.ExceptionInInitializerError
        at org.apache.spark.SparkConf$.(SparkConf.scala:668)
        at org.apache.spark.SparkConf$.(SparkConf.scala)
        at org.apache.spark.SparkConf.set(SparkConf.scala:94)
        at org.apache.spark.SparkConf.set(SparkConf.scala:83)
        at org.apache.spark.deploy.SparkSubmit$$anonfun$prepareSubmitEnvironment$1.apply(SparkSubmit.scala:367)
        at org.apache.spark.deploy.SparkSubmit$$anonfun$prepareSubmitEnvironment$1.apply(SparkSubmit.scala:367)
        at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
        at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
        at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
        at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
        at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
        at org.apache.spark.deploy.SparkSubmit$.prepareSubmitEnvironment(SparkSubmit.scala:367)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:170)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.net.UnknownHostException: 100-020-gpuserver: 100-020-gpuserver: Name or service not known
        at java.net.InetAddress.getLocalHost(InetAddress.java:1506)
        at org.apache.spark.util.Utils$.findLocalInetAddress(Utils.scala:911)
        at org.apache.spark.util.Utils$.org$apache$spark$util$Utils$$localIpAddress$lzycompute(Utils.scala:904)
        at org.apache.spark.util.Utils$.org$apache$spark$util$Utils$$localIpAddress(Utils.scala:904)
        at org.apache.spark.util.Utils$$anonfun$localCanonicalHostName$1.apply(Utils.scala:961)
        at org.apache.spark.util.Utils$$anonfun$localCanonicalHostName$1.apply(Utils.scala:961)
        at scala.Option.getOrElse(Option.scala:121)
        at org.apache.spark.util.Utils$.localCanonicalHostName(Utils.scala:961)
        at org.apache.spark.internal.config.package$.(package.scala:282)
        at org.apache.spark.internal.config.package$.(package.scala)
        ... 15 more
Caused by: java.net.UnknownHostException: 100-020-gpuserver: Name or service not known
        at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
        at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929)
        at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324)
        at java.net.InetAddress.getLocalHost(InetAddress.java:1501)
        ... 24 more
Traceback (most recent call last):
  File "conn_spark.py", line 26, in 
    spark = SparkSession.builder.appName('Managereval').master('local').getOrCreate()
  File "/root/anaconda3/lib/python3.7/site-packages/pyspark/sql/session.py", line 173, in getOrCreate
    sc = SparkContext.getOrCreate(sparkConf)
  File "/root/anaconda3/lib/python3.7/site-packages/pyspark/context.py", line 331, in getOrCreate
    SparkContext(conf=conf or SparkConf())
  File "/root/anaconda3/lib/python3.7/site-packages/pyspark/context.py", line 115, in __init__
    SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
  File "/root/anaconda3/lib/python3.7/site-packages/pyspark/context.py", line 280, in _ensure_initialized
    SparkContext._gateway = gateway or launch_gateway(conf)
  File "/root/anaconda3/lib/python3.7/site-packages/pyspark/java_gateway.py", line 95, in launch_gateway
    raise Exception("Java gateway process exited before sending the driver its port number")
Exception: Java gateway process exited before sending the driver its port number

问题分析:java.net.UnknownHostException: 100-020-gpuserver: 100-020-gpuserver: Name or service not known

解决方案:将/etc/hosts文件中127.0.0.1的主机名改为100-020-gpuserver

127.0.0.1   100-020-gpuserver localhost.localdomain localhost4 localhost4.localdomain4

3、py4j.protocol.Py4JJavaError: An error occurred while calling o31.jdbc.

(py3_spark) [root@100-020-gpuserver controller]# python conn_spark.py
2021-10-09 15:47:10 WARN  Utils:66 - Your hostname, 100-020-gpuserver resolves to a loopback address: 127.0.0.1; using 172.19.100.20 instead (on interface eth0)
2021-10-09 15:47:10 WARN  Utils:66 - Set SPARK_LOCAL_IP if you need to bind to another address
2021-10-09 15:47:10 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Traceback (most recent call last):
  File "conn_spark.py", line 30, in 
    piggery_data = spark.read.jdbc(url=url, table=sow_piggery_informations_tb, properties=prop)
  File "/data/.virtualenvs/py3_spark/lib/python3.7/site-packages/pyspark/sql/readwriter.py", line 525, in jdbc
    return self._df(self._jreader.jdbc(url, table, jprop))
  File "/data/.virtualenvs/py3_spark/lib/python3.7/site-packages/py4j/java_gateway.py", line 1160, in __call__
    answer, self.gateway_client, self.target_id, self.name)
  File "/data/.virtualenvs/py3_spark/lib/python3.7/site-packages/pyspark/sql/utils.py", line 63, in deco
    return f(*a, **kw)
  File "/data/.virtualenvs/py3_spark/lib/python3.7/site-packages/py4j/protocol.py", line 320, in get_return_value
    format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling o31.jdbc.
: java.lang.ClassNotFoundException: org.postgresql.Driver
        at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
        at org.apache.spark.sql.execution.datasources.jdbc.DriverRegistry$.register(DriverRegistry.scala:45)
        at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions$$anonfun$6.apply(JDBCOptions.scala:79)
        at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions$$anonfun$6.apply(JDBCOptions.scala:79)
        at scala.Option.foreach(Option.scala:257)
        at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions.(JDBCOptions.scala:79)
        at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions.(JDBCOptions.scala:35)
        at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:34)
        at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:340)
        at org.apache.spark.sql.DataframeReader.loadV1Source(DataframeReader.scala:239)
        at org.apache.spark.sql.DataframeReader.load(DataframeReader.scala:227)
        at org.apache.spark.sql.DataframeReader.load(DataframeReader.scala:164)
        at org.apache.spark.sql.DataframeReader.jdbc(DataframeReader.scala:254)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:282)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:214)
        at java.lang.Thread.run(Thread.java:748)

问题分析:java.lang.ClassNotFoundException: org.postgresql.Driver,缺postgresql数据库相关的jar包

解决办法:(1)官网下载相应版本的jar包,例如postgresql-9.4.1212.jar

                  (2)移动到python的pyspark安装包下的jars目录下,例如/data/.virtualenvs/py3_spark/lib/python3.7/site-packages/pyspark/jars/

转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/307385.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号