报错信息
22/03/14 10:58:26 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 22/03/14 10:58:26 INFO SparkContext: Running Spark version 3.0.0 22/03/14 10:58:26 INFO ResourceUtils: ============================================================== 22/03/14 10:58:26 INFO ResourceUtils: Resources for spark.driver: 22/03/14 10:58:26 INFO ResourceUtils: ============================================================== 22/03/14 10:58:26 INFO SparkContext: Submitted application: Spark01_FindWord.py 22/03/14 10:58:26 INFO SecurityManager: Changing view acls to: root 22/03/14 10:58:26 INFO SecurityManager: Changing modify acls to: root 22/03/14 10:58:26 INFO SecurityManager: Changing view acls groups to: 22/03/14 10:58:26 INFO SecurityManager: Changing modify acls groups to: 22/03/14 10:58:26 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set() 22/03/14 10:58:27 INFO Utils: Successfully started service 'sparkDriver' on port 44307. 22/03/14 10:58:27 INFO SparkEnv: Registering MapOutputTracker 22/03/14 10:58:27 INFO SparkEnv: Registering BlockManagerMaster 22/03/14 10:58:27 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information 22/03/14 10:58:27 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up 22/03/14 10:58:27 INFO SparkEnv: Registering BlockManagerMasterHeartbeat 22/03/14 10:58:27 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-a4554f81-46f5-4499-b6fd-2d5a9005552b 22/03/14 10:58:27 INFO MemoryStore: MemoryStore started with capacity 366.3 MiB 22/03/14 10:58:27 INFO SparkEnv: Registering OutputCommitCoordinator 22/03/14 10:58:27 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041. 22/03/14 10:58:27 WARN Utils: Service 'SparkUI' could not bind on port 4041. Attempting port 4042. 22/03/14 10:58:27 WARN Utils: Service 'SparkUI' could not bind on port 4042. Attempting port 4043. 22/03/14 10:58:27 WARN Utils: Service 'SparkUI' could not bind on port 4043. Attempting port 4044. 22/03/14 10:58:27 WARN Utils: Service 'SparkUI' could not bind on port 4044. Attempting port 4045. 22/03/14 10:58:27 INFO Utils: Successfully started service 'SparkUI' on port 4045. 22/03/14 10:58:27 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://node01:4045 22/03/14 10:58:27 ERROR SparkContext: Error initializing SparkContext. org.apache.spark.SparkException: Could not parse Master URL: '修改方法 错误代码' at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2924) at org.apache.spark.SparkContext. (SparkContext.scala:528) at org.apache.spark.api.java.JavaSparkContext. (JavaSparkContext.scala:58) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:238) at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80) at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.lang.Thread.run(Thread.java:748) 22/03/14 10:58:27 INFO SparkUI: Stopped Spark web UI at http://node01:4045 22/03/14 10:58:27 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 22/03/14 10:58:27 INFO MemoryStore: MemoryStore cleared 22/03/14 10:58:27 INFO BlockManager: BlockManager stopped 22/03/14 10:58:28 INFO BlockManagerMaster: BlockManagerMaster stopped 22/03/14 10:58:28 WARN MetricsSystem: Stopping a MetricsSystem that is not running 22/03/14 10:58:28 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 22/03/14 10:58:28 INFO SparkContext: Successfully stopped SparkContext Traceback (most recent call last): File "/ext/servers/spark3.0/MyCode/Spark01_FindWord.py", line 4, in sc = SparkContext(conf) File "/ext/servers/spark3.0/python/lib/pyspark.zip/pyspark/context.py", line 131, in __init__ File "/ext/servers/spark3.0/python/lib/pyspark.zip/pyspark/context.py", line 193, in _do_init File "/ext/servers/spark3.0/python/lib/pyspark.zip/pyspark/context.py", line 310, in _initialize_context File "/ext/servers/spark3.0/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1569, in __call__ File "/ext/servers/spark3.0/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 328, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext. : org.apache.spark.SparkException: Could not parse Master URL: ' ' at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2924) at org.apache.spark.SparkContext. (SparkContext.scala:528) at org.apache.spark.api.java.JavaSparkContext. (JavaSparkContext.scala:58) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:238) at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80) at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.lang.Thread.run(Thread.java:748) 22/03/14 10:58:28 INFO ShutdownHookManager: Shutdown hook called 22/03/14 10:58:28 INFO ShutdownHookManager: Deleting directory /tmp/spark-32987338-486b-448a-b9f4-3cf4d68aa85d 22/03/14 10:58:28 INFO ShutdownHookManager: Deleting directory /tmp/spark-2160f024-2430-4ab1-b5ed-5c3afc402351 (Spark_python) root@node01:/ext/servers/spark3.0/MyCode# spark-submit Spark01_FindWord.py
from pyspark import SparkConf, SparkContext
conf = SparkConf().setMaster("local").setAppName("MyApp")
sc = SparkContext(conf)
hdfs_file = "hdfs://node01:/user/hadoop/test06.txt"
hdfs_rdd = sc.textFile(hdfs_file).count()
print(hdfs_rdd)
修改之后的代码
from pyspark import SparkConf, SparkContext
conf = SparkConf().setMaster("local").setAppName("MyApp")
sc = SparkContext(conf=conf)
hdfs_file = "hdfs://node01:/user/hadoop/test06.txt"
hdfs_rdd = sc.textFile(hdfs_file).count()
print(hdfs_rdd)
总结
第一个就是SparkContext这个方法的传参问题,我们如果不进行指定的话,你直接写conf是没有办法进行默认传参的,所以后面就直接报错了,所以我们要指定参数



