The worker in the spark cluster is always unable to connect to the master, zookeeper cluster to achieve master high availability? - Codes Helper - Programming Question Answer

The worker in the spark cluster is always unable to connect to the master, zookeeper cluster to achieve master high availability?

1Query spark0-2 the three hosts are zookeeper clusters

2 spark0-4 five hosts are spark clusters

3 spark0-1 two hosts achieve master high availability.

run start-all.sh on spark0 to start the spark cluster. At this point, spark will be launched natively as master, and the spark2 spark3 spark4 host in the slaves file will be launched as slave.

:
spark0 start-allsh slavespark0master

spark0 spark0ssh slave slavesparkmastermaster

spark0 ssh spark2sparkspark2 sparkworkerworker masterspark2worker spark localhost masterzookeeper spark masterlocalhost master

:
1sparkzookeeper

-- restate the question-

repeat the question:

(1) spark0 as master and spark1 as backup master

(2) spark2-4 is configured as the worker node in slave

(3) run start-all.sh on spark0, first start spark on spark0 locally as master, and then start spark2,spark3 spark4 in slaves as worker through ssh

(4) after spark on spark2 starts, I need to confirm communication with master. At this time, I checked the log on spark2 and found that spark on spark2 could not find the master, log. The log shows localhost as mater, just like the red arrow in the figure above.

(5) spark0 has been unable to receive a confirmation of successful startup of the spark2-spark4 work node, so the worker startup failed.

question: why can"t the three nodes of spark2 spark3 spark4 find master?? Use localhost as master? Normally, after spark2 spark3 spark4 starts, you should ask the zookeeper cluster which grounding is master, because we have configured the connection zookeeper cluster parameter in the spark-env.sh file in saprk2 spark3 spark4

Spark

Mar.10,2021

doesn't there be a slaves file? It should be equipped with a worker address. Which machine is slaves, that has the host address in it?

have you solved the brother problem? I specially signed up an account to reply. No, no, no.
I also encountered the same problem.
thought at first that the hosts file was misconfigured, and then I looked for the reason for the zookeeper configuration, but there was no result.
it took half a day
and then I found a solution on stackoverflow:
configure in spark-env.sh:
export SPARK_MASTER_HOST=your master ip
export SPARK_LOCAL_IP=your local ip

because it is said that after spark2.0, the parameter SPARK_MASTER_IP is gone, and it becomes SPARK_MASTER_HOST
the SPARK_MASTER_IP, that I have been using before has no problem on other servers, but there is a problem in the production environment.
although there are still some doubts, at least the problem has been solved.

Previous: Eclipse testing for react projects

Next: A statement causes high CPU occupancy in Oracle database

Why is it necessary to start start-all on master when starting a spark cluster, and startup failure occurs when starting start-all on slave?
what is the process of running start-all to start a spark cluster? ...

Spark

Feb.28,2021
Why can't you access the properties of an object in the method of spark: scala?
how to understand the content of the green part? Why does it feel so awkward? the feeling in the book is also very vague. ...

Spark scala

Feb.28,2021
On the understanding of scala Grammar in Spark
val lines: Dataset[String] = session.read.textFile("") val words: Dataset[String] = lines.flatMap(_.split(" ")) linesdataSetflatMapdataSetIDEAflatMap: def flatMap[U : Encoder](func: T => Traversabl...

Java scala spark

Mar.03,2021
Call Spark mllib linear regression to print weights and other coefficients are all NaN
use spark mllib linear regression to do traffic forecast printing training, weight and other coefficients are all NaN data format: 520221 | 0009 | 0009 | 292 | 000541875150 | 2018 | 04 | 18 | 11 | 3 | 137 520626 | 0038 | 0038 | 520626 | 2030300010...

Linear-regression spark machine-learning

Mar.06,2021
Is the data of the same field after spark groupby in the same partition?
suppose there are ten partitions in a RDD. When you groupby this RDD, you get a new RDD,. Is the data of the same field in the same partition? my test results show that data from the same grouping field is divided into the same partition, and data fro...

Spark

Mar.11,2021
I would like to ask Spark'DataFrame to edit only one column (intercept a paragraph) and return the new DataFrame.
ask Spark DataFrame to edit only one column (intercept a paragraph) and return a new DataFrame ...

Spark

Mar.11,2021
Start-all mode to start the spark cluster, there is no Master process?
started spark,hdfs,yarn successfully at first, but after a long time, it was found that the spark task could not be submitted normally, and there was always an error similar to the following. "INFO Client: Retrying connect to server: 0.0.0.0 Already tr...

Spark

Mar.11,2021
The Zeppelin tutorial code runs with an error.
refer to the link description to run the tutorial code in the web notebook provided by the zeppelin container. Import local file: val bankText = sc.textFile("D: Projects Zeppelin bank bank-full.csv") case class Bank(age:Integer, job:Stri...

Spark zeppelin

Mar.11,2021
Are there any open source middleware or components for traffic distribution?
are there any open source middleware products that provide traffic marking and traffic distribution? that is, when a http request comes, you can route the request to the specified machine or environment according to the information of various dimension...

Springboot spark springcloud spring java

Mar.12,2021
How to sort the Int type instead of the String type when sparksql operates the csv sort
Dataset<Row> df = spark.read().format("csv").load("C: develop intellij-workspace SparkSqlDemos resources down.csv"); df.createOrReplaceTempView("down"); Dataset<Row> dfSQL = spark.sql("SELECT ...

Spark

Mar.12,2021
Spark Job submission error: Initial job has not accepted any resources;.
I use IntelliJ IDEA locally for spark development and report an error when submitting it to the cluster to run. After searching, all the answers point to insufficient CPU memory resources, but I have set up enough CPU memory resources, and the state o...

Spark

Mar.13,2021
CreateDirectStream.foreachRDD error calling external object
the figure is as follows: def update_model(rdd),mixture_model: it s OK to declare mixture_model directly in update_model, but every time you foreachRDD, you need to re-declare MingleModel. Makes it impossible to update the model in real time ...

Python spark-streaming spark spark-submit

Mar.16,2021
Error NativeCodeLoader:62 after installing pyspark in win10
configure spark s environment according to this link https: blog.csdn.net w417950., but will report an error when starting: I searched for it and didn t find a way to solve my problem. Beginners on the road, please forgive me ...

Spark python

Mar.16,2021
AWS EC2-Initial job has not accepted any resources
Slaves had registered, but cannot pass work to slave. (Standalone) If I open all of the Inbound TCP port, it can work. But I cannot do it, because it is about security. 2018-06-04 13:22:44 INFO DAGScheduler:54 - Submitting 100 missing tasks from...

Amazon-web-services ec2 spark

Mar.16,2021
You have greatly helped me to see if my processing method is reasonable [a large number of Json files are read, parsed and imported into elasticsearch].
< H2 > Business scenario < H2 > A large number of json files need to be read and re-parsed and imported into elasticsearch . Json files are saved in different date folders. The size of a single folder is about 80g. The number of json files under the ...

Java spark hadoop elasticsearch big-data

Mar.23,2021
Why did pyspark fail to call python third-party libraries in RDD?
problem description Hi, I called the jieba participle when I was running pyspark on the company line, and found that I could successfully import, but when I called the participle function in RDD, it suggested that there was no module jieba, without th...

Spark python pyspark

Mar.28,2021
Spark sql parses the json of an array of nested objects
1. Json data is now available as follows { "id ": 11, "data ": [{ "package ": "com.browser1 ", "activetime ": 60000}, { "package ": "com.browser6 ", "activetime ": 1205000}, { "package ": "com.browser7 ", "activetime ": 1205000}]} { "id ": 12...

Big-data json spark-streaming scala spark

Mar.28,2021
How can two large pieces of data in spark avoid shuffle in join?
purpose: there are two large pieces of data in spark that require join,. Both input data contain the field userid. Now you need to associate them according to userid. I hope to avoid shuffle. completed: I pre-processed two pieces of data into 1w f...

Spark pyspark spark-streaming big-data

Mar.31,2021
Use spark or hadoop to delete duplicate two-way relational data
I have a batch of data (10 billion) as follows, ID FROM TO 1 A B 2 A C 3 B A 4 C A Delete duplicate two-way relational data as follows ID FROM TO 1 A B 2 A C 1. Because the amount of data is too large, bloomfilter is no...

Java bloomfilter elasticsearch spark hadoop

Mar.31,2021
Can the program that calls Spark be started directly in Java-jar mode?
I wrote a worldcount program for spark, which can be debugged in eclipse using local mode, or run through the maven packaged java-jar command: SparkConf sparkConf = new SparkConf().setAppName("JavaWordCount"); sparkConf.setMaster("loc...

Spark java

Apr.03,2021

MySQL Query : SELECT * FROM `codeshelper`.`v9_news` WHERE status=99 AND catid='6' ORDER BY rand() LIMIT 5
MySQL Error : Disk full (/tmp/#sql-temptable-64f5-38268bf-3755d.MAI); waiting for someone to free some space... (errno: 28 "No space left on device")
MySQL Errno : 1021
Message : Disk full (/tmp/#sql-temptable-64f5-38268bf-3755d.MAI); waiting for someone to free some space... (errno: 28 "No space left on device")
Need Help?