The worker in the spark cluster is always unable to connect to the master, zookeeper cluster to achieve master high availability?

1Query spark0-2 the three hosts are zookeeper clusters

2 spark0-4 five hosts are spark clusters

3 spark0-1 two hosts achieve master high availability.

run start-all.sh on spark0 to start the spark cluster. At this point, spark will be launched natively as master, and the spark2 spark3 spark4 host in the slaves file will be launched as slave.

clipboard.png

:
spark0 start-allsh slavespark0master

spark0 spark0ssh slave slavesparkmastermaster

spark0 ssh spark2sparkspark2 sparkworkerworker masterspark2worker spark localhost masterzookeeper spark masterlocalhost master

:
1sparkzookeeper

clipboard.png

-- restate the question-

repeat the question:

(1) spark0 as master and spark1 as backup master

(2) spark2-4 is configured as the worker node in slave

(3) run start-all.sh on spark0, first start spark on spark0 locally as master, and then start spark2,spark3 spark4 in slaves as worker through ssh

(4) after spark on spark2 starts, I need to confirm communication with master. At this time, I checked the log on spark2 and found that spark on spark2 could not find the master, log. The log shows localhost as mater, just like the red arrow in the figure above.

(5) spark0 has been unable to receive a confirmation of successful startup of the spark2-spark4 work node, so the worker startup failed.

question: why can"t the three nodes of spark2 spark3 spark4 find master?? Use localhost as master? Normally, after spark2 spark3 spark4 starts, you should ask the zookeeper cluster which grounding is master, because we have configured the connection zookeeper cluster parameter in the spark-env.sh file in saprk2 spark3 spark4

Mar.10,2021

doesn't there be a slaves file? It should be equipped with a worker address. Which machine is slaves, that has the host address in it?


have you solved the brother problem? I specially signed up an account to reply. No, no, no.
I also encountered the same problem.
thought at first that the hosts file was misconfigured, and then I looked for the reason for the zookeeper configuration, but there was no result.
it took half a day
and then I found a solution on stackoverflow:
configure in spark-env.sh:
export SPARK_MASTER_HOST=your master ip
export SPARK_LOCAL_IP=your local ip

because it is said that after spark2.0, the parameter SPARK_MASTER_IP is gone, and it becomes SPARK_MASTER_HOST
the SPARK_MASTER_IP, that I have been using before has no problem on other servers, but there is a problem in the production environment.
although there are still some doubts, at least the problem has been solved.

Menu