I got the following error (in Hadoop user logs) while trying to run Mahout map reduce job in Hadoop (fully distribution mode):
2015-03-25 08:31:52,858 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.net.ConnectException: Call From slave01.net/127.0.1.1 to slave01.net:60926 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731)
at org.apache.hadoop.ipc.Client.call(Client.java:1472)
at org.apache.hadoop.ipc.Client.call(Client.java:1399)
at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:244)
at com.sun.proxy.$Proxy7.getTask(Unknown Source)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:132)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:607)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:705)
at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1521)
at org.apache.hadoop.ipc.Client.call(Client.java:1438)
I could solve this issue by,
Replacing 127.0.1.1 host name mapping to permanent IP as given below:
33.33.33.10 master
Thanks for blog. At least some one has faced this issue and solved it !!
ReplyDeleteI am facing same problem ....Some time whole job get executed successfully and sometime only maps that are running on AM gets successfully completed but rest maps fails with above errors.
I had tried many thing and concluded that failed maps try to connect to with AM..on local machine with localhost as machine name.
For above solution that you have given ,i have some confusion..can you explain it in detail ?
i have following hosts file in each slaves.each slave has hostname as in seen in maping.
127.0.0.1 localhost
192.168.xxx.xx1 slave1
192.168.xxx.xx2 slave2
...
192.168.xxx.xxN slaveN
is above mapping is correct or anything else needed ?
Thanks,
Jagdish
Hi Jagdish,
ReplyDeleteI encountered this issue sometime back. As, I remember I setup Hadoop cluster in Vagrant environment. There, in Vagrantfile configuration, I have given the host name using statement below.
master.vm.hostname = "master.net"
Then, during Vagrant startup, it will add the following entry to etc/hosts file.
127.0.1.1 master.net master
I replaced that with the following entry:
33.33.33.10 master.net master
Also, in master node host file I have given following entry as well.
33.33.33.11 slave01.net
This seems to be failure in communication between nodes in Hadoop cluster. You can check the logs for each failed application to see which tasks are failing and in which nodes.
Did you add the host entry for master node in slaves?