Monday, March 30, 2015

org.apache.hadoop.mapred.YarnChild: Exception running child : java.net.ConnectException: Connection refused

I got the following error (in Hadoop user logs) while trying to run Mahout map reduce job in Hadoop (fully distribution mode):

2015-03-25 08:31:52,858 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.net.ConnectException: Call From slave01.net/127.0.1.1 to slave01.net:60926 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
    at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791)
    at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731)
    at org.apache.hadoop.ipc.Client.call(Client.java:1472)
    at org.apache.hadoop.ipc.Client.call(Client.java:1399)
    at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:244)
    at com.sun.proxy.$Proxy7.getTask(Unknown Source)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:132)
Caused by: java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494)
    at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:607)
    at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:705)
    at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:1521)
    at org.apache.hadoop.ipc.Client.call(Client.java:1438)

I could solve this issue by,
Replacing 127.0.1.1 host name mapping to permanent IP as given below:
33.33.33.10      master

2 comments:

  1. Thanks for blog. At least some one has faced this issue and solved it !!
    I am facing same problem ....Some time whole job get executed successfully and sometime only maps that are running on AM gets successfully completed but rest maps fails with above errors.
    I had tried many thing and concluded that failed maps try to connect to with AM..on local machine with localhost as machine name.

    For above solution that you have given ,i have some confusion..can you explain it in detail ?
    i have following hosts file in each slaves.each slave has hostname as in seen in maping.
    127.0.0.1 localhost
    192.168.xxx.xx1 slave1
    192.168.xxx.xx2 slave2
    ...
    192.168.xxx.xxN slaveN

    is above mapping is correct or anything else needed ?

    Thanks,
    Jagdish

    ReplyDelete
  2. Hi Jagdish,

    I encountered this issue sometime back. As, I remember I setup Hadoop cluster in Vagrant environment. There, in Vagrantfile configuration, I have given the host name using statement below.

    master.vm.hostname = "master.net"

    Then, during Vagrant startup, it will add the following entry to etc/hosts file.
    127.0.1.1 master.net master

    I replaced that with the following entry:
    33.33.33.10 master.net master

    Also, in master node host file I have given following entry as well.
    33.33.33.11 slave01.net

    This seems to be failure in communication between nodes in Hadoop cluster. You can check the logs for each failed application to see which tasks are failing and in which nodes.

    Did you add the host entry for master node in slaves?

    ReplyDelete