Monday, March 30, 2015

org.apache.hadoop.mapred.YarnChild: Exception running child : java.net.ConnectException: Connection refused

I got the following error (in Hadoop user logs) while trying to run Mahout map reduce job in Hadoop (fully distribution mode):

2015-03-25 08:31:52,858 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.net.ConnectException: Call From slave01.net/127.0.1.1 to slave01.net:60926 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
    at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791)
    at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731)
    at org.apache.hadoop.ipc.Client.call(Client.java:1472)
    at org.apache.hadoop.ipc.Client.call(Client.java:1399)
    at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:244)
    at com.sun.proxy.$Proxy7.getTask(Unknown Source)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:132)
Caused by: java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494)
    at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:607)
    at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:705)
    at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:1521)
    at org.apache.hadoop.ipc.Client.call(Client.java:1438)

I could solve this issue by,
Replacing 127.0.1.1 host name mapping to permanent IP as given below:
33.33.33.10      master

Sunday, March 15, 2015

java.io.IOException: Incompatible clusterIDs in /home/user/hadoop/data

I encountered this issue when I added a new data node later for an already created Hadoop cluster.

Problem:
FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for Block pool (Datanode Uuid unassigned) service to master/33.33.33.10:9000. Exiting.
java.io.IOException: Incompatible clusterIDs in /home/huser/hadoop/data: namenode clusterID = CID-8019e6e9-73d7-409c-a241-b57e9534e6fe; datanode clusterID = CID-bcc9c537-54dc-4329-bf63-448037976f75
    at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:646)
    at org.apache.hadoop.hdfs.server.datanode.DataStorage.addStorageLocations(DataStorage.java:320)
    at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:403)
    at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:422)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1311)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1276)
    at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:314)
    at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:220)
    at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:828)


Solution:
The issue seems to be due to some version/ metadata mismatch problem.  I followed the steps given below to solve the issue:
  1. Delete the directory listed as dfs.datanode.data.dir/ dfs.namenode.name.dir configuration in hdfs-site.xml
  2. Delete tmp/hadoop-hduser directory
  3. Re-format the name node using following command
./hdfs namenode -format




Issue when starting Hadoop cluster


Problem:
have: ssh: Could not resolve hostname have: Name or service not known

warning:: ssh: Could not resolve hostname warning:: Name or service not known

guard: ssh: Could not resolve hostname guard: Name or service not known

VM: ssh: Could not resolve hostname VM: Name or service not known

you: ssh: Could not resolve hostname you: Name or service not known

You: ssh: Could not resolve hostname You: Name or service not known

fix: ssh: Could not resolve hostname fix: Name or service not known

Client: ssh: Could not resolve hostname Client: Name or service not known

'execstack: ssh: Could not resolve hostname 'execstack: Name or service not known

might: ssh: Could not resolve hostname might: Name or service not known

HotSpot(TM): ssh: Could not resolve hostname HotSpot(TM): Name or service not known

',: ssh: Could not resolve hostname ',: Name or service not known

VM: ssh: Could not resolve hostname VM: Name or service not known

or: ssh: Could not resolve hostname or: Name or service not known

disabled: ssh: Could not resolve hostname disabled: Name or service not known

loaded: ssh: Could not resolve hostname loaded: Name or service not known

recommended: ssh: Could not resolve hostname recommended: Name or service not known

which: ssh: Could not resolve hostname which: Name or service not known

fix: ssh: Could not resolve hostname fix: Name or service not known

now.: ssh: Could not resolve hostname now.: Name or service not known
the: ssh: Could not resolve hostname the: Name or service not known

that: ssh: Could not resolve hostname that: Name or service not known

guard.: ssh: Could not resolve hostname guard.: Name or service not known

will: ssh: Could not resolve hostname will: Name or service not known

have: ssh: Could not resolve hostname have: Name or service not known

library: ssh: Could not resolve hostname library: Name or service not known

library: ssh: Could not resolve hostname library: Name or service not known

stack: ssh: Could not resolve hostname stack: Name or service not known

The: ssh: Could not resolve hostname The: Name or service not known

try: ssh: Could not resolve hostname try: Name or service not known

the: ssh: Could not resolve hostname the: Name or service not known

link: ssh: Could not resolve hostname link: Name or service not known

highly: ssh: Could not resolve hostname highly: Name or service not known

It's: ssh: Could not resolve hostname It's: Name or service not known

with: ssh: Could not resolve hostname with: Name or service not known

stack: ssh: Could not resolve hostname stack: Name or service not known

Java: ssh: Could not resolve hostname Java: Name or service not known

with: ssh: Could not resolve hostname with: Name or service not known

it: ssh: Could not resolve hostname it: Name or service not known

noexecstack'.: ssh: Could not resolve hostname noexecstack'.: Name or service not known

'-z: ssh: Could not resolve hostname '-z: Name or service not known


Solution:

Add the following to .bashrc file

export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_INSTALL/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_INSTALL/lib"




java.io.IOException: Incorrect configuration: namenode address dfs.namenode.servicerpc-address or dfs.namenode.rpc-address is not configured.

Issue: One data node in Hadoop cluster dies after starting

Problem:
java.io.IOException: Incorrect configuration: namenode address dfs.namenode.servicerpc-address or dfs.namenode.rpc-address is not configured.
    at org.apache.hadoop.hdfs.DFSUtil.getNNServiceRpcAddressesForCluster(DFSUtil.java:866)
    at org.apache.hadoop.hdfs.server.datanode.BlockPoolManager.refreshNamenodes(BlockPoolManager.java:155)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1074)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:415)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2268)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2155)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2202)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2378)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:2402)
2015-03-16 05:34:17,953 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1
2015-03-16 05:34:17,959 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
 

Solution:
Applied the following configuration in core-xml of data node


fs.defaultFS
hdfs://master:9000

Thursday, March 12, 2015

How to setup a private network in Vagrant?

Follow the steps given below to setup a private network in Vagrant.
  • Specify the node names (node 1, node 2 etc) and their static IPs (any preferred IP) in the Vagrant configuration file as given below:

Vagrant.configure("2") do |config|
  config.vm.provision "shell", inline: "echo Hello"

  config.vm.define "master" do |master|
    master.vm.box = "hashicorp/precise32"
    master.vm.network :private_network, ip: "33.33.33.10"
  end

  config.vm.define “node01" do | node01|
    node01.vm.box = "hashicorp/precise32"
   node01.vm.network :private_network, ip: "33.33.33.11"
  end

  config.vm.define "node02" do |node02|
    node02.vm.box = "hashicorp/precise32"
    node02.vm.network :private_network, ip: "33.33.33.12"
  end
end

  • Start and initialise new Vagrant instances

vagrant init

  • You can ssh to each instance by their names
Example:
vagrant ssh node01

Tuesday, March 10, 2015

Issues with Vi editor

  1. Create .vimrc in your home directory
  2. Insert the following content and save:
set nocompatible
set backspace=2

If you use,
sudo vi filename,  above option won't work.

Then you should use,
sudoedit filename

How to scp Vagrant?

from VM to local:

vagrant scp (name of the Vagrant environment:Vagrant path in the box) (local folder path - from the directory from which Vagrant is loaded)

vagrant scp default:/home/vagrant/jayani jayani

from local to VM:
vagrant scp  (local folder path - from the directory from which Vagrant is loaded) (name of the Vagrant environment:Vagrant path in the box)

vagrant scp jayani /home/vagrant

if you have multiple Vagrant environment precede with the name of the Vagrant environment.
vagrant scp master:jayani /home/vagrant

How to find the name of the Vagrant environment?

execute the following command:
vagrant global-status

id       name    provider   state   directory                                   
---------------------------------------------------------------------------------
894ce03  default virtualbox running /Users/jwithanawasam/
cbb0745  master  virtualbox running /Users/jwithanawasam/
c296e12  slave01 virtualbox running /Users/jwithanawasam/
3c6e261  slave02 virtualbox running /Users/jwithanawasam/

VBoxManage: error: Failed to create the host-only adapter VBoxManage: error: VBoxNetAdpCtl: Error while adding new interface: failed to open

After configuring some network settings as given below, I got the following error. I have mentioned how I resolved it here.

master.vm.network :private_network, ip: "33.33.33.10"
Vagrant up or Vagrant reload command

Error:

There was an error while executing `VBoxManage`, a CLI used by Vagrant
for controlling VirtualBox. The command and stderr is shown below.

Command: ["hostonlyif", "create"]

Stderr: 0%...
Progress state: NS_ERROR_FAILURE
VBoxManage: error: Failed to create the host-only adapter
VBoxManage: error: VBoxNetAdpCtl: Error while adding new interface: failed to open /dev/vboxnetctl: No such file or directory

VBoxManage: error: Details: code NS_ERROR_FAILURE (0x80004005), component HostNetworkInterface, interface IHostNetworkInterface
VBoxManage: error: Context: "int handleCreate(HandlerArg*, int, int*)" at line 68 of file VBoxManageHostonly.cpp

Solution:

  1. First power off all the VMs running in virtual box
  2. Then run the following command: (For Mac)
sudo /Library/StartupItems/VirtualBox/VirtualBox restart
  1. Then start required VMs