Skip to content
This repository has been archived by the owner on May 21, 2018. It is now read-only.

Framework startup issues #254

Open
hansman opened this issue Apr 13, 2016 · 0 comments
Open

Framework startup issues #254

hansman opened this issue Apr 13, 2016 · 0 comments

Comments

@hansman
Copy link

hansman commented Apr 13, 2016

We have been trying to bring up the mesos hdfs framework on a 8 machine (4cpus each, 8GB ram each) mesos cluster. There are some problems starting it up that appear hard to debug.

Our config
config/hdfs-site.xml and config/mesos-site.xml as defaulted in the repository. We override the values with the following values:

export JAVA_HOME=/usr/lib/jvm/{ourJDK}/jre
export MESOS_HDFS_STATE_ZK=app-zk1-groot.service.local:2182,app-zk2-groot.service.local:2182,app-zk3-groot.service.local:2182
export MESOS_MASTER_URI=zookeeper.service.local:2181/mesos
export MESOS_HDFS_ZKFC_HA_ZOOKEEPER_QUORUM=app-zk1-groot.service.local:2182,app-zk2-groot.service.local:2182,app-zk3-groot.service.local:2182
export MESOS_HDFS_JVM_OVERHEAD=0.4
export MESOS_HDFS_NAMENODE_HEAP_SIZE=512
export MESOS_HDFS_EXECUTOR_CPUS=0.7
export MESOS_HDFS_NAMENODE_CPUS=0.7
export MESOS_HDFS_JOURNALNODE_CPUS=0.7
export MESOS_HDFS_DATANODE_CPUS=0.7
export MESOS_HDFS_JVM_OVERHEAD=0.4
export MESOS_HDFS_HADOOP_HEAP_SIZE=256
export MESOS_HDFS_EXECUTOR_HEAP_SIZE=256
export MESOS_HDFS_DATANODE_HEAP_SIZE=256

mesosdns is not enabled

How we launch

Not as a marathon task. Not dockerized.
just sh bin/hdfs-mesos

Blocking problems

  1. Could not download hdfs-mesos-executor-0.1.6.tgz
    Had to set mesos.hdfs.framework.hostaddress to the framework scheduler explicitly
    journalnode were being launched but pointed to localhost as 'config server'.

  2. No datanode, no zkfc tasks launched successfully

When it tries to launch datanode or zkfc following error:

FATAL ha.ZKFailoverController (ZKFailoverController.java:doRun(213)) - Unable to start failover controller. Parent znode does not exist.
Run with -formatZK flag to initialize ZooKeeper

  1. On a separate environment (same parameters) the framework does not startup due to a socket error

016-04-13 01:46:43,426:8395(0x7f60c6ff5700):ZOO_ERROR@handle_socket_error_msg@1721: Socket [192.168.0.104:2182] zk retcode=-4, errno=112(Host is down): failed while receiving a server response

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant