Skip to content
This repository was archived by the owner on May 21, 2018. It is now read-only.

Do I need to install hdfs to run my Spark jobs #250

Open
yogeshnath opened this issue Feb 5, 2016 · 5 comments
Open

Do I need to install hdfs to run my Spark jobs #250

yogeshnath opened this issue Feb 5, 2016 · 5 comments

Comments

@yogeshnath
Copy link

I installed Mesosphere in HA mode and installed Spark using dcos. Run my job but got the following error. Do I need to install hdfs as well?

"Exception in thread "main" java.lang.IllegalArgumentException: java.net.UnknownHostException: namenode1.hdfs.mesos"

Mesos-DNS would not resolve this. I could resolve master.mesos, slave.mesos.


spark.driver.extraJavaOptions=-Dspark.mesos.executor.docker.image=mesosphere/spark:1.6.0
spark.driver.memory=1024M
spark.executor.extraJavaOptions=-Dspark.mesos.executor.docker.image=mesosphere/spark:1.6.0
spark.jars=file:/mnt/mesos/sandbox/tmo.d3.automation-6.jar
spark.logConf=true
spark.master=mesos://zk://master.mesos:2181/mesos
spark.mesos.executor.docker.image=mesosphere/spark:1.6.0
spark.submit.deployMode=client
16/02/05 23:40:13 INFO SecurityManager: Changing view acls to: root
16/02/05 23:40:13 INFO SecurityManager: Changing modify acls to: root
16/02/05 23:40:13 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
16/02/05 23:40:13 INFO Utils: Successfully started service 'sparkDriver' on port 39550.
16/02/05 23:40:13 INFO Slf4jLogger: Slf4jLogger started
16/02/05 23:40:13 INFO Remoting: Starting remoting
16/02/05 23:40:14 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://[email protected]:44087]
16/02/05 23:40:14 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 44087.
16/02/05 23:40:14 INFO SparkEnv: Registering MapOutputTracker
16/02/05 23:40:14 INFO SparkEnv: Registering BlockManagerMaster
16/02/05 23:40:14 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-4ca3392b-4069-4f9a-872b-eeb1b115dd9a
16/02/05 23:40:14 INFO MemoryStore: MemoryStore started with capacity 511.1 MB
16/02/05 23:40:14 INFO SparkEnv: Registering OutputCommitCoordinator
16/02/05 23:40:14 INFO Utils: Successfully started service 'SparkUI' on port 4040.
16/02/05 23:40:14 INFO SparkUI: Started SparkUI at http://10.0.2.1:4040
16/02/05 23:40:14 INFO HttpFileServer: HTTP File server directory is /tmp/spark-a244d6fc-7f86-4218-9d76-06217ef29e97/httpd-cd2b0d83-fa6f-4680-9160-1c318bb2f68a
16/02/05 23:40:14 INFO HttpServer: Starting HTTP Server
16/02/05 23:40:14 INFO Utils: Successfully started service 'HTTP file server' on port 39342.
16/02/05 23:40:14 INFO SparkContext: Added JAR file:/mnt/mesos/sandbox/tmo.d3.automation-6.jar at http://10.0.2.1:39342/jars/tmo.d3.automation-6.jar with timestamp 1454715614551
2016-02-05 23:40:14,719:6(0x7f5693bbd700):ZOO_INFO@log_env@712: Client environment:zookeeper.version=zookeeper C client 3.4.5
2016-02-05 23:40:14,719:6(0x7f5693bbd700):ZOO_INFO@log_env@716: Client environment:host.name=ip-10-0-2-1.us-west-1.compute.internal
2016-02-05 23:40:14,719:6(0x7f5693bbd700):ZOO_INFO@log_env@723: Client environment:os.name=Linux
2016-02-05 23:40:14,719:6(0x7f5693bbd700):ZOO_INFO@log_env@724: Client environment:os.arch=4.2.2-coreos-r1
2016-02-05 23:40:14,719:6(0x7f5693bbd700):ZOO_INFO@log_env@725: Client environment:os.version=#2 SMP Tue Dec 1 01:59:59 UTC 2015
2016-02-05 23:40:14,720:6(0x7f5693bbd700):ZOO_INFO@log_env@733: Client environment:user.name=(null)
2016-02-05 23:40:14,720:6(0x7f5693bbd700):ZOO_INFO@log_env@741: Client environment:user.home=/root
2016-02-05 23:40:14,720:6(0x7f5693bbd700):ZOO_INFO@log_env@753: Client environment:user.dir=/opt/spark/dist
2016-02-05 23:40:14,720:6(0x7f5693bbd700):ZOO_INFO@zookeeper_init@786: Initiating client connection, host=master.mesos:2181 sessionTimeout=10000 watcher=0x7f569b4ad600 sessionId=0 sessionPasswd= context=0x7f56c80012c0 flags=0
I0205 23:40:14.720655 95 sched.cpp:164] Version: 0.25.0
2016-02-05 23:40:14,728:6(0x7f56922b9700):ZOO_INFO@check_events@1703: initiated connection to server [10.0.4.122:2181]
2016-02-05 23:40:14,734:6(0x7f56922b9700):ZOO_INFO@check_events@1750: session establishment complete on server [10.0.4.122:2181], sessionId=0x152b2fcc05e0007, negotiated timeout=10000
I0205 23:40:14.735280 89 group.cpp:331] Group process (group(1)@10.0.2.1:43793) connected to ZooKeeper
I0205 23:40:14.735371 89 group.cpp:805] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0)
I0205 23:40:14.735466 89 group.cpp:403] Trying to create path '/mesos' in ZooKeeper
I0205 23:40:14.744324 89 detector.cpp:156] Detected a new leader: (id='1')
I0205 23:40:14.744611 89 group.cpp:674] Trying to get '/mesos/json.info_0000000001' in ZooKeeper
I0205 23:40:14.748836 89 detector.cpp:481] A new leading master ([email protected]:5050) is detected
I0205 23:40:14.749011 89 sched.cpp:262] New master detected at [email protected]:5050
I0205 23:40:14.749320 89 sched.cpp:272] No credentials provided. Attempting to register without authentication
I0205 23:40:14.756449 89 sched.cpp:641] Framework registered with 20385f5d-b460-4335-913f-aa02816b7963-0003
16/02/05 23:40:14 INFO CoarseMesosSchedulerBackend: Registered as framework ID 20385f5d-b460-4335-913f-aa02816b7963-0003
16/02/05 23:40:14 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 44576.
16/02/05 23:40:14 INFO NettyBlockTransferService: Server created on 44576
16/02/05 23:40:14 INFO BlockManagerMaster: Trying to register BlockManager
16/02/05 23:40:14 INFO BlockManagerMasterEndpoint: Registering block manager 10.0.2.1:44576 with 511.1 MB RAM, BlockManagerId(driver, 10.0.2.1, 44576)
16/02/05 23:40:14 INFO BlockManagerMaster: Registered BlockManager
16/02/05 23:40:15 INFO CoarseMesosSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
16/02/05 23:40:15 INFO CoarseMesosSchedulerBackend: Mesos task 1 is now TASK_RUNNING
16/02/05 23:40:15 INFO CoarseMesosSchedulerBackend: Mesos task 3 is now TASK_RUNNING
16/02/05 23:40:15 INFO CoarseMesosSchedulerBackend: Mesos task 0 is now TASK_RUNNING
16/02/05 23:40:15 INFO CoarseMesosSchedulerBackend: Mesos task 4 is now TASK_RUNNING
16/02/05 23:40:15 INFO CoarseMesosSchedulerBackend: Mesos task 2 is now TASK_RUNNING
16/02/05 23:40:17 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 117.2 KB, free 117.2 KB)
16/02/05 23:40:17 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 12.6 KB, free 129.8 KB)
16/02/05 23:40:17 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 10.0.2.1:44576 (size: 12.6 KB, free: 511.1 MB)
16/02/05 23:40:17 INFO SparkContext: Created broadcast 0 from textFile at FenixRowCountstest.scala:51
16/02/05 23:40:17 WARN DFSUtil: Namenode for hdfs remains unresolved for ID nn1. Check your hdfs-site.xml file to ensure namenodes are configured properly.
16/02/05 23:40:17 WARN DFSUtil: Namenode for hdfs remains unresolved for ID nn2. Check your hdfs-site.xml file to ensure namenodes are configured properly.
16/02/05 23:40:18 INFO SparkContext: Starting job: foreach at FenixRowCountstest.scala:126
16/02/05 23:40:18 INFO DAGScheduler: Got job 0 (foreach at FenixRowCountstest.scala:126) with 2 output partitions
16/02/05 23:40:18 INFO DAGScheduler: Final stage: ResultStage 0 (foreach at FenixRowCountstest.scala:126)
16/02/05 23:40:18 INFO DAGScheduler: Parents of final stage: List()
16/02/05 23:40:18 INFO DAGScheduler: Missing parents: List()
16/02/05 23:40:18 INFO CoarseMesosSchedulerBackend: Registered executor NettyRpcEndpointRef(null) (ip-10-0-2-79.us-west-1.compute.internal:39924) with ID 20385f5d-b460-4335-913f-aa02816b7963-S1
16/02/05 23:40:18 INFO CoarseMesosSchedulerBackend: Registered executor NettyRpcEndpointRef(null) (ip-10-0-3-40.us-west-1.compute.internal:60136) with ID 20385f5d-b460-4335-913f-aa02816b7963-S3
16/02/05 23:40:18 INFO DAGScheduler: Submitting ResultStage 0 (ParallelCollectionRDD[2] at parallelize at FenixRowCountstest.scala:125), which has no missing parents
16/02/05 23:40:18 INFO BlockManagerMasterEndpoint: Registering block manager ip-10-0-2-79.us-west-1.compute.internal:44431 with 511.1 MB RAM, BlockManagerId(20385f5d-b460-4335-913f-aa02816b7963-S1, ip-10-0-2-79.us-west-1.compute.internal, 44431)
16/02/05 23:40:18 INFO CoarseMesosSchedulerBackend: Registered executor NettyRpcEndpointRef(null) (ip-10-0-2-163.us-west-1.compute.internal:51562) with ID 20385f5d-b460-4335-913f-aa02816b7963-S5
16/02/05 23:40:18 INFO BlockManagerMasterEndpoint: Registering block manager ip-10-0-3-40.us-west-1.compute.internal:45844 with 511.1 MB RAM, BlockManagerId(20385f5d-b460-4335-913f-aa02816b7963-S3, ip-10-0-3-40.us-west-1.compute.internal, 45844)
16/02/05 23:40:18 INFO CoarseMesosSchedulerBackend: Registered executor NettyRpcEndpointRef(null) (ip-10-0-0-164.us-west-1.compute.internal:55604) with ID 20385f5d-b460-4335-913f-aa02816b7963-S4
16/02/05 23:40:18 INFO BlockManagerMasterEndpoint: Registering block manager ip-10-0-2-163.us-west-1.compute.internal:45068 with 511.1 MB RAM, BlockManagerId(20385f5d-b460-4335-913f-aa02816b7963-S5, ip-10-0-2-163.us-west-1.compute.internal, 45068)
16/02/05 23:40:18 INFO BlockManagerMasterEndpoint: Registering block manager ip-10-0-0-164.us-west-1.compute.internal:42653 with 511.1 MB RAM, BlockManagerId(20385f5d-b460-4335-913f-aa02816b7963-S4, ip-10-0-0-164.us-west-1.compute.internal, 42653)
16/02/05 23:40:18 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 1160.0 B, free 130.9 KB)
16/02/05 23:40:18 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 824.0 B, free 131.7 KB)
16/02/05 23:40:18 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 10.0.2.1:44576 (size: 824.0 B, free: 511.1 MB)
16/02/05 23:40:18 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1006
16/02/05 23:40:18 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (ParallelCollectionRDD[2] at parallelize at FenixRowCountstest.scala:125)
16/02/05 23:40:18 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
16/02/05 23:40:18 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, ip-10-0-0-164.us-west-1.compute.internal, partition 0,PROCESS_LOCAL, 2283 bytes)
16/02/05 23:40:18 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, ip-10-0-2-79.us-west-1.compute.internal, partition 1,PROCESS_LOCAL, 2389 bytes)
16/02/05 23:40:18 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on ip-10-0-0-164.us-west-1.compute.internal:42653 (size: 824.0 B, free: 511.1 MB)
16/02/05 23:40:18 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on ip-10-0-2-79.us-west-1.compute.internal:44431 (size: 824.0 B, free: 511.1 MB)
16/02/05 23:40:19 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 826 ms on ip-10-0-0-164.us-west-1.compute.internal (1/2)
16/02/05 23:40:19 INFO DAGScheduler: ResultStage 0 (foreach at FenixRowCountstest.scala:126) finished in 0.843 s
16/02/05 23:40:19 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 765 ms on ip-10-0-2-79.us-west-1.compute.internal (2/2)
16/02/05 23:40:19 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
16/02/05 23:40:19 INFO DAGScheduler: Job 0 finished: foreach at FenixRowCountstest.scala:126, took 1.166327 s
16/02/05 23:40:19 WARN DFSUtil: Namenode for hdfs remains unresolved for ID nn1. Check your hdfs-site.xml file to ensure namenodes are configured properly.
16/02/05 23:40:19 WARN DFSUtil: Namenode for hdfs remains unresolved for ID nn2. Check your hdfs-site.xml file to ensure namenodes are configured properly.
Exception in thread "main" java.lang.IllegalArgumentException: java.net.UnknownHostException: namenode1.hdfs.mesos
at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:377)
at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:240)
at org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider.getProxy(ConfiguredFailoverProxyProvider.java:124)
at org.apache.hadoop.io.retry.RetryInvocationHandler.(RetryInvocationHandler.java:74)
at org.apache.hadoop.io.retry.RetryInvocationHandler.(RetryInvocationHandler.java:65)
at org.apache.hadoop.io.retry.RetryProxy.create(RetryProxy.java:58)
at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:152)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:579)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:524)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:146)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2397)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2431)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2413)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:167)
at org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:653)
at org.apache.hadoop.mapred.FileOutputFormat.setOutputPath(FileOutputFormat.java:146)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply$mcV$sp(PairRDDFunctions.scala:1058)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply(PairRDDFunctions.scala:1026)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply(PairRDDFunctions.scala:1026)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:1026)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$1.apply$mcV$sp(PairRDDFunctions.scala:952)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$1.apply(PairRDDFunctions.scala:952)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$1.apply(PairRDDFunctions.scala:952)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:951)
at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1.apply$mcV$sp(RDD.scala:1443)
at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1.apply(RDD.scala:1422)
at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1.apply(RDD.scala:1422)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
at org.apache.spark.rdd.RDD.saveAsTextFile(RDD.scala:1422)
at FenixRowCountstest$.writeLogger(FenixRowCountstest.scala:127)
at FenixRowCountstest$.main(FenixRowCountstest.scala:72)
at FenixRowCountstest.main(FenixRowCountstest.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.net.UnknownHostException: namenode1.hdfs.mesos
... 51 more
16/02/05 23:40:19 INFO SparkContext: Invoking stop() from shutdown hook
16/02/05 23:40:19 INFO SparkUI: Stopped Spark web UI at http://10.0.2.1:4040
16/02/05 23:40:19 INFO CoarseMesosSchedulerBackend: Shutting down all executors
16/02/05 23:40:19 INFO CoarseMesosSchedulerBackend: Asking each executor to shut down
I0205 23:40:19.559413 108 sched.cpp:1771] Asked to stop the driver
I0205 23:40:19.559767 93 sched.cpp:1040] Stopping framework '20385f5d-b460-4335-913f-aa02816b7963-0003'
16/02/05 23:40:19 INFO CoarseMesosSchedulerBackend: driver.run() returned with code DRIVER_STOPPED
16/02/05 23:40:19 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
16/02/05 23:40:19 INFO MemoryStore: MemoryStore cleared
16/02/05 23:40:19 INFO BlockManager: BlockManager stopped
16/02/05 23:40:19 INFO BlockManagerMaster: BlockManagerMaster stopped
16/02/05 23:40:19 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
16/02/05 23:40:19 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
16/02/05 23:40:19 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
16/02/05 23:40:19 INFO SparkContext: Successfully stopped SparkContext
16/02/05 23:40:19 INFO ShutdownHookManager: Shutdown hook called
16/02/05 23:40:19 INFO ShutdownHookManager: Deleting directory /tmp/spark-a244d6fc-7f86-4218-9d76-06217ef29e97/httpd-cd2b0d83-fa6f-4680-9160-1c318bb2f68a
16/02/05 23:40:19 INFO ShutdownHookManager: Deleting directory /tmp/spark-a244d6fc-7f86-4218-9d76-06217ef29e97

@radek1st
Copy link

I have a similar issue on a fresh install of single master DCOS and Spark:

...
16/05/16 17:52:37 WARN DFSUtil: Namenode for hdfs remains unresolved for ID nn1. Check your hdfs-site.xml file to ensure namenodes are configured properly.
16/05/16 17:52:37 WARN DFSUtil: Namenode for hdfs remains unresolved for ID nn2. Check your hdfs-site.xml file to ensure namenodes are configured properly.
Exception in thread "main" java.lang.IllegalArgumentException: java.net.UnknownHostException: namenode1.hdfs.mesos
at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:377)
at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:240)
at org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider.getProxy(ConfiguredFailoverProxyProvider.java:124)
at org.apache.hadoop.io.retry.RetryInvocationHandler.(RetryInvocationHandler.java:74)
at org.apache.hadoop.io.retry.RetryInvocationHandler.(RetryInvocationHandler.java:65)
at org.apache.hadoop.io.retry.RetryProxy.create(RetryProxy.java:58)
...

@jstokes
Copy link

jstokes commented May 25, 2016

Just wanted to add - I get something similar submitting tasks through either Zeppelin or the dcos spark run ... cli

java.lang.IllegalArgumentException: java.net.UnknownHostException: hdfs
    at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:377)
    at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:240)
    at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:144)
    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:579)
    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:524)
    at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:146)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2397)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89)
    at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2431)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2413)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:167)
    at org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:653)
    at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:427)
    at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:400)
    at org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$33.apply(SparkContext.scala:1015)
    at org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$33.apply(SparkContext.scala:1015)
    at org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176)
    at org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176)
    at scala.Option.map(Option.scala:145)
    at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:176)
    at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:212)
    at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:208)
    at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
    at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:70)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
    at org.apache.spark.scheduler.Task.run(Task.scala:89)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.UnknownHostException: hdfs
    ... 44 more

When running through Zeppelin, this will cause the first task submitted to fail, but subsequent tasks complete successfully (although the error is still logged). Replicated on a clean cluster built from AWS Cloud Formation templates.

@thbeh
Copy link

thbeh commented Jul 21, 2016

@jstokes did you manage to solve your issue with dcos spark run cli?

@ktaube
Copy link

ktaube commented Aug 2, 2016

I managed to solve this by adding spark.mesos.uri property to --submit-args with links to my own hdfs-site.xml and core-site.xml.

At least mesosphere/spark:1.0.1-1.6.1-2 docker image contains spark-env.sh that copies those config files from MESOS_SANDBOX to to HADOOP_CONF_DIR

./conf/spark-env.sh:11:[ -f "${MESOS_SANDBOX}/hdfs-site.xml" ] && cp "${MESOS_SANDBOX}/hdfs-site.xml" "${HADOOP_CONF_DIR}"
./conf/spark-env.sh:12:[ -f "${MESOS_SANDBOX}/core-site.xml" ] && cp "${MESOS_SANDBOX}/core-site.xml" "${HADOOP_CONF_DIR}"

@jstokes
Copy link

jstokes commented Aug 8, 2016

@thbeh No - I was away from DCOS for a little while. I stood up a new cluster but was immediately hit with the same exceptions in d2iq-archive/universe#613.

Is there a workaround for this?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants