Skip to content
This repository has been archived by the owner on Dec 29, 2022. It is now read-only.

Isuue of CUDA_ERROR_OUT_OF_MEMORY while running the toy example #270

Open
Sandhya2207 opened this issue Jul 5, 2017 · 4 comments
Open

Comments

@Sandhya2207
Copy link

Hi,
while testing the seq2seq model on toy data set, I am getting the following error--

E tensorflow/core/common_runtime/direct_session.cc:137] Internal: failed initializing StreamExecutor for CUDA device ordinal 0: Internal: failed call to cuDevicePrimaryCtxRetain: CUDA_ERROR_OUT_OF_MEMORY; total memory reported: 18446744073709551615
Traceback (most recent call last):
File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
"main", fname, loader, pkg_name)
File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/home/development/sandhya/installs/tf-seq2seq-google/seq2seq-master/bin/train.py", line 277, in
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 44, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "/home/development/sandhya/installs/tf-seq2seq-google/seq2seq-master/bin/train.py", line 272, in main
schedule=FLAGS.schedule)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/learn_runner.py", line 106, in run
return task()
File "seq2seq/contrib/experiment.py", line 104, in continuous_train_and_eval
monitors=self._train_monitors)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/deprecation.py", line 280, in new_func
return func(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 426, in fit
loss = self._train_model(input_fn=input_fn, hooks=hooks)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 981, in _train_model
config=self.config.tf_config) as mon_sess:
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 315, in MonitoredTrainingSession
return MonitoredSession(session_creator=session_creator, hooks=all_hooks)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 601, in init
session_creator, hooks, should_recover=True)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 434, in init
self._sess = _RecoverableSession(self._coordinated_creator)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 767, in init
_WrappedSession.init(self, self._create_session())
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 772, in _create_session
return self._sess_creator.create_session()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 494, in create_session
self.tf_sess = self._session_creator.create_session()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 375, in create_session
init_fn=self._scaffold.init_fn)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/session_manager.py", line 256, in prepare_session
config=config)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/session_manager.py", line 161, in _restore_checkpoint
sess = session.Session(self._target, graph=self._graph, config=config)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1176, in init
super(Session, self).init(target, graph, config=config)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 552, in init
self._session = tf_session.TF_NewDeprecatedSession(opts, status)
File "/usr/lib/python2.7/contextlib.py", line 24, in exit
self.gen.next()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InternalError: Failed to create session.

Have tried the earlier suggested fix of resetting the flag " gpu_allow_growth" to True. Kindly suggest.

Thanks

@yanghoonkim
Copy link

Which gpu are you using?
You just run the same code provided by tutorial?

@Sandhya2207
Copy link
Author

The gpu is GTX-Titan .

@chiphuyen
Copy link

I have similar problem when I have multiple processes running and they hog up memory. Killing other processes fixes the problem.

@joshisagar92
Copy link

Thanks @chiphuyen for intuition.
I encountered same error with tf estimator and fixed it by closing other jupyter notebook instances running on same device.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants