Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What's Wrong ? #103

Closed
1 task
quasiben opened this issue Aug 5, 2019 · 16 comments
Closed
1 task

What's Wrong ? #103

quasiben opened this issue Aug 5, 2019 · 16 comments

Comments

@quasiben
Copy link
Member

quasiben commented Aug 5, 2019

https://tutorial.dask.org/ is now live:

Let's use this issue to list what needs to be fixed/changed before announcing.

cc @mrocklin @TomAugspurger @martindurant

@mrocklin
Copy link
Member

mrocklin commented Aug 5, 2019

The binder link in the individual notebook pages points to dask-examples.

See for example the Binder button at the top of this page. https://tutorial.dask.org/07_dataframe_storage.html

@mrocklin
Copy link
Member

mrocklin commented Aug 5, 2019

https://tutorial.dask.org/ is now live:

Also, hooray! This is great to see.

@quasiben
Copy link
Member Author

quasiben commented Aug 5, 2019

Fixed in #104

The binder link in the individual notebook pages points to dask-examples.

See for example the Binder button at the top of this page. https://tutorial.dask.org/07_dataframe_storage.html

@jrbourbeau
Copy link
Member

xref #105

@mrocklin
Copy link
Member

mrocklin commented Aug 5, 2019 via email

@mrocklin
Copy link
Member

mrocklin commented Aug 5, 2019

@martindurant

b = db.read_text('s3://dask-data/nyc-taxi/2015/yellow_tripdata_2015-01.csv')
b.take(1)

Traceback

---------------------------------------------------------------------------
NoCredentialsError                        Traceback (most recent call last)
<ipython-input-6-f250b1423dc0> in <module>
      2 # each partition is a remote CSV text file
      3 b = db.read_text('s3://dask-data/nyc-taxi/2015/yellow_tripdata_2015-01.csv')
----> 4 b.take(1)

/srv/conda/envs/notebook/lib/python3.7/site-packages/dask/bag/core.py in take(self, k, npartitions, compute, warn)
   1222 
   1223         if compute:
-> 1224             return tuple(b.compute())
   1225         else:
   1226             return b

/srv/conda/envs/notebook/lib/python3.7/site-packages/dask/base.py in compute(self, **kwargs)
    154         dask.base.compute
    155         """
--> 156         (result,) = compute(self, traverse=False, **kwargs)
    157         return result
    158 

/srv/conda/envs/notebook/lib/python3.7/site-packages/dask/base.py in compute(*args, **kwargs)
    396     keys = [x.__dask_keys__() for x in collections]
    397     postcomputes = [x.__dask_postcompute__() for x in collections]
--> 398     results = schedule(dsk, keys, **kwargs)
    399     return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])
    400 

/srv/conda/envs/notebook/lib/python3.7/site-packages/dask/multiprocessing.py in get(dsk, keys, num_workers, func_loads, func_dumps, optimize_graph, pool, **kwargs)
    190                            get_id=_process_get_id, dumps=dumps, loads=loads,
    191                            pack_exception=pack_exception,
--> 192                            raise_exception=reraise, **kwargs)
    193     finally:
    194         if cleanup:

/srv/conda/envs/notebook/lib/python3.7/site-packages/dask/local.py in get_async(apply_async, num_workers, dsk, result, cache, get_id, rerun_exceptions_locally, pack_exception, raise_exception, callbacks, dumps, loads, **kwargs)
    460                         _execute_task(task, data)  # Re-execute locally
    461                     else:
--> 462                         raise_exception(exc, tb)
    463                 res, worker_id = loads(res_info)
    464                 state['cache'][key] = res

/srv/conda/envs/notebook/lib/python3.7/site-packages/dask/compatibility.py in reraise(exc, tb)
    109     def reraise(exc, tb=None):
    110         if exc.__traceback__ is not tb:
--> 111             raise exc.with_traceback(tb)
    112         raise exc
    113 

/srv/conda/envs/notebook/lib/python3.7/site-packages/dask/local.py in execute_task()
    228     try:
    229         task, data = loads(task_info)
--> 230         result = _execute_task(task, data)
    231         id = get_id()
    232         result = dumps((result, id))

/srv/conda/envs/notebook/lib/python3.7/site-packages/dask/core.py in _execute_task()
    117         func, args = arg[0], arg[1:]
    118         args2 = [_execute_task(a, cache) for a in args]
--> 119         return func(*args2)
    120     elif not ishashable(arg):
    121         return arg

/srv/conda/envs/notebook/lib/python3.7/site-packages/dask/bag/core.py in safe_take()
   2159 
   2160 def safe_take(n, b, warn=True):
-> 2161     r = list(take(n, b))
   2162     if len(r) != n and warn:
   2163         warnings.warn("Insufficient elements for `take`. {0} elements "

/srv/conda/envs/notebook/lib/python3.7/site-packages/dask/bag/text.py in file_to_blocks()
    102 
    103 def file_to_blocks(lazy_file):
--> 104     with lazy_file as f:
    105         for line in f:
    106             yield line

/srv/conda/envs/notebook/lib/python3.7/site-packages/dask/bytes/core.py in __enter__()
    181         mode = self.mode.replace('t', '').replace('b', '') + 'b'
    182 
--> 183         f = SeekableFile(self.fs.open(self.path, mode=mode))
    184 
    185         fobjects = [f]

/srv/conda/envs/notebook/lib/python3.7/site-packages/s3fs/core.py in open()
    348         fdesc = S3File(self, path, mode2, block_size=block_size, acl=acl,
    349                        version_id=version_id, fill_cache=fill_cache,
--> 350                        s3_additional_kwargs=kw)
    351         if 'b' in mode:
    352             return fdesc

/srv/conda/envs/notebook/lib/python3.7/site-packages/s3fs/core.py in __init__()
   1204         else:
   1205             try:
-> 1206                 info = self.info()
   1207                 self.size = info['Size']
   1208                 if self.s3.version_aware:

/srv/conda/envs/notebook/lib/python3.7/site-packages/s3fs/core.py in info()
   1222         refresh = self.s3.version_aware
   1223         return self.s3.info(self.path, version_id=self.version_id,
-> 1224                             refresh=refresh, **kwargs)
   1225 
   1226     def metadata(self, refresh=False, **kwargs):

/srv/conda/envs/notebook/lib/python3.7/site-packages/s3fs/core.py in info()
    522             bucket, key = split_path(path)
    523             out = self._call_s3(self.s3.head_object, kwargs, Bucket=bucket,
--> 524                                 Key=key, **self.req_kw)
    525             return {
    526                 'ETag': out['ETag'],

/srv/conda/envs/notebook/lib/python3.7/site-packages/s3fs/core.py in _call_s3()
    193         additional_kwargs = self._get_s3_method_kwargs(method, *akwarglist,
    194                                                        **kwargs)
--> 195         return method(**additional_kwargs)
    196 
    197     def _get_s3_method_kwargs(self, method, *akwarglist, **kwargs):

/srv/conda/envs/notebook/lib/python3.7/site-packages/botocore/client.py in _api_call()
    355                     "%s() only accepts keyword arguments." % py_operation_name)
    356             # The "self" in this scope is referring to the BaseClient.
--> 357             return self._make_api_call(operation_name, kwargs)
    358 
    359         _api_call.__name__ = str(py_operation_name)

/srv/conda/envs/notebook/lib/python3.7/site-packages/botocore/client.py in _make_api_call()
    646         else:
    647             http, parsed_response = self._make_request(
--> 648                 operation_model, request_dict, request_context)
    649 
    650         self.meta.events.emit(

/srv/conda/envs/notebook/lib/python3.7/site-packages/botocore/client.py in _make_request()
    665     def _make_request(self, operation_model, request_dict, request_context):
    666         try:
--> 667             return self._endpoint.make_request(operation_model, request_dict)
    668         except Exception as e:
    669             self.meta.events.emit(

/srv/conda/envs/notebook/lib/python3.7/site-packages/botocore/endpoint.py in make_request()
    100         logger.debug("Making request for %s with params: %s",
    101                      operation_model, request_dict)
--> 102         return self._send_request(request_dict, operation_model)
    103 
    104     def create_request(self, params, operation_model=None):

/srv/conda/envs/notebook/lib/python3.7/site-packages/botocore/endpoint.py in _send_request()
    130     def _send_request(self, request_dict, operation_model):
    131         attempts = 1
--> 132         request = self.create_request(request_dict, operation_model)
    133         context = request_dict['context']
    134         success_response, exception = self._get_response(

/srv/conda/envs/notebook/lib/python3.7/site-packages/botocore/endpoint.py in create_request()
    114                 op_name=operation_model.name)
    115             self._event_emitter.emit(event_name, request=request,
--> 116                                      operation_name=operation_model.name)
    117         prepared_request = self.prepare_request(request)
    118         return prepared_request

/srv/conda/envs/notebook/lib/python3.7/site-packages/botocore/hooks.py in emit()
    354     def emit(self, event_name, **kwargs):
    355         aliased_event_name = self._alias_event_name(event_name)
--> 356         return self._emitter.emit(aliased_event_name, **kwargs)
    357 
    358     def emit_until_response(self, event_name, **kwargs):

/srv/conda/envs/notebook/lib/python3.7/site-packages/botocore/hooks.py in emit()
    226                  handlers.
    227         """
--> 228         return self._emit(event_name, kwargs)
    229 
    230     def emit_until_response(self, event_name, **kwargs):

/srv/conda/envs/notebook/lib/python3.7/site-packages/botocore/hooks.py in _emit()
    209         for handler in handlers_to_call:
    210             logger.debug('Event %s: calling handler %s', event_name, handler)
--> 211             response = handler(**kwargs)
    212             responses.append((handler, response))
    213             if stop_on_response and response is not None:

/srv/conda/envs/notebook/lib/python3.7/site-packages/botocore/signers.py in handler()
     88         # this method is invoked to sign the request.
     89         # Don't call this method directly.
---> 90         return self.sign(operation_name, request)
     91 
     92     def sign(self, operation_name, request, region_name=None,

/srv/conda/envs/notebook/lib/python3.7/site-packages/botocore/signers.py in sign()
    155                     raise e
    156 
--> 157             auth.add_auth(request)
    158 
    159     def _choose_signer(self, operation_name, signing_type, context):

/srv/conda/envs/notebook/lib/python3.7/site-packages/botocore/auth.py in add_auth()
    423         self._region_name = signing_context.get(
    424             'region', self._default_region_name)
--> 425         super(S3SigV4Auth, self).add_auth(request)
    426 
    427     def _modify_request_before_signing(self, request):

/srv/conda/envs/notebook/lib/python3.7/site-packages/botocore/auth.py in add_auth()
    355     def add_auth(self, request):
    356         if self.credentials is None:
--> 357             raise NoCredentialsError
    358         datetime_now = datetime.datetime.utcnow()
    359         request.context['timestamp'] = datetime_now.strftime(SIGV4_TIMESTAMP)

NoCredentialsError: Unable to locate credentials

@jrbourbeau
Copy link
Member

From the dask bag tests, looks like we may need to specify storage_options={'anon': True}:

In [1]: import dask.bag as db

In [2]: b = db.read_text('s3://dask-data/nyc-taxi/2015/yellow_tripdata_2015-01.csv',
   ...:                  storage_options={'anon': True})
   ...: b.take(1)
Out[2]: ('VendorID,tpep_pickup_datetime,tpep_dropoff_datetime,passenger_count,trip_distance,pickup_longitude,pickup_latitude,RateCodeID,store_and_fwd_flag,dropoff_longitude,dropoff_latitude,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount\n',)

@quasiben
Copy link
Member Author

quasiben commented Aug 5, 2019

@mrocklin we do call prep in the docker setup:

RUN cd dask-tutorial && conda env update -f binder/environment.yml && python prep.py && cd ..

@mrocklin
Copy link
Member

mrocklin commented Aug 5, 2019 via email

@quasiben
Copy link
Member Author

quasiben commented Aug 5, 2019

Looking now as well

@martindurant
Copy link
Member

storage_options={'anon': True}

agree. I don't think we ever had the ability to guess that we should be anonymous, but maybe this was run at some point on ec2, where you have some default ID.

@quasiben
Copy link
Member Author

quasiben commented Aug 5, 2019

@mrocklin I tried three times running python prep.py and twice the sessions were killed:

jovyan@jupyter-dask-2ddask-2dtutorial-2db28bbbkv:~$ python prep.py
Create random data for array exercise
Exploding weather data
Killed

Not sure why. Looking into it -- I suspect this is what happened when mybinder tried to build the image

@quasiben
Copy link
Member Author

quasiben commented Aug 5, 2019

Also, if folks are planning on running tutorials with mybinder directly they should note that the default maximum is 100 concurrent sessions:
https://mybinder.readthedocs.io/en/latest/user-guidelines.html#maximum-concurrent-users-for-a-repository

I believe we can ask for more

@quasiben
Copy link
Member Author

quasiben commented Aug 6, 2019

@mrocklin when you have a moment can you try again ? prep data downloading now occurs just after the session starts

@TomAugspurger
Copy link
Member

Let's open new issues as needed.

@jcrist
Copy link
Member

jcrist commented May 13, 2020

I just went through the deployed version to prep for a tutorial tomorrow (haven't looked at our tutorial in > 1 year). Everything works great, this is really well put together, and the binder setup works slick. Thanks y'all!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants