Append along time axis with write_nc #33

MBlaschek · 2014-10-23T08:35:31Z

Hi great module!

I'm trying to write out a GeoArray in a loop increasing the time coordinate every step.
Dimensions are [1 x 3 x 360 x 720] = time x lev x lat x lon

I write this array to netcdf and then try to append to it
I had a look at the indexing, but not sure if I can increase the time dimension like this to write out the next time step.
Would be great if you could share some thoughts on how to do that.

Thanks

MBlaschek · 2014-10-23T10:25:31Z

Hi.

I wrote a little bit of an extension of nc.py
Probably not the best, but it works to append along the time axis for my array. I did not test it for other shapes.

netcdf time dimension is now unlimted as well. There is no check if the time already exists or so, endless amount of possible problems can occur.

Maybe you can use it to incorporate it.
ciao

diff --git a/dimarray/io/nc.py b/dimarray/io/nc.py
index fcef873..5494480 100644
--- a/dimarray/io/nc.py
+++ b/dimarray/io/nc.py
@@ -560,7 +560,7 @@ def _write_dataset(f, obj, mode='w-', indices=None, axis=0, format=FORMAT, verbo


 @format_doc(netCDF4=_doc_write_nc, indexing=_doc_indexing_write, write_modes=_doc_write_modes)
-def _write_variable(f, obj=None, name=None, mode='a+', format=FORMAT, indices=None, axis=0, verbose=False, share_grid_mapping=False, **kwargs):
+def _write_variable(f, obj=None, name=None, mode='a+', format=FORMAT, indices=None, axis=0, verbose=False, share_grid_mapping=False, append=False, **kwargs):
     """ Write DimArray instance to file

     Parameters
@@ -578,6 +578,8 @@ def _write_variable(f, obj=None, name=None, mode='a+', format=FORMAT, indices=No
         separate variable in the dataset, accordingly to CF-conventions
         in order to share that information across several variables.
         Default is False.
+    append : bool, optional
+        if True, try to append along time axis (unlimted dimension)

     See Also
     --------
@@ -594,6 +596,7 @@ def _write_variable(f, obj=None, name=None, mode='a+', format=FORMAT, indices=No
     if name not in f.variables:
         assert isinstance(obj, DimArray), "expected a DimArray instance, got {}".format(type(obj))
         v = _createVariable(f, name, obj.axes, dtype=obj.dtype, **kwargs)
+        append = False

     else:
         v = f.variables[name]
@@ -615,7 +618,20 @@ def _write_variable(f, obj=None, name=None, mode='a+', format=FORMAT, indices=No
             raise IndexError(msg)

     # Write Variable
-    v[ix] = np.asarray(obj)
+    if append:
+        # Read dimensions
+        axes = read_dimensions(f, name)
+        # Check if time is in there
+        if 'time' in [ax.name for ax in axes]:
+            ix = len(axes['time'].values)
+            nx = ix+len(obj.axes['time'])
+            # append
+            v[ix:nx,::] = np.asarray(obj)
+            f.variables['time'][ix:nx] = obj.axes['time'].values
+        
+    else:
+        # Normal 
+        v[ix] = np.asarray(obj)

     # add metadata if any
     if not isinstance(obj, DimArray):
@@ -794,7 +810,11 @@ def _check_dimensions(f, axes, **verb):
     for ax in axes:
         dim = ax.name
         if not dim in f.dimensions:
-            f.createDimension(dim, ax.size)
+            # time dimension is unlimited > append
+            if dim == 'time':
+                f.createDimension(dim,size=None)
+            else:
+                f.createDimension(dim, ax.size)

             # strings are given "object" type in Axis object
             # ==> assume all objects are actually strings

perrette · 2014-10-29T15:01:15Z

Hi MBlaschek,

Sorry for the late answer, busy time. Good that you found a solution in your case. You are not the first person to raise this issue, and I agree this would be a nice feature, but dimarray should not prefer any particular dimension, unless absolutely needed (time could arguably be one of these special cases).

A few ideas in that direction:

when appending to a file, it would be easy to let "append" take a string value to indicate the dimension on which to append (e.g. "time", and possibly, True defaulting to "time").
at creation, if a dimension is missing, the "append" parameter would also be checked, and set to None for the relevant dimension.

I believe this would only be a minor change to what you have already implemented.

Even better would be one test case in tests/test_nc.py
(https://github.com/perrette/dimarray/blob/master/tests/test_nc.py)

You are welcome to give it a go, otherwise I will soon enough.

perrette · 2014-12-05T18:40:41Z

@MBlaschek, @vnoel Just wanted to let you know, I have changed quite a bit the way write_nc/read_nc work, which is now simply a wrapper around:

ds_disk = open_nc(...)   # returns DatasetOnDisk
ds_disk.read() or ds_disk.write()
ds_disk.close()

Where DatasetOnDisk behaves similarly to a Dataset. It has an axes attributes, as well as dims, keys and so on.

To create a new unlimited dimension you can just, do:

ds_disk = open_nc('test.nc',mode='w')   
ds_disk.axes.append('time')  # size=None by default
print ds_disk.nc.dimensions['time']  # check that dimension is unlimited via underlying ds_disk.nc netCDF4 object
ds_disk['v1'] = da.DimArray([1,3,4.],axes=[[11.,22.,33.]], dims=['time'])  # write new variable
ds_disk['v1'].ix[3] = da.DimArray([44],axes=[[44.]], dims=['time'])   # add 4th time slice
ds_disk['v1'].ix[4] = 999.   # update value for 5th time slice
ds_disk['v1'].time[4] = 555.  # update time dimension
ds_disk.close()

The ix to indicate integer position index when inserting a new slice.
You can use ds_disk.nc any time to access lower level netCDF4 features.

Since the feature is still new, they might be a few bugs / incompatibilities with previous versions, please report in case you notice something unexpected.

perrette · 2014-12-05T18:51:30Z

If most of your work consists in working with netCDF dataset, you might also check this cool package: http://xray.readthedocs.org/ which is similar to dimarray in many points - which I learn about recently. The major conceptual difference is that they base the basic datastructure on netCDF (that is, the basic object is a Dataset, and their DataArray (equivalent of DimArray) is actually a pointer (the variable name) to a Dataset. In dimarray, DimArray is the basic object, and a Dataset is more a convenience to work with netCDF and provides speed up for a number of methods when several arrays share some dimensions in common (e.g. reindex_axis, interp_axis). xray's underlying code looks very professional to me, the only thing is that relying on the netCDF structure probably presents a few more constraints than just assuming an array + axes, and that is relies on pandas. The rest is a matter of preferences. Anyway, worth having a look at. I already adopted some of its API (e.g. an attrs attribute to store metadata, instead of having is in the object's __dict__).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Append along time axis with write_nc #33

Append along time axis with write_nc #33

MBlaschek commented Oct 23, 2014

MBlaschek commented Oct 23, 2014

perrette commented Oct 29, 2014

perrette commented Dec 5, 2014

perrette commented Dec 5, 2014

Append along time axis with write_nc #33

Append along time axis with write_nc #33

Comments

MBlaschek commented Oct 23, 2014

MBlaschek commented Oct 23, 2014

perrette commented Oct 29, 2014

perrette commented Dec 5, 2014

perrette commented Dec 5, 2014