Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expected time slice behavior? #2182

Open
pochedls opened this issue May 30, 2017 · 8 comments
Open

Expected time slice behavior? #2182

pochedls opened this issue May 30, 2017 · 8 comments
Assignees

Comments

@pochedls
Copy link

I wanted to see if something I am experiencing is expected behavior. I am using UVCDAT v2.10.

I am trying to load monthly surface temperature data over the same time range (01-1979 through 12-2016). Using the same command I get different length time series, though both have data over this entire record. One ends on 2016-12-1 and the other ends on 2016-11-15 (prematurely - at least compared to the behavior I expected).

I am using this syntax: temp = f('air', time=('1979-01', '2016-12'))

The behavior I expected was that it would load data through the end of the month (they both start in 01-1979). This would make it very convenient for loading different datasets (e.g., various surface temperature datasets) with the same time axes.

If this is the expected behavior, is there an easy way to achieve this (or do I need to write something that carefully compares the time units and subsets the data appropriately)?

Example script loading NOAA GlobalTemp (lines 1 - 8) and GISTEMP data (lines 9 - 15). Notice that the record lengths and end months are different (despite using time=('1979-01', '2016-12') in both).

In [1]: import cdms2 

In [2]: f = cdms2.open('/export_backup2/pochedley1/data/surface/air.mon.anom.nc')

In [3]: temp = f('air', time=('1979-01','2016-12'))

In [4]: temp.shape
Out[4]: (456, 36, 72)  

In [5]: f.close()

In [6]: time=temp.getTime()

In [7]: timeComponent = time.asComponentTime()

In [8]: timeComponent[-1]
Out[8]: 2016-12-1 0:0:0.0

In [9]: f = cdms2.open('/export_backup2/pochedley1/data/surface/gistemp1200_ERSSTv4.nc')

In [10]: temp = f('tempanomaly', time=('1979-01','2016-12'))
/export_backup2/pochedley1/bin/anaconda2/envs/uvcdat2/lib/python2.7/site-packages/cdms2/axis.py:1624: UserWarning:
Your first bounds[0,0] -180.000000000000000 will be corrected to -180.000000000000000
Your bounds bounds[-1,1] 180.000000000000000 will be corrected to 180.000000000000000
  warnings.warn(msg, UserWarning)

In [11]: temp.shape
Out[11]: (455, 90, 180)

In [12]: f.close()

In [13]: time=temp.getTime()

In [14]: timeComponent = time.asComponentTime()

In [15]: timeComponent[-1]
Out[15]: 2016-11-15 0:0:0.0
@dnadeau4
Copy link
Contributor

@pochedls Can you check that your time are exactly 1979-01-01T00:00:00 and 2016-12-01T00:00:00 on both files.

You can use ncdump -t -v time gistemp1200_ERSSTv4.nc to translate the time into a human readable format.

I am a bit puzzled by the bounds warning as well, I wrote this for longitudes that are not quite at -180.0,180.0. but you are...

@dnadeau4
Copy link
Contributor

@pochedls your file gistemp1200_ERSSTv4.nc seems to be at the center time 2016-12-16 this is why you are missing 1 month.

@pochedls
Copy link
Author

@dnadeau4 - thanks for looking at this. I see - so this is the expected behavior and opening monthly files in this way probably converts the '2016-12' to something like '2016-12-15'.

I see now that I can get the full time series by specifying the day as well, i.e., temp = f('tempanomaly', time=('1979-01-01','2016-12-31')) [I hadn't noticed this in the documentation, though this is obvious in hindsight].

It would be helpful to update the documentation (1.11 here). You could simply add something like:

or
>>> x = ua.subRegion(time=('1990-1-31','1991-1-31'))

under "or string representations could be used."

As for the longitude warning - I think I get this message pretty frequently.

@dnadeau4
Copy link
Contributor

I am actually refreshing the documentation using sphynx I will add your code to it.
I should be on readthedocs site someday...
https://github.com/UV-CDAT/cdms/blob/cdmsdocs/docs/source/manual/cdms_1.rst

@taylor13
Copy link

taylor13 commented Jun 1, 2017

@dnadeau4 I'll provide input on the time issue shortly, but first the longitude warning message. The code that produces this is:

    if(self.isLongitude() and hasattr(self, 'units') and
            (self.units.find('degree') != -1) and len(retbnds.shape) == 2):
        # Make sure we have close to 360 degree interval
        if(abs(abs(retbnds[-1, 1] - retbnds[0, 0]) - 360) <
              (numpy.minimum(0.01, abs(retbnds[0, 1] - retbnds[0, 0]) * 0.1))):
            # Now check wether either bound is near an interger value;
            # if yes round both integer
            if((abs(retbnds[0, 0] - numpy.floor(retbnds[0, 0] + 0.5)) <
                abs(retbnds[0, 1] - retbnds[0, 0]) * 0.01) or
                (abs(retbnds[-1, 1] - numpy.floor(retbnds[-1, 1] + 0.5)) <
                 abs(retbnds[-1, 1] - retbnds[-1, 0]) * 0.01)):
                # only for -180, 180 not needed if values are all positive
                # (0-360)
                if((retbnds[0, 0] * retbnds[-1, 1]) < 0):
                    msg = "\nYour first bounds[0,0] %3.15lf will be corrected to %3.15lf\n"\
                          "Your bounds bounds[-1,1] %3.15lf will be corrected to %3.15lf" \
                        % (retbnds[0, 0], numpy.floor(retbnds[0, 0] + 0.5), retbnds[-1, 1],
                           numpy.floor(retbnds[-1, 1] + 0.5))

                    warnings.warn(msg, UserWarning)
                    retbnds[0, 0] = numpy.floor(retbnds[0, 0] + 0.5)
                    retbnds[-1, 1] = numpy.floor(retbnds[-1, 1] + 0.5)
            else:
                if(retbnds[-1, 1] > retbnds[0, 0]):
                    retbnds[-1, 1] = retbnds[0, 0] + 360.
                else:
                    retbnds[0, 0] = retbnds[-1, 1] + 360.

The main problem is that the message will always be produced if the domain spans about 360. and includes both negative and positive longitude values. The message should only be displayed if the values of the either of the bounds has been changed.

The other point to be made is that I think the test (and possible corrections) should be made even when all longitude values are positive and all are negative, so the comment (about "... -180, 180 not needed if ...") is incorrect . One way to produce the correct behavior would be to replace

                if((retbnds[0, 0] * retbnds[-1, 1]) < 0):
                    msg = "\nYour first bounds[0,0] %3.15lf will be corrected to %3.15lf\n"\
                          "Your bounds bounds[-1,1] %3.15lf will be corrected to %3.15lf" \
                        % (retbnds[0, 0], numpy.floor(retbnds[0, 0] + 0.5), retbnds[-1, 1],
                           numpy.floor(retbnds[-1, 1] + 0.5))

                    warnings.warn(msg, UserWarning)
                    retbnds[0, 0] = numpy.floor(retbnds[0, 0] + 0.5)
                    retbnds[-1, 1] = numpy.floor(retbnds[-1, 1] + 0.5)

with

                 origbnd1 = retbnds[0,0]
                 origbnd2 = rebnds[-1,1]
                 retbnds[0, 0] = numpy.floor(retbnds[0, 0] + 0.5)
                 retbnds[-1, 1] = numpy.floor(retbnds[-1, 1] + 0.5)

                 if (((retbnds[0,0]-origbnd1) /= 0.) or (retbnds[-1,1]-origbnd2) /= 0.)):
                     msg="\n Your longitude domain approximately spans 360 degrees, so it has been adjusted to exactly span 360 degrees."\
                     if ((retbnds[0,0]-origbnd1) /= 0.)):
                         msg = msg + "The first bound of the longitude domain was adjusted from %e.15lf to %e.15lf" \ % (origbnd1, retbnds[0,0])
                      if ((retbnds[-1,1]-origbnd2) /= 0.)):
                         msg = msg + "The last bound of the longitude domain was adjusted from %e.15lf to %e.15lf" \ % (origbnd2, retbnds[-1,1])

                       warnings.warn(msg, UserWarning)

@taylor13
Copy link

taylor13 commented Jun 1, 2017

@doutriaux1 I think the CDMS assume the *beginning" of the month is meant if you leave off day (and beginning of the day is meant if you leave off hours-minutes-second), so 2016-12 is equivalent to 2016-12-01 00:00:00.0

You can further control which time-slices are selected using an "indicator" (defined in the mapIntervalExt specifications). Here is an example:

     data = ta.subRegion(time=('1980-3','1980-4','co'))

which you can find in the CDMS documentation at https://uvcdat.llnl.gov/documentation/cdms/cdms_2.html#2.3 .

Here is what the CDMS documentation says about the "indicator" ('co'), but it isn't very clear so I will follow this quoted section with a reworded version:

indicator is a two or three-character string, where the first character is

        'c' if the interval is closed on the left, 
        'o' if open, 

and the second character has the same meaning for the right-hand point.

If present, the third character specifies how the interval should be intersected with the axis:

  	'n' - select node values which are contained in the interval
  	'b' -select axis elements for which the corresponding cell boundary intersects the interval
  	'e' - same as n, but include an extra node on either side
  	's' - select axis elements for which the cell boundary is a subset of the interval

The default indicator is ‘ccn’, that is, the interval is closed, and nodes in the interval are selected.

In the above "closed" means the interval includes the specified end point, whereas "open" means the end point is not included.

I think the documentation of these characters will not be clear to most folks. I would reword part of this section as follows:

If present, the third character specifies what criteria should be used in determining which axis elements will be included:

    'n' - select all axis elements that are contained in the specified interval
    'e' - same as 'n', but when available, include one additional axis element on each end of the interval (i.e. pad the interval with an extra value on either end)
    'b' - select all axis elements of cells (defined by coordinate bounds) that are completely contained by the specified interval (i.e., lying entirely in the interval)
    's' - select all axis elements of cells (defined by coordinate bounds) that are at least partially contained in the specified interval

I am not absolutely sure I have the behavior correctly captured in the above, so we should check the relevant code (and look at some test examples).

@doutriaux1
Copy link
Contributor

@dnadeau4 can we close this?

@taylor13
Copy link

taylor13 commented Sep 5, 2017

I think, before closing someone should confirm that my definitions of n, e, b, and s are correct, and assuming they are, replace the section of the original documentation with the above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants