-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
CFTimeIndex #1252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CFTimeIndex #1252
Changes from 2 commits
e1e8223
6496458
675b2f7
7beddc1
3cf03bc
53b085c
738979b
a177f89
48ec519
9e76df6
2a7b439
b942724
7845e6d
a9ed3c8
3e23ed5
a9f3548
f00f59a
b34879d
e93b62d
61e8bc6
0244f58
32d7986
9855176
8d61fdb
6b87da7
812710c
3610e6e
8f69a90
cec909c
422792b
de74037
2993e3c
f3438fd
c35364e
08f72dc
62ce0ae
ff05005
20fea63
d5a3cef
e721d26
5e1c4a8
257f086
00e8ada
c9d0454
f678714
b03e38e
890dde0
80e05ba
13c8358
ab46798
67fd335
7041a8d
9df4e11
9391463
da12ecd
a6997ec
7302d7e
9dc5539
1aa8d86
ef3f2b1
4fb5a90
1fd205a
58a0715
ca4d7dd
3947aac
a395db0
1b00bde
5fdcd20
459211c
7e9bb20
247c9eb
e66abe9
f25b0b6
b10cc73
c318755
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,180 @@ | ||
import re | ||
from datetime import timedelta | ||
|
||
import numpy as np | ||
import pandas as pd | ||
|
||
from pandas.lib import isscalar | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. V minor but there is an xarray version of this There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. And the pandas version isn't public API :) |
||
|
||
|
||
def named(name, pattern): | ||
return '(?P<' + name + '>' + pattern + ')' | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This should only be called once, probably at module import time, so it should not matter for performance. I would just go with whatever is most readable. |
||
|
||
|
||
def optional(x): | ||
return '(?:' + x + ')?' | ||
|
||
|
||
def trailing_optional(xs): | ||
if not xs: | ||
return '' | ||
return xs[0] + optional(trailing_optional(xs[1:])) | ||
|
||
|
||
def build_pattern(date_sep='\-', datetime_sep='T', time_sep='\:'): | ||
pieces = [(None, 'year', '\d{4}'), | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do you need negative or five digit years? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Personally, I don't see myself needing it in the near future, but I'm not necessarily opposed to adding that support if others would find it useful. It would make writing simple positive four-digit year dates more complicated though right? Would you always need the leading zero and the sign? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Then let's not bother until someone asks. Per Wikipedia's ISO 8601 you can optionally use an expanded year representation with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. FYI NCAR's TraCE simulation project is a 21k yr paleoclimate simulation. Not sure how they handle calendars/times. I know somebody who has analyzed data from this simulation; will ask what it looks like. |
||
(date_sep, 'month', '\d{2}'), | ||
(date_sep, 'day', '\d{2}'), | ||
(datetime_sep, 'hour', '\d{2}'), | ||
(time_sep, 'minute', '\d{2}'), | ||
(time_sep, 'second', '\d{2}' + optional('\.\d+'))] | ||
pattern_list = [] | ||
for sep, name, sub_pattern in pieces: | ||
pattern_list.append((sep if sep else '') + named(name, sub_pattern)) | ||
# TODO: allow timezone offsets? | ||
return '^' + trailing_optional(pattern_list) + '$' | ||
|
||
|
||
def parse_iso8601(datetime_string): | ||
basic_pattern = build_pattern(date_sep='', time_sep='') | ||
extended_pattern = build_pattern() | ||
patterns = [basic_pattern, extended_pattern] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Save this in as global variable. |
||
for pattern in patterns: | ||
match = re.match(pattern, datetime_string) | ||
if match: | ||
return match.groupdict() | ||
raise ValueError('no ISO-8601 match for string: %s' % datetime_string) | ||
|
||
|
||
def _parse_iso8601_with_reso(date_type, timestr): | ||
default = date_type(1, 1, 1) | ||
result = parse_iso8601(timestr) | ||
replace = {} | ||
|
||
for attr in ['year', 'month', 'day', 'hour', 'minute', 'second']: | ||
value = result.get(attr, None) | ||
if value is not None: | ||
replace[attr] = int(value) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Note that seconds can be fractional |
||
resolution = attr | ||
|
||
return default.replace(**replace), resolution | ||
|
||
|
||
def _parsed_string_to_bounds(date_type, resolution, parsed): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Note this is based on a pandas function |
||
if resolution == 'year': | ||
return (date_type(parsed.year, 1, 1), | ||
date_type(parsed.year + 1, 1, 1) - timedelta(microseconds=1)) | ||
if resolution == 'month': | ||
if parsed.month == 12: | ||
end = date_type(parsed.year + 1, 1, 1) - timedelta(microseconds=1) | ||
else: | ||
end = (date_type(parsed.year, parsed.month + 1, 1) - | ||
timedelta(microseconds=1)) | ||
return date_type(parsed.year, parsed.month, 1), end | ||
if resolution == 'day': | ||
start = date_type(parsed.year, parsed.month, parsed.day) | ||
return start, start + timedelta(days=1, microseconds=-1) | ||
if resolution == 'hour': | ||
start = date_type(parsed.year, parsed.month, parsed.day, parsed.hour) | ||
return start, start + timedelta(hours=1, microseconds=-1) | ||
if resolution == 'minute': | ||
start = date_type(parsed.year, parsed.month, parsed.day, parsed.hour, | ||
parsed.minute) | ||
return start, start + timedelta(minutes=1, microseconds=-1) | ||
if resolution == 'second': | ||
start = date_type(parsed.year, parsed.month, parsed.day, parsed.hour, | ||
parsed.minute, parsed.second) | ||
return start, start + timedelta(seconds=1, microseconds=-1) | ||
else: | ||
raise KeyError | ||
|
||
|
||
def get_date_field(datetimes, field): | ||
return [getattr(date, field) for date in datetimes] | ||
|
||
|
||
def _field_accessor(name, docstring=None): | ||
def f(self): | ||
return get_date_field(self._data, name) | ||
|
||
f.__name__ = name | ||
f.__doc__ = docstring | ||
return property(f) | ||
|
||
|
||
def get_date_type(self): | ||
return type(self._data[0]) | ||
|
||
|
||
class NetCDFTimeIndex(pd.Index): | ||
def __new__(cls, data): | ||
result = object.__new__(cls) | ||
result._data = np.array(data) | ||
return result | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Consider validating that every array element has correct type |
||
|
||
year = _field_accessor('year', 'The year of the datetime') | ||
month = _field_accessor('month', 'The month of the datetime') | ||
day = _field_accessor('day', 'The days of the datetime') | ||
hour = _field_accessor('hour', 'The hours of the datetime') | ||
minute = _field_accessor('minute', 'The minutes of the datetime') | ||
second = _field_accessor('second', 'The seconds of the datetime') | ||
microsecond = _field_accessor('microsecond', | ||
'The microseconds of the datetime') | ||
date_type = property(get_date_type) | ||
|
||
def _partial_date_slice(self, resolution, parsed, | ||
use_lhs=True, use_rhs=True): | ||
start, end = _parsed_string_to_bounds(self.date_type, resolution, | ||
parsed) | ||
lhs_mask = (self._data >= start) if use_lhs else True | ||
rhs_mask = (self._data <= end) if use_rhs else True | ||
return (lhs_mask & rhs_mask).nonzero()[0] | ||
|
||
def _get_string_slice(self, key, use_lhs=True, use_rhs=True): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we actually need the |
||
parsed, resolution = _parse_iso8601_with_reso(self.date_type, key) | ||
loc = self._partial_date_slice(resolution, parsed, use_lhs, use_rhs) | ||
return loc | ||
|
||
def get_loc(self, key, method=None, tolerance=None): | ||
if isinstance(key, pd.compat.string_types): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Use xarray's pycompat module instead of pandas's |
||
result = self._get_string_slice(key) | ||
# Prevents problem with __contains__ if key corresponds to only | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would solve this problem by checking for boolean dtype instead There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I lifted the
Is it worth discussing with them to see what their recommended fix is? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As a side note, this issue, and the behavior I described below, #1252 (comment), seem to be influenced by the fact that for simplicity, I omitted logic in For example, if I override |
||
# the first element in index (if we leave things as a list, | ||
# np.any([0]) is False). | ||
# Also coerces things to scalar coords in xarray if possible, | ||
# which is consistent with the behavior with a DatetimeIndex. | ||
if len(result) == 1: | ||
return result[0] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does datetimeindex actually do exactly this? It's pretty messy behavior. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You're correct; DatetimeIndex doesn't do exactly this:
I'll try and simplify the logic here using your suggestion above. |
||
else: | ||
return result | ||
else: | ||
return pd.Index.get_loc(self, key, method=method, | ||
tolerance=tolerance) | ||
|
||
def _maybe_cast_slice_bound(self, label, side, kind): | ||
if isinstance(label, pd.compat.string_types): | ||
parsed, resolution = _parse_iso8601_with_reso(self.date_type, | ||
label) | ||
start, end = _parsed_string_to_bounds(self.date_type, resolution, | ||
parsed) | ||
if self.is_monotonic_decreasing and len(self): | ||
return end if side == 'left' else start | ||
return start if side == 'left' else end | ||
else: | ||
return label | ||
|
||
# TODO: Add ability to use integer range outside of iloc? | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is messy indeed! |
||
# e.g. series[1:5]. | ||
def get_value(self, series, key): | ||
if not isinstance(key, slice): | ||
return series.iloc[self.get_loc(key)] | ||
else: | ||
return series.iloc[self.slice_indexer( | ||
key.start, key.stop, key.step)] | ||
|
||
def __contains__(self, key): | ||
try: | ||
result = self.get_loc(key) | ||
return isscalar(result) or type(result) == slice or np.any(result) | ||
except (KeyError, TypeError, ValueError): | ||
return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't go in
core
, since there's nothing tying it to core xarray internals. Instead, it should probably go in a new top level module, maybe a new directory alongside the contents of the existing conventions module (rename it toxarray.conventions.coding
?).