Description
In my project I'm examining air temperature and solar data, so my following comments apply to those values. They do not apply to other data returned by the model.
The solar position algorithm appears to ignore timezone in timezone aware timestamps when calculating irradiance data. Specifically, the .get_processed_data() function returns data with mixed handling of timezones. The dataframe returned has the timestamp in the format "UTC+tz". The ambient air temperature data matches this timestamp, with the highest temperature around, in local time, 17:00 and the minimum temperature around 05:00. The solar data does not match this timestamp, with irradiation beginning around, in UTC, 07:00 and ending around 19:00. This corresponds to irradiation data being > 0 from midnight to noon in local time.
I'm not sure if this is a bug or if it's me not understanding how timesteps are handled in pvlib-python, but the forecast I'm getting from .get_processed_data() does not appear to be correct.
To Reproduce
You can replicate this by running the code in the docs/tutorials/forecast.ipynb notebook. Steps:
- Run the first two setup cells to import as needed and input the location and time information.
- Run the first cell in the HRRR section to initialize the HRRR model (fm = HRRR())
- Run the fourth cell in the HRRR section to get processed data (data = fm.get_processed_data(latitude, longitude, start, end))
- Run the 9th cell in the HRRR section to print the processed, sorted data (data[sorted(data.columns)]).
The output will appear as follows:
The data shows solar irradiation data from 07:00-07:00 to 19:00-07:00. This makes sense if the timestamp is printed in local time, and does not if the timestamp is printed in UTC. The air temperature data shows a minimum at 12:00-07:00 and a maximum at 22:00-07:00. This makes sense if the timestamp is in UTC and does not if the data timestamp is in local time.
The following description, code, and solution were provided by Cliff Hansen @ Sandia:
I think there is a bug in this line:
pvlib-python/pvlib/forecast.py
Line 418 in 7eae1fc
This code produces the output Peter showed, with the air temperature and GHI patterns offset.
import pandas as pd
from pvlib.forecast import HRRR
latitude = 32.2
longitude = -110.9
tz = 'America/Phoenix'
start = pd.Timestamp('2021-05-19 00:00:00', tz=tz)
end = start + pd.Timedelta(days=7) # 7 days
fm = HRRR()
data = fm.get_processed_data(latitude, longitude, start, end)
data[sorted(data.columns)]
plt.plot(data['temp_air']*30)
plt.plot(data['ghi'])
Replacing that line with
self.time = pd.DatetimeIndex(pd.Series(times), tz=self.location.tz)
self.time = pd.DatetimeIndex(pd.Series(times)).tz_localize('UTC').tz_convert(self.location.tz)
produces time traces that make sense.