The 20th Century Reanalysis: IRData.twcr

This package retrieves and loads data from the Twentieth Century Reanalysis (20CR).

It retrieves the data from the 20CR portal at NERSC.

At the moment, only version ‘2c’ of 20CR is supported for public use. There is limited support for the pre-release of version 3 - versions ‘4.5.*’ - see below.

Only hourly data is supported (no daily or monthly averages) for 5 surface variables:

  • Mean-sea-level pressure: ‘mslp’
  • 2m air temperature: ‘air.2m’
  • Precipitation rate: ‘prate’
  • 10m meridional wind: ‘uwnd.10m’
  • 10m zonal wind: ‘vwnd.10m’

Data retrieved is stored in directory $SCRATCH/20CR - the ‘SCRATCH’ environment variable must be set.

For example:

import datetime
import IRData.twcr as twcr
twcr.fetch('prate',
           datetime.datetime(1987,3,12),
           version='2c')

Will retrieve precipitation rate data for the selected date. 20CR2c data is fetched in one-calendar-year blocks, so this will retrieve data for the whole of 1987. The retrieval is slow, as the data has to be fetched from NERSC, but the retrieval is only run if necessary - if that year’s data has been previously fetched and is already on local disc, the fetch command will detect this and return instantly.

Once the data has been fetched,

pr=twcr.load('prate',
             datetime.datetime(1987,3,12,15,15),
             version='2c')

will then load the precipitation rates at quarter past 3pm on March 12 1987 from the retrieved dataset as an iris.cube.Cube. Note that as 20CR only provides data at 6-hourly or 3-hourly intervals, the value for 3:15pm will be interpolated between the outputs (to get uninterpolated data, only call load for times when 20CR has output). Also, as 20CR2c is an ensemble dataset, the result will include all 56 ensemble members.

Observations files are also available. They can be fetched with:

import datetime
twcr.fetch_observations(datetime.datetime(1987,3,12,15,15),
                        version='2c')

Observations are also fetched in one-calendar-year blocks, so this will retrieve observations (from NERSC) for the whole of 1987. Again, the retrieval is only run if necessary - if that year’s data has been previously fetched and is already on local disc, the fetch command will detect this and return instantly.

Once the observations have been fetched, load all the observations valid between two times with:

import datetime
o=twcr.load_observations(datetime.datetime(1987,3,12,6),
                         datetime.datetime(1987,3,12,18),
                         version='2c')

It’s also possible to load all the observations associated with a particular reanalysis field. 20CR assimilates observations every 6-hours, so there is one observations file for each 6-hourly assimilation run. Load all the observations available to the assimilation run for 12 noon on March 12 1987 (as a pandas.DataFrame) with:

o=twcr.load_observations_1file(datetime.datetime(1987,3,12,12),
                               version='2c')

That’s only possible for times that match an assimilation time (hour=0,6,12,18). For in-between times (interpolated fields), load all the observations contributing to the field with:

o=twcr.load_observations_fortime(datetime.datetime(1987,3,12,12),
                                 version='2c')

This gets all the observations from each field used in the interpolation, and assigns a weight to each one - the same as the weight used in interpolating the fields.

Pre-release version 3

Version numbers beginning ‘4.5.’ (mostly ‘4.5.1’ and ‘4.5.2’) are the pre-release data for 20CRv3 and all the functions described above will work in the same way, with the major caveat that the data are not yet released, so you can’t just ‘fetch’ them. To get proto-v3 data, first create the data files at NERSC and then fetch them as with 2c, except you will be downloading the data by ssh. This means you will need to setup your NERSC account to support passwordless ssh access from your local machine and add your NERSC account name to the fetch command:

import datetime
import IRData.twcr as twcr
twcr.fetch('prate',
           datetime.datetime(1987,3,12),
           version='4.5.1',
           user='pbrohan')

Note that proto-v3 data is fetched in 1-month blocks (rather than 1-year as for 2c). All the ‘load’ functions then work exactly as for 2c.

Note: NERSC is soon to enforce multi-factor authentication which will mess this up. Some changes will be required.


IRData.twcr.fetch(variable, dtime, version='none', user='pbrohan')[source]

Get data for one variable, from the 20CR archive at NERSC.

Data wil be stored locally in directory $SCRATCH/20CR, to be retrieved by load(). If the local file that would be produced already exists, this function does nothing.

Parameters:
  • variable (str) – Variable to fetch (e.g. ‘prmsl’).
  • dtime (datetime.datetime) – Date and time to get data for.
  • version (str) – 20CR version to retrieve data for.
  • user (str) – NERSC userid to use in retrieval. Only needed for v3-preliminary data. Defaults to ‘pbrohan’. This should be your NERSC username.
Raises:

StandardError – If version is not a supported value.


IRData.twcr.fetch_observations(dtime, version='none', user='pbrohan')[source]

Get observations from the 20CR archive at NERSC.

Data wil be stored locally in directory $SCRATCH/20CR, to be retrieved by load_observations(). If the local files that would be produced already exists, this function does nothing.

For 20CR version 2c, the data is retrieved in calendar year blocks, and the ‘month’ and ‘day’ arguments are ignored.

Parameters:
  • dtime (datetime.datetime) – Date and time to get observations for.
  • version (str) – 20CR version to retrieve data for.
  • user (str) – NERSC userid to use in retrieval. Only needed for v3-preliminary data. Defaults to ‘pbrohan’. This should be your NERSC username.

Will retrieve the data for the year of the given date-time.

Raises:StandardError – If version is not a supported value.

IRData.twcr.load(variable, dtime, version=None)[source]

Load requested data from disc, interpolating if necessary.

Data must be available in directory $SCRATCH/20CR, previously retrieved by fetch().

Parameters:
  • variable (str) – Variable to fetch (e.g. ‘prmsl’)
  • dtime (datetime.datetime) – Date and time to load data for.
  • version (str) – 20CR version to load data from.
Returns:

Global field of variable at time.

Return type:

iris.cube.Cube

Note that 20CR data is only output every 6 hours (prmsl) or 3 hours, so if hour%3!=0, the result may be linearly interpolated in time.

Raises:StandardError – Version number not supported, or data not on disc - see fetch()

IRData.twcr.load_observations(start, end, version='none', user='pbrohan')[source]

Load observations from disc, for the selected period

Data must be available in directory $SCRATCH/20CR, previously retrieved by fetch().

Parameters:
Returns:

Dataframe of observations.

Return type:

pandas.DataFrame

Raises:

StandardError – Version number not supported, or data not on disc - see fetch_observations()


IRData.twcr.load_observations_1file(dtime, version='none')[source]

Load observations from disc, that were used in the assimilation run at the time specified.

Data must be available in directory $SCRATCH/20CR, previously retrieved by fetch_observations().

Parameters:
  • dtime (int) – Date and time of assimilation run.
  • version (str) – 20CR version to load data from.
  • user (str) – NERSC userid to use in retrieval. Only needed for v3-preliminary data. Defaults to ‘pbrohan’. This should be your NERSC username.
Returns:

Dataframe of observations.

Return type:

pandas.DataFrame

Raises:

StandardError – Version number not supported, or data not on disc - see fetch_observations()


IRData.twcr.load_observations_fortime(v_time, version='none')[source]

Load observations from disc, that contribute to fields ata given time

Data must be available in directory $SCRATCH/20CR, previously retrieved by fetch().

At the times when assimilation takes place, all the observations used at that time are provided by load_observations_1file() - this function serves the same function, but for intermediate times, where fields are obtained by interpolation. It gets all the observations from each field used in the interpolation, and assigns a weight to each one - the same as the weight used in interpolating the fields.

Parameters:
  • v_time (datetime.datetime) – Get observations associated with this time.
  • version (str) – 20CR version to load data from.
Returns:

same as from load_observations(), except with aded column ‘weight’ giving the weight of each observation at the given time.

Return type:

pandas.DataFrame

Raises:

StandardError – Version number not supported, or data not on disc - see fetch_observations()