Plotting HYCOM/RTOFS SST data in Python

Based on this notebook, which highlights some basics on reading in RTOFS/netCDF output into Python, manipulating that data, and plotting it. For additional examples, Filipe Fernandes has a great example of similar operations on his blog.

    import netCDF4 as nc
    import numpy as np

    %matplotlib inline
    import matplotlib.pyplot as plt
    import seaborn as sns

Download some RTOFS global output. It's a rather large file, so be sure to comment out this command once it's already downloaded!

#!curl -O

For starters, let's read in the data using the Unidate netCDF4 module. This module gives very low-level access to the netCDF file - basically just its raw contents and metadata. We load the file directly into memory using the Dataset type, and print it to give an overview of its contents.

    fn = ""
    data = nc.Dataset(fn)
    print data
<type 'netCDF4.Dataset'>
root group (NETCDF3_CLASSIC data model, file format UNDEFINED):
    Conventions: CF-1.0
    title: HYCOM ATLb2.00
    institution: National Centers for Environmental Prediction
    source: HYCOM archive file
    experiment: 90.9
    history: archv2ncdf2d
    dimensions(sizes): MT(1), Y(3298), X(4500), Layer(1)
    variables(dimensions): float64 MT(MT), float64 Date(MT), int32 Layer(Layer), int32 Y(Y), int32 X(X), float32 Latitude(Y,X), float32 Longitude(Y,X), float32 u_velocity(MT,Layer,Y,X), float32 v_velocity(MT,Layer,Y,X), float32 sst(MT,Y,X), float32 sss(MT,Y,X), float32 layer_density(MT,Layer,Y,X)

Each Dataset object contains a dict-like interface for accessing variables, which can be selected by their var_name from the original netCDF file. Let's select the SST data and inspect it in more detail.

    sst = data.variables['sst']
    print sst
<type 'netCDF4.Variable'>
float32 sst(MT, Y, X)
    coordinates: Longitude Latitude Date
    standard_name: sea_surface_temperature
    units: degC
    _FillValue: 1.26765e+30
    valid_range: [ -3.58215308  34.7584877 ]
    long_name:  sea surf. temp.   [90.9H]
unlimited dimensions: MT
current shape = (1, 3298, 4500)
filling off

This gives us a quick overview of the sst data. It's a 3-dimensional dataset (time, latitude, and longitude). Furthermore, from the inspection of the full dataset, we can see that the latitude and longitude coordinates are actually aliases for a complex, 2D latitude-longitude system underpinning the model coordinate systems.

Note that we can use standard netCDF terminal commands to also inspect the contents of the dataset:

    !ncdump -h {fn}
netcdf rtofs_glo_2ds_f001_1hrly_prog {
    MT = UNLIMITED ; // (1 currently)
    Y = 3298 ;
    X = 4500 ;
    Layer = 1 ;
    double MT(MT) ;
        MT:long_name = "time" ;
        MT:units = "days since 1900-12-31 00:00:00" ;
        MT:calendar = "standard" ;
        MT:axis = "T" ;
    double Date(MT) ;
        Date:long_name = "date" ;
        Date:units = "day as %Y%m%d.%f" ;
        Date:C_format = "%13.4f" ;
        Date:FORTRAN_format = "(f13.4)" ;
    int Layer(Layer) ;
        Layer:units = "layer" ;
        Layer:positive = "down" ;
        Layer:axis = "Z" ;
    int Y(Y) ;
        Y:point_spacing = "even" ;
        Y:axis = "Y" ;
    int X(X) ;
        X:point_spacing = "even" ;
        X:axis = "X" ;
    float Latitude(Y, X) ;
        Latitude:standard_name = "latitude" ;
        Latitude:units = "degrees_north" ;
    float Longitude(Y, X) ;
        Longitude:standard_name = "longitude" ;
        Longitude:units = "degrees_east" ;
        Longitude:modulo = "360 degrees" ;
    float u_velocity(MT, Layer, Y, X) ;
        u_velocity:coordinates = "Longitude Latitude Date" ;
        u_velocity:standard_name = "eastward_sea_water_velocity" ;
        u_velocity:units = "m/s" ;
        u_velocity:_FillValue = 1.267651e+30f ;
        u_velocity:valid_range = -2.127099f, 2.834078f ;
        u_velocity:long_name = " u-veloc. [90.9H]" ;
    float v_velocity(MT, Layer, Y, X) ;
        v_velocity:coordinates = "Longitude Latitude Date" ;
        v_velocity:standard_name = "northward_sea_water_velocity" ;
        v_velocity:units = "m/s" ;
        v_velocity:_FillValue = 1.267651e+30f ;
        v_velocity:valid_range = -2.466692f, 2.390137f ;
        v_velocity:long_name = " v-veloc. [90.9H]" ;
    float sst(MT, Y, X) ;
        sst:coordinates = "Longitude Latitude Date" ;
        sst:standard_name = "sea_surface_temperature" ;
        sst:units = "degC" ;
        sst:_FillValue = 1.267651e+30f ;
        sst:valid_range = -3.582153f, 34.75849f ;
        sst:long_name = " sea surf. temp.   [90.9H]" ;
    float sss(MT, Y, X) ;
        sss:coordinates = "Longitude Latitude Date" ;
        sss:standard_name = "sea_surface_salinity" ;
        sss:units = "psu" ;
        sss:_FillValue = 1.267651e+30f ;
        sss:valid_range = 0.2699257f, 40.10468f ;
        sss:long_name = "sea surf. salnity  [90.9H]" ;
    float layer_density(MT, Layer, Y, X) ;
        layer_density:coordinates = "Longitude Latitude Date" ;
        layer_density:standard_name = "sea_water_potential_density" ;
        layer_density:units = "sigma" ;
        layer_density:_FillValue = 1.267651e+30f ;
        layer_density:valid_range = 0.f, 0.f ;
        layer_density:long_name = " density [90.9H]" ;

// global attributes:
        :Conventions = "CF-1.0" ;
        :title = "HYCOM ATLb2.00" ;
        :institution = "National Centers for Environmental Prediction" ;
        :source = "HYCOM archive file" ;
        :experiment = "90.9" ;
        :history = "archv2ncdf2d" ;

Another useful package for reading in netCDF or other structured datasets is xray. xray automatically labels coordinate axes and gives a pandas-like interface to manipulating data. For instance, here is an example of selecting the sst data from the netCDF file using xray, and plotting the distribution of temperature values:

    import xray
    ds = xray.open_dataset(fn, decode_times=True)
    sst = ds.sst.values.ravel()
    sst_masked = sst[~np.isnan(sst)]

    ## Masking a numpy array with multiple logical criteria:
    # sst_between_-10_5 = sst[(sst > -10) & (sst < 5)]

    print ds.sst[0]
<xray.DataArray 'sst' (Y: 3298, X: 4500)>
array([[ nan,  nan,  nan, ...,  nan,  nan,  nan],
       [ nan,  nan,  nan, ...,  nan,  nan,  nan],
       [ nan,  nan,  nan, ...,  nan,  nan,  nan],
       [ nan,  nan,  nan, ...,  nan,  nan,  nan],
       [ nan,  nan,  nan, ...,  nan,  nan,  nan],
       [ nan,  nan,  nan, ...,  nan,  nan,  nan]])
    MT         datetime64[ns] 2015-07-01T01:00:00.028800
  * Y          (Y) int32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 ...
  * X          (X) int32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 ...
    Latitude   (Y, X) float32 -78.64 -78.64 -78.64 -78.64 -78.64 -78.64 ...
    Longitude  (Y, X) float32 74.16 74.24 74.3199 74.4 74.48 74.5601 74.64 ...
    Date       float64 2.015e+07
    standard_name: sea_surface_temperature
    units: degC
    valid_range: [ -3.58215308  34.7584877 ]
    long_name:  sea surf. temp.   [90.9H]


We can also very quickly render the raw data in a cartesian coordinate system using imshow from matplotlib. Note that we invert the second axis (latitude) since imshow defines its coordinate system from top-to-bottom.



A more sophisticated mapping system is provided by the iris/cartopy ecosystem. iris allows us to easily open structured climate data sets (of which this model output is an example), and automatically handles processing of metadata, such as timestamps and coordinate systems. It also has easy-to-use machinery for reprojecting, slicing through, or other basic manipulations of the data.

    import as ccrs

    import iris
    import iris.plot as iplt
    import iris.quickplot as qplt

It's easy enough to load a dataset using iris; just open the raw netCDF file, and we'll have a list of Cubes for each variable in it, each containing information on the structure of the coordinate system.

    cubes = iris.load(fn)
    print cubes
0: eastward_sea_water_velocity / (m/s) (time: 1; Layer: 1; latitude: 3298; longitude: 4500)
1: northward_sea_water_velocity / (m/s) (time: 1; Layer: 1; latitude: 3298; longitude: 4500)
2: sea_surface_salinity / (unknown)    (time: 1; latitude: 3298; longitude: 4500)
3: sea_surface_temperature / (degC)    (time: 1; latitude: 3298; longitude: 4500)
4: sea_water_potential_density / (unknown) (time: 1; Layer: 1; latitude: 3298; longitude: 4500)

We can extract a single variable (Cube) from the cube list using indexing notation, or the extract function. Printing the cube tells us all the metadata we need to know about it.

    #sst_c = cubes[3]
    sst_c = cubes.extract("sea_surface_temperature", strict=True)
    print sst_c
sea_surface_temperature / (degC)    (time: 1; latitude: 3298; longitude: 4500)
     Dimension coordinates:
          time                           x            -                -
          latitude                       -            x                -
          longitude                      -            -                x
     Auxiliary coordinates:
          date                           x            -                -
          latitude                       -            x                x
          longitude                      -            x                x
             modulo='360 degrees'
          Conventions: CF-1.0
          experiment: 90.9
          history: archv2ncdf2d
          institution: National Centers for Environmental Prediction
          source: HYCOM archive file
          title: HYCOM ATLb2.00

Generally speaking, the plotting machinery of matplotlib or cartopy can't handle 2D coordinate systems very well. So we'll try to re-project the data into a standard system. There are caveats here:

  1. The project function used below is not fancy; it applies a nearest-neighbor algorithm, so it does not conserve anything about the data. Ideally, you'd want to use something like a bilinear interpolation method to migrate data to the new coordinate system.

  2. This particular dataset is weird; the "raw" x-y coordinate system do not have names in the original netCDF file, but iris tags them as redundant latitude-longitude coordinates. It still retains the true 2D grid as Auxiliary coords, so we can just delete the mis-named coordinates without harming ourselves too much.

  3. The projection method is kind of slow, so we slice through a subset of the data as an example.

  4. We use a Plate Carree projection, but any one could do.

    import iris.analysis.cartography as iac
    sst_d = sst_c[0, ::10, ::10]


    new_sst, extent = iac.project(sst_d, ccrs.PlateCarree())

    print new_sst
sea_surface_temperature / (degC)    (projection_y_coordinate: 330; projection_x_coordinate: 450)
     Dimension coordinates:
          projection_y_coordinate                           x                             -
          projection_x_coordinate                           -                             x
     Auxiliary coordinates:
          latitude                                          x                             x
          longitude                                         x                             x
     Scalar coordinates:
          date: 20150701.0417
          time: 2015-07-01 01:00:00
          Conventions: CF-1.0
          experiment: 90.9
          history: archv2ncdf2d
          institution: National Centers for Environmental Prediction
          source: HYCOM archive file
          title: HYCOM ATLb2.00

We can now easily plot the dataset using iris' interface to matplotlib, as a proof of concept:



As a slightly fancier example, let's just plot SSTs in the vicinity of the Gulf Stream.

First, we should extract a 'rectangular' box of SST data around the US Atlantic coast form the original dataset. iris has problems with 2D coordinate systems (most tools will!), so Filipe Fernandes has built a package of workarounds that we could use to help. For simplicity, we'll just write a function for extracting the correct points from the 2D data (stolen form Filipe!)

    sst_d = sst_c[:]


    # add 360 to lon coord since it's not symmetric about the prime meridian
    #bbox = [-78.+360., 34., -75.+360., 42.]
    bbox = [-82.0+360., 32., -66.0+360., 45.]

    minmax = lambda x: (np.min(x), np.max(x))

    def bbox_extract_2Dcoords(cube, bbox):
        Extract a sub-set of a cube inside a lon, lat bounding box
        bbox=[lon_min lon_max lat_min lat_max].
        NOTE: This is a work around too subset an iris cube that has
        2D lon, lat coords.

        lons = cube.coord('longitude').points
        lats = cube.coord('latitude').points

        lons_inregion = np.logical_and(lons > bbox[0], lons < bbox[2])
        lats_inregion = np.logical_and(lats > bbox[1], lats < bbox[3])
        inregion = np.logical_and(lons_inregion, lats_inregion)

        region_inds = np.where(inregion)
        imin, imax = minmax(region_inds[0])
        jmin, jmax = minmax(region_inds[1])
        return cube[..., imin:imax+1, jmin:jmax+1]

    sst_sub = bbox_extract_2Dcoords(sst_d, bbox)

    print sst_sub
sea_surface_temperature / (degC)    (time: 1; -- : 209; -- : 199)
     Dimension coordinates:
          time                           x       -         -
     Auxiliary coordinates:
          date                           x       -         -
          latitude                       -       x         x
          longitude                      -       x         x
          Conventions: CF-1.0
          experiment: 90.9
          history: archv2ncdf2d
          institution: National Centers for Environmental Prediction
          source: HYCOM archive file
          title: HYCOM ATLb2.00

As a reminder, we can collapse a dimension using a mathematical operation rather than just index it using numpy array slicing/indexing notation. Thus, we can quickly whip up a view of the sub-sliced SST data:

    sst_proc = sst_sub.collapsed('time', iris.analysis.MEAN)
    #print sst_proc.shape



What's especially nice now is that we have a much simpler X-Y/lat-lon coordinate system that doesn't deal with seams across the prime meridian or equator. So we don't even have to re-project our data! We can just plot it using the normal matplotlib routines. Below is an example of leveraging cartopy/matplotlib to build production-quality cartographic images.

    # Extra imports for annotating our map/plot
    from cartopy.feature import NaturalEarthFeature, COLORS
    from cartopy.mpl.gridliner import LONGITUDE_FORMATTER, LATITUDE_FORMATTER

    # Control feature/boundary resolution
    FEATURE_RES = '50m' # '10m' or '110m'

    # Extract the data and coordinate system, masking any invalid
    # data points for convenience =
    lon = sst_proc.coord('longitude').points
    lat = sst_proc.coord('latitude').points

    # Set up the plotting canvas
    fig = plt.figure()
    ax = fig.add_subplot(111, projection=ccrs.PlateCarree())

    # Set map extent
    ax.set_extent([bbox[0], bbox[2], bbox[1], bbox[3]])

    # Add land political boundaries to map
    land_data = NaturalEarthFeature('physical', 'land', FEATURE_RES,
                                    edgecolor='face', # edgecolor == facecolor
                                    facecolor=COLORS['land']) # a nice default
    states = NaturalEarthFeature('cultural',
                                 edgecolor='grey', facecolor='none')
    ax.coastlines(resolution=FEATURE_RES, lw=1.15)

    # Add some gridlines and use cartopy's convenience functions
    # for formatting them
    # Note - disabled for now, just change 'color' to enable
    grid_style = dict(color='none', linestyle='-', lw=1.5)
    gl = ax.gridlines(draw_labels=True, **grid_style)
    gl.xlabels_top = gl.ylabels_right = False
    gl.xformatter = LONGITUDE_FORMATTER
    gl.yformatter = LATITUDE_FORMATTER
    label_style = dict(size=11, color='black', weight='regular',
    gl.xlabel_style = gl.ylabel_style = label_style

    # Plot the actual data, using the projection system as a coordinate
    # transform (I think this is right?)
    pc = ax.pcolormesh(lon, lat,, transform=ccrs.PlateCarree(),
    cbar = fig.colorbar(pc, ax=ax, shrink=0.9), labelpad=15., rotation=-90.,
                       fontdict=dict(size=12, weight='bold'))

    ax.set_title("SST data from HYCOM/RTOFS", loc='left',
                 fontdict=dict(size=12, weight='bold'))