Run Configuration

See here for real life configuration examples

For an example config with all the available options turned on see the Example Config

Run configuration is broken into 6 major sections:

  • global options used by all components

  • img_hosting options used for hosting diagnostic output

  • simulations options related to each case being run, and how case-vs-case comparisons should be configured

  • Post processing all options related to post processing jobs

  • Diags all options related to diagnostic runs

  • Data types definitions of which data types are required, and how to find data files

  • Example Config an example config file

global

The global section has the following keys:

  • project_path (string): This is the base of the processflow project directory tree. All input and output will be stored here under project_path/input/ and project_path/output/. Any required directories will be generated.

  • email (string): This is the email address to send notifications to.

example

[global]
    project_path = /p/user_pub/e3sm/baldwin32/deck/bcrc_spinup  # base path for the project
    email = baldwin32@llnl.gov                                  # my email address so I can get notifications

img_hosting

This is an optional section, only needed if the user would like to turn on web hosting for diagnostic plot output. To turn off output hosting, simply remove this section from the configuration.

  • img_host_server (string): The base url of the webserver, used for constructing the notification email links.

  • host_directory (string): The base directory for where to put output for web hosting, the user must have permission to write here. Directories will be created for each simulation case, with jobs for the case stored below it.

  • url_prefix (string): Notification urls are constructed as: https://{img_host_server}/{url_prefix}/{case}/{diagnostic} the url_prefix is used if the hosting service uses a specific string for your host directory.

example

[img_hosting]
    img_host_server = acme-viewer.llnl.gov                  # this hypothetical run is happen on acme1.llnl.gov, see :ref:`machine_sp`
    host_directory = /var/www/acme/acme-diags/baldwin32
    url_prefix = baldwin32

simulations

This section is used for configuring each case. As many cases can be placed here as the user would like (one or more). The cases can be very different from each other, use different naming conventions (see data_types), and have their data stored in different file structures. The one thing they must all share in common is the start_year and end_year attributes.

  • start_year (int): the first year of data to be used.

  • end_year (int): the last year of data.

A new section is created for each case, allowing very different configs for the different cases.

  • [[CASEID]] (string): this should be the full case name e.g. [[20180129.DECKv1b_piControl.ne30_oEC.edison]]

  • short_name (string): a nice short name for the case, this can be any string identifier for this case

  • native_grid_name (string): the name of the native grid used in the land and atmospheric components

  • native_mpas_grid_name (string): the name of the mpas grid

  • data_types (list): which data types should be copied for this case, this must include all data_types needed for jobs this case will be running in a space seperated list. Can be set to ‘all’ to mean all data types described in the data_types section.

  • job_types (list): which of the job types should be run on this case. Use the keyword ‘all’ to run all defined jobs on this case.

If running diagnostic jobs, the comparisons key must be included

This is the list of comparisons between for each case. Each case running diagnostics should have an entry here, followed by which other cases it should be compared to. This can include the keywords ‘all’ for all possible comparisons, or ‘obs’ for model-vs-obs comparisons. The ‘all’ keyword will add comparisons with each other case as well as model-vs-obs.

comparisons = obs, case_2
    or
comparisons = case_3
    or
comparisons = all

In this example the following comparison diagnostics jobs will be generated:

  • case_1-vs-obs, case_1-vs-case_2

  • case_2-vs-case_3

  • case_3-vs-case_1, case_3-vs-case_2, case_3-vs-obs

Note how case_2-vs-case_3 and case_3-vs-case_2 were both created, to avoid this case_3 could have been set to: obs, case_1.

example

[simulations]
    start_year = 1
    end_year = 2
    [[20180129.DECKv1b_piControl.ne30_oEC.edison]]
        short_name = piControl
        native_grid_name = ne30
        native_mpas_grid_name = oEC60to30v3
        data_types = all
        job_types = all
        comparisons = obs
    [[20180215.DECKv1b_1pctCO2.ne30_oEC.edison]]
        short_name = 1pctCO2
        native_grid_name = ne30
        native_mpas_grid_name = oEC60to30v3
        data_types = all
        job_types = all
        comparisons = 20180129.DECKv1b_piControl.ne30_oEC.edison
    [[20180215.DECKv1b_abrupt4xCO2.ne30_oEC.edison]]
        short_name = abrupt4xCO2
        native_grid_name = ne30
        native_mpas_grid_name = oEC60to30v3
        data_types = atm, lnd
        job_types = e3sm_diags, amwg, climo
        comparisons = all

Post processing

This section of the config is used to configure all post processing jobs. Supported job types are:

Climo

Produces regridded climatologies using ncclimo. Requires the ‘atm’ data type. Uses the following config options:

  • run_frequency (list): a space sepperated list of integers. This list will be used to generate the job start/end years. For example if you have 50 years of data you could set the run_frequency = 10, 25, 50 and you would get sets from years 1-10, 11-20, 21-30, 31-40, 41-50, then 1-25, 26-50, and finally 1-50.

  • destination_grid_name (string): the name of the output grid. This can be any string identifier, its just used to group the output.

  • regrid_map_path (string): the path on the local file system to a regrid map suitable for your data and desired output map.

example

[post-processing]
    [[climo]]
        run_frequency = 2
        destination_grid_name = fv129x256
        regrid_map_path = /p/cscratch/acme/data/map_ne30np4_to_fv129x256_aave.20150901.nc
        [[[custom_args]]] # OPTIONAL SLURM ARGUMENTS
            --partition = regular
            --account = e3sm

Timeseries

Produces single-variable-per-file timeseries files from monthly model output files. Optionally regrids the timeseries output files.

  • run_frequency (int list): a space sepperated list of integers. This list will be used to generate the job start/end years. For example if you have 50 years of data you could set the run_frequency = 10, 25, 50 and you would get sets from years 1-10, 11-20, 21-30, 31-40, 41-50, then 1-25, 26-50, and finally 1-50.

  • destination_grid_name (string): the name of the output grid. This can be any string identifier, its just used to group the output.

  • regrid_map_path (string): the path on the local file system to a regrid map suitable for your data and desired output map.

  • atm (string list): include this key followed by variable names for each atmospheric variable you would like extracted (remote key to turn off atm timeseries generation)

  • lnd (string list): include this key followed by variable names for each land variable you would like extracted (remote key to turn off lnd timeseries generation)

  • ocn (string list): include this key followed by variable names for each ocean variable you would like extracted (remote key to turn off ocn timeseries generation)

example

[post-processing]
    [[timeseries]]
        run_frequency = 2
        destination_grid_name = fv129x256
        regrid_map_path = /p/cscratch/acme/data/map_ne30np4_to_fv129x256_aave.20150901.nc
        lnd = SOILICE, SOILLIQ, SOILWATER_10CM, QINTR, QOVER, QRUNOFF, QSOIL, QVEGT, TSOI
        atm = FSNTOA, FLUT, FSNT, FLNT, FSNS, FLNS, SHFLX, QFLX, PRECC, PRECL, PRECSC, PRECSL, TS, TREFHT
        [[[custom_args]]] # OPTIONAL SLURM ARGUMENTS
            --partition = regular
            --account = e3sm

Regrid

Translates model output files from one grid into another. Regridding is supported for atm, lnd, and ocn data types. Each regrid type requires its own config section, see example below. To turn off a data type, remove it from the config.

example

[post-processing]
    [[regrid]]
        [[[lnd]]]
            source_grid_path = /export/zender1/data/grids/ne30np4_pentagons.091226.nc
            destination_grid_path = /export/zender1/data/grids/129x256_SCRIP.20150901.nc
            destination_grid_name = fv129x256
        [[[atm]]]
            regrid_map_path = /p/cscratch/acme/data/map_ne30np4_to_fv129x256_aave.20150901.nc
            destination_grid_name = fv129x256
        [[[ocn]]]
            regrid_map_path = ~/grids/map_oEC60to30v3_to_0.5x0.5degree_bilinear.nc
            destination_grid_name = 0.5x0.5degree_bilinear
        [[[custom_args]]] # OPTIONAL SLURM ARGUMENTS
            --partition = regular
            --account = e3sm

Diags

This section contains all config options for diagnostic jobs. Currently supported diagnostics are:

AMWG

The AMWG diagnostic suite needs the ‘atm’ data type, and is dependent on the ‘climo’ job type.

  • run_frequency (list): a comma sepperated list of integers. This list will be used to generate the job start/end years. For example if you have 50 years of data you could set the run_frequency = 10, 25, 50 and you would get sets from years 1-10, 11-20, 21-30, 31-40, 41-50, then 1-25, 26-50, and finally 1-50.

  • diag_home (string): the path to where on the local file system the amwg code is located. All amwg jobs will be executed from this directory.

  • sets (list): the list of AMWG sets to run, or set to ‘all’ to run all sets

example

[diags]
    [[amwg]]
        run_frequency = 2
        diag_home = /p/cscratch/acme/amwg/amwg_diag
        sets = 2, 3, 4, 4a, 5, 6, 15
        [[[custom_args]]] # OPTIONAL SLURM ARGUMENTS
            --partition = regular
            --account = e3sm

e3sm_diags

The e3sm_diags suite needs the ‘atm’ data type, and is dependent on the ‘climo’ job type.

  • run_frequency (list): a comma sepperated list of integers. This list will be used to generate the job start/end years. For example if you have 50 years of data you could set the run_frequency = 10, 25, 50 and you would get sets from years 1-10, 11-20, 21-30, 31-40, 41-50, then 1-25, 26-50, and finally 1-50.

  • backend (string): which graphing backend to use for generating the plots. Supported options are ‘vcs’ and ‘mpl’.

  • reference_data_path (string): path to local copy of reference observational data.

example

[diags]
    [[e3sm_diags]]
        run_frequency = 2
        backend = mpl
        reference_data_path = /p/cscratch/acme/data/obs_for_acme_diags
        [[[custom_args]]] # OPTIONAL SLURM ARGUMENTS
            --partition = regular
            --account = e3sm

Aprime

The aprime diagnostic suite requires the following data types, and is not dependent on any other job types:

  • atm

  • cice

  • cice_restart

  • cice_streams

  • cice_in

  • ocn

  • ocn_restart

  • ocn_streams

  • ocn_in

  • meridionalHeatTransport

To run aprime, your system must have the latest version of the aprime code available. If this is not the case, simply clone the aprime repo.

example

[diags]
    [[aprime]]
        run_frequency = 2
        aprime_code_path = /p/cscratch/acme/data/a-prime
        [[[custom_args]]] # OPTIONAL SLURM ARGUMENTS
            --partition = regular
            --account = e3sm

MPAS-Analysis

For the complete mpas-analysis documentation see the MPAS_Documentation

The MPAS-Analysis diagnostic suite requires the following data types:

  • cice

  • cice_restart

  • cice_streams

  • cice_in

  • ocn

  • ocn_restart

  • ocn_streams

  • ocn_in

  • meridionalHeatTransport

The mpas-analysis job has the following config keys:

  • mapping_directory: this is the path to the directory containing map files.

  • generate_plots: a list of plots to generate

  • start_year_offset, optional: the time series start offset

  • ocn_obs_data_path, optional: if the ocean observations are stored in a custom location

  • seaice_obs_data_path, optional: if the seaice observations are stored in a custom location

  • region_mask_path, optional: if the region masks are stored in a custom location

  • ocean_namelist_name: the filename of the ocean namelist file, typically mpas-o_in or mpaso_in

  • seaice_namelist_name: the filename of the seaice namelist file, typically mpas-cice_in or mpassi_in

example

[[mpas_analysis]]
    mapping_directory = /space2/diagnostics/mpas_analysis/maps
    generate_plots = 'all', 'no_landIceCavities', 'no_eke', 'no_BGC', 'no_icebergs', 'no_min', 'no_max'
    start_year_offset = True
    ocn_obs_data_path = /space2/diagnostics/observations/Ocean/
    seaice_obs_data_path = /space2/diagnostics/observations/SeaIce/
    region_mask_path = /space2/diagnostics/mpas_analysis/region_masks
    ocean_namelist_name = mpaso_in
    seaice_namelist_name = mpassi_in

Data types

The data_types section is the most complex and configurable part of the configuration process. The basic structure is that each sub-section defines a type of data, and then gives information about where to find the data, where to store the data, and what the file names are going to be. The values for each option are templates, which use substitutions to fill out the information at run time. Each substitution is made with values specific to the case the data is being included as part of. The following strings are used for replacement:

  • CASEID: the full name for the case.

  • YEAR: the year of the data

  • MONTH: the month for the data

  • LOCAL_PATH: if defined for the case, the local_path specified in the case definition (config.simulation.case)

  • REMOTE_PATH: if defined for the case, the remote_path from the case definition

  • START_YR: the global start_year

  • END_YR: the global end_year

  • REST_YR: the first year that restart data is available, start_year + 1

  • PROJECT_PATH: the global project_path

These are simply the defaults available for all cases, you can define your own substituions on a case-by-case basis by including the keyword and value in the case definition.

The values for each data type are by default the same for every case, but case specific definitions can be added by creating a new section inside the data type section with the case name. In this example, the my.case.1 remote_path option over rides the default value, and includes a custom substitution keyword. Note that the keyword when defined must be lower case, but when used in the data_type value must be upper case.

[simulations]
    start_year = 1
    end_year = 2
    [[my.case.1]]
        my_custom_keyword = 'isnt-this-nice'
        remote_path = /export/my_user/model_output/my_case
    [[my.case.2]]
        remote_path = /export/my_user/model_output/my_second-case

[data_types]
    [[some_data_type]]
        remote_path = 'REMOTE_PATH/archive/custom_component/hist'
        file_format = 'CASEID.custom.value.YEAR-MONTH.nc'
        local_path  = '/my/local/path/'
        monthly = True
        [[[my.case.1]]]
            remote_path = 'REMOTE_PATH/MY_CUSTOM_KEYWORD/CASEID'

In the below example, all data types are defined for a case that uses short-term-archiving (note the /archive/atm/hist). The atm and lnd types have been defined for the 20180215.DECKv1b_abrupt4xCO2.ne30_oEC.edison case to NOT use short term archiving. For these two data types, the case is expected to use the standard everything-in-the-run-directory method. Note the local_path = ‘LOCAL_PATH/atm’

example

[data_types]
    [[atm]]
        file_format = CASEID.cam.h0.YEAR-MONTH.nc
        local_path = PROJECT_PATH/input/CASEID/atm
        monthly = True
        [[[20180215.DECKv1b_abrupt4xCO2.ne30_oEC.edison]]]
            local_path = LOCAL_PATH/atm
    [[lnd]]
        file_format = CASEID.clm2.h0.YEAR-MONTH.nc
        local_path = PROJECT_PATH/input/CASEID/lnd
        monthly = True
        [[[20180215.DECKv1b_abrupt4xCO2.ne30_oEC.edison]]]
            local_path = LOCAL_PATH/lnd
    [[cice]]
        file_format = mpascice.hist.am.timeSeriesStatsMonthly.YEAR-MONTH-01.nc
        local_path = PROJECT_PATH/input/CASEID/ice
        monthly = True
    [[ocn]]
        file_format = mpaso.hist.am.timeSeriesStatsMonthly.YEAR-MONTH-01.nc
        local_path = PROJECT_PATH/input/CASEID/ocn
        monthly = True
    [[ocn_restart]]
        file_format = mpaso.rst.REST_YR-01-01_00000.nc
        local_path = PROJECT_PATH/input/CASEID/rest
        monthly = False
    [[cice_restart]]
        file_format = mpascice.rst.REST_YR-01-01_00000.nc
        local_path = PROJECT_PATH/input/CASEID/rest
        monthly = False
    [[ocn_streams]]
        file_format = streams.ocean
        local_path = PROJECT_PATH/input/CASEID/mpas
        monthly = False
    [[cice_streams]]
        file_format = streams.cice
        local_path = PROJECT_PATH/input/CASEID/mpas
        monthly = False
    [[ocn_in]]
        file_format = mpas-o_in
        local_path = PROJECT_PATH/input/CASEID/mpas
        monthly = False
    [[cice_in]]
        file_format = mpas-cice_in
        local_path = PROJECT_PATH/input/CASEID/mpas
        monthly = False
    [[meridionalHeatTransport]]
        file_format = mpaso.hist.am.meridionalHeatTransport.START_YR-02-01.nc
        local_path = PROJECT_PATH/input/CASEID/mpas
        monthly = False