Input validation

acud input data validation functionality runs two types of input data checks:

  • yaml (files) validation
  • csv (files) validation

yaml validation covers both run and model configuration files and runs checks such as:

  • Can the run and model configuration yaml files be found?
  • Do the configuration yaml files include all required fields
  • Are field values provided of the right type and within allowed values?

csv validation performs the following checks on input csv files:

  • Structural checks - ensure that there are no empty rows, no blank headers, etc.
  • Content checks - ensure that the values have the correct types (“string”, “number”, etc.), that their values are allowed (“datetime format must be either dtname or ymdp”), and that they respect the constraints (“forced outage must be a number greater than 0”).

Basic usage

From the command line, change the current directory to the case folder you want to inspect:

cd my_case_folder

Validate yaml configuration files only:

ad validate RUN

where RUN stands for the name of the run configuration yaml file (including the file extension, e.g. run_base_case.yml).

Run the yaml validation and display the name of the configuration files that were inspected:

ad validate RUN --verbose

Run both yaml and csv validations:

ad validate RUN --all

or just:

ad validate RUN -a

Again, display the names of the inspected input data files:

ad validate RUN --all --verbose

Built-in schema files contain the input data validation rules applied for both yaml and csv files, e.g. whether a field is required or not, a field´s name, type, allowed values, etc.

List the built-in schema files:

ad schema -l ACME

ACME above stands for just any string (but you do need to provide one).

Display a built-in schema specification, e.g. for generators:

ad schema generator

If you find the schema difficult to read on screen you can write it to a file for easier inspection like this: ad schema generator > check_schema.txt

This will write a text file named check_schema.txt into your current directory.

List all available acud commands and a brief description of each one of them:

ad --help

Get help about the new validation commands:

ad validate --help

or:

ad schema --help

Important usage notes

Datetime formats

run yaml configuration files includes fields startperiod and endperiod expecting datetime values. There is only one datetime format allowed to specify these fields in the run configuration yaml file:

%Y-%m-%d %H:%M

This is a slightly modified version of the ISO8601 standard datetime format: we drop the seconds, the time zone and replace the ‘T’ character separating date and time with a blank space. This means that in the run configuration yaml file January, 2, 2018 at 11pm looks like this:

2018-01-02 23:00

On the other hand, multiple datetime formats are allowed in the timeseries csv files. These formats need to be specified by the user in the corresponding timeseries datetimeformat field included in the model configuration yaml file.

Datetime column position

Timeseries csv files with dtname format MUST include the datetime field in the leftmost column.

Implementation

Most of the heavylifting of the acud input data validation is performed by two mature and well maintained third-party Python packages, namely Cerberus and goodtables-py. Cerberus is a lightweight, extensible data validation library for Python. goodtables-py is a Python framework to validate tabular data.

More information about these packages can be found at http://docs.python-cerberus.org/en/stable/ and https://github.com/frictionlessdata/goodtables-py, respectively. For additional information on table schemas also check https://frictionlessdata.io/specs/table-schema.