
Configuration of the dataset


Loading configuration functions


def load_yaml_configuration(filename: Union[str, Pathy]) -> Configuration

Load a yaml file which has a configuration in it


  • filename - the file name that you want to load. Will load from local, AWS, or GCP depending on the protocol suffix (e.g. 's3://bucket/config.yaml').

Returns:pydantic class

Save functions for the configuration model


def save_yaml_configuration(configuration: Configuration, filename: Optional[Union[str, Pathy]] = None)

Save a local yaml file which has the a configuration in it.

If filename is None then saves to configuration.output_data.filepath / configuration.yaml.

Will save to GCP, AWS, or local, depending on the protocol suffix of filepath.


Configuration model for the dataset.

All paths must include the protocol prefix. For local files, it's sufficient to just start with a '/'. For aws, start with 's3://', for gcp start with 'gs://'.

This file is mostly about configuring the DataSources.

Separate Pydantic models in nowcasting_dataset/data_sources/<data_source_name>/<data_source_name> are used to validate the values of the data itself.

General Objects

class General(BaseModel)

General pydantic model

Git Objects

class Git(BaseModel)

Git model

DataSourceMixin Objects

class DataSourceMixin(BaseModel)

Mixin class, to add forecast and history minutes


def seq_length_30_minutes()

How many steps are there in 30 minute datasets


def seq_length_5_minutes()

How many steps are there in 5 minute datasets


def seq_length_60_minutes()

How many steps are there in 60 minute datasets


def history_seq_length_5_minutes()

How many historical steps are there in 5 minute datasets


def history_seq_length_30_minutes()

How many historical steps are there in 30 minute datasets


def history_seq_length_60_minutes()

How many historical steps are there in 60 minute datasets

StartEndDatetimeMixin Objects

class StartEndDatetimeMixin(BaseModel)

Mixin class to add start and end date


def check_start_and_end_datetime(cls, values)

Make sure start datetime is before end datetime

PV Objects

class PV(DataSourceMixin,  StartEndDatetimeMixin)

PV configuration model

Satellite Objects

class Satellite(DataSourceMixin)

Satellite configuration model

HRVSatellite Objects

class HRVSatellite(DataSourceMixin)

Satellite configuration model for HRV data

OpticalFlow Objects

class OpticalFlow(DataSourceMixin)

Optical Flow configuration model

NWP Objects

class NWP(DataSourceMixin)

NWP configuration model

GSP Objects

class GSP(DataSourceMixin,  StartEndDatetimeMixin)

GSP configuration model


def history_minutes_divide_by_30(cls, v)

Validate 'history_minutes'


def forecast_minutes_divide_by_30(cls, v)

Validate 'forecast_minutes'

Topographic Objects

class Topographic(DataSourceMixin)

Topographic configuration model

Sun Objects

class Sun(DataSourceMixin)

Sun configuration model

InputData Objects

class InputData(BaseModel)

Input data model.


def default_seq_length_5_minutes()

How many steps are there in 5 minute datasets


def set_forecast_and_history_minutes(cls, values)

Set default history and forecast values, if needed.

Run through the different data sources and if the forecast or history minutes are not set, then set them to the default values


def set_all_to_defaults(cls)

Returns an InputData instance with all fields set to their default values.

Used for unittests.

OutputData Objects

class OutputData(BaseModel)

Output data model


def filepath_pathy(cls, v)

Make sure filepath is a Pathy object

Process Objects

class Process(BaseModel)

Pydantic model of how the data is processed


def local_temp_path_to_path_object_expanduser(cls, v)

Convert the local path to Path

Convert the path in string format to a pathlib.PosixPath object and call expanduser on the latter.

Configuration Objects

class Configuration(BaseModel)

Configuration model for the dataset


def set_base_path(base_path: str)

Append base_path to all paths. Mostly used for testing.


def set_git_commit(configuration: Configuration)

Set the git information in the configuration file


  • configuration - configuration object

  • Returns - configuration object with git information