data_sources.gsp.gsp_data_source

GSP Data Source. GSP - Grid Supply Points

Read more https://data.nationalgrideso.com/system/gis-boundaries-for-gb-grid-supply-points

GSPDataSource Objects

@dataclass
class GSPDataSource(ImageDataSource)

Data source for GSP (Grid Supply Point) PV Data.

30 mins data is taken from 'PV Live' from https://www.solar.sheffield.ac.uk/pvlive/ meta data is taken from ESO. PV Live estimates the total PV power generation for each Grid Supply Point region.

Even though GSP data isn't image data, GSPDataSource inherits from ImageDataSource so it can select Grid Supply Point regions within the geospatial region of interest. The region of interest is defined by image_size_pixels and meters_per_pixel.

__post_init__

def __post_init__(image_size_pixels_height: int, image_size_pixels_width: int,
                  meters_per_pixel: int)

Set random seed and load data

check_input_paths_exist

def check_input_paths_exist() -> None

Check input paths exist. If not, raise a FileNotFoundError.

sample_period_minutes

@property
def sample_period_minutes() -> int

Override the default sample minutes

get_data_model_for_batch

@staticmethod
def get_data_model_for_batch()

Get the model that is used in the batch

load

def load()

Load the meta data and load the GSP power data

datetime_index

def datetime_index()

Return the datetimes that are available

get_number_locations

def get_number_locations()

Get the number of GSP

get_all_locations

def get_all_locations(
        t0_datetimes_utc: pd.DatetimeIndex) -> List[SpaceTimeLocation]

Make locations for all GSP

For some datetimes, return locations of all datetimes and all GSPs. This means a national forecast can then be made

Arguments:

  • t0_datetimes_utc - list of available t0 datetimes.

Returns:

List of space time locations which includes 1. datetimes 2. x locations 3. y locations 4. gsp ids

get_locations

def get_locations(
        t0_datetimes_utc: pd.DatetimeIndex) -> List[SpaceTimeLocation]

Get x and y locations. Assume that all data is available for all GSP.

Random GSP are taken, and the locations of them are returned. This is useful as other datasources need to know which x,y locations to get.

Arguments:

  • t0_datetimes_utc - list of available t0 datetimes.

  • Returns - list of location objects

get_example

def get_example(location: SpaceTimeLocation) -> xr.Dataset

Get data example from one time point (t0_dt) and for x and y coords.

Get data at the location of x,y and get surrounding GSP power data also.

Arguments:

  • location - A location object of the example which contains
  • a timestamp of the example (t0_datetime_utc),
  • the x center location of the example (x_location_osgb)
  • the y center location of the example(y_location_osgb)

  • Returns - Dictionary with GSP data in it.

drop_gsp_by_threshold

def drop_gsp_by_threshold(
        gsp_power: pd.DataFrame,
        meta_data: pd.DataFrame,
        threshold_mw: int = 20) -> tuple[pd.DataFrame, pd.DataFrame]

Drop GSP where the max power is below a certain threshold

Arguments:

  • gsp_power - GSP power data
  • meta_data - the GSP meta data
  • threshold_mw - the threshold where we only taken GSP with a maximum power, above (or equal) this value.

  • Returns - power data and metadata

drop_gsp_north_of_boundary

def drop_gsp_north_of_boundary(
        gsp_power: pd.DataFrame, meta_data: pd.DataFrame,
        northern_boundary_osgb: int) -> tuple[pd.DataFrame, pd.DataFrame]

Drop GSPs north of northern_boundary_osgb.

Arguments:

  • gsp_power - GSP power data
  • meta_data - the GSP meta data
  • northern_boundary_osgb - The geospatial boundary.

  • Returns - power data and metadata

load_solar_gsp_data

def load_solar_gsp_data(
        zarr_path: Union[str, Path],
        start_dt: Optional[datetime] = None,
        end_dt: Optional[datetime] = None) -> (pd.DataFrame, pd.DataFrame)

Load solar PV GSP data

Arguments:

  • zarr_path - zarr_path of file to be loaded, can put 'gs://' files in here too
  • start_dt - the start datetime, which to trim the data to
  • end_dt - the end datetime, which to trim the data to

  • Returns - dataframe of gsp data

data_sources.gsp.live

Function to get data from live database

get_gsp_power_from_database

def get_gsp_power_from_database(
        history_duration: timedelta, interpolate_minutes: int,
        load_extra_minutes: int) -> (pd.DataFrame, pd.DataFrame)

Get gsp power from database

Arguments:

  • history_duration - a timedelta of how many minutes to load in the past
  • interpolate_minutes - how many minutes we should interpolate the data froward for
  • load_extra_minutes - the extra minutes we should load, in order to load more data. This is because some data from a site lags significantly behind 'now'

Returns:pandas data frame with the following columns pv systems indexes The index is the datetime

data_sources.gsp.pvlive

Functions used to query the PVlive api

load_pv_gsp_raw_data_from_pvlive

def load_pv_gsp_raw_data_from_pvlive(
        start: datetime,
        end: datetime,
        number_of_gsp: int = None,
        normalize_data: bool = True) -> pd.DataFrame

Load raw pv gsp data from pvlive.

Note that each gsp is loaded separately. Also the data is loaded in 30 day chunks.

Arguments:

  • start - the start date for gsp data to load
  • end - the end date for gsp data to load
  • number_of_gsp - The number of gsp to load. Note that on 2021-09-01 there were 338 to load.
  • normalize_data - Option to normalize the generation according to installed capacity

  • Returns - Data frame of time series of gsp data. Shows PV data for each GSP from {start} to {end}

get_installed_capacity

def get_installed_capacity(
        start: Optional[datetime] = datetime(2021, 1, 1, tzinfo=pytz.utc),
        maximum_number_of_gsp: Optional[int] = None) -> pd.Series

Get the installed capacity of each gsp

This can take ~30 seconds for getting the full list

Arguments:

  • start - optional datetime when the installed cpapcity is collected
  • maximum_number_of_gsp - Truncate list of GSPs to be no larger than this number of GSPs. Set to None to disable truncation.

  • Returns - pd.Series of installed capacity indexed by gsp_id

data_sources.gsp

GSP data sources and functions

data_sources.gsp.gsp_model

Model for output of GSP data

GSP Objects

class GSP(DataSourceOutput)

Class to store GSP data as a xr.Dataset with some validation

model_validation

@classmethod
def model_validation(cls, v)

Check that all values are non NaNs

power_normalized

@property
def power_normalized()

Normalized power

data_sources.gsp.eso

This file has a few functions that are used to get GSP (Grid Supply Point) information

The info comes from National Grid ESO.

ESO - Electricity System Operator. General information can be found here - https://data.nationalgrideso.com/system/gis-boundaries-for-gb-grid-supply-points

get_gsp_metadata_from_eso: gets the gsp metadata get_gsp_shape_from_eso: gets the shape of the gsp regions get_list_of_gsp_ids: gets a list of gsp_ids, by using 'get_gsp_metadata_from_eso'

Peter Dudfield 2021-09-13

get_gsp_metadata_from_eso

def get_gsp_metadata_from_eso(calculate_centroid: bool = True,
                              load_local_file: bool = True,
                              save_local_file: bool = False) -> pd.DataFrame

Get the metadata for the gsp, from ESO.

Arguments:

  • calculate_centroid - Load the shape file also, and calculate the Centroid
  • load_local_file - Load from a local file, not from ESO
  • save_local_file - Save to a local file, only need to do this is Data is updated.

  • Returns - Dataframe of ESO Metadata

get_gsp_shape_from_eso

def get_gsp_shape_from_eso(join_duplicates: bool = True,
                           load_local_file: bool = True,
                           save_local_file: bool = False) -> gpd.GeoDataFrame

Get the the gsp shape file from ESO (or a local file)

Arguments:

  • join_duplicates - If True, any RegionIDs which have multiple entries, will be joined together to give one entry.
  • load_local_file - Load from a local file, not from ESO
  • save_local_file - Save to a local file, only need to do this is Data is updated.

  • Returns - Geo Pandas dataframe of GSP shape data

get_list_of_gsp_ids

def get_list_of_gsp_ids(
        maximum_number_of_gsp: Optional[int] = None) -> List[int]

Get list of gsp ids from ESO metadata

Arguments:

  • maximum_number_of_gsp - Truncate list of GSPs to be no larger than this number of GSPs. Set to None to disable truncation.

  • Returns - list of gsp ids