data_sources.gsp

GSP data sources and functions

data_sources.gsp.eso

This file has a few functions that are used to get GSP (Grid Supply Point) information from National Grid ESO.

ESO - Electricity System Operator. General information can be found here - https://data.nationalgrideso.com/system/gis-boundaries-for-gb-grid-supply-points

get_gsp_metadata_from_eso: gets the gsp metadata get_gsp_shape_from_eso: gets the shape of the gsp regions get_list_of_gsp_ids: gets a list of gsp_ids, by using 'get_gsp_metadata_from_eso'

Peter Dudfield 2021-09-13

get_gsp_metadata_from_eso

def get_gsp_metadata_from_eso(calculate_centroid: bool = True) -> pd.DataFrame

Get the metadata for the gsp, from ESO.

Arguments:

  • calculate_centroid - Load the shape file also, and calculate the Centroid

  • Returns - Dataframe of ESO Metadata

get_gsp_shape_from_eso

def get_gsp_shape_from_eso(join_duplicates: bool = True, load_local_file: bool = True, save_local_file: bool = False) -> gpd.GeoDataFrame

Get the the gsp shape file from ESO (or a local file)

Arguments:

  • join_duplicates - If True, any RegionIDs which have multiple entries, will be joined together to give one entry
  • load_local_file - Load from a local file, not from ESO
  • save_local_file - Save to a local file, only need to do this is Data is updated.

  • Returns - Geo Pandas dataframe of GSP shape data

get_list_of_gsp_ids

def get_list_of_gsp_ids(maximum_number_of_gsp: Optional[int] = None) -> List[int]

Get list of gsp ids from ESO metadata

Arguments:

  • maximum_number_of_gsp - Truncate list of GSPs to be no larger than this number of GSPs. Set to None to disable truncation.

  • Returns - list of gsp ids

data_sources.gsp.gsp_data_source

GSP Data Source. GSP - Grid Supply Points

Read more https://data.nationalgrideso.com/system/gis-boundaries-for-gb-grid-supply-points

GSPDataSource Objects

@dataclass
class GSPDataSource(ImageDataSource)

Data source for GSP (Grid Supply Point) PV Data.

30 mins data is taken from 'PV Live' from https://www.solar.sheffield.ac.uk/pvlive/ meta data is taken from ESO. PV Live estimates the total PV power generation for each Grid Supply Point region.

Even though GSP data isn't image data, GSPDataSource inherits from ImageDataSource so it can select Grid Supply Point regions within the geospatial region of interest. The region of interest is defined by image_size_pixels and meters_per_pixel.

__post_init__

def __post_init__(image_size_pixels: int, meters_per_pixel: int)

Set random seed and load data

sample_period_minutes

@property
def sample_period_minutes() -> int

Override the default sample minutes

load

def load()

Load the meta data and load the GSP power data

datetime_index

def datetime_index()

Return the datetimes that are available

get_locations

def get_locations(t0_datetimes: pd.DatetimeIndex) -> Tuple[List[Number], List[Number]]

Get x and y locations. Assume that all data is available for all GSP.

Random GSP are taken, and the locations of them are returned. This is useful as other datasources need to know which x,y locations to get.

Arguments:

  • t0_datetimes - list of available t0 datetimes.

  • Returns - list of x and y locations

get_example

def get_example(t0_dt: pd.Timestamp, x_meters_center: Number, y_meters_center: Number) -> GSP

Get data example from one time point (t0_dt) and for x and y coords.

Get data at the location of x,y and get surrounding GSP power data also.

Arguments:

  • t0_dt - datetime of "now". History and forecast are also returned
  • x_meters_center - x location of center GSP.
  • y_meters_center - y location of center GSP.

  • Returns - Dictionary with GSP data in it.

drop_gsp_by_threshold

def drop_gsp_by_threshold(gsp_power: pd.DataFrame, meta_data: pd.DataFrame, threshold_mw: int = 20)

Drop GSP where the max power is below a certain threshold

Arguments:

  • gsp_power - GSP power data
  • meta_data - the GSP meta data
  • threshold_mw - the threshold where we only taken GSP with a maximum power, above this value.

  • Returns - power data and metadata

load_solar_gsp_data

def load_solar_gsp_data(filename: Union[str, Path], start_dt: Optional[datetime] = None, end_dt: Optional[datetime] = None) -> pd.DataFrame

Load solar PV GSP data

Arguments:

  • filename - filename of file to be loaded, can put 'gs://' files in here too
  • start_dt - the start datetime, which to trim the data to
  • end_dt - the end datetime, which to trim the data to

  • Returns - dataframe of pv data

data_sources.gsp.gsp_model

Model for output of GSP data

GSP Objects

class GSP(DataSourceOutput)

Class to store GSP data as a xr.Dataset with some validation

model_validation

@classmethod
def model_validation(cls, v)

Check that all values are non NaNs

data_sources.gsp.pvlive

Functions used to query the PVlive api

load_pv_gsp_raw_data_from_pvlive

def load_pv_gsp_raw_data_from_pvlive(start: datetime, end: datetime, number_of_gsp: int = None, normalize_data: bool = True) -> pd.DataFrame

Load raw pv gsp data from pvlive. Note that each gsp is loaded separately. Also the data is loaded in 30 day chunks.

Arguments:

  • start - the start date for gsp data to load
  • end - the end date for gsp data to load
  • number_of_gsp - The number of gsp to load. Note that on 2021-09-01 there were 338 to load.
  • normalize_data - Option to normalize the generation according to installed capacity

  • Returns - Data frame of time series of gsp data. Shows PV data for each GSP from {start} to {end}

get_installed_capacity

def get_installed_capacity(start: Optional[datetime] = datetime(2021, 1, 1, tzinfo=pytz.utc), maximum_number_of_gsp: Optional[int] = None) -> pd.Series

Get the installed capacity of each gsp

This can take ~30 seconds for getting the full list

Arguments:

  • start - optional datetime when the installed cpapcity is collected
  • maximum_number_of_gsp - Truncate list of GSPs to be no larger than this number of GSPs. Set to None to disable truncation.

  • Returns - pd.Series of installed capacity indexed by gsp_id