data_sources.gsp.gsp_data_source
GSP Data Source. GSP - Grid Supply Points
Read more https://data.nationalgrideso.com/system/gis-boundaries-for-gb-grid-supply-points
GSPDataSource Objects
@dataclass
class GSPDataSource(ImageDataSource)
Data source for GSP (Grid Supply Point) PV Data.
30 mins data is taken from 'PV Live' from https://www.solar.sheffield.ac.uk/pvlive/ meta data is taken from ESO. PV Live estimates the total PV power generation for each Grid Supply Point region.
Even though GSP data isn't image data, GSPDataSource
inherits from ImageDataSource
so it can select Grid Supply Point regions within the geospatial region of interest.
The region of interest is defined by image_size_pixels
and meters_per_pixel
.
__post_init__
def __post_init__(image_size_pixels_height: int, image_size_pixels_width: int,
meters_per_pixel: int)
Set random seed and load data
check_input_paths_exist
def check_input_paths_exist() -> None
Check input paths exist. If not, raise a FileNotFoundError.
sample_period_minutes
@property
def sample_period_minutes() -> int
Override the default sample minutes
get_data_model_for_batch
@staticmethod
def get_data_model_for_batch()
Get the model that is used in the batch
load
def load()
Load the meta data and load the GSP power data
datetime_index
def datetime_index()
Return the datetimes that are available
get_number_locations
def get_number_locations()
Get the number of GSP
get_all_locations
def get_all_locations(
t0_datetimes_utc: pd.DatetimeIndex) -> List[SpaceTimeLocation]
Make locations for all GSP
For some datetimes, return locations of all datetimes and all GSPs. This means a national forecast can then be made
Arguments:
t0_datetimes_utc
- list of available t0 datetimes.
Returns:
List of space time locations which includes 1. datetimes 2. x locations 3. y locations 4. gsp ids
get_locations
def get_locations(
t0_datetimes_utc: pd.DatetimeIndex) -> List[SpaceTimeLocation]
Get x and y locations. Assume that all data is available for all GSP.
Random GSP are taken, and the locations of them are returned. This is useful as other datasources need to know which x,y locations to get.
Arguments:
-
t0_datetimes_utc
- list of available t0 datetimes. -
Returns
- list of location objects
get_example
def get_example(location: SpaceTimeLocation) -> xr.Dataset
Get data example from one time point (t0_dt) and for x and y coords.
Get data at the location of x,y and get surrounding GSP power data also.
Arguments:
location
- A location object of the example which contains- a timestamp of the example (t0_datetime_utc),
- the x center location of the example (x_location_osgb)
-
the y center location of the example(y_location_osgb)
-
Returns
- Dictionary with GSP data in it.
drop_gsp_by_threshold
def drop_gsp_by_threshold(
gsp_power: pd.DataFrame,
meta_data: pd.DataFrame,
threshold_mw: int = 20) -> tuple[pd.DataFrame, pd.DataFrame]
Drop GSP where the max power is below a certain threshold
Arguments:
gsp_power
- GSP power datameta_data
- the GSP meta data-
threshold_mw
- the threshold where we only taken GSP with a maximum power, above (or equal) this value. -
Returns
- power data and metadata
drop_gsp_north_of_boundary
def drop_gsp_north_of_boundary(
gsp_power: pd.DataFrame, meta_data: pd.DataFrame,
northern_boundary_osgb: int) -> tuple[pd.DataFrame, pd.DataFrame]
Drop GSPs north of northern_boundary_osgb.
Arguments:
gsp_power
- GSP power datameta_data
- the GSP meta data-
northern_boundary_osgb
- The geospatial boundary. -
Returns
- power data and metadata
load_solar_gsp_data
def load_solar_gsp_data(
zarr_path: Union[str, Path],
start_dt: Optional[datetime] = None,
end_dt: Optional[datetime] = None) -> (pd.DataFrame, pd.DataFrame)
Load solar PV GSP data
Arguments:
zarr_path
- zarr_path of file to be loaded, can put 'gs://' files in here toostart_dt
- the start datetime, which to trim the data to-
end_dt
- the end datetime, which to trim the data to -
Returns
- dataframe of gsp data
data_sources.gsp.live
Function to get data from live database
get_gsp_power_from_database
def get_gsp_power_from_database(
history_duration: timedelta, interpolate_minutes: int,
load_extra_minutes: int) -> (pd.DataFrame, pd.DataFrame)
Get gsp power from database
Arguments:
history_duration
- a timedelta of how many minutes to load in the pastinterpolate_minutes
- how many minutes we should interpolate the data froward forload_extra_minutes
- the extra minutes we should load, in order to load more data. This is because some data from a site lags significantly behind 'now'
Returns:pandas data frame with the following columns pv systems indexes The index is the datetime
data_sources.gsp.pvlive
Functions used to query the PVlive api
load_pv_gsp_raw_data_from_pvlive
def load_pv_gsp_raw_data_from_pvlive(
start: datetime,
end: datetime,
number_of_gsp: int = None,
normalize_data: bool = True) -> pd.DataFrame
Load raw pv gsp data from pvlive.
Note that each gsp is loaded separately. Also the data is loaded in 30 day chunks.
Arguments:
start
- the start date for gsp data to loadend
- the end date for gsp data to loadnumber_of_gsp
- The number of gsp to load. Note that on 2021-09-01 there were 338 to load.-
normalize_data
- Option to normalize the generation according to installed capacity -
Returns
- Data frame of time series of gsp data. Shows PV data for each GSP from {start} to {end}
get_installed_capacity
def get_installed_capacity(
start: Optional[datetime] = datetime(2021, 1, 1, tzinfo=pytz.utc),
maximum_number_of_gsp: Optional[int] = None) -> pd.Series
Get the installed capacity of each gsp
This can take ~30 seconds for getting the full list
Arguments:
start
- optional datetime when the installed cpapcity is collected-
maximum_number_of_gsp
- Truncate list of GSPs to be no larger than this number of GSPs. Set to None to disable truncation. -
Returns
- pd.Series of installed capacity indexed by gsp_id
data_sources.gsp
GSP data sources and functions
data_sources.gsp.gsp_model
Model for output of GSP data
GSP Objects
class GSP(DataSourceOutput)
Class to store GSP data as a xr.Dataset with some validation
model_validation
@classmethod
def model_validation(cls, v)
Check that all values are non NaNs
power_normalized
@property
def power_normalized()
Normalized power
data_sources.gsp.eso
This file has a few functions that are used to get GSP (Grid Supply Point) information
The info comes from National Grid ESO.
ESO - Electricity System Operator. General information can be found here - https://data.nationalgrideso.com/system/gis-boundaries-for-gb-grid-supply-points
get_gsp_metadata_from_eso: gets the gsp metadata get_gsp_shape_from_eso: gets the shape of the gsp regions get_list_of_gsp_ids: gets a list of gsp_ids, by using 'get_gsp_metadata_from_eso'
Peter Dudfield 2021-09-13
get_gsp_metadata_from_eso
def get_gsp_metadata_from_eso(calculate_centroid: bool = True,
load_local_file: bool = True,
save_local_file: bool = False) -> pd.DataFrame
Get the metadata for the gsp, from ESO.
Arguments:
calculate_centroid
- Load the shape file also, and calculate the Centroidload_local_file
- Load from a local file, not from ESO-
save_local_file
- Save to a local file, only need to do this is Data is updated. -
Returns
- Dataframe of ESO Metadata
get_gsp_shape_from_eso
def get_gsp_shape_from_eso(join_duplicates: bool = True,
load_local_file: bool = True,
save_local_file: bool = False) -> gpd.GeoDataFrame
Get the the gsp shape file from ESO (or a local file)
Arguments:
join_duplicates
- If True, any RegionIDs which have multiple entries, will be joined together to give one entry.load_local_file
- Load from a local file, not from ESO-
save_local_file
- Save to a local file, only need to do this is Data is updated. -
Returns
- Geo Pandas dataframe of GSP shape data
get_list_of_gsp_ids
def get_list_of_gsp_ids(
maximum_number_of_gsp: Optional[int] = None) -> List[int]
Get list of gsp ids from ESO metadata
Arguments:
-
maximum_number_of_gsp
- Truncate list of GSPs to be no larger than this number of GSPs. Set to None to disable truncation. -
Returns
- list of gsp ids