data_sources.gsp
GSP data sources and functions
data_sources.gsp.eso
This file has a few functions that are used to get GSP (Grid Supply Point) information
The info comes from National Grid ESO.
ESO - Electricity System Operator. General information can be found here - https://data.nationalgrideso.com/system/gis-boundaries-for-gb-grid-supply-points
get_gsp_metadata_from_eso: gets the gsp metadata get_gsp_shape_from_eso: gets the shape of the gsp regions get_list_of_gsp_ids: gets a list of gsp_ids, by using 'get_gsp_metadata_from_eso'
Peter Dudfield 2021-09-13
get_gsp_metadata_from_eso
def get_gsp_metadata_from_eso(calculate_centroid: bool = True) -> pd.DataFrame
Get the metadata for the gsp, from ESO.
Arguments:
-
calculate_centroid
- Load the shape file also, and calculate the Centroid -
Returns
- Dataframe of ESO Metadata
get_gsp_shape_from_eso
def get_gsp_shape_from_eso(join_duplicates: bool = True, load_local_file: bool = True, save_local_file: bool = False) -> gpd.GeoDataFrame
Get the the gsp shape file from ESO (or a local file)
Arguments:
join_duplicates
- If True, any RegionIDs which have multiple entries, will be joined together to give one entry.load_local_file
- Load from a local file, not from ESO-
save_local_file
- Save to a local file, only need to do this is Data is updated. -
Returns
- Geo Pandas dataframe of GSP shape data
get_list_of_gsp_ids
def get_list_of_gsp_ids(maximum_number_of_gsp: Optional[int] = None) -> List[int]
Get list of gsp ids from ESO metadata
Arguments:
-
maximum_number_of_gsp
- Truncate list of GSPs to be no larger than this number of GSPs. Set to None to disable truncation. -
Returns
- list of gsp ids
data_sources.gsp.gsp_data_source
GSP Data Source. GSP - Grid Supply Points
Read more https://data.nationalgrideso.com/system/gis-boundaries-for-gb-grid-supply-points
GSPDataSource Objects
@dataclass
class GSPDataSource(ImageDataSource)
Data source for GSP (Grid Supply Point) PV Data.
30 mins data is taken from 'PV Live' from https://www.solar.sheffield.ac.uk/pvlive/ meta data is taken from ESO. PV Live estimates the total PV power generation for each Grid Supply Point region.
Even though GSP data isn't image data, GSPDataSource
inherits from ImageDataSource
so it can select Grid Supply Point regions within the geospatial region of interest.
The region of interest is defined by image_size_pixels
and meters_per_pixel
.
__post_init__
def __post_init__(image_size_pixels: int, meters_per_pixel: int)
Set random seed and load data
check_input_paths_exist
def check_input_paths_exist() -> None
Check input paths exist. If not, raise a FileNotFoundError.
sample_period_minutes
@property
def sample_period_minutes() -> int
Override the default sample minutes
load
def load()
Load the meta data and load the GSP power data
datetime_index
def datetime_index()
Return the datetimes that are available
get_locations
def get_locations(t0_datetimes: pd.DatetimeIndex) -> Tuple[List[Number], List[Number]]
Get x and y locations. Assume that all data is available for all GSP.
Random GSP are taken, and the locations of them are returned. This is useful as other datasources need to know which x,y locations to get.
Arguments:
-
t0_datetimes
- list of available t0 datetimes. -
Returns
- list of x and y locations
get_example
def get_example(t0_dt: pd.Timestamp, x_meters_center: Number, y_meters_center: Number) -> GSP
Get data example from one time point (t0_dt) and for x and y coords.
Get data at the location of x,y and get surrounding GSP power data also.
Arguments:
t0_dt
- datetime of "now". History and forecast are also returnedx_meters_center
- x location of center GSP.-
y_meters_center
- y location of center GSP. -
Returns
- Dictionary with GSP data in it.
drop_gsp_by_threshold
def drop_gsp_by_threshold(gsp_power: pd.DataFrame, meta_data: pd.DataFrame, threshold_mw: int = 20)
Drop GSP where the max power is below a certain threshold
Arguments:
gsp_power
- GSP power datameta_data
- the GSP meta data-
threshold_mw
- the threshold where we only taken GSP with a maximum power, above this value. -
Returns
- power data and metadata
load_solar_gsp_data
def load_solar_gsp_data(zarr_path: Union[str, Path], start_dt: Optional[datetime] = None, end_dt: Optional[datetime] = None) -> pd.DataFrame
Load solar PV GSP data
Arguments:
zarr_path
- zarr_path of file to be loaded, can put 'gs://' files in here toostart_dt
- the start datetime, which to trim the data to-
end_dt
- the end datetime, which to trim the data to -
Returns
- dataframe of pv data
data_sources.gsp.gsp_model
Model for output of GSP data
GSP Objects
class GSP(DataSourceOutput)
Class to store GSP data as a xr.Dataset with some validation
model_validation
@classmethod
def model_validation(cls, v)
Check that all values are non NaNs
data_sources.gsp.pvlive
Functions used to query the PVlive api
load_pv_gsp_raw_data_from_pvlive
def load_pv_gsp_raw_data_from_pvlive(start: datetime, end: datetime, number_of_gsp: int = None, normalize_data: bool = True) -> pd.DataFrame
Load raw pv gsp data from pvlive.
Note that each gsp is loaded separately. Also the data is loaded in 30 day chunks.
Arguments:
start
- the start date for gsp data to loadend
- the end date for gsp data to loadnumber_of_gsp
- The number of gsp to load. Note that on 2021-09-01 there were 338 to load.-
normalize_data
- Option to normalize the generation according to installed capacity -
Returns
- Data frame of time series of gsp data. Shows PV data for each GSP from {start} to {end}
get_installed_capacity
def get_installed_capacity(start: Optional[datetime] = datetime(2021, 1, 1, tzinfo=pytz.utc), maximum_number_of_gsp: Optional[int] = None) -> pd.Series
Get the installed capacity of each gsp
This can take ~30 seconds for getting the full list
Arguments:
start
- optional datetime when the installed cpapcity is collected-
maximum_number_of_gsp
- Truncate list of GSPs to be no larger than this number of GSPs. Set to None to disable truncation. -
Returns
- pd.Series of installed capacity indexed by gsp_id