brightwind.transform.transform.average_data_by_period

brightwind.transform.transform.average_data_by_period(data, period, aggregation_method='mean', coverage_threshold=None, return_coverage=False)

Averages the data by the time period specified by period.

Aggregates data by the aggregation_method specified, by default this function averages the data to the period specified. Can be used to find hourly, daily, weekly, etc. averages or sums. Can also return coverage and filter the returned data by coverage.

Parameters
  • data (pandas.Series) – Data to find average or aggregate of

  • period (str or pandas.DateOffset) –

    Groups data by the period specified here. The following formats are supported

    • Set period to 10min for 10 minute average, 20min for 20 minute average and so on for 4min, 15min, etc.

    • Set period to 1H for hourly average, 3H for three hourly average and so on for 5H, 6H etc.

    • Set period to 1D for a daily average, 3D for three day average, similarly 5D, 7D, 15D etc.

    • Set period to 1W for a weekly average, 3W for three week average, similarly 2W, 4W etc.

    • Set period to 1M for monthly average

    • Set period to 1AS fo annual average

    • Can be a DateOffset object too

  • aggregation_method (str) – Default mean, returns the mean of the data for the specified period. Can also use median, prod, sum, std,`var`, max, min which are shorthands for median, product, summation, standard deviation, variance, maximum and minimum respectively.

  • coverage_threshold (float) – Coverage is defined as the ratio of number of data points present in the period and the maximum number of data points that a period should have. Example, for 10 minute data resolution and a period of 1 hour, the maximum number of data points in one period is 6. But if the number if data points available is only 3 for that hour the coverage is 3/6=0.5. It should be greater than 0 and less than or equal to 1. It is set to None by default. If it is None or 0, data is not filtered. Otherwise periods are removed where coverage is less than the coverage_threshold are removed.

  • return_coverage (bool) – If True appends and additional column in the DataFrame returned, with coverage calculated for each period. The columns with coverage are named as <column name>_Coverage

Returns

A DataFrame with data aggregated with the specified aggregation_method (mean by default). Additionally it could be filtered based on coverage and have a coverage column depending on the parameters.

Return type

DataFrame

Example usage

import brightwind as bw
data = bw.load_campbell_scientific(bw.datasets.demo_campbell_scientific_site_data)

#To find hourly averages
data_hourly = bw.average_data_by_period(data.Spd80mN, period='1H')

#To find monthly averages
data_monthly = bw.average_data_by_period(data.Spd80mN, period='1M')

#To filter months where half of the data is missing
data_monthly_filtered = bw.average_data_by_period(data.Spd80mN, period='1M', coverage_threshold=0.5)

#To check the coverage for all months
data_monthly_filtered = bw.average_data_by_period(data.Spd80mN, period='1M', return_coverage=True)