brightwind.transform.transform.average_data_by_period¶
-
brightwind.transform.transform.
average_data_by_period
(data, period, aggregation_method='mean', coverage_threshold=None, return_coverage=False)¶ Averages the data by the time period specified by period.
Aggregates data by the aggregation_method specified, by default this function averages the data to the period specified. Can be used to find hourly, daily, weekly, etc. averages or sums. Can also return coverage and filter the returned data by coverage.
- Parameters
data (pandas.Series) – Data to find average or aggregate of
period (str or pandas.DateOffset) –
Groups data by the period specified here. The following formats are supported
Set period to 10min for 10 minute average, 20min for 20 minute average and so on for 4min, 15min, etc.
Set period to 1H for hourly average, 3H for three hourly average and so on for 5H, 6H etc.
Set period to 1D for a daily average, 3D for three day average, similarly 5D, 7D, 15D etc.
Set period to 1W for a weekly average, 3W for three week average, similarly 2W, 4W etc.
Set period to 1M for monthly average
Set period to 1AS fo annual average
Can be a DateOffset object too
aggregation_method (str) – Default mean, returns the mean of the data for the specified period. Can also use median, prod, sum, std,`var`, max, min which are shorthands for median, product, summation, standard deviation, variance, maximum and minimum respectively.
coverage_threshold (float) – Coverage is defined as the ratio of number of data points present in the period and the maximum number of data points that a period should have. Example, for 10 minute data resolution and a period of 1 hour, the maximum number of data points in one period is 6. But if the number if data points available is only 3 for that hour the coverage is 3/6=0.5. It should be greater than 0 and less than or equal to 1. It is set to None by default. If it is None or 0, data is not filtered. Otherwise periods are removed where coverage is less than the coverage_threshold are removed.
return_coverage (bool) – If True appends and additional column in the DataFrame returned, with coverage calculated for each period. The columns with coverage are named as <column name>_Coverage
- Returns
A DataFrame with data aggregated with the specified aggregation_method (mean by default). Additionally it could be filtered based on coverage and have a coverage column depending on the parameters.
- Return type
DataFrame
Example usage
import brightwind as bw data = bw.load_campbell_scientific(bw.datasets.demo_campbell_scientific_site_data) #To find hourly averages data_hourly = bw.average_data_by_period(data.Spd80mN, period='1H') #To find monthly averages data_monthly = bw.average_data_by_period(data.Spd80mN, period='1M') #To filter months where half of the data is missing data_monthly_filtered = bw.average_data_by_period(data.Spd80mN, period='1M', coverage_threshold=0.5) #To check the coverage for all months data_monthly_filtered = bw.average_data_by_period(data.Spd80mN, period='1M', return_coverage=True)