How to get some useful statistics using the brightwind library¶

[1]:

import datetime
print('Last updated: {}'.format(datetime.date.today().strftime('%d %B, %Y')))

Last updated: 26 June, 2019

Outline:¶

This guide will demonstrate how to get some useful statistics from a sample dataset using the following steps:

Import the brightwind library and some sample data
Find time continuity gaps within the sample data
Get some basic statistics on each of the columns from the sample dataset
Find the monthly coverage of the dataset or the coverage of any time period.
Return the mean of monthly means of a anemometer or of a range of anemometers

[2]:

import brightwind as bw

[3]:

# specify location of existing sample dataset
filepath = r'C:\...\brightwind\datasets\demo\demo_data.csv'
# load data as dataframe
data = bw.load_csv(filepath)
# show first few rows of dataframe
data.head(5)

[3]:

	Spd80mN	Spd80mS	Spd60mN	Spd60mS	Spd40mN	Spd40mS	Spd80mNStd	Spd80mSStd	Spd60mNStd	Spd60mSStd	...	Dir78mSStd	Dir58mS	Dir58mSStd	Dir38mS	Dir38mSStd	T2m	RH2m	P2m	PrcpTot	BattMin
Timestamp
2016-01-09 15:30:00	8.370	7.911	8.160	7.849	7.857	7.626	1.240	1.075	1.060	0.947	...	6.100	110.1	6.009	112.2	5.724	0.711	100.0	935.0	0.0	12.94
2016-01-09 15:40:00	8.250	7.961	8.100	7.884	7.952	7.840	0.897	0.875	0.900	0.855	...	5.114	110.9	4.702	109.8	5.628	0.630	100.0	935.0	0.0	12.95
2016-01-09 17:00:00	7.652	7.545	7.671	7.551	7.531	7.457	0.756	0.703	0.797	0.749	...	4.172	113.1	3.447	111.8	4.016	1.126	100.0	934.0	0.0	12.75
2016-01-09 17:10:00	7.382	7.325	6.818	6.689	6.252	6.174	0.844	0.810	0.897	0.875	...	4.680	118.8	5.107	115.6	5.189	0.954	100.0	934.0	0.0	12.71
2016-01-09 17:20:00	7.977	7.791	8.110	7.915	8.140	7.974	0.556	0.528	0.562	0.524	...	3.123	115.9	2.960	113.6	3.540	0.863	100.0	934.0	0.0	12.69

5 rows × 29 columns

Time Continuity¶

First we want to see if there are any gaps in the data. We can use the time_continuity_gap function to identify periods where there are gaps in the timestamp that are not consistent with typical gap seen between timestamps in the file(s). The function returns a pandas dataframe showing the timestamp at the start of the missing period and the timestamp at the end of the missing period. An additional column shows how many days were lost in the missing period.

[4]:

bw.time_continuity_gaps(data)

[4]:

	Date From	Date To	Days Lost
1	2016-01-09 15:40:00	2016-01-09 17:00:00	0.055556
17750	2016-05-11 23:00:00	2016-05-31 15:20:00	19.680556

Basic Statistics¶

Next we may want to get some basic statistics of each of the columns found in the wind data file. The basic_stats function returns the count, mean, standard deviation, minimum and maximum of each column. This can be useful for a variety of checks, one example is confirming calibrations have been applied to the anemometers by checking if the minimum value for each anemometer matches the corresponding calibration offset.

[5]:

bw.basic_stats(data)

[5]:

	count	mean	std	min	max
Spd80mN	95629.0	7.498665	3.998231	0.215	29.000
Spd80mS	95629.0	6.474298	4.457503	0.000	29.270
Spd60mN	95629.0	7.033594	3.809893	0.214	28.220
Spd60mS	95629.0	7.113664	3.905644	0.080	29.030
Spd40mN	95629.0	6.742682	3.738940	0.228	27.380
Spd40mS	95629.0	6.800116	3.816079	0.092	28.450
Spd80mNStd	95629.0	1.005663	0.540208	0.000	5.056
Spd80mSStd	95629.0	0.820888	0.596739	0.000	5.151
Spd60mNStd	95629.0	1.015741	0.536483	0.000	5.043
Spd60mSStd	95629.0	0.942060	0.535222	0.000	5.185
Spd40mNStd	95629.0	1.002585	0.515037	0.000	4.919
Spd40mSStd	95629.0	0.936986	0.522567	0.000	5.143
Spd80mNMax	95629.0	9.845375	5.137878	0.215	38.620
Spd80mSMax	95629.0	8.473476	5.754762	0.000	39.450
Spd60mNMax	95629.0	9.467539	5.007623	0.214	39.060
Spd60mSMax	95629.0	9.440672	5.066036	0.080	39.830
Spd40mNMax	95629.0	9.170213	4.936084	0.228	38.440
Spd40mSMax	95629.0	9.147638	4.996349	0.092	38.770
Dir78mS	95629.0	198.259766	78.632518	0.003	360.000
Dir78mSStd	95629.0	6.603149	5.931689	0.000	78.910
Dir58mS	95629.0	232.994314	76.145192	0.014	360.000
Dir58mSStd	95629.0	4.259346	6.249012	0.000	78.490
Dir38mS	95629.0	197.835100	84.050190	0.031	360.000
Dir38mSStd	95629.0	8.923607	6.420406	0.000	80.100
T2m	95629.0	7.116077	4.908406	-6.663	25.420
RH2m	95629.0	93.857024	9.649367	25.730	100.000
P2m	95629.0	952.968077	23.537472	592.200	1002.000
PrcpTot	95629.0	0.014461	0.085502	0.000	5.200
BattMin	95629.0	13.416010	0.565756	12.240	15.180

Data Coverage¶

Next we can see check the coverage of each column in the dataset. By default, the coverage function returns the monthly coverage.

[6]:

bw.coverage(data)

[6]:

	Spd80mN_Coverage	Spd80mS_Coverage	Spd60mN_Coverage	Spd60mS_Coverage	Spd40mN_Coverage	Spd40mS_Coverage	Spd80mNStd_Coverage	Spd80mSStd_Coverage	Spd60mNStd_Coverage	Spd60mSStd_Coverage	...	Dir78mSStd_Coverage	Dir58mS_Coverage	Dir58mSStd_Coverage	Dir38mS_Coverage	Dir38mSStd_Coverage	T2m_Coverage	RH2m_Coverage	P2m_Coverage	PrcpTot_Coverage	BattMin_Coverage
Timestamp
2016-01-01	0.719534	0.719534	0.719534	0.719534	0.719534	0.719534	0.719534	0.719534	0.719534	0.719534	...	0.719534	0.719534	0.719534	0.719534	0.719534	0.719534	0.719534	0.719534	0.719534	0.719534
2016-02-01	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	...	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
2016-03-01	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	...	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
2016-04-01	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	...	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
2016-05-01	0.365367	0.365367	0.365367	0.365367	0.365367	0.365367	0.365367	0.365367	0.365367	0.365367	...	0.365367	0.365367	0.365367	0.365367	0.365367	0.365367	0.365367	0.365367	0.365367	0.365367
2016-06-01	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	...	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
2016-07-01	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	...	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
2016-08-01	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	...	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
2016-09-01	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	...	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
2016-10-01	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	...	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
2016-11-01	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	...	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
2016-12-01	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	...	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
2017-01-01	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	...	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
2017-02-01	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	...	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
2017-03-01	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	...	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
2017-04-01	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	...	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
2017-05-01	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	...	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
2017-06-01	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	...	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
2017-07-01	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	...	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
2017-08-01	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	...	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
2017-09-01	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	...	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
2017-10-01	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	...	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
2017-11-01	0.748611	0.748611	0.748611	0.748611	0.748611	0.748611	0.748611	0.748611	0.748611	0.748611	...	0.748611	0.748611	0.748611	0.748611	0.748611	0.748611	0.748611	0.748611	0.748611	0.748611

23 rows × 29 columns

Returning the coverage of all of the columns is more information that we need in this case! So we can assign each of the anemometers to a list, by specifiying the column headings from the table that correspond to the average 10-min values from the anemometers, and then passing them through the coverage function.

[7]:

anemometers = ['Spd80mN','Spd80mS', 'Spd60mN', 'Spd60mS', 'Spd40mN', 'Spd40mS']
bw.coverage(data[anemometers])

[7]:

	Spd80mN_Coverage	Spd80mS_Coverage	Spd60mN_Coverage	Spd60mS_Coverage	Spd40mN_Coverage	Spd40mS_Coverage
Timestamp
2016-01-01	0.719534	0.719534	0.719534	0.719534	0.719534	0.719534
2016-02-01	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
2016-03-01	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
2016-04-01	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
2016-05-01	0.365367	0.365367	0.365367	0.365367	0.365367	0.365367
2016-06-01	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
2016-07-01	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
2016-08-01	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
2016-09-01	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
2016-10-01	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
2016-11-01	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
2016-12-01	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
2017-01-01	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
2017-02-01	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
2017-03-01	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
2017-04-01	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
2017-05-01	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
2017-06-01	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
2017-07-01	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
2017-08-01	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
2017-09-01	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
2017-10-01	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
2017-11-01	0.748611	0.748611	0.748611	0.748611	0.748611	0.748611

But what if we dont want monthly coverage? We can then use the period variable to return whatever time period we want, whether that is 10-min (period=‘10min’), hourly (period=‘1H’), daily (period=‘1D’), weekly (period=‘1W’) or yearly (period=‘1AS’). Here we have opted to return the yearly coverage.

[8]:

bw.coverage(data[anemometers],period='1AS')

[8]:

	Spd80mN_Coverage	Spd80mS_Coverage	Spd60mN_Coverage	Spd60mS_Coverage	Spd40mN_Coverage	Spd40mS_Coverage
Timestamp
2016-01-01	0.922492	0.922492	0.922492	0.922492	0.922492	0.922492
2017-01-01	0.894406	0.894406	0.894406	0.894406	0.894406	0.894406

Mean of monthly means¶

The mean of monthly means is a method of adjusting the average to take account of seasonal bias. For example this would remove the upward bias of having a 1.5 year dataset that covers two windier winter periods and one calm summer period. We can call the function in two ways, either by passing a specific column from the dataset which will return a value, or sending a list of column names (in this case anemometers) which will return the mean of monthly means for each column name as a dataframe.

[9]:

bw.momm(data.Spd80mN)

[9]:

7.556588194559553

[10]:

bw.momm(data[anemometers])

[10]:

	MOMM
Spd80mN	7.556588
Spd80mS	6.587765
Spd60mN	7.081094
Spd60mS	7.163933
Spd40mN	6.785035
Spd40mS	6.844676