[1]:
import datetime
print('Last updated: {}'.format(datetime.date.today().strftime('%d %B, %Y')))
Last updated: 26 June, 2019
This guide will demonstrate how to get some useful statistics from a sample dataset using the following steps:
Import the brightwind library and some sample data
Find time continuity gaps within the sample data
Get some basic statistics on each of the columns from the sample dataset
Find the monthly coverage of the dataset or the coverage of any time period.
Return the mean of monthly means of a anemometer or of a range of anemometers
[2]:
import brightwind as bw
[3]:
# specify location of existing sample dataset
filepath = r'C:\...\brightwind\datasets\demo\demo_data.csv'
# load data as dataframe
data = bw.load_csv(filepath)
# show first few rows of dataframe
data.head(5)
[3]:
Spd80mN | Spd80mS | Spd60mN | Spd60mS | Spd40mN | Spd40mS | Spd80mNStd | Spd80mSStd | Spd60mNStd | Spd60mSStd | ... | Dir78mSStd | Dir58mS | Dir58mSStd | Dir38mS | Dir38mSStd | T2m | RH2m | P2m | PrcpTot | BattMin | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Timestamp | |||||||||||||||||||||
2016-01-09 15:30:00 | 8.370 | 7.911 | 8.160 | 7.849 | 7.857 | 7.626 | 1.240 | 1.075 | 1.060 | 0.947 | ... | 6.100 | 110.1 | 6.009 | 112.2 | 5.724 | 0.711 | 100.0 | 935.0 | 0.0 | 12.94 |
2016-01-09 15:40:00 | 8.250 | 7.961 | 8.100 | 7.884 | 7.952 | 7.840 | 0.897 | 0.875 | 0.900 | 0.855 | ... | 5.114 | 110.9 | 4.702 | 109.8 | 5.628 | 0.630 | 100.0 | 935.0 | 0.0 | 12.95 |
2016-01-09 17:00:00 | 7.652 | 7.545 | 7.671 | 7.551 | 7.531 | 7.457 | 0.756 | 0.703 | 0.797 | 0.749 | ... | 4.172 | 113.1 | 3.447 | 111.8 | 4.016 | 1.126 | 100.0 | 934.0 | 0.0 | 12.75 |
2016-01-09 17:10:00 | 7.382 | 7.325 | 6.818 | 6.689 | 6.252 | 6.174 | 0.844 | 0.810 | 0.897 | 0.875 | ... | 4.680 | 118.8 | 5.107 | 115.6 | 5.189 | 0.954 | 100.0 | 934.0 | 0.0 | 12.71 |
2016-01-09 17:20:00 | 7.977 | 7.791 | 8.110 | 7.915 | 8.140 | 7.974 | 0.556 | 0.528 | 0.562 | 0.524 | ... | 3.123 | 115.9 | 2.960 | 113.6 | 3.540 | 0.863 | 100.0 | 934.0 | 0.0 | 12.69 |
5 rows × 29 columns
First we want to see if there are any gaps in the data. We can use the time_continuity_gap function to identify periods where there are gaps in the timestamp that are not consistent with typical gap seen between timestamps in the file(s). The function returns a pandas dataframe showing the timestamp at the start of the missing period and the timestamp at the end of the missing period. An additional column shows how many days were lost in the missing period.
[4]:
bw.time_continuity_gaps(data)
[4]:
Date From | Date To | Days Lost | |
---|---|---|---|
1 | 2016-01-09 15:40:00 | 2016-01-09 17:00:00 | 0.055556 |
17750 | 2016-05-11 23:00:00 | 2016-05-31 15:20:00 | 19.680556 |
Next we may want to get some basic statistics of each of the columns found in the wind data file. The basic_stats function returns the count, mean, standard deviation, minimum and maximum of each column. This can be useful for a variety of checks, one example is confirming calibrations have been applied to the anemometers by checking if the minimum value for each anemometer matches the corresponding calibration offset.
[5]:
bw.basic_stats(data)
[5]:
count | mean | std | min | max | |
---|---|---|---|---|---|
Spd80mN | 95629.0 | 7.498665 | 3.998231 | 0.215 | 29.000 |
Spd80mS | 95629.0 | 6.474298 | 4.457503 | 0.000 | 29.270 |
Spd60mN | 95629.0 | 7.033594 | 3.809893 | 0.214 | 28.220 |
Spd60mS | 95629.0 | 7.113664 | 3.905644 | 0.080 | 29.030 |
Spd40mN | 95629.0 | 6.742682 | 3.738940 | 0.228 | 27.380 |
Spd40mS | 95629.0 | 6.800116 | 3.816079 | 0.092 | 28.450 |
Spd80mNStd | 95629.0 | 1.005663 | 0.540208 | 0.000 | 5.056 |
Spd80mSStd | 95629.0 | 0.820888 | 0.596739 | 0.000 | 5.151 |
Spd60mNStd | 95629.0 | 1.015741 | 0.536483 | 0.000 | 5.043 |
Spd60mSStd | 95629.0 | 0.942060 | 0.535222 | 0.000 | 5.185 |
Spd40mNStd | 95629.0 | 1.002585 | 0.515037 | 0.000 | 4.919 |
Spd40mSStd | 95629.0 | 0.936986 | 0.522567 | 0.000 | 5.143 |
Spd80mNMax | 95629.0 | 9.845375 | 5.137878 | 0.215 | 38.620 |
Spd80mSMax | 95629.0 | 8.473476 | 5.754762 | 0.000 | 39.450 |
Spd60mNMax | 95629.0 | 9.467539 | 5.007623 | 0.214 | 39.060 |
Spd60mSMax | 95629.0 | 9.440672 | 5.066036 | 0.080 | 39.830 |
Spd40mNMax | 95629.0 | 9.170213 | 4.936084 | 0.228 | 38.440 |
Spd40mSMax | 95629.0 | 9.147638 | 4.996349 | 0.092 | 38.770 |
Dir78mS | 95629.0 | 198.259766 | 78.632518 | 0.003 | 360.000 |
Dir78mSStd | 95629.0 | 6.603149 | 5.931689 | 0.000 | 78.910 |
Dir58mS | 95629.0 | 232.994314 | 76.145192 | 0.014 | 360.000 |
Dir58mSStd | 95629.0 | 4.259346 | 6.249012 | 0.000 | 78.490 |
Dir38mS | 95629.0 | 197.835100 | 84.050190 | 0.031 | 360.000 |
Dir38mSStd | 95629.0 | 8.923607 | 6.420406 | 0.000 | 80.100 |
T2m | 95629.0 | 7.116077 | 4.908406 | -6.663 | 25.420 |
RH2m | 95629.0 | 93.857024 | 9.649367 | 25.730 | 100.000 |
P2m | 95629.0 | 952.968077 | 23.537472 | 592.200 | 1002.000 |
PrcpTot | 95629.0 | 0.014461 | 0.085502 | 0.000 | 5.200 |
BattMin | 95629.0 | 13.416010 | 0.565756 | 12.240 | 15.180 |
Next we can see check the coverage of each column in the dataset. By default, the coverage function returns the monthly coverage.
[6]:
bw.coverage(data)
[6]:
Spd80mN_Coverage | Spd80mS_Coverage | Spd60mN_Coverage | Spd60mS_Coverage | Spd40mN_Coverage | Spd40mS_Coverage | Spd80mNStd_Coverage | Spd80mSStd_Coverage | Spd60mNStd_Coverage | Spd60mSStd_Coverage | ... | Dir78mSStd_Coverage | Dir58mS_Coverage | Dir58mSStd_Coverage | Dir38mS_Coverage | Dir38mSStd_Coverage | T2m_Coverage | RH2m_Coverage | P2m_Coverage | PrcpTot_Coverage | BattMin_Coverage | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Timestamp | |||||||||||||||||||||
2016-01-01 | 0.719534 | 0.719534 | 0.719534 | 0.719534 | 0.719534 | 0.719534 | 0.719534 | 0.719534 | 0.719534 | 0.719534 | ... | 0.719534 | 0.719534 | 0.719534 | 0.719534 | 0.719534 | 0.719534 | 0.719534 | 0.719534 | 0.719534 | 0.719534 |
2016-02-01 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | ... | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |
2016-03-01 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | ... | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |
2016-04-01 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | ... | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |
2016-05-01 | 0.365367 | 0.365367 | 0.365367 | 0.365367 | 0.365367 | 0.365367 | 0.365367 | 0.365367 | 0.365367 | 0.365367 | ... | 0.365367 | 0.365367 | 0.365367 | 0.365367 | 0.365367 | 0.365367 | 0.365367 | 0.365367 | 0.365367 | 0.365367 |
2016-06-01 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | ... | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |
2016-07-01 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | ... | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |
2016-08-01 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | ... | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |
2016-09-01 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | ... | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |
2016-10-01 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | ... | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |
2016-11-01 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | ... | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |
2016-12-01 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | ... | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |
2017-01-01 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | ... | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |
2017-02-01 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | ... | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |
2017-03-01 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | ... | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |
2017-04-01 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | ... | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |
2017-05-01 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | ... | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |
2017-06-01 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | ... | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |
2017-07-01 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | ... | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |
2017-08-01 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | ... | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |
2017-09-01 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | ... | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |
2017-10-01 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | ... | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |
2017-11-01 | 0.748611 | 0.748611 | 0.748611 | 0.748611 | 0.748611 | 0.748611 | 0.748611 | 0.748611 | 0.748611 | 0.748611 | ... | 0.748611 | 0.748611 | 0.748611 | 0.748611 | 0.748611 | 0.748611 | 0.748611 | 0.748611 | 0.748611 | 0.748611 |
23 rows × 29 columns
Returning the coverage of all of the columns is more information that we need in this case! So we can assign each of the anemometers to a list, by specifiying the column headings from the table that correspond to the average 10-min values from the anemometers, and then passing them through the coverage function.
[7]:
anemometers = ['Spd80mN','Spd80mS', 'Spd60mN', 'Spd60mS', 'Spd40mN', 'Spd40mS']
bw.coverage(data[anemometers])
[7]:
Spd80mN_Coverage | Spd80mS_Coverage | Spd60mN_Coverage | Spd60mS_Coverage | Spd40mN_Coverage | Spd40mS_Coverage | |
---|---|---|---|---|---|---|
Timestamp | ||||||
2016-01-01 | 0.719534 | 0.719534 | 0.719534 | 0.719534 | 0.719534 | 0.719534 |
2016-02-01 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |
2016-03-01 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |
2016-04-01 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |
2016-05-01 | 0.365367 | 0.365367 | 0.365367 | 0.365367 | 0.365367 | 0.365367 |
2016-06-01 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |
2016-07-01 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |
2016-08-01 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |
2016-09-01 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |
2016-10-01 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |
2016-11-01 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |
2016-12-01 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |
2017-01-01 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |
2017-02-01 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |
2017-03-01 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |
2017-04-01 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |
2017-05-01 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |
2017-06-01 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |
2017-07-01 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |
2017-08-01 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |
2017-09-01 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |
2017-10-01 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |
2017-11-01 | 0.748611 | 0.748611 | 0.748611 | 0.748611 | 0.748611 | 0.748611 |
But what if we dont want monthly coverage? We can then use the period variable to return whatever time period we want, whether that is 10-min (period=‘10min’), hourly (period=‘1H’), daily (period=‘1D’), weekly (period=‘1W’) or yearly (period=‘1AS’). Here we have opted to return the yearly coverage.
[8]:
bw.coverage(data[anemometers],period='1AS')
[8]:
Spd80mN_Coverage | Spd80mS_Coverage | Spd60mN_Coverage | Spd60mS_Coverage | Spd40mN_Coverage | Spd40mS_Coverage | |
---|---|---|---|---|---|---|
Timestamp | ||||||
2016-01-01 | 0.922492 | 0.922492 | 0.922492 | 0.922492 | 0.922492 | 0.922492 |
2017-01-01 | 0.894406 | 0.894406 | 0.894406 | 0.894406 | 0.894406 | 0.894406 |
The mean of monthly means is a method of adjusting the average to take account of seasonal bias. For example this would remove the upward bias of having a 1.5 year dataset that covers two windier winter periods and one calm summer period. We can call the function in two ways, either by passing a specific column from the dataset which will return a value, or sending a list of column names (in this case anemometers) which will return the mean of monthly means for each column name as a dataframe.
[9]:
bw.momm(data.Spd80mN)
[9]:
7.556588194559553
[10]:
bw.momm(data[anemometers])
[10]:
MOMM | |
---|---|
Spd80mN | 7.556588 |
Spd80mS | 6.587765 |
Spd60mN | 7.081094 |
Spd60mS | 7.163933 |
Spd40mN | 6.785035 |
Spd40mS | 6.844676 |