[1]:
import datetime
print('Last updated: {}'.format(datetime.date.today().strftime('%d %B, %Y')))
Last updated: 26 June, 2019
This tutorial demonstrates how to use the following functions to transform a sample dataset in various ways. The tutorial covers the following steps:
Importing the brightwind library and loading some sample data
Using the scale_wind_speed()
function
Using the average_data_by_period()
function
Using the adjust_slope_offset()
function
Using the offset_wind_direction()
function
Using the offset_timestamps()
function
Using the selective_avg()
function
[2]:
import brightwind as bw
The following commands will load the exisiting dataset and show the first few timestamps.
[3]:
# load existing sample dataset
data = bw.load_csv(bw.datasets.demo_data)
# show first few rows of dataframe
data.head(5)
[3]:
Spd80mN | Spd80mS | Spd60mN | Spd60mS | Spd40mN | Spd40mS | Spd80mNStd | Spd80mSStd | Spd60mNStd | Spd60mSStd | ... | Dir78mSStd | Dir58mS | Dir58mSStd | Dir38mS | Dir38mSStd | T2m | RH2m | P2m | PrcpTot | BattMin | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Timestamp | |||||||||||||||||||||
2016-01-09 15:30:00 | 8.370 | 7.911 | 8.160 | 7.849 | 7.857 | 7.626 | 1.240 | 1.075 | 1.060 | 0.947 | ... | 6.100 | 110.1 | 6.009 | 112.2 | 5.724 | 0.711 | 100.0 | 935.0 | 0.0 | 12.94 |
2016-01-09 15:40:00 | 8.250 | 7.961 | 8.100 | 7.884 | 7.952 | 7.840 | 0.897 | 0.875 | 0.900 | 0.855 | ... | 5.114 | 110.9 | 4.702 | 109.8 | 5.628 | 0.630 | 100.0 | 935.0 | 0.0 | 12.95 |
2016-01-09 17:00:00 | 7.652 | 7.545 | 7.671 | 7.551 | 7.531 | 7.457 | 0.756 | 0.703 | 0.797 | 0.749 | ... | 4.172 | 113.1 | 3.447 | 111.8 | 4.016 | 1.126 | 100.0 | 934.0 | 0.0 | 12.75 |
2016-01-09 17:10:00 | 7.382 | 7.325 | 6.818 | 6.689 | 6.252 | 6.174 | 0.844 | 0.810 | 0.897 | 0.875 | ... | 4.680 | 118.8 | 5.107 | 115.6 | 5.189 | 0.954 | 100.0 | 934.0 | 0.0 | 12.71 |
2016-01-09 17:20:00 | 7.977 | 7.791 | 8.110 | 7.915 | 8.140 | 7.974 | 0.556 | 0.528 | 0.562 | 0.524 | ... | 3.123 | 115.9 | 2.960 | 113.6 | 3.540 | 0.863 | 100.0 | 934.0 | 0.0 | 12.69 |
5 rows × 29 columns
scale_wind_speed()
function¶The following commands specify an arbitrary scale factor and apply this to one of the speed variables.
[4]:
# specify scale factor
scale_factor = 1.03
# apply scale factor to Spd80mN variable
Spd80mN_scaled = bw.scale_wind_speed(data.Spd80mN, scale_factor)
# print first 5 rows of result
Spd80mN_scaled.head(5)
[4]:
Timestamp
2016-01-09 15:30:00 8.62110
2016-01-09 15:40:00 8.49750
2016-01-09 17:00:00 7.88156
2016-01-09 17:10:00 7.60346
2016-01-09 17:20:00 8.21631
Name: Spd80mN, dtype: float64
Now we’ll repeat the same step, this time saving the resulting data in the existing dataframe. The new variable appears as the last column on the right hand side of the returned dataframe.
[5]:
# apply scale factor to Spd80mN variable, creating a new variable in the existing dataframe
data['Spd80mN_scaled'] = bw.scale_wind_speed(data.Spd80mN, scale_factor)
# show first few rows of updated dataframe, scaled wind speed is included in the last column
data.head(5)
[5]:
Spd80mN | Spd80mS | Spd60mN | Spd60mS | Spd40mN | Spd40mS | Spd80mNStd | Spd80mSStd | Spd60mNStd | Spd60mSStd | ... | Dir58mS | Dir58mSStd | Dir38mS | Dir38mSStd | T2m | RH2m | P2m | PrcpTot | BattMin | Spd80mN_scaled | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Timestamp | |||||||||||||||||||||
2016-01-09 15:30:00 | 8.370 | 7.911 | 8.160 | 7.849 | 7.857 | 7.626 | 1.240 | 1.075 | 1.060 | 0.947 | ... | 110.1 | 6.009 | 112.2 | 5.724 | 0.711 | 100.0 | 935.0 | 0.0 | 12.94 | 8.62110 |
2016-01-09 15:40:00 | 8.250 | 7.961 | 8.100 | 7.884 | 7.952 | 7.840 | 0.897 | 0.875 | 0.900 | 0.855 | ... | 110.9 | 4.702 | 109.8 | 5.628 | 0.630 | 100.0 | 935.0 | 0.0 | 12.95 | 8.49750 |
2016-01-09 17:00:00 | 7.652 | 7.545 | 7.671 | 7.551 | 7.531 | 7.457 | 0.756 | 0.703 | 0.797 | 0.749 | ... | 113.1 | 3.447 | 111.8 | 4.016 | 1.126 | 100.0 | 934.0 | 0.0 | 12.75 | 7.88156 |
2016-01-09 17:10:00 | 7.382 | 7.325 | 6.818 | 6.689 | 6.252 | 6.174 | 0.844 | 0.810 | 0.897 | 0.875 | ... | 118.8 | 5.107 | 115.6 | 5.189 | 0.954 | 100.0 | 934.0 | 0.0 | 12.71 | 7.60346 |
2016-01-09 17:20:00 | 7.977 | 7.791 | 8.110 | 7.915 | 8.140 | 7.974 | 0.556 | 0.528 | 0.562 | 0.524 | ... | 115.9 | 2.960 | 113.6 | 3.540 | 0.863 | 100.0 | 934.0 | 0.0 | 12.69 | 8.21631 |
5 rows × 30 columns
Finally, we can compare the mean values of the scaled and unscaled speed time series. The ratio of these mean values is shown to equal the intended scale factor.
[6]:
# print unscaled mean speed
print('The unscaled mean speed is: \t {} m/s'.format(round(data.Spd80mN.mean(),2)))
# print scaled mean speed
print('The scaled mean speed is: \t {} m/s'.format(round(data.Spd80mN_scaled.mean(),2)))
# calculate ratio of mean values
ratio = data.Spd80mN_scaled.mean()/data.Spd80mN.mean()
# print ratio of mean values
print('The ratio of these values is: \t {}'.format(round(ratio,2)))
The unscaled mean speed is: 7.5 m/s
The scaled mean speed is: 7.72 m/s
The ratio of these values is: 1.03
average_data_by_period()
function¶This function can be used to apply a window average to data, using a window which can be specified as any number of minutes (‘min’), hours (‘H’), days (‘D’), weeks (‘W’). ‘1M’ and ‘1AS’ can also used for monthly and annual averages respectively. The output is created within a new series.
[7]:
# then derive monthly averages
data_monthly = bw.average_data_by_period(data.Spd80mN, period='1M')
# show monthly averaged data for first 5 months
data_monthly.head(5)
[7]:
Timestamp
2016-01-01 9.252377
2016-02-01 8.904382
2016-03-01 6.395166
2016-04-01 6.598875
2016-05-01 8.729657
Freq: MS, Name: Spd80mN, dtype: float64
adjust_slope_offset()
function¶Now, let’s imagine that data from the northern anemometer at 80 m was logged with standard slope and offset values of 0.765 and 0.35, and we need to rescale this data considering slope and offset values from a calibration certificate which is specific to this anemometer. The following command can be used to do this, once again adding a new variable to the dataframe.
[8]:
# Apply slope and offset adjustments
data['Spd80mN_adj'] = bw.adjust_slope_offset(data.Spd80mN,0.765, 0.35, 0.7642, 0.352)
Now let’s view the raw and adjusted speed values side-by-side.
[9]:
# concatenate raw and adjusted variables
summary = data[['Spd80mN', 'Spd80mN_adj']]
# show first few values
summary.head(5)
[9]:
Spd80mN | Spd80mN_adj | |
---|---|---|
Timestamp | ||
2016-01-09 15:30:00 | 8.370 | 8.363613 |
2016-01-09 15:40:00 | 8.250 | 8.243739 |
2016-01-09 17:00:00 | 7.652 | 7.646364 |
2016-01-09 17:10:00 | 7.382 | 7.376646 |
2016-01-09 17:20:00 | 7.977 | 7.971024 |
offset_wind_direction()
function¶Similarly, the following function can be used to apply a scalar offset to direction data from a given wind vane.
[10]:
# apply directional offset
data['Dir38mS_adj'] = bw.offset_wind_direction(data.Dir38mS, -10)
# concatenate raw and adjusted variables
summary = data[['Dir38mS', 'Dir38mS_adj']]
# show first few values
summary.head(5)
[10]:
Dir38mS | Dir38mS_adj | |
---|---|---|
Timestamp | ||
2016-01-09 15:30:00 | 112.2 | 102.2 |
2016-01-09 15:40:00 | 109.8 | 99.8 |
2016-01-09 17:00:00 | 111.8 | 101.8 |
2016-01-09 17:10:00 | 115.6 | 105.6 |
2016-01-09 17:20:00 | 113.6 | 103.6 |
offset_timestamps()
function¶This function can be used to apply an offset to the timestamps within a dataset. A few examples are provided below, first, let’s print the first timestamp of the raw data for comparison.
[11]:
# print first timestamp
print(data.index[0])
2016-01-09 15:30:00
Now we’ll add 1 hour and 30 minutes to the timestamps of the data and print the first timestamp once again.
[12]:
# add 90 minutes to timestamps
data = bw.offset_timestamps(data, '1.5H')
# print first timestamp
print(data.index[0])
2016-01-09 17:00:00
Negative offsets can also be applied.
[13]:
# subtract 2 hours from timestamps
data = bw.offset_timestamps(data, '-2H')
# print first timestamp
print(data.index[0])
2016-01-09 15:00:00
Offsets can also be applied in minutes (positive or negative).
[14]:
# add 7 minutes to timestamps
data = bw.offset_timestamps(data, '7min', overwrite=True)
# print first timestamp
print(data.index[0])
2016-01-09 15:07:00
Time offsets can also be restricted to a certain time period during the record using the following extra arguments. In this case we can see that the final adjustment has just been applied from the 2nd to the 4th timestamps.
[15]:
# subtract 7 minutes from timestamps between the datetime range below
data = bw.offset_timestamps(data, '-7min', date_from='2016-01-09 15:17:00', date_to='2016-01-09 16:47:00')
# show all data for first 5 timestamps
data.head(5)
[15]:
Spd80mN | Spd80mS | Spd60mN | Spd60mS | Spd40mN | Spd40mS | Spd80mNStd | Spd80mSStd | Spd60mNStd | Spd60mSStd | ... | Dir38mS | Dir38mSStd | T2m | RH2m | P2m | PrcpTot | BattMin | Spd80mN_scaled | Spd80mN_adj | Dir38mS_adj | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2016-01-09 15:07:00 | 8.370 | 7.911 | 8.160 | 7.849 | 7.857 | 7.626 | 1.240 | 1.075 | 1.060 | 0.947 | ... | 112.2 | 5.724 | 0.711 | 100.0 | 935.0 | 0.0 | 12.94 | 8.62110 | 8.363613 | 102.2 |
2016-01-09 15:10:00 | 8.250 | 7.961 | 8.100 | 7.884 | 7.952 | 7.840 | 0.897 | 0.875 | 0.900 | 0.855 | ... | 109.8 | 5.628 | 0.630 | 100.0 | 935.0 | 0.0 | 12.95 | 8.49750 | 8.243739 | 99.8 |
2016-01-09 16:30:00 | 7.652 | 7.545 | 7.671 | 7.551 | 7.531 | 7.457 | 0.756 | 0.703 | 0.797 | 0.749 | ... | 111.8 | 4.016 | 1.126 | 100.0 | 934.0 | 0.0 | 12.75 | 7.88156 | 7.646364 | 101.8 |
2016-01-09 16:40:00 | 7.382 | 7.325 | 6.818 | 6.689 | 6.252 | 6.174 | 0.844 | 0.810 | 0.897 | 0.875 | ... | 115.6 | 5.189 | 0.954 | 100.0 | 934.0 | 0.0 | 12.71 | 7.60346 | 7.376646 | 105.6 |
2016-01-09 16:57:00 | 7.977 | 7.791 | 8.110 | 7.915 | 8.140 | 7.974 | 0.556 | 0.528 | 0.562 | 0.524 | ... | 113.6 | 3.540 | 0.863 | 100.0 | 934.0 | 0.0 | 12.69 | 8.21631 | 7.971024 | 103.6 |
5 rows × 32 columns
If the timestamp is adjustment is large enough to overlap the unadjusted data before/after the ‘date_from’/’date_to’, by default the function will not overwrite the unadjusted data. The argument ‘overwrite=True’ can be specified to change this default and overwrite the neighbouring timestamps.
Finally, note that the adjusted timestamps can be allocated to a completely new dataframe, leaving the input dataset unchanged.
[16]:
# create new dataset with 6H added to the original timestamps
data_adj = bw.offset_timestamps(data, '6H')
selective_avg()
function¶This function can be used to create a time series of wind speed using data from two anemometers mounted at the same height, along with one wind vane. For each timestamp in the dataset, the function either averages the two wind speed values or only includes the upstream wind speed value if the other is in the wake of the mast. The wake is defined by winds approaching the mast from the opposite site of the boom within a directional sector of a given size (in this case 60 degrees).
[17]:
# derive selective average variable based on two 80 m anemeters and 78 m wind vane
data['sel_avg_80m'] = bw.selective_avg(data.Spd80mN, data.Spd80mS, wdir=data.Dir78mS, boom_dir_1=0, boom_dir_2=180, sector_width=60)
[18]:
# concatenate inputs and new output variable
summary = data[['Spd80mN', 'Spd80mS', 'Dir78mS','sel_avg_80m']]
# show the first 5 entries, none of which include waked anemometers
summary.head(5)
[18]:
Spd80mN | Spd80mS | Dir78mS | sel_avg_80m | |
---|---|---|---|---|
2016-01-09 15:07:00 | 8.370 | 7.911 | 114.2 | 8.1405 |
2016-01-09 15:10:00 | 8.250 | 7.961 | 114.4 | 8.1055 |
2016-01-09 16:30:00 | 7.652 | 7.545 | 117.8 | 7.5985 |
2016-01-09 16:40:00 | 7.382 | 7.325 | 124.5 | 7.3535 |
2016-01-09 16:57:00 | 7.977 | 7.791 | 120.9 | 7.8840 |