How to transform your data — brightwind documentation

Outline:¶

This tutorial demonstrates how to use the following functions to transform a sample dataset in various ways. The tutorial covers the following steps:

Importing the brightwind library and loading some sample data
Using the scale_wind_speed() function
Using the average_data_by_period() function
Using the adjust_slope_offset() function
Using the offset_wind_direction() function
Using the offset_timestamps() function
Using the selective_avg() function

1. Importing the brightwind library and loading some sample data¶

[2]:

import brightwind as bw

The following commands will load the exisiting dataset and show the first few timestamps.

[3]:

# load existing sample dataset
data = bw.load_csv(bw.datasets.demo_data)

# show first few rows of dataframe
data.head(5)

[3]:

	Spd80mN	Spd80mS	Spd60mN	Spd60mS	Spd40mN	Spd40mS	Spd80mNStd	Spd80mSStd	Spd60mNStd	Spd60mSStd	...	Dir78mSStd	Dir58mS	Dir58mSStd	Dir38mS	Dir38mSStd	T2m	RH2m	P2m	PrcpTot	BattMin
Timestamp
2016-01-09 15:30:00	8.370	7.911	8.160	7.849	7.857	7.626	1.240	1.075	1.060	0.947	...	6.100	110.1	6.009	112.2	5.724	0.711	100.0	935.0	0.0	12.94
2016-01-09 15:40:00	8.250	7.961	8.100	7.884	7.952	7.840	0.897	0.875	0.900	0.855	...	5.114	110.9	4.702	109.8	5.628	0.630	100.0	935.0	0.0	12.95
2016-01-09 17:00:00	7.652	7.545	7.671	7.551	7.531	7.457	0.756	0.703	0.797	0.749	...	4.172	113.1	3.447	111.8	4.016	1.126	100.0	934.0	0.0	12.75
2016-01-09 17:10:00	7.382	7.325	6.818	6.689	6.252	6.174	0.844	0.810	0.897	0.875	...	4.680	118.8	5.107	115.6	5.189	0.954	100.0	934.0	0.0	12.71
2016-01-09 17:20:00	7.977	7.791	8.110	7.915	8.140	7.974	0.556	0.528	0.562	0.524	...	3.123	115.9	2.960	113.6	3.540	0.863	100.0	934.0	0.0	12.69

5 rows × 29 columns

2. Using the `scale_wind_speed()` function¶

The following commands specify an arbitrary scale factor and apply this to one of the speed variables.

[4]:

# specify scale factor
scale_factor = 1.03

# apply scale factor to Spd80mN variable
Spd80mN_scaled = bw.scale_wind_speed(data.Spd80mN, scale_factor)

# print first 5 rows of result
Spd80mN_scaled.head(5)

[4]:

Timestamp
2016-01-09 15:30:00    8.62110
2016-01-09 15:40:00    8.49750
2016-01-09 17:00:00    7.88156
2016-01-09 17:10:00    7.60346
2016-01-09 17:20:00    8.21631
Name: Spd80mN, dtype: float64

Now we’ll repeat the same step, this time saving the resulting data in the existing dataframe. The new variable appears as the last column on the right hand side of the returned dataframe.

[5]:

# apply scale factor to Spd80mN variable, creating a new variable in the existing dataframe
data['Spd80mN_scaled'] = bw.scale_wind_speed(data.Spd80mN, scale_factor)

# show first few rows of updated dataframe, scaled wind speed is included in the last column
data.head(5)

[5]:

	Spd80mN	Spd80mS	Spd60mN	Spd60mS	Spd40mN	Spd40mS	Spd80mNStd	Spd80mSStd	Spd60mNStd	Spd60mSStd	...	Dir58mS	Dir58mSStd	Dir38mS	Dir38mSStd	T2m	RH2m	P2m	PrcpTot	BattMin	Spd80mN_scaled
Timestamp
2016-01-09 15:30:00	8.370	7.911	8.160	7.849	7.857	7.626	1.240	1.075	1.060	0.947	...	110.1	6.009	112.2	5.724	0.711	100.0	935.0	0.0	12.94	8.62110
2016-01-09 15:40:00	8.250	7.961	8.100	7.884	7.952	7.840	0.897	0.875	0.900	0.855	...	110.9	4.702	109.8	5.628	0.630	100.0	935.0	0.0	12.95	8.49750
2016-01-09 17:00:00	7.652	7.545	7.671	7.551	7.531	7.457	0.756	0.703	0.797	0.749	...	113.1	3.447	111.8	4.016	1.126	100.0	934.0	0.0	12.75	7.88156
2016-01-09 17:10:00	7.382	7.325	6.818	6.689	6.252	6.174	0.844	0.810	0.897	0.875	...	118.8	5.107	115.6	5.189	0.954	100.0	934.0	0.0	12.71	7.60346
2016-01-09 17:20:00	7.977	7.791	8.110	7.915	8.140	7.974	0.556	0.528	0.562	0.524	...	115.9	2.960	113.6	3.540	0.863	100.0	934.0	0.0	12.69	8.21631

5 rows × 30 columns

Finally, we can compare the mean values of the scaled and unscaled speed time series. The ratio of these mean values is shown to equal the intended scale factor.

[6]:

# print unscaled mean speed
print('The unscaled mean speed is: \t {} m/s'.format(round(data.Spd80mN.mean(),2)))

# print scaled mean speed
print('The scaled mean speed is: \t {} m/s'.format(round(data.Spd80mN_scaled.mean(),2)))

# calculate ratio of mean values
ratio = data.Spd80mN_scaled.mean()/data.Spd80mN.mean()

# print ratio of mean values
print('The ratio of these values is: \t {}'.format(round(ratio,2)))

The unscaled mean speed is:      7.5 m/s
The scaled mean speed is:        7.72 m/s
The ratio of these values is:    1.03

3. Using the `average_data_by_period()` function¶

This function can be used to apply a window average to data, using a window which can be specified as any number of minutes (‘min’), hours (‘H’), days (‘D’), weeks (‘W’). ‘1M’ and ‘1AS’ can also used for monthly and annual averages respectively. The output is created within a new series.

[7]:

# then derive monthly averages
data_monthly = bw.average_data_by_period(data.Spd80mN, period='1M')

# show monthly averaged data for first 5 months
data_monthly.head(5)

[7]:

Timestamp
2016-01-01    9.252377
2016-02-01    8.904382
2016-03-01    6.395166
2016-04-01    6.598875
2016-05-01    8.729657
Freq: MS, Name: Spd80mN, dtype: float64

4. Using the `adjust_slope_offset()` function¶

Now, let’s imagine that data from the northern anemometer at 80 m was logged with standard slope and offset values of 0.765 and 0.35, and we need to rescale this data considering slope and offset values from a calibration certificate which is specific to this anemometer. The following command can be used to do this, once again adding a new variable to the dataframe.

[8]:

# Apply slope and offset adjustments
data['Spd80mN_adj'] = bw.adjust_slope_offset(data.Spd80mN,0.765, 0.35, 0.7642, 0.352)

Now let’s view the raw and adjusted speed values side-by-side.

[9]:

# concatenate raw and adjusted variables
summary = data[['Spd80mN', 'Spd80mN_adj']]

# show first few values
summary.head(5)

[9]:

	Spd80mN	Spd80mN_adj
Timestamp
2016-01-09 15:30:00	8.370	8.363613
2016-01-09 15:40:00	8.250	8.243739
2016-01-09 17:00:00	7.652	7.646364
2016-01-09 17:10:00	7.382	7.376646
2016-01-09 17:20:00	7.977	7.971024

5. Using the `offset_wind_direction()` function¶

Similarly, the following function can be used to apply a scalar offset to direction data from a given wind vane.

[10]:

# apply directional offset
data['Dir38mS_adj'] = bw.offset_wind_direction(data.Dir38mS, -10)

# concatenate raw and adjusted variables
summary = data[['Dir38mS', 'Dir38mS_adj']]

# show first few values
summary.head(5)

[10]:

	Dir38mS	Dir38mS_adj
Timestamp
2016-01-09 15:30:00	112.2	102.2
2016-01-09 15:40:00	109.8	99.8
2016-01-09 17:00:00	111.8	101.8
2016-01-09 17:10:00	115.6	105.6
2016-01-09 17:20:00	113.6	103.6

6. Using the `offset_timestamps()` function¶

This function can be used to apply an offset to the timestamps within a dataset. A few examples are provided below, first, let’s print the first timestamp of the raw data for comparison.

[11]:

# print first timestamp
print(data.index[0])

2016-01-09 15:30:00

Now we’ll add 1 hour and 30 minutes to the timestamps of the data and print the first timestamp once again.

[12]:

# add 90 minutes to timestamps
data = bw.offset_timestamps(data, '1.5H')

# print first timestamp
print(data.index[0])

2016-01-09 17:00:00

Negative offsets can also be applied.

[13]:

# subtract 2 hours from timestamps
data = bw.offset_timestamps(data, '-2H')

# print first timestamp
print(data.index[0])

2016-01-09 15:00:00

Offsets can also be applied in minutes (positive or negative).

[14]:

# add 7 minutes to timestamps
data = bw.offset_timestamps(data, '7min', overwrite=True)

# print first timestamp
print(data.index[0])

2016-01-09 15:07:00

Time offsets can also be restricted to a certain time period during the record using the following extra arguments. In this case we can see that the final adjustment has just been applied from the 2nd to the 4th timestamps.

[15]:

# subtract 7 minutes from timestamps between the datetime range below
data = bw.offset_timestamps(data, '-7min', date_from='2016-01-09 15:17:00', date_to='2016-01-09 16:47:00')

# show all data for first 5 timestamps
data.head(5)

[15]:

	Spd80mN	Spd80mS	Spd60mN	Spd60mS	Spd40mN	Spd40mS	Spd80mNStd	Spd80mSStd	Spd60mNStd	Spd60mSStd	...	Dir38mS	Dir38mSStd	T2m	RH2m	P2m	BattMin	Spd80mN_scaled	Spd80mN_adj	Dir38mS_adj
2016-01-09 15:07:00	8.370	7.911	8.160	7.849	7.857	7.626	1.240	1.075	1.060	0.947	...	112.2	5.724	0.711	100.0	935.0	12.94	8.62110	8.363613	102.2
2016-01-09 15:10:00	8.250	7.961	8.100	7.884	7.952	7.840	0.897	0.875	0.900	0.855	...	109.8	5.628	0.630	100.0	935.0	12.95	8.49750	8.243739	99.8
2016-01-09 16:30:00	7.652	7.545	7.671	7.551	7.531	7.457	0.756	0.703	0.797	0.749	...	111.8	4.016	1.126	100.0	934.0	12.75	7.88156	7.646364	101.8
2016-01-09 16:40:00	7.382	7.325	6.818	6.689	6.252	6.174	0.844	0.810	0.897	0.875	...	115.6	5.189	0.954	100.0	934.0	12.71	7.60346	7.376646	105.6
2016-01-09 16:57:00	7.977	7.791	8.110	7.915	8.140	7.974	0.556	0.528	0.562	0.524	...	113.6	3.540	0.863	100.0	934.0	12.69	8.21631	7.971024	103.6

5 rows × 32 columns

If the timestamp is adjustment is large enough to overlap the unadjusted data before/after the ‘date_from’/’date_to’, by default the function will not overwrite the unadjusted data. The argument ‘overwrite=True’ can be specified to change this default and overwrite the neighbouring timestamps.

Finally, note that the adjusted timestamps can be allocated to a completely new dataframe, leaving the input dataset unchanged.

[16]:

# create new dataset with 6H added to the original timestamps
data_adj = bw.offset_timestamps(data, '6H')

7. Using the `selective_avg()` function¶

This function can be used to create a time series of wind speed using data from two anemometers mounted at the same height, along with one wind vane. For each timestamp in the dataset, the function either averages the two wind speed values or only includes the upstream wind speed value if the other is in the wake of the mast. The wake is defined by winds approaching the mast from the opposite site of the boom within a directional sector of a given size (in this case 60 degrees).

[17]:

# derive selective average variable based on two 80 m anemeters and 78 m wind vane
data['sel_avg_80m'] = bw.selective_avg(data.Spd80mN, data.Spd80mS, wdir=data.Dir78mS, boom_dir_1=0, boom_dir_2=180, sector_width=60)

[18]:

# concatenate inputs and new output variable
summary = data[['Spd80mN', 'Spd80mS', 'Dir78mS','sel_avg_80m']]

# show the first 5 entries, none of which include waked anemometers
summary.head(5)

[18]:

	Spd80mN	Spd80mS	Dir78mS	sel_avg_80m
2016-01-09 15:07:00	8.370	7.911	114.2	8.1405
2016-01-09 15:10:00	8.250	7.961	114.4	8.1055
2016-01-09 16:30:00	7.652	7.545	117.8	7.5985
2016-01-09 16:40:00	7.382	7.325	124.5	7.3535
2016-01-09 16:57:00	7.977	7.791	120.9	7.8840

How to transform your data¶

Outline:¶

1. Importing the brightwind library and loading some sample data¶

2. Using the scale_wind_speed() function¶

3. Using the average_data_by_period() function¶

4. Using the adjust_slope_offset() function¶

5. Using the offset_wind_direction() function¶