brightwind.analyse.analyse.dist_matrix

brightwind.analyse.analyse.dist_matrix(var_series, x_series, y_series, num_bins_x=None, num_bins_y=None, x_bins=None, y_bins=None, x_bin_labels=None, y_bin_labels=None, var_label=None, x_label=None, y_label=None, aggregation_method='%frequency', return_data=False)

Calculates the distribution of a variable against two other variables, on an X-Y plane, returning a heat map. By default, the X and Y variables are binned in bins of 1. However, this behaviour can be modified by the user.

Parameters
  • var_series (pandas.Series) – Time-series of the variable whose distribution we need to find.

  • x_series (pandas.Series) – Time-series of the X variable which we want to bin against, forms columns of distribution.

  • y_series (pandas.Series) – Time-series of the Y variable which we want to bin against, forms rows of distribution.

  • num_bins_x (int) – Number of evenly spaced bins to use for x_series. If this and x_bins are not specified, bins of width 1 are used.

  • num_bins_y (int) – Number of evenly spaced bins to use for y_series. If this and y_bins are not specified, bins of width 1 are used.

  • x_bins (list, array, None) – (optional) Array of numbers where adjacent elements of array form a bin. Overwrites num_bins_x. If set to None derives the min and max from the x_series series and creates evenly spaced number of bins specified by num_bins_x.

  • y_bins (list, array, None) – (optional) Array of numbers where adjacent elements of array form a bin. Overwrites num_bins_y. If set to None derives the min and max from the y_series series and creates evenly spaced number of bins specified by num_bins_y.

  • x_bin_labels – (optional) Labels of bins to be used for x_series, uses (bin-start, bin-end] format by default.

:type x_bin_labels:list :param y_bin_labels: (optional) Labels of bins to be used for y_series, uses (bin-start, bin-end] format by

default.

Parameters
  • var_label (str) – (Optional) Label to use for variable distributed, by default name of the var_series is used.

  • x_label (str) – (Optional) Label to use for x_label of heat map, by default name of the x_series is used.

  • y_label (str) – (Optional) Label to use for y_label of heat map, by default name of the y_series is used.

  • aggregation_method (str or function) – Statistical method used to find distribution. It can be mean, max, min, std, count, %frequency or a custom function. Computes frequency in percentages by default.

  • return_data – If True data is also returned with a plot.

Returns

A heat map and a distribution matrix if return_data is True, otherwise just a heat map.

Example usage

import brightwind as bw
df = bw.load_csv(r'C:\Users\Stephen\Documents\Analysis\demo_data.csv')

# For distribution of mean wind speed standard deviation against wind speed and temperature
bw.dist_matrix(df.Spd40mNStd, x_series=df.T2m, y_series=df.Spd40mN, aggregation_method='mean')

# To change the number of bins
bw.dist_matrix(df.Spd40mNStd, x_series=df.T2m, y_series=df.Spd40mN, num_bins_x=4, num_bins_y=10)

# To specify custom bins
bw.dist_matrix(df.Spd40mNStd, x_series=df.T2m, y_series=df.Spd40mN,
               y_bins=[0,6,12, 15, 41], y_bin_labels=['low wind', 'medium wind', 'gale', 'storm'],
               aggregation_method='min', return_data=True)

# For custom aggregation function
def custom_agg(x):
    return x.mean()+(2*x.std())
data = bw.dist_matrix(df.Spd40mNStd, x_series=df.T2m, y_series=df.Spd40mN,
                      aggregation_method=custom_agg, return_data=True)