brightwind.load.load.apply_cleaning

brightwind.load.load.apply_cleaning(data, cleaning_file_or_df, inplace=False, sensor_col_name='Sensor', date_from_col_name='Start', date_to_col_name='Stop', all_sensors_descriptor='All', replacement_text='NaN')

Apply cleaning to a DataFrame using predetermined flagged periods for each sensor listed in a cleaning file. The flagged data will be replaced with NaN values which then do not appear in any plots or effect calculations.

This file is a simple comma separated file with the sensor name along with the start and end timestamps for the flagged period. There may be other columns in the file however these will be ignores. E.g.: | Sensor | Start | Stop —————————————————- | Spd80m | 2018-10-23 12:30:00 | 2018-10-25 14:20:00 | Dir78m | 2018-12-23 02:40:00 |

Parameters
  • data (pandas.DataFrame) – Data to be cleaned.

  • cleaning_file_or_df (str, pd.DataFrame) – File path of the csv file or a pandas DataFrame which contains the list of sensor names along with the start and end timestamps of the periods that are flagged.

  • inplace (Boolean) – If ‘inplace’ is True, the original data, ‘data’, will be modified and and replaced with the cleaned data. If ‘inplace’ is False, the original data will not be touched and instead a new object containing the cleaned data is created. To store this cleaned data, please ensure it is assigned to a new variable.

  • sensor_col_name (str, default 'Sensor') – The column name which contains the list of sensor names that have flagged periods.

  • date_from_col_name (str, default 'Start') – The column name of the date_from or the start date of the period to be cleaned.

  • date_to_col_name (str, default 'Stop') – The column name of the date_to or the end date of the period to be cleaned.

  • all_sensors_descriptor (str, default 'All') – A text descriptor that represents ALL sensors in the DataFrame.

  • replacement_text (str, default 'NaN') – Text used to replace the flagged data.

Returns

DataFrame with the flagged data removed.

Return type

pandas.DataFrame

Example usage

import brightwind as bw
Load data:

data = bw.load_csv(r’C:UsersStephenDocumentsAnalysisdemo_data’) cleaning_file = r’C:UsersStephenDocumentsAnalysisdemo_cleaning_file.csv’

To apply cleaning to ‘data’ and store the cleaned data in ‘data_cleaned’:

data_cleaned = bw.apply_cleaning(data, cleaning_file) print(data_cleaned)

To modify ‘data’ and replace it with the cleaned data:

bw.apply_cleaning(data, cleaning_file, inplace=True) print(data)

To apply cleaning where the cleaning file has column names other than defaults::

cleaning_file = r’C:somefoldercleaning_file.csv’ data = bw.apply_cleaning(data, cleaning_file, sensor_col_name=’Data column’,

date_from_col_name=’Start Time’, date_to_col_name=’Stop Time’)