Transform Documentation¶

ecopipeline.transform.add_local_time(df: DataFrame, site_name: str, config: ConfigManager) → DataFrame¶

Function adds a column to the dataframe with the local time.

Parameters:

df :pd.DataFrame: Dataframe
site_namestr: site name
configecopipeline.ConfigManager: The ConfigManager object that holds configuration data for the pipeline

Returns:

pd.DataFrame

ecopipeline.transform.add_relative_humidity(df: DataFrame, temp_col: str = 'airTemp_F', dew_point_col: str = 'dewPoint_F', degree_f: bool = True)¶

Add a column for relative humidity to the DataFrame.

Parameters:

dfpd.DataFrame: DataFrame containing air temperature and dew point temperature.
temp_colstr: Column name for air temperature.
dew_point_colstr: Column name for dew point temperature.
degree_fbool: True if temperature columns are in °F, false if in °C

Returns:

pd.DataFrame:: DataFrame with an added column for relative humidity.

ecopipeline.transform.aggregate_df(df: ~pandas.core.frame.DataFrame, ls_filename: str = '', complete_hour_threshold: float = 0.8, complete_day_threshold: float = 1.0, remove_partial: bool = True) -> (<class 'pandas.core.frame.DataFrame'>, <class 'pandas.core.frame.DataFrame'>)¶

Function takes in a pandas dataframe of minute data, aggregates it into hourly and daily dataframes, appends ‘load_shift_day’ column onto the daily_df and the ‘system_state’ column to hourly_df to keep track of the loadshift schedule for the system, and then returns those dataframes. The function will only trim the returned dataframes such that only averages from complete hours and complete days are returned rather than agregated data from partial datasets.

Parameters:

dfpd.DataFrame: Single pandas dataframe of minute-by-minute sensor data.
ls_filenamestr: Path to csv file containing load shift schedule (e.g. “full/path/to/pipeline/input/loadshift_matrix.csv”), There should be at least four columns in this csv: ‘date’, ‘startTime’, ‘endTime’, and ‘event’
complete_hour_thresholdfloat: Default to 0.8. percent of minutes in an hour needed to count as a complete hour. Percent as a float (e.g. 80% = 0.8) Only applicable if remove_partial set to True
complete_day_thresholdfloat: Default to 1.0. percent of hours in a day needed to count as a complete day. Percent as a float (e.g. 80% = 0.8) Only applicable if remove_partial set to True
remove_partialbool: Default to True. Removes parial days and hours from aggregated dfs

Returns:

daily_dfpd.DataFrame: agregated daily dataframe that contains all daily information as well as the ‘load_shift_day’ column if relevant to the data set.
hourly_dfpd.DataFrame: agregated hourly dataframe that contains all hourly information as well as the ‘system_state’ column if relevant to the data set.

ecopipeline.transform.aggregate_values(df: DataFrame, thermo_slice: str) → DataFrame¶

Gets daily average of data for all relevant varibles.

Parameters:

dfpd.DataFrame: Pandas DataFrame of minute by minute data
thermo_slicestr: indicates the time at which slicing begins. If none no slicing is performed. The format of the thermo_slice string is “HH:MM AM/PM”.

Returns:

pd.DataFrame:: Pandas DataFrame which contains the aggregated hourly data.

ecopipeline.transform.apply_equipment_cop_derate(df: DataFrame, equip_cop_col: str, r_val: int = 16) → DataFrame¶

Function derates equipment method system COP based on R value R12 - R16 : 12 % R16 - R20 : 10% R20 - R24 : 8% R24 - R28 : 6% R28 - R32 : 4% > R32 : 2%

Parameters:

dfpd.DataFrame: dataframe
equip_cop_colstr: name of COP column to derate
r_valint: R value, defaults to 16

Returns:

pd.DataFrame: df with equip_cop_col derated

ecopipeline.transform.aqsuite_filter_new(last_date: str, filenames: List[str], site: str, config: ConfigManager) → List[str]¶

Function filters the filenames list to only those newer than the last date.

Parameters:

last_datestr: latest date loaded prior to current runtime
filenamesList[str]: List of filenames to be filtered
sitestr: site name
configecopipeline.ConfigManager: The ConfigManager object that holds configuration data for the pipeline

Returns:

List[str]:: Filtered list of filenames

ecopipeline.transform.aqsuite_prep_time(df: DataFrame) → DataFrame¶

Function takes an aqsuite dataframe and converts the time column into datetime type and sorts the entire dataframe by time. Prereq:

Input dataframe MUST be an aqsuite Dataframe whose columns have not yet been renamed

Parameters:

dfpd.DataFrame): Aqsuite DataFrame

Returns:

pd.DataFrame:: Pandas Dataframe containing data from all files

ecopipeline.transform.avg_duplicate_times(df: DataFrame, timezone: str) → DataFrame¶

Function will take in a dataframe and look for duplicate timestamps (ususally due to daylight savings or rounding). The dataframe will be altered to just have one line for the timestamp, takes the average values between the duplicate timestamps for the columns of the line.

Parameters:

df: pd.DataFrame: Pandas dataframe to be altered
timezone: str: The timezone for the indexes in the output dataframe as a string. Must be a string recognized as a time stamp by the pandas tz_localize() function https://pandas.pydata.org/docs/reference/api/pandas.Series.tz_localize.html

Returns:

pd.DataFrame:: Pandas dataframe with all duplicate timestamps compressed into one, averegaing data values

ecopipeline.transform.calculate_cop_values(df: DataFrame, heatLoss_fixed: int, thermo_slice: str) → DataFrame¶

Performs COP calculations using the daily aggregated data.

Parameters:

dfpd.DataFrame: Pandas DataFrame to add COP columns to
heatloss_fixedfloat: fixed heatloss value
thermo_slicestr: the time at which slicing begins if we would like to thermo slice.

Returns:

pd.DataFrame:: Pandas DataFrame with the added COP columns.

ecopipeline.transform.change_ID_to_HVAC(df: DataFrame, site_info: Series) → DataFrame¶

Function takes in a site dataframe along with the name and path of the site and assigns a unique event_ID value whenever the system changes state.

Parameters:

dfpd.DataFrame: Pandas Dataframe
site_infopd.Series: site_info.csv as a pd.Series

Returns:

pd.DataFrame:: modified Pandas Dataframe

ecopipeline.transform.concat_last_row(df: DataFrame, last_row: DataFrame) → DataFrame¶

This function takes in a dataframe with new data and a second data frame meant to be the last row from the database the new data is being processed for. The two dataframes are then concatenated such that the new data can later be forward filled from the info the last row

Parameters:

dfpd.DataFrame: dataframe with new data that needs to be forward filled from data in the last row of a database
last_rowpd.DataFrame: last row of the database to forward fill from in a pandas dataframe

Returns:

pd.DataFrame:: Pandas dataframe with last row concatenated

ecopipeline.transform.condensate_calculations(df: DataFrame, site: str, site_info: Series) → DataFrame¶

Calculates condensate values for the given dataframe

Parameters:

dfpd.DataFrame: dataframe to be modified
sitestr: name of site
site_infopd.Series: Series of site info

Returns:

pd.DataFrame:: modified dataframe

ecopipeline.transform.convert_c_to_f(df: DataFrame, column_names: list) → DataFrame¶

Function takes in a pandas dataframe of data and a list of column names to convert from degrees Celsius to Farenhiet.

Parameters:

dfpd.DataFrame: Single pandas dataframe of sensor data.
column_nameslist of stings: list of columns with data currently in Celsius that need to be converted to Farenhiet

Returns:

pd.DataFrame: Dataframe with specified columns converted from Celsius to Farenhiet.

ecopipeline.transform.convert_l_to_g(df: DataFrame, column_names: list) → DataFrame¶

Function takes in a pandas dataframe of data and a list of column names to convert from Liters to Gallons.

Parameters:

dfpd.DataFrame: Single pandas dataframe of sensor data.
column_nameslist of stings: list of columns with data currently in Liters that need to be converted to Gallons

Returns:

pd.DataFrame: Dataframe with specified columns converted from Liters to Gallons.

ecopipeline.transform.convert_on_off_col_to_bool(df: DataFrame, column_names: list) → DataFrame¶

Function takes in a pandas dataframe of data and a list of column names to convert from the strings “ON” and “OFF” to boolean values True and False resperctively.

Parameters:

dfpd.DataFrame: Single pandas dataframe of sensor data.
column_nameslist of stings: list of columns with data currently in strings “ON” and “OFF” that need to be converted to boolean values

Returns:

pd.DataFrame: Dataframe with specified columns converted from Celsius to Farenhiet.

ecopipeline.transform.convert_time_zone(df: DataFrame, tz_convert_from: str = 'UTC', tz_convert_to: str = 'America/Los_Angeles') → DataFrame¶

converts a dataframe’s indexed timezone from tz_convert_from to tz_convert_to.

Parameters:

dfpd.DataFrame: Single pandas dataframe of sensor data.
tz_convert_fromstr: String value of timezone data is currently in
tz_convert_tostr: String value of timezone data should be converted to

Returns:

pd.DataFrame:: The dataframe with it’s index converted to the appropriate timezone.

ecopipeline.transform.cop_method_1(df: DataFrame, recircLosses, heatout_primary_column: str = 'HeatOut_Primary', total_input_power_column: str = 'PowerIn_Total') → DataFrame¶

Performs COP calculation method 1 (original AWS method).

Parameters:

df: pd.Dataframe: Pandas dataframe representing daily averaged values from datastream to add COP columns to. Adds column called ‘COP_DHWSys_1’ to the dataframe in place The dataframe needs to already have two columns, ‘HeatOut_Primary’ and ‘PowerIn_Total’ to calculate COP_DHWSys_1
recircLosses: float or pd.Series: If fixed tempurature maintanance reciculation loss value from spot measurement, this should be a float. If reciculation losses measurements are in datastream, this should be a column of df. Units should be in kW.
heatout_primary_columnstr: Name of the column that contains the output power of the primary system in kW. Defaults to ‘HeatOut_Primary’
total_input_power_columnstr: Name of the column that contains the total input power of the system in kW. Defaults to ‘PowerIn_Total’

Returns:

pd.DataFrame: Dataframe with added column for system COP called COP_DHWSys_1

ecopipeline.transform.cop_method_2(df: DataFrame, cop_tm, cop_primary_column_name) → DataFrame¶

Performs COP calculation method 2 as defined by Scott’s whiteboard image COP = COP_primary(ELEC_primary/ELEC_total) + COP_tm(ELEC_tm/ELEC_total)

Parameters:

df: pd.DataFrame: Pandas DataFrame to add COP columns to. The dataframe needs to have a column for the COP of the primary system (see cop_primary_column_name) as well as a column called ‘PowerIn_Total’ for the total system power and columns prefixed with ‘PowerIn_HPWH’ or ‘PowerIn_SecLoopPump’ for power readings taken for HPWHs/primary systems and columns prefixed with ‘PowerIn_SwingTank’ or ‘PowerIn_ERTank’ for power readings taken for Temperature Maintenance systems
cop_tm: float: fixed COP value for temputure Maintenece system
cop_primary_column_name: str: Name of the column used for COP_Primary values

Returns:

pd.DataFrame: Dataframe with added column for system COP called COP_DHWSys_2

ecopipeline.transform.create_data_statistics_df(df: DataFrame) → DataFrame¶

Function must be called on the raw minute data df after the rename_varriables() and before the ffill_missing() function has been called. The function returns a dataframe indexed by day. Each column will expanded to 3 columns, appended with ‘_missing_mins’, ‘_avg_gap’, and ‘_max_gap’ respectively. the columns will carry the following statisctics: _missing_mins -> the number of minutes in the day that have no reported data value for the column _avg_gap -> the average gap (in minutes) between collected data values that day _max_gap -> the maximum gap (in minutes) between collected data values that day

Parameters:

dfpd.DataFrame: minute data df after the rename_varriables() and before the ffill_missing() function has been called

Returns:

daily_data_statspd.DataFrame: new dataframe with the columns descriped in the function’s description

ecopipeline.transform.create_fan_curves(cfm_info: DataFrame, site_info: Series) → DataFrame¶

Create fan curves for each site.

Parameters:

cfm_infopd.DataFrame: DataFrame of fan curve information.
site_infopd.Series: Series containing the site information.

Returns:

pd.DataFrame:: Dataframe containing the fan curves for each site.

ecopipeline.transform.create_summary_tables(df: DataFrame)¶

Revamped version of “aggregate_data” function. Creates hourly and daily summary tables.

Parameters:

dfpd.DataFrame: Single pandas dataframe of minute-by-minute sensor data.

Returns:

pd.DataFrame:: Two pandas dataframes, one of by the hour and one of by the day aggregated sensor data.

ecopipeline.transform.delete_erroneous_from_time_pt(df: DataFrame, time_point: Timestamp, column_names: list, new_value=None) → DataFrame¶

Function will take a pandas dataframe and delete specified erroneous values at a specified time point.

Parameters:

df: pd.DataFrame: Timestamp indexed Pandas dataframe that needs to have an erroneous value removed
time_pointpd.Timestamp: The timepoint index the erroneous value takes place in
column_nameslist: list of column names as strings that contain erroneous values at this time stamp
new_valueany: new value to populate the erroneous columns at this timestamp with. If set to None, will replace value with NaN

Returns:

pd.DataFrame:: Pandas dataframe with error values replaced with new value

ecopipeline.transform.elev_correction(site_name: str, config: ConfigManager) → DataFrame¶

Function creates a dataframe for a given site that contains site name, elevation, and the corrected elevation.

Parameters:

site_namestr: site’s name
configecopipeline.ConfigManager: The ConfigManager object that holds configuration data for the pipeline

Returns:

pd.DataFrame:: new Pandas dataframe

ecopipeline.transform.ffill_missing(original_df: DataFrame, config: ConfigManager, previous_fill: DataFrame = None) → DataFrame¶

Function will take a pandas dataframe and forward fill select variables with no entry.

Parameters:

original_df: pd.DataFrame

Pandas dataframe that needs to be forward filled

configecopipeline.ConfigManager

The ConfigManager object that holds configuration data for the pipeline. Among other things, this object will point to a file called Varriable_Names.csv in the input folder of the pipeline (e.g. “full/path/to/pipeline/input/Variable_Names.csv”). There should be at least three columns in this csv: “variable_name”, “changepoint”, “ffill_length”. The variable_name column should contain the name of each variable in the dataframe that requires forward filling. The changepoint column should contain one of three values:

“0” if the variable should be forward filled to a certain length (see ffill_length). “1” if the varrible should be forward filled completely until the next change point. null if the variable should not be forward filled.

The ffill_length contains the number of rows which should be forward filled if the value in the changepoint is “0”

previous_fill: pd.DataFrame (default None)

A pandas dataframe with the same index type and at least some of the same columns as original_df (usually taken as the last entry from the pipeline that has been put into the destination database). The values of this will be used to forward fill into the new set of data if applicable.

Returns:

pd.DataFrame:: Pandas dataframe that has been forward filled to the specifications detailed in the vars_filename csv

ecopipeline.transform.flag_dhw_outage(df: DataFrame, daily_df: DataFrame, dhw_outlet_column: str, supply_temp: int = 110, consecutive_minutes: int = 15) → DataFrame¶

Parameters:

dfpd.DataFrame: Single pandas dataframe of sensor data on minute intervals.
daily_dfpd.DataFrame: Single pandas dataframe of sensor data on daily intervals.
dhw_outlet_columnstr: Name of the column in df and daily_df that contains temperature of DHW supplied to building occupants
supply_tempint: the minimum DHW temperature acceptable to supply to building occupants
consecutive_minutesint: the number of minutes in a row that DHW is not delivered to tenants to qualify as a DHW Outage

Returns:

event_dfpd.DataFrame: Dataframe with ‘ALARM’ events on the days in which there was a DHW Outage.

ecopipeline.transform.gas_valve_diff(df: DataFrame, site: str, config: ConfigManager) → DataFrame¶

Function takes in the site dataframe and the site name. If the site has gas heating, take the lagged difference to get per minute values.

Parameters:

dfpd.DataFrame: Dataframe for site
sitestr: site name as string
configecopipeline.ConfigManager: The ConfigManager object that holds configuration data for the pipeline

Returns:

pd.DataFrame:: modified Pandas Dataframe

ecopipeline.transform.gather_outdoor_conditions(df: DataFrame, site: str) → DataFrame¶

Function takes in a site dataframe and site name as a string. Returns a new dataframe that contains time_utc, <site>_ODT, and <site>_ODRH for the site.

Parameters:

dfpd.DataFrame: Pandas Dataframe
sitestr: site name as string

Returns:

pd.DataFrame:: new Pandas Dataframe

ecopipeline.transform.generate_event_log_df(config: ConfigManager)¶

Creates an event log df based on user submitted events in an event log csv :param config: The ConfigManager object that holds configuration data for the pipeline. :type config: ecopipeline.ConfigManager

Returns:

event_dfpd.DataFrame: Dataframe formatted from events in Event_log.csv for pipeline.

ecopipeline.transform.get_cfm_values(df, site_cfm, site_info, site)¶

ecopipeline.transform.get_cop_values(df: DataFrame, site_info: DataFrame)¶

ecopipeline.transform.get_energy_by_min(df: DataFrame) → DataFrame¶

Energy is recorded cummulatively. Function takes the lagged differences in order to get a per/minute value for each of the energy variables.

Parameters:

dfpd.DataFrame: Pandas dataframe

Returns:

pd.DataFrame:: Pandas dataframe

ecopipeline.transform.get_hvac_state(df: DataFrame, site_info: Series) → DataFrame¶

ecopipeline.transform.get_refrig_charge(df: DataFrame, site: str, config: ConfigManager) → DataFrame¶

Function takes in a site dataframe, its site name as a string, the path to site_info.csv as a string, the path to superheat.csv as a string, and the path to 410a_pt.csv, and calculates the refrigerant charge per minute?

Parameters:

dfpd.DataFrame: Pandas Dataframe
sitestr: site name as a string
configecopipeline.ConfigManager: The ConfigManager object that holds configuration data for the pipeline

Returns:

pd.DataFrame:: modified Pandas Dataframe

ecopipeline.transform.get_site_cfm_info(site: str, config: ConfigManager) → DataFrame¶

Returns a dataframe of the site cfm information for the given site NOTE: The parsing is necessary as the first row of data are comments that need to be dropped.

Parameters:

sitestr: The site name
configecopipeline.ConfigManager: The ConfigManager object that holds configuration data for the pipeline

Returns:

dfpd.DataFrame: The DataFrame of the site cfm information

ecopipeline.transform.get_site_info(site: str, config: ConfigManager) → Series¶

Returns a dataframe of the site information for the given site

Parameters:

sitestr: The site name
configecopipeline.ConfigManager: The ConfigManager object that holds configuration data for the pipeline

Returns:

dfpd.Series: The Series of the site information

ecopipeline.transform.get_storage_gals120(df: DataFrame, location: Series, gals: int, total: int, zones: Series) → DataFrame¶

Function that creates and appends the Gals120 data onto the Dataframe

Parameters:

dfpd.Series: A Pandas Dataframe
location (pd.Series)
galsint
totalint
zonespd.Series

Returns:

pd.DataFrame:: a Pandas Dataframe

ecopipeline.transform.get_temp_zones120(df: DataFrame) → DataFrame¶

Function that keeps track of the average temperature of each zone. for this function to work, naming conventions for each parrallel tank must include ‘Temp1’ as the tempature at the top of the tank, ‘Temp5’ as that at the bottom of the tank, and ‘Temp2’-‘Temp4’ as the tempatures in between.

Parameters:

dfpd.Series: A Pandas Dataframe

Returns:

pd.DataFrame:: a Pandas Dataframe

ecopipeline.transform.heat_output_calc(df: DataFrame, flow_var: str, hot_temp: str, cold_temp: str, heat_out_col_name: str, return_as_kw: bool = True) → DataFrame¶

Function will take a flow varriable and two temperature inputs to calculate heat output

Parameters:

df: pd.DataFrame: Pandas dataframe with minute-to-minute data
flow_varstr: The column name of the flow varriable for the calculation. Units of the column should be gal/min
hot_tempstr: The column name of the hot temperature varriable for the calculation. Units of the column should be degrees F
cold_tempstr: The column name of the cold temperature varriable for the calculation. Units of the column should be degrees F
heat_out_col_namestr: The new column name for the heat output calculated from the varriables
return_as_kwbool: Set to true for new heat out column to have kW units. Set to false to return column as BTU/hr

Returns:

pd.DataFrame:: Pandas dataframe with new heat output column of specified name.

ecopipeline.transform.join_to_daily(daily_data: DataFrame, cop_data: DataFrame) → DataFrame¶

Function left-joins the the daily data and COP data.

Parameters:

daily_datapd.DataFrame: Daily dataframe
cop_datapd.DataFrame: cop_values dataframe

Returns:

pd.DataFrame: A single, joined dataframe

ecopipeline.transform.join_to_hourly(hourly_data: DataFrame, noaa_data: DataFrame) → DataFrame¶

Function left-joins the weather data to the hourly dataframe.

Parameters:

hourly_datapd.DataFrame: Hourly dataframe
noaa_datapd.DataFrame: noaa dataframe

Returns:

pd.DataFrame:: A single, joined dataframe

ecopipeline.transform.lbnl_pressure_conversions(df: DataFrame) → DataFrame¶

ecopipeline.transform.lbnl_sat_calculations(df: DataFrame) → DataFrame¶

ecopipeline.transform.lbnl_temperature_conversions(df: DataFrame) → DataFrame¶

ecopipeline.transform.merge_indexlike_rows(df: DataFrame) → DataFrame¶

Merges index-like rows together ensuring that all relevant information for a certain timestamp is stored in one row - not in multiple rows. It also rounds the timestamps to the nearest minute.

Parameters:

file_pathstr: The file path to the data.

Returns:

dfpd.DataFrame: The DataFrame with all index-like rows merged.

ecopipeline.transform.nclarity_csv_to_df(csv_filenames: List[str]) → DataFrame¶

Function takes a list of csv filenames containing nclarity data and reads all files into a singular dataframe.

Parameters:

csv_filenamesList[str]: List of filenames

Returns:

pd.DataFrame:: Pandas Dataframe containing data from all files

ecopipeline.transform.nclarity_filter_new(date: str, filenames: List[str]) → List[str]¶

Function filters the filenames list to only those from the given date or later.

Parameters:

datestr: target date
filenamesList[str]: List of filenames to be filtered

Returns:

List[str]:: Filtered list of filenames

ecopipeline.transform.nullify_erroneous(original_df: DataFrame, config: ConfigManager) → DataFrame¶

Function will take a pandas dataframe and make erroneous values NaN.

Parameters:

original_df: pd.DataFrame: Pandas dataframe that needs to be filtered for error values
configecopipeline.ConfigManager: The ConfigManager object that holds configuration data for the pipeline. Among other things, this object will point to a file called Varriable_Names.csv in the input folder of the pipeline (e.g. “full/path/to/pipeline/input/Variable_Names.csv”). There should be at least two columns in this csv: “variable_name” and “error_value” The variable_name should contain the names of all columns in the dataframe that need to have there erroneous values removed The error_value column should contain the error value of each variable_name, or null if there isn’t an error value for that variable

Returns:

pd.DataFrame:: Pandas dataframe with error values replaced with NaNs

ecopipeline.transform.remove_outliers(original_df: DataFrame, config: ConfigManager, site: str = '') → DataFrame¶

Function will take a pandas dataframe and location of bounds information in a csv, store the bounds data in a dataframe, then remove outliers above or below bounds as designated by the csv. Function then returns the resulting dataframe.

Parameters:

original_df: pd.DataFrame: Pandas dataframe for which outliers need to be removed
configecopipeline.ConfigManager: The ConfigManager object that holds configuration data for the pipeline. Among other things, this object will point to a file called Varriable_Names.csv in the input folder of the pipeline (e.g. “full/path/to/pipeline/input/Variable_Names.csv”). The file must have at least three columns which must be titled “variable_name”, “lower_bound”, and “upper_bound” which should contain the name of each variable in the dataframe that requires the removal of outliers, the lower bound for acceptable data, and the upper bound for acceptable data respectively
site: str: string of site name if processing a particular site in a Variable_Names.csv file with multiple sites. Leave as an empty string if not aplicable.

Returns:

pd.DataFrame:: Pandas dataframe with outliers removed and replaced with nans

ecopipeline.transform.remove_partial_days(df, hourly_df, daily_df, complete_hour_threshold: float = 0.8, complete_day_threshold: float = 1.0, partial_day_removal_exclusion: list = [])¶

Helper function for removing daily and hourly values that are calculated from incomplete data.

Parameters:

dfpd.DataFrame: Single pandas dataframe of minute-by-minute sensor data.
daily_dfpd.DataFrame: agregated daily dataframe that contains all daily information.
hourly_dfpd.DataFrame: agregated hourly dataframe that contains all hourly information.
complete_hour_thresholdfloat: Default to 0.8. percent of minutes in an hour needed to count as a complete hour. Percent as a float (e.g. 80% = 0.8)
complete_day_thresholdfloat: Default to 1.0. percent of hours in a day needed to count as a complete day. Percent as a float (e.g. 80% = 0.8)
partial_day_removal_exclusionlist[str]: List of column names to ignore when searching through columns to remove sections without enough data

ecopipeline.transform.rename_sensors(original_df: DataFrame, config: ConfigManager, site: str = '', system: str = '')¶

Function will take in a dataframe and a string representation of a file path and renames sensors from their alias to their true name. Also filters the dataframe by site and system if specified.

Parameters:

original_df: pd.DataFrame: A dataframe that contains data labeled by the raw varriable names to be renamed.
configecopipeline.ConfigManager: The ConfigManager object that holds configuration data for the pipeline. Among other things, this object will point to a file called Varriable_Names.csv in the input folder of the pipeline (e.g. “full/path/to/pipeline/input/Variable_Names.csv”) The csv this points to should have at least 2 columns called “variable_alias” (the raw name to be changed from) and “variable_name” (the name to be changed to). All columns without a cooresponding variable_name will be dropped from the dataframe.
site: str: If the pipeline is processing data for a particular site with a dataframe that contains data from multiple sites that need to be prossessed seperatly, fill in this optional varriable to drop data from all other sites in the returned dataframe. Appropriate varriables in your Variable_Names.csv must have a matching substring to this varriable in a column called “site”.
system: str: If the pipeline is processing data for a particular system with a dataframe that contains data from multiple systems that need to be prossessed seperatly, fill in this optional varriable to drop data from all other systems in the returned dataframe. Appropriate varriables in your Variable_Names.csv must have a matching string to this varriable in a column called “system”

Returns:

df: pd.DataFrame: Pandas dataframe that has been filtered by site and system (if either are applicable) with column names that match those specified in Variable_Names.csv.

ecopipeline.transform.replace_humidity(df: DataFrame, od_conditions: DataFrame, date_forward: datetime, site_name: str) → DataFrame¶

Function replaces all humidity readings for a given site after a given datetime.

Parameters:

dfpd.DataFrame: Dataframe containing the raw sensor data.
od_conditionspd.DataFrame: DataFrame containing outdoor confitions measured by field sensors.
date_forwarddt.datetime: Datetime containing the time after which all humidity readings should be replaced.
site_namestr: String containing the name of the site for which humidity values are to be replaced.

Returns:

pd.DataFrame:: Modified DataFrame where the Humidity_ODRH column contains the field readings after the given datetime.

ecopipeline.transform.round_time(df: DataFrame)¶

Function takes in a dataframe and rounds dataTime index down to the nearest minute. Works in place

Parameters:

dfpd.DataFrame: a dataframe indexed by datetimes. These date times will all be rounded down to the nearest minute.

Returns:

boolean: Returns True if the indexes have been rounded down. Returns False if the fuinction failed (e.g. if df was empty)

ecopipeline.transform.sensor_adjustment(df: DataFrame, config: ConfigManager) → DataFrame¶

TO BE DEPRICATED – Reads in input/adjustments.csv and applies necessary adjustments to the dataframe

Parameters:

dfpd.DataFrame: DataFrame to be adjusted
configecopipeline.ConfigManager: The ConfigManager object that holds configuration data for the pipeline. Among other things, this object will point to a file called adjustments.csv in the input folder of the pipeline (e.g. “full/path/to/pipeline/input/adjustments.csv”)

Returns:

pd.DataFrame:: Adjusted Dataframe

ecopipeline.transform.shift_accumulative_columns(df: DataFrame, column_names: list = [])¶

converts a dataframe’s accumulative columns to non accumulative difference values.

Parameters:

dfpd.DataFrame: Single pandas dataframe of sensor data.
column_nameslist: The names of columns that need to be changed from accumulative sum data to non-accumulative data. Will do this to all columns if set to an empty list

Returns:

pd.DataFrame:: The dataframe with aappropriate columns changed from accumulative sum data to non-accumulative data.

ecopipeline.transform.site_specific(df: DataFrame, site: str) → DataFrame¶

Does Site Specific Calculations for LBNL. The site name is searched using RegEx

Parameters:

dfpd.DataFrame: dataframe of data
sitestr: site name as a string

Returns:

pd.DataFrame:: modified dataframe

ecopipeline.transform.verify_power_energy(df: DataFrame, config: ConfigManager)¶

Verifies that for each timestamp, corresponding power and energy variables are consistent with one another. Power ~= energy * 60. Margin of error TBD. Outputs to a csv file any rows with conflicting power and energy variables.

Prereq:: Input dataframe MUST have had get_energy_by_min() called on it previously

Parameters:

dfpd.DataFrame: Pandas dataframe
configecopipeline.ConfigManager: The ConfigManager object that holds configuration data for the pipeline

Returns:

None

Transform Documentation¶

Table of Contents

Previous topic

Next topic

This Page