Transform Documentation

ecopipeline.transform.add_local_time(df: DataFrame, site_name: str, config: ConfigManager) DataFrame

Function adds a column to the dataframe with the local time.

Parameters:
df :pd.DataFrame

Dataframe

site_namestr

site name

configecopipeline.ConfigManager

The ConfigManager object that holds configuration data for the pipeline

Returns:
pd.DataFrame
ecopipeline.transform.add_relative_humidity(df: DataFrame, temp_col: str = 'airTemp_F', dew_point_col: str = 'dewPoint_F', degree_f: bool = True)

Add a column for relative humidity to the DataFrame.

Parameters:
dfpd.DataFrame

DataFrame containing air temperature and dew point temperature.

temp_colstr

Column name for air temperature.

dew_point_colstr

Column name for dew point temperature.

degree_fbool

True if temperature columns are in °F, false if in °C

Returns:
pd.DataFrame:

DataFrame with an added column for relative humidity.

ecopipeline.transform.aggregate_df(df: ~pandas.core.frame.DataFrame, ls_filename: str = '', complete_hour_threshold: float = 0.8, complete_day_threshold: float = 1.0, remove_partial: bool = True) -> (<class 'pandas.core.frame.DataFrame'>, <class 'pandas.core.frame.DataFrame'>)

Function takes in a pandas dataframe of minute data, aggregates it into hourly and daily dataframes, appends ‘load_shift_day’ column onto the daily_df and the ‘system_state’ column to hourly_df to keep track of the loadshift schedule for the system, and then returns those dataframes. The function will only trim the returned dataframes such that only averages from complete hours and complete days are returned rather than agregated data from partial datasets.

Parameters:
dfpd.DataFrame

Single pandas dataframe of minute-by-minute sensor data.

ls_filenamestr

Path to csv file containing load shift schedule (e.g. “full/path/to/pipeline/input/loadshift_matrix.csv”), There should be at least four columns in this csv: ‘date’, ‘startTime’, ‘endTime’, and ‘event’

complete_hour_thresholdfloat

Default to 0.8. percent of minutes in an hour needed to count as a complete hour. Percent as a float (e.g. 80% = 0.8) Only applicable if remove_partial set to True

complete_day_thresholdfloat

Default to 1.0. percent of hours in a day needed to count as a complete day. Percent as a float (e.g. 80% = 0.8) Only applicable if remove_partial set to True

remove_partialbool

Default to True. Removes parial days and hours from aggregated dfs

Returns:
daily_dfpd.DataFrame

agregated daily dataframe that contains all daily information as well as the ‘load_shift_day’ column if relevant to the data set.

hourly_dfpd.DataFrame

agregated hourly dataframe that contains all hourly information as well as the ‘system_state’ column if relevant to the data set.

ecopipeline.transform.aggregate_values(df: DataFrame, thermo_slice: str) DataFrame

Gets daily average of data for all relevant varibles.

Parameters:
dfpd.DataFrame

Pandas DataFrame of minute by minute data

thermo_slicestr

indicates the time at which slicing begins. If none no slicing is performed. The format of the thermo_slice string is “HH:MM AM/PM”.

Returns:
pd.DataFrame:

Pandas DataFrame which contains the aggregated hourly data.

ecopipeline.transform.apply_equipment_cop_derate(df: DataFrame, equip_cop_col: str, r_val: int = 16) DataFrame

Function derates equipment method system COP based on R value R12 - R16 : 12 % R16 - R20 : 10% R20 - R24 : 8% R24 - R28 : 6% R28 - R32 : 4% > R32 : 2%

Parameters:
dfpd.DataFrame

dataframe

equip_cop_colstr

name of COP column to derate

r_valint

R value, defaults to 16

Returns:
pd.DataFrame

df with equip_cop_col derated

ecopipeline.transform.aqsuite_filter_new(last_date: str, filenames: List[str], site: str, config: ConfigManager) List[str]

Function filters the filenames list to only those newer than the last date.

Parameters:
last_datestr

latest date loaded prior to current runtime

filenamesList[str]

List of filenames to be filtered

sitestr

site name

configecopipeline.ConfigManager

The ConfigManager object that holds configuration data for the pipeline

Returns:
List[str]:

Filtered list of filenames

ecopipeline.transform.aqsuite_prep_time(df: DataFrame) DataFrame

Function takes an aqsuite dataframe and converts the time column into datetime type and sorts the entire dataframe by time. Prereq:

Input dataframe MUST be an aqsuite Dataframe whose columns have not yet been renamed

Parameters:
dfpd.DataFrame)

Aqsuite DataFrame

Returns:
pd.DataFrame:

Pandas Dataframe containing data from all files

ecopipeline.transform.avg_duplicate_times(df: DataFrame, timezone: str) DataFrame

Function will take in a dataframe and look for duplicate timestamps (ususally due to daylight savings or rounding). The dataframe will be altered to just have one line for the timestamp, takes the average values between the duplicate timestamps for the columns of the line.

Parameters:
df: pd.DataFrame

Pandas dataframe to be altered

timezone: str

The timezone for the indexes in the output dataframe as a string. Must be a string recognized as a time stamp by the pandas tz_localize() function https://pandas.pydata.org/docs/reference/api/pandas.Series.tz_localize.html

Returns:
pd.DataFrame:

Pandas dataframe with all duplicate timestamps compressed into one, averegaing data values

ecopipeline.transform.calculate_cop_values(df: DataFrame, heatLoss_fixed: int, thermo_slice: str) DataFrame

Performs COP calculations using the daily aggregated data.

Parameters:
dfpd.DataFrame

Pandas DataFrame to add COP columns to

heatloss_fixedfloat

fixed heatloss value

thermo_slicestr

the time at which slicing begins if we would like to thermo slice.

Returns:
pd.DataFrame:

Pandas DataFrame with the added COP columns.

ecopipeline.transform.change_ID_to_HVAC(df: DataFrame, site_info: Series) DataFrame

Function takes in a site dataframe along with the name and path of the site and assigns a unique event_ID value whenever the system changes state.

Parameters:
dfpd.DataFrame

Pandas Dataframe

site_infopd.Series

site_info.csv as a pd.Series

Returns:
pd.DataFrame:

modified Pandas Dataframe

ecopipeline.transform.concat_last_row(df: DataFrame, last_row: DataFrame) DataFrame

This function takes in a dataframe with new data and a second data frame meant to be the last row from the database the new data is being processed for. The two dataframes are then concatenated such that the new data can later be forward filled from the info the last row

Parameters:
dfpd.DataFrame

dataframe with new data that needs to be forward filled from data in the last row of a database

last_rowpd.DataFrame

last row of the database to forward fill from in a pandas dataframe

Returns:
pd.DataFrame:

Pandas dataframe with last row concatenated

ecopipeline.transform.condensate_calculations(df: DataFrame, site: str, site_info: Series) DataFrame

Calculates condensate values for the given dataframe

Parameters:
dfpd.DataFrame

dataframe to be modified

sitestr

name of site

site_infopd.Series

Series of site info

Returns:
pd.DataFrame:

modified dataframe

ecopipeline.transform.convert_c_to_f(df: DataFrame, column_names: list) DataFrame

Function takes in a pandas dataframe of data and a list of column names to convert from degrees Celsius to Farenhiet.

Parameters:
dfpd.DataFrame

Single pandas dataframe of sensor data.

column_nameslist of stings

list of columns with data currently in Celsius that need to be converted to Farenhiet

Returns:
pd.DataFrame: Dataframe with specified columns converted from Celsius to Farenhiet.
ecopipeline.transform.convert_l_to_g(df: DataFrame, column_names: list) DataFrame

Function takes in a pandas dataframe of data and a list of column names to convert from Liters to Gallons.

Parameters:
dfpd.DataFrame

Single pandas dataframe of sensor data.

column_nameslist of stings

list of columns with data currently in Liters that need to be converted to Gallons

Returns:
pd.DataFrame: Dataframe with specified columns converted from Liters to Gallons.
ecopipeline.transform.convert_on_off_col_to_bool(df: DataFrame, column_names: list) DataFrame

Function takes in a pandas dataframe of data and a list of column names to convert from the strings “ON” and “OFF” to boolean values True and False resperctively.

Parameters:
dfpd.DataFrame

Single pandas dataframe of sensor data.

column_nameslist of stings

list of columns with data currently in strings “ON” and “OFF” that need to be converted to boolean values

Returns:
pd.DataFrame: Dataframe with specified columns converted from Celsius to Farenhiet.
ecopipeline.transform.convert_time_zone(df: DataFrame, tz_convert_from: str = 'UTC', tz_convert_to: str = 'America/Los_Angeles') DataFrame

converts a dataframe’s indexed timezone from tz_convert_from to tz_convert_to.

Parameters:
dfpd.DataFrame

Single pandas dataframe of sensor data.

tz_convert_fromstr

String value of timezone data is currently in

tz_convert_tostr

String value of timezone data should be converted to

Returns:
pd.DataFrame:

The dataframe with it’s index converted to the appropriate timezone.

ecopipeline.transform.cop_method_1(df: DataFrame, recircLosses, heatout_primary_column: str = 'HeatOut_Primary', total_input_power_column: str = 'PowerIn_Total') DataFrame

Performs COP calculation method 1 (original AWS method).

Parameters:
df: pd.Dataframe

Pandas dataframe representing daily averaged values from datastream to add COP columns to. Adds column called ‘COP_DHWSys_1’ to the dataframe in place The dataframe needs to already have two columns, ‘HeatOut_Primary’ and ‘PowerIn_Total’ to calculate COP_DHWSys_1

recircLosses: float or pd.Series

If fixed tempurature maintanance reciculation loss value from spot measurement, this should be a float. If reciculation losses measurements are in datastream, this should be a column of df. Units should be in kW.

heatout_primary_columnstr

Name of the column that contains the output power of the primary system in kW. Defaults to ‘HeatOut_Primary’

total_input_power_columnstr

Name of the column that contains the total input power of the system in kW. Defaults to ‘PowerIn_Total’

Returns:
pd.DataFrame: Dataframe with added column for system COP called COP_DHWSys_1
ecopipeline.transform.cop_method_2(df: DataFrame, cop_tm, cop_primary_column_name) DataFrame

Performs COP calculation method 2 as defined by Scott’s whiteboard image COP = COP_primary(ELEC_primary/ELEC_total) + COP_tm(ELEC_tm/ELEC_total)

Parameters:
df: pd.DataFrame

Pandas DataFrame to add COP columns to. The dataframe needs to have a column for the COP of the primary system (see cop_primary_column_name) as well as a column called ‘PowerIn_Total’ for the total system power and columns prefixed with ‘PowerIn_HPWH’ or ‘PowerIn_SecLoopPump’ for power readings taken for HPWHs/primary systems and columns prefixed with ‘PowerIn_SwingTank’ or ‘PowerIn_ERTank’ for power readings taken for Temperature Maintenance systems

cop_tm: float

fixed COP value for temputure Maintenece system

cop_primary_column_name: str

Name of the column used for COP_Primary values

Returns:
pd.DataFrame: Dataframe with added column for system COP called COP_DHWSys_2
ecopipeline.transform.create_data_statistics_df(df: DataFrame) DataFrame

Function must be called on the raw minute data df after the rename_varriables() and before the ffill_missing() function has been called. The function returns a dataframe indexed by day. Each column will expanded to 3 columns, appended with ‘_missing_mins’, ‘_avg_gap’, and ‘_max_gap’ respectively. the columns will carry the following statisctics: _missing_mins -> the number of minutes in the day that have no reported data value for the column _avg_gap -> the average gap (in minutes) between collected data values that day _max_gap -> the maximum gap (in minutes) between collected data values that day

Parameters:
dfpd.DataFrame

minute data df after the rename_varriables() and before the ffill_missing() function has been called

Returns:
daily_data_statspd.DataFrame

new dataframe with the columns descriped in the function’s description

ecopipeline.transform.create_fan_curves(cfm_info: DataFrame, site_info: Series) DataFrame

Create fan curves for each site.

Parameters:
cfm_infopd.DataFrame

DataFrame of fan curve information.

site_infopd.Series

Series containing the site information.

Returns:
pd.DataFrame:

Dataframe containing the fan curves for each site.

ecopipeline.transform.create_summary_tables(df: DataFrame)

Revamped version of “aggregate_data” function. Creates hourly and daily summary tables.

Parameters:
dfpd.DataFrame

Single pandas dataframe of minute-by-minute sensor data.

Returns:
pd.DataFrame:

Two pandas dataframes, one of by the hour and one of by the day aggregated sensor data.

ecopipeline.transform.delete_erroneous_from_time_pt(df: DataFrame, time_point: Timestamp, column_names: list, new_value=None) DataFrame

Function will take a pandas dataframe and delete specified erroneous values at a specified time point.

Parameters:
df: pd.DataFrame

Timestamp indexed Pandas dataframe that needs to have an erroneous value removed

time_pointpd.Timestamp

The timepoint index the erroneous value takes place in

column_nameslist

list of column names as strings that contain erroneous values at this time stamp

new_valueany

new value to populate the erroneous columns at this timestamp with. If set to None, will replace value with NaN

Returns:
pd.DataFrame:

Pandas dataframe with error values replaced with new value

ecopipeline.transform.elev_correction(site_name: str, config: ConfigManager) DataFrame

Function creates a dataframe for a given site that contains site name, elevation, and the corrected elevation.

Parameters:
site_namestr

site’s name

configecopipeline.ConfigManager

The ConfigManager object that holds configuration data for the pipeline

Returns:
pd.DataFrame:

new Pandas dataframe

ecopipeline.transform.ffill_missing(original_df: DataFrame, config: ConfigManager, previous_fill: DataFrame = None) DataFrame

Function will take a pandas dataframe and forward fill select variables with no entry.

Parameters:
original_df: pd.DataFrame

Pandas dataframe that needs to be forward filled

configecopipeline.ConfigManager

The ConfigManager object that holds configuration data for the pipeline. Among other things, this object will point to a file called Varriable_Names.csv in the input folder of the pipeline (e.g. “full/path/to/pipeline/input/Variable_Names.csv”). There should be at least three columns in this csv: “variable_name”, “changepoint”, “ffill_length”. The variable_name column should contain the name of each variable in the dataframe that requires forward filling. The changepoint column should contain one of three values:

“0” if the variable should be forward filled to a certain length (see ffill_length). “1” if the varrible should be forward filled completely until the next change point. null if the variable should not be forward filled.

The ffill_length contains the number of rows which should be forward filled if the value in the changepoint is “0”

previous_fill: pd.DataFrame (default None)

A pandas dataframe with the same index type and at least some of the same columns as original_df (usually taken as the last entry from the pipeline that has been put into the destination database). The values of this will be used to forward fill into the new set of data if applicable.

Returns:
pd.DataFrame:

Pandas dataframe that has been forward filled to the specifications detailed in the vars_filename csv

ecopipeline.transform.flag_dhw_outage(df: DataFrame, daily_df: DataFrame, dhw_outlet_column: str, supply_temp: int = 110, consecutive_minutes: int = 15) DataFrame
Parameters:
dfpd.DataFrame

Single pandas dataframe of sensor data on minute intervals.

daily_dfpd.DataFrame

Single pandas dataframe of sensor data on daily intervals.

dhw_outlet_columnstr

Name of the column in df and daily_df that contains temperature of DHW supplied to building occupants

supply_tempint

the minimum DHW temperature acceptable to supply to building occupants

consecutive_minutesint

the number of minutes in a row that DHW is not delivered to tenants to qualify as a DHW Outage

Returns:
event_dfpd.DataFrame

Dataframe with ‘ALARM’ events on the days in which there was a DHW Outage.

ecopipeline.transform.gas_valve_diff(df: DataFrame, site: str, config: ConfigManager) DataFrame

Function takes in the site dataframe and the site name. If the site has gas heating, take the lagged difference to get per minute values.

Parameters:
dfpd.DataFrame

Dataframe for site

sitestr

site name as string

configecopipeline.ConfigManager

The ConfigManager object that holds configuration data for the pipeline

Returns:
pd.DataFrame:

modified Pandas Dataframe

ecopipeline.transform.gather_outdoor_conditions(df: DataFrame, site: str) DataFrame

Function takes in a site dataframe and site name as a string. Returns a new dataframe that contains time_utc, <site>_ODT, and <site>_ODRH for the site.

Parameters:
dfpd.DataFrame

Pandas Dataframe

sitestr

site name as string

Returns:
pd.DataFrame:

new Pandas Dataframe

ecopipeline.transform.generate_event_log_df(config: ConfigManager)

Creates an event log df based on user submitted events in an event log csv :param config: The ConfigManager object that holds configuration data for the pipeline. :type config: ecopipeline.ConfigManager

Returns:
event_dfpd.DataFrame

Dataframe formatted from events in Event_log.csv for pipeline.

ecopipeline.transform.get_cfm_values(df, site_cfm, site_info, site)
ecopipeline.transform.get_cop_values(df: DataFrame, site_info: DataFrame)
ecopipeline.transform.get_energy_by_min(df: DataFrame) DataFrame

Energy is recorded cummulatively. Function takes the lagged differences in order to get a per/minute value for each of the energy variables.

Parameters:
dfpd.DataFrame

Pandas dataframe

Returns:
pd.DataFrame:

Pandas dataframe

ecopipeline.transform.get_hvac_state(df: DataFrame, site_info: Series) DataFrame
ecopipeline.transform.get_refrig_charge(df: DataFrame, site: str, config: ConfigManager) DataFrame

Function takes in a site dataframe, its site name as a string, the path to site_info.csv as a string, the path to superheat.csv as a string, and the path to 410a_pt.csv, and calculates the refrigerant charge per minute?

Parameters:
dfpd.DataFrame

Pandas Dataframe

sitestr

site name as a string

configecopipeline.ConfigManager

The ConfigManager object that holds configuration data for the pipeline

Returns:
pd.DataFrame:

modified Pandas Dataframe

ecopipeline.transform.get_site_cfm_info(site: str, config: ConfigManager) DataFrame

Returns a dataframe of the site cfm information for the given site NOTE: The parsing is necessary as the first row of data are comments that need to be dropped.

Parameters:
sitestr

The site name

configecopipeline.ConfigManager

The ConfigManager object that holds configuration data for the pipeline

Returns:
dfpd.DataFrame

The DataFrame of the site cfm information

ecopipeline.transform.get_site_info(site: str, config: ConfigManager) Series

Returns a dataframe of the site information for the given site

Parameters:
sitestr

The site name

configecopipeline.ConfigManager

The ConfigManager object that holds configuration data for the pipeline

Returns:
dfpd.Series

The Series of the site information

ecopipeline.transform.get_storage_gals120(df: DataFrame, location: Series, gals: int, total: int, zones: Series) DataFrame

Function that creates and appends the Gals120 data onto the Dataframe

Parameters:
dfpd.Series

A Pandas Dataframe

location (pd.Series)
galsint
totalint
zonespd.Series
Returns:
pd.DataFrame:

a Pandas Dataframe

ecopipeline.transform.get_temp_zones120(df: DataFrame) DataFrame

Function that keeps track of the average temperature of each zone. for this function to work, naming conventions for each parrallel tank must include ‘Temp1’ as the tempature at the top of the tank, ‘Temp5’ as that at the bottom of the tank, and ‘Temp2’-‘Temp4’ as the tempatures in between.

Parameters:
dfpd.Series

A Pandas Dataframe

Returns:
pd.DataFrame:

a Pandas Dataframe

ecopipeline.transform.heat_output_calc(df: DataFrame, flow_var: str, hot_temp: str, cold_temp: str, heat_out_col_name: str, return_as_kw: bool = True) DataFrame

Function will take a flow varriable and two temperature inputs to calculate heat output

Parameters:
df: pd.DataFrame

Pandas dataframe with minute-to-minute data

flow_varstr

The column name of the flow varriable for the calculation. Units of the column should be gal/min

hot_tempstr

The column name of the hot temperature varriable for the calculation. Units of the column should be degrees F

cold_tempstr

The column name of the cold temperature varriable for the calculation. Units of the column should be degrees F

heat_out_col_namestr

The new column name for the heat output calculated from the varriables

return_as_kwbool

Set to true for new heat out column to have kW units. Set to false to return column as BTU/hr

Returns:
pd.DataFrame:

Pandas dataframe with new heat output column of specified name.

ecopipeline.transform.join_to_daily(daily_data: DataFrame, cop_data: DataFrame) DataFrame

Function left-joins the the daily data and COP data.

Parameters:
daily_datapd.DataFrame

Daily dataframe

cop_datapd.DataFrame

cop_values dataframe

Returns:
pd.DataFrame

A single, joined dataframe

ecopipeline.transform.join_to_hourly(hourly_data: DataFrame, noaa_data: DataFrame) DataFrame

Function left-joins the weather data to the hourly dataframe.

Parameters:
hourly_datapd.DataFrame

Hourly dataframe

noaa_datapd.DataFrame

noaa dataframe

Returns:
pd.DataFrame:

A single, joined dataframe

ecopipeline.transform.lbnl_pressure_conversions(df: DataFrame) DataFrame
ecopipeline.transform.lbnl_sat_calculations(df: DataFrame) DataFrame
ecopipeline.transform.lbnl_temperature_conversions(df: DataFrame) DataFrame
ecopipeline.transform.merge_indexlike_rows(df: DataFrame) DataFrame

Merges index-like rows together ensuring that all relevant information for a certain timestamp is stored in one row - not in multiple rows. It also rounds the timestamps to the nearest minute.

Parameters:
file_pathstr

The file path to the data.

Returns:
dfpd.DataFrame

The DataFrame with all index-like rows merged.

ecopipeline.transform.nclarity_csv_to_df(csv_filenames: List[str]) DataFrame

Function takes a list of csv filenames containing nclarity data and reads all files into a singular dataframe.

Parameters:
csv_filenamesList[str]

List of filenames

Returns:
pd.DataFrame:

Pandas Dataframe containing data from all files

ecopipeline.transform.nclarity_filter_new(date: str, filenames: List[str]) List[str]

Function filters the filenames list to only those from the given date or later.

Parameters:
datestr

target date

filenamesList[str]

List of filenames to be filtered

Returns:
List[str]:

Filtered list of filenames

ecopipeline.transform.nullify_erroneous(original_df: DataFrame, config: ConfigManager) DataFrame

Function will take a pandas dataframe and make erroneous values NaN.

Parameters:
original_df: pd.DataFrame

Pandas dataframe that needs to be filtered for error values

configecopipeline.ConfigManager

The ConfigManager object that holds configuration data for the pipeline. Among other things, this object will point to a file called Varriable_Names.csv in the input folder of the pipeline (e.g. “full/path/to/pipeline/input/Variable_Names.csv”). There should be at least two columns in this csv: “variable_name” and “error_value” The variable_name should contain the names of all columns in the dataframe that need to have there erroneous values removed The error_value column should contain the error value of each variable_name, or null if there isn’t an error value for that variable

Returns:
pd.DataFrame:

Pandas dataframe with error values replaced with NaNs

ecopipeline.transform.remove_outliers(original_df: DataFrame, config: ConfigManager, site: str = '') DataFrame

Function will take a pandas dataframe and location of bounds information in a csv, store the bounds data in a dataframe, then remove outliers above or below bounds as designated by the csv. Function then returns the resulting dataframe.

Parameters:
original_df: pd.DataFrame

Pandas dataframe for which outliers need to be removed

configecopipeline.ConfigManager

The ConfigManager object that holds configuration data for the pipeline. Among other things, this object will point to a file called Varriable_Names.csv in the input folder of the pipeline (e.g. “full/path/to/pipeline/input/Variable_Names.csv”). The file must have at least three columns which must be titled “variable_name”, “lower_bound”, and “upper_bound” which should contain the name of each variable in the dataframe that requires the removal of outliers, the lower bound for acceptable data, and the upper bound for acceptable data respectively

site: str

string of site name if processing a particular site in a Variable_Names.csv file with multiple sites. Leave as an empty string if not aplicable.

Returns:
pd.DataFrame:

Pandas dataframe with outliers removed and replaced with nans

ecopipeline.transform.remove_partial_days(df, hourly_df, daily_df, complete_hour_threshold: float = 0.8, complete_day_threshold: float = 1.0, partial_day_removal_exclusion: list = [])

Helper function for removing daily and hourly values that are calculated from incomplete data.

Parameters:
dfpd.DataFrame

Single pandas dataframe of minute-by-minute sensor data.

daily_dfpd.DataFrame

agregated daily dataframe that contains all daily information.

hourly_dfpd.DataFrame

agregated hourly dataframe that contains all hourly information.

complete_hour_thresholdfloat

Default to 0.8. percent of minutes in an hour needed to count as a complete hour. Percent as a float (e.g. 80% = 0.8)

complete_day_thresholdfloat

Default to 1.0. percent of hours in a day needed to count as a complete day. Percent as a float (e.g. 80% = 0.8)

partial_day_removal_exclusionlist[str]

List of column names to ignore when searching through columns to remove sections without enough data

ecopipeline.transform.rename_sensors(original_df: DataFrame, config: ConfigManager, site: str = '', system: str = '')

Function will take in a dataframe and a string representation of a file path and renames sensors from their alias to their true name. Also filters the dataframe by site and system if specified.

Parameters:
original_df: pd.DataFrame

A dataframe that contains data labeled by the raw varriable names to be renamed.

configecopipeline.ConfigManager

The ConfigManager object that holds configuration data for the pipeline. Among other things, this object will point to a file called Varriable_Names.csv in the input folder of the pipeline (e.g. “full/path/to/pipeline/input/Variable_Names.csv”) The csv this points to should have at least 2 columns called “variable_alias” (the raw name to be changed from) and “variable_name” (the name to be changed to). All columns without a cooresponding variable_name will be dropped from the dataframe.

site: str

If the pipeline is processing data for a particular site with a dataframe that contains data from multiple sites that need to be prossessed seperatly, fill in this optional varriable to drop data from all other sites in the returned dataframe. Appropriate varriables in your Variable_Names.csv must have a matching substring to this varriable in a column called “site”.

system: str

If the pipeline is processing data for a particular system with a dataframe that contains data from multiple systems that need to be prossessed seperatly, fill in this optional varriable to drop data from all other systems in the returned dataframe. Appropriate varriables in your Variable_Names.csv must have a matching string to this varriable in a column called “system”

Returns:
df: pd.DataFrame

Pandas dataframe that has been filtered by site and system (if either are applicable) with column names that match those specified in Variable_Names.csv.

ecopipeline.transform.replace_humidity(df: DataFrame, od_conditions: DataFrame, date_forward: datetime, site_name: str) DataFrame

Function replaces all humidity readings for a given site after a given datetime.

Parameters:
dfpd.DataFrame

Dataframe containing the raw sensor data.

od_conditionspd.DataFrame

DataFrame containing outdoor confitions measured by field sensors.

date_forwarddt.datetime

Datetime containing the time after which all humidity readings should be replaced.

site_namestr

String containing the name of the site for which humidity values are to be replaced.

Returns:
pd.DataFrame:

Modified DataFrame where the Humidity_ODRH column contains the field readings after the given datetime.

ecopipeline.transform.round_time(df: DataFrame)

Function takes in a dataframe and rounds dataTime index down to the nearest minute. Works in place

Parameters:
dfpd.DataFrame

a dataframe indexed by datetimes. These date times will all be rounded down to the nearest minute.

Returns:
boolean

Returns True if the indexes have been rounded down. Returns False if the fuinction failed (e.g. if df was empty)

ecopipeline.transform.sensor_adjustment(df: DataFrame, config: ConfigManager) DataFrame

TO BE DEPRICATED – Reads in input/adjustments.csv and applies necessary adjustments to the dataframe

Parameters:
dfpd.DataFrame

DataFrame to be adjusted

configecopipeline.ConfigManager

The ConfigManager object that holds configuration data for the pipeline. Among other things, this object will point to a file called adjustments.csv in the input folder of the pipeline (e.g. “full/path/to/pipeline/input/adjustments.csv”)

Returns:
pd.DataFrame:

Adjusted Dataframe

ecopipeline.transform.shift_accumulative_columns(df: DataFrame, column_names: list = [])

converts a dataframe’s accumulative columns to non accumulative difference values.

Parameters:
dfpd.DataFrame

Single pandas dataframe of sensor data.

column_nameslist

The names of columns that need to be changed from accumulative sum data to non-accumulative data. Will do this to all columns if set to an empty list

Returns:
pd.DataFrame:

The dataframe with aappropriate columns changed from accumulative sum data to non-accumulative data.

ecopipeline.transform.site_specific(df: DataFrame, site: str) DataFrame

Does Site Specific Calculations for LBNL. The site name is searched using RegEx

Parameters:
dfpd.DataFrame

dataframe of data

sitestr

site name as a string

Returns:
pd.DataFrame:

modified dataframe

ecopipeline.transform.verify_power_energy(df: DataFrame, config: ConfigManager)

Verifies that for each timestamp, corresponding power and energy variables are consistent with one another. Power ~= energy * 60. Margin of error TBD. Outputs to a csv file any rows with conflicting power and energy variables.

Prereq:

Input dataframe MUST have had get_energy_by_min() called on it previously

Parameters:
dfpd.DataFrame

Pandas dataframe

configecopipeline.ConfigManager

The ConfigManager object that holds configuration data for the pipeline

Returns:
None