Transform Documentation¶
- ecopipeline.transform.add_local_time(df: DataFrame, site_name: str, config: ConfigManager) DataFrame¶
Function adds a column to the dataframe with the local time.
- Parameters:
- df :pd.DataFrame
Dataframe
- site_namestr
site name
- configecopipeline.ConfigManager
The ConfigManager object that holds configuration data for the pipeline
- Returns:
- pd.DataFrame
- ecopipeline.transform.add_relative_humidity(df: DataFrame, temp_col: str = 'airTemp_F', dew_point_col: str = 'dewPoint_F', degree_f: bool = True)¶
Add a
'relative_humidity'column to the dataframe.Computes relative humidity from air temperature and dew-point temperature using the August-Roche-Magnus approximation. Clips the result to [0, 100].
- Parameters:
- dfpd.DataFrame
Dataframe containing air temperature and dew-point temperature columns.
- temp_colstr, optional
Column name for air temperature. Defaults to
'airTemp_F'.- dew_point_colstr, optional
Column name for dew-point temperature. Defaults to
'dewPoint_F'.- degree_fbool, optional
If
True, temperature columns are assumed to be in °F and are internally converted to °C for the calculation. IfFalse, columns are assumed to already be in °C. Defaults toTrue.
- Returns:
- pd.DataFrame
Dataframe with an added
'relative_humidity'column (percent, 0–100).
- ecopipeline.transform.aggregate_df(df: ~pandas.core.frame.DataFrame, ls_filename: str = '', complete_hour_threshold: float = 0.8, complete_day_threshold: float = 1.0, remove_partial: bool = True) -> (<class 'pandas.core.frame.DataFrame'>, <class 'pandas.core.frame.DataFrame'>)¶
Aggregate minute-level data into hourly and daily dataframes.
Energy columns (matching
.*Energy.*but notEnergyRateor BTU suffixes) are summed; all other numeric columns are averaged. Optionally appends load-shift schedule data and removes partial hours/days.- Parameters:
- dfpd.DataFrame
Pandas dataframe of minute-by-minute sensor data.
- ls_filenamestr, optional
Path to the load-shift schedule CSV file (e.g.
"full/path/to/pipeline/input/loadshift_matrix.csv"). The CSV must have at least four columns:date,startTime,endTime, andevent. Defaults to"".- complete_hour_thresholdfloat, optional
Fraction of minutes in an hour required to count as a complete hour, expressed as a float (e.g. 80% = 0.8). Defaults to 0.8. Only applicable when
remove_partialisTrue.- complete_day_thresholdfloat, optional
Fraction of hours in a day required to count as a complete day, expressed as a float (e.g. 80% = 0.8). Defaults to 1.0. Only applicable when
remove_partialisTrue.- remove_partialbool, optional
If
True, removes partial hours and days from the aggregated dataframes. Defaults toTrue.
- Returns:
- hourly_dfpd.DataFrame
Aggregated hourly dataframe, including a
'system_state'column if a valid load-shift file was provided.- daily_dfpd.DataFrame
Aggregated daily dataframe, including a
'load_shift_day'column if a valid load-shift file was provided.
- ecopipeline.transform.aggregate_values(df: DataFrame, thermo_slice: str) DataFrame¶
Gets daily average of data for all relevant varibles.
- Parameters:
- dfpd.DataFrame
Pandas DataFrame of minute by minute data
- thermo_slicestr
indicates the time at which slicing begins. If none no slicing is performed. The format of the thermo_slice string is “HH:MM AM/PM”.
- Returns:
- pd.DataFrame:
Pandas DataFrame which contains the aggregated hourly data.
- ecopipeline.transform.apply_equipment_cop_derate(df: DataFrame, equip_cop_col: str, r_val: int = 16) DataFrame¶
Derate equipment-method system COP based on building R-value.
Derate percentages applied:
R12–R16: 12%
R16–R20: 10%
R20–R24: 8%
R24–R28: 6%
R28–R32: 4%
> R32: 2%
- Parameters:
- dfpd.DataFrame
Dataframe containing the equipment COP column to derate.
- equip_cop_colstr
Name of the COP column to derate.
- r_valint, optional
Building R-value used to determine the derate factor. Defaults to 16.
- Returns:
- pd.DataFrame
Dataframe with
equip_cop_colmultiplied by the appropriate derate factor.
- Raises:
- Exception
If
r_valis less than 12.
- ecopipeline.transform.aqsuite_filter_new(last_date: str, filenames: List[str], site: str, config: ConfigManager) List[str]¶
Function filters the filenames list to only those newer than the last date.
- Parameters:
- last_datestr
latest date loaded prior to current runtime
- filenamesList[str]
List of filenames to be filtered
- sitestr
site name
- configecopipeline.ConfigManager
The ConfigManager object that holds configuration data for the pipeline
- Returns:
- List[str]:
Filtered list of filenames
- ecopipeline.transform.aqsuite_prep_time(df: DataFrame) DataFrame¶
Function takes an aqsuite dataframe and converts the time column into datetime type and sorts the entire dataframe by time. Prereq:
Input dataframe MUST be an aqsuite Dataframe whose columns have not yet been renamed
- Parameters:
- dfpd.DataFrame)
Aqsuite DataFrame
- Returns:
- pd.DataFrame:
Pandas Dataframe containing data from all files
- ecopipeline.transform.avg_duplicate_times(df: DataFrame, timezone: str) DataFrame¶
Collapse duplicate timestamps by averaging numeric values and taking the first non-numeric value.
Looks for duplicate timestamps (typically caused by daylight-saving time transitions or timestamp rounding) and reduces each group of duplicates to a single row, averaging numeric columns and keeping the first value for non-numeric columns.
- Parameters:
- dfpd.DataFrame
Pandas dataframe to be altered.
- timezonestr
Timezone string to apply to the output index. Must be a string recognised by
pandas.Series.tz_localize. See https://pandas.pydata.org/docs/reference/api/pandas.Series.tz_localize.html.
- Returns:
- pd.DataFrame
Dataframe with all duplicate timestamps collapsed into one row, averaging numeric data values.
- ecopipeline.transform.calculate_cop_values(df: DataFrame, heatLoss_fixed: int, thermo_slice: str) DataFrame¶
Performs COP calculations using the daily aggregated data.
- Parameters:
- dfpd.DataFrame
Pandas DataFrame to add COP columns to
- heatloss_fixedfloat
fixed heatloss value
- thermo_slicestr
the time at which slicing begins if we would like to thermo slice.
- Returns:
- pd.DataFrame:
Pandas DataFrame with the added COP columns.
- ecopipeline.transform.central_transform_function(config: ConfigManager, df: DataFrame, weather_df: DataFrame = None, tz_convert_from: str = 'America/Los_Angeles', tz_convert_to: str = 'America/Los_Angeles', oat_column_name: str = 'Temp_OAT', complete_hour_threshold: float = 0.8, complete_day_threshold: float = 1.0, remove_partial: bool = True, pre_aggregation_func=None, post_aggregation_func=None) [<class 'pandas.core.frame.DataFrame'>, <class 'pandas.core.frame.DataFrame'>, <class 'pandas.core.frame.DataFrame'>]¶
Run the full central transform pipeline on raw minute-level site data.
Renames sensors, rounds timestamps, forward-fills missing values, optionally converts timezones, averages duplicate timestamps, aggregates to hourly and daily dataframes, and optionally merges weather data. Supports optional pre- and post-aggregation hooks for custom processing.
- Parameters:
- configConfigManager
The ConfigManager object that holds configuration data for the pipeline.
- dfpd.DataFrame
Dataframe with raw time-indexed (ideally minute-interval) site data. Important column names should be represented in the
variable_aliascolumn in the Variable_Names.csv file.- weather_dfpd.DataFrame, optional
Dataframe with time-indexed (preferably hourly) weather data. Will be merged with the hourly dataframe.
- tz_convert_fromstr, optional
String value of the timezone the data is currently in.
- tz_convert_tostr, optional
String value of the timezone the data should be converted to.
- oat_column_namestr, optional
Name that the Outdoor Air Temperature column should have. Defaults to
'Temp_OAT'.- complete_hour_thresholdfloat, optional
Percent of minutes in an hour needed to count as a complete hour, expressed as a float (e.g. 80% = 0.8). Defaults to 0.8. Only applicable if
remove_partialisTrue.- complete_day_thresholdfloat, optional
Percent of hours in a day needed to count as a complete day, expressed as a float (e.g. 80% = 0.8). Defaults to 1.0. Only applicable if
remove_partialisTrue.- remove_partialbool, optional
If
True, removes partial days and hours from aggregated dataframes. Defaults toTrue.- pre_aggregation_funccallable, optional
A custom function called after minute-level processing and before aggregation. Signature:
pre_aggregation_func(df: pd.DataFrame) -> pd.DataFrame.- post_aggregation_funccallable, optional
A custom function called after weather merging and before returning. Signature:
post_aggregation_func(df, hourly_df, daily_df) -> tuple[pd.DataFrame, pd.DataFrame, pd.DataFrame].
- Returns:
- tuple of pd.DataFrame
A three-element tuple
(df, hourly_df, daily_df)containing the processed minute-level, hourly, and daily dataframes respectively.
- Raises:
- TypeError
If
pre_aggregation_funcorpost_aggregation_funcis not callable, does not accept the expected parameters, or does not return the expected type.
- ecopipeline.transform.change_ID_to_HVAC(df: DataFrame, site_info: Series) DataFrame¶
Function takes in a site dataframe along with the name and path of the site and assigns a unique event_ID value whenever the system changes state.
- Parameters:
- dfpd.DataFrame
Pandas Dataframe
- site_infopd.Series
site_info.csv as a pd.Series
- Returns:
- pd.DataFrame:
modified Pandas Dataframe
- ecopipeline.transform.column_name_change(df: DataFrame, dt: Timestamp, new_column: str, old_column: str, remove_old_column: bool = True) DataFrame¶
Back-fill
new_columnwith values fromold_columnfor rows before a name-change timestamp.Overwrites values in
new_columnwith values fromold_columnfor all rows with an index earlier thandt, provideddtis within the index range. Optionally removesold_columnafterwards.- Parameters:
- dfpd.DataFrame
Pandas dataframe with minute-to-minute data.
- dtpd.Timestamp
Timestamp of the variable name change.
- new_columnstr
Name of the column to be overwritten for rows before
dt.- old_columnstr
Name of the column to copy values from.
- remove_old_columnbool, optional
If
True, dropsold_columnfrom the dataframe after the copy. Defaults toTrue.
- Returns:
- pd.DataFrame
Dataframe with
new_columnupdated for pre-change rows.
- ecopipeline.transform.concat_last_row(df: DataFrame, last_row: DataFrame) DataFrame¶
Concatenate the last database row onto a new-data dataframe to enable forward filling.
Takes a dataframe with new data and a second dataframe representing the last row of the destination database, concatenates them so that subsequent forward filling can use information from the last row.
- Parameters:
- dfpd.DataFrame
Dataframe with new data that needs to be forward filled from data in the last row of a database.
- last_rowpd.DataFrame
Last row of the database to forward fill from.
- Returns:
- pd.DataFrame
Dataframe with the last row concatenated and sorted by index.
- ecopipeline.transform.condensate_calculations(df: DataFrame, site: str, site_info: Series) DataFrame¶
Calculates condensate values for the given dataframe
- Parameters:
- dfpd.DataFrame
dataframe to be modified
- sitestr
name of site
- site_infopd.Series
Series of site info
- Returns:
- pd.DataFrame:
modified dataframe
- ecopipeline.transform.convert_c_to_f(df: DataFrame, column_names: list) DataFrame¶
Convert specified columns from degrees Celsius to Fahrenheit.
- Parameters:
- dfpd.DataFrame
Pandas dataframe of sensor data.
- column_nameslist
List of column names whose values are currently in Celsius and need to be converted to Fahrenheit.
- Returns:
- pd.DataFrame
Dataframe with the specified columns converted from Celsius to Fahrenheit.
- ecopipeline.transform.convert_l_to_g(df: DataFrame, column_names: list) DataFrame¶
Convert specified columns from liters to gallons.
- Parameters:
- dfpd.DataFrame
Pandas dataframe of sensor data.
- column_nameslist
List of column names whose values are currently in liters and need to be converted to gallons.
- Returns:
- pd.DataFrame
Dataframe with the specified columns converted from liters to gallons.
- ecopipeline.transform.convert_on_off_col_to_bool(df: DataFrame, column_names: list) DataFrame¶
Convert “ON”/”OFF” string values to boolean
True/Falsein specified columns.- Parameters:
- dfpd.DataFrame
Pandas dataframe of sensor data.
- column_nameslist
List of column names containing
"ON"/"OFF"(or"On"/"Off") strings to be converted to boolean values.
- Returns:
- pd.DataFrame
Dataframe with the specified columns converted to boolean values.
- ecopipeline.transform.convert_temp_resistance_type(df: DataFrame, column_name: str, sensor_model='veris') DataFrame¶
Convert temperature resistance readings using a 10k Type 2 thermistor model.
Applies a two-stage pickle-model conversion (temperature-to-resistance, then resistance-to-temperature) to correct sensor readings in the specified column.
- Parameters:
- dfpd.DataFrame
Timestamp-indexed Pandas dataframe of minute-by-minute values.
- column_namestr
Name of the column containing resistance conversion Type 2 data.
- sensor_modelstr, optional
Sensor model to use. Supported values:
'veris','tasseron'. Defaults to'veris'.
- Returns:
- pd.DataFrame
Dataframe with the specified column corrected via the thermistor model.
- Raises:
- Exception
If
sensor_modelis not a supported value.
- ecopipeline.transform.convert_time_zone(df: DataFrame, tz_convert_from: str = 'UTC', tz_convert_to: str = 'America/Los_Angeles') DataFrame¶
Convert a dataframe’s DatetimeIndex from one timezone to another.
- Parameters:
- dfpd.DataFrame
Pandas dataframe of sensor data whose index should be timezone-converted.
- tz_convert_fromstr, optional
Timezone string the index is currently in. Defaults to
'UTC'.- tz_convert_tostr, optional
Timezone string the index should be converted to. Defaults to
'America/Los_Angeles'.
- Returns:
- pd.DataFrame
Dataframe with its index converted to the target timezone (stored without timezone info as a naive datetime index).
- ecopipeline.transform.cop_method_1(df: DataFrame, recircLosses, heatout_primary_column: str = 'HeatOut_Primary', total_input_power_column: str = 'PowerIn_Total') DataFrame¶
Perform COP calculation method 1 (original AWS method).
Computes
COP_DHWSys_1 = (HeatOut_Primary + recircLosses) / PowerIn_Totaland adds the result as a new column to the dataframe.- Parameters:
- dfpd.DataFrame
Pandas dataframe of daily averaged values. Must already contain
heatout_primary_columnandtotal_input_power_column.- recircLossesfloat or pd.Series
Recirculation losses in kW. Pass a
floatfor a fixed spot-measured value, or apd.Series(aligned withdf) if measurements are available in the datastream.- heatout_primary_columnstr, optional
Name of the column containing primary system output power in kW. Defaults to
'HeatOut_Primary'.- total_input_power_columnstr, optional
Name of the column containing total system input power in kW. Defaults to
'PowerIn_Total'.
- Returns:
- pd.DataFrame
Dataframe with an added
'COP_DHWSys_1'column.
- ecopipeline.transform.cop_method_2(df: DataFrame, cop_tm, cop_primary_column_name) DataFrame¶
Perform COP calculation method 2.
Formula:
COP = COP_primary * (ELEC_primary / ELEC_total) + COP_tm * (ELEC_tm / ELEC_total)- Parameters:
- dfpd.DataFrame
Pandas dataframe to add the COP column to. Must contain:
cop_primary_column_name: primary system COP values.'PowerIn_Total': total system power.Columns prefixed with
'PowerIn_HPWH'or equal to'PowerIn_SecLoopPump'(primary system power).Columns prefixed with
'PowerIn_SwingTank'or'PowerIn_ERTank'(temperature-maintenance system power).
- cop_tmfloat
Fixed COP value for the temperature-maintenance system.
- cop_primary_column_namestr
Name of the column containing primary-system COP values.
- Returns:
- pd.DataFrame
Dataframe with an added
'COP_DHWSys_2'column.
- ecopipeline.transform.create_data_statistics_df(df: DataFrame) DataFrame¶
Compute per-column data-gap statistics aggregated by day.
Must be called on the raw minute-level dataframe after
rename_sensors()and beforeffill_missing(). Each original column is expanded into three derived columns:<col>_missing_mins: number of minutes in the day with no reported value.<col>_avg_gap: average consecutive gap length (in minutes) for that day.<col>_max_gap: maximum consecutive gap length (in minutes) for that day.
- Parameters:
- dfpd.DataFrame
Minute-level dataframe after
rename_sensors()and beforeffill_missing()has been called.
- Returns:
- pd.DataFrame
Day-indexed dataframe containing the three gap-statistic columns for each original column.
- ecopipeline.transform.create_fan_curves(cfm_info: DataFrame, site_info: Series) DataFrame¶
Create fan curves for each site.
- Parameters:
- cfm_infopd.DataFrame
DataFrame of fan curve information.
- site_infopd.Series
Series containing the site information.
- Returns:
- pd.DataFrame:
Dataframe containing the fan curves for each site.
- ecopipeline.transform.create_summary_tables(df: DataFrame)¶
Create hourly and daily summary tables from minute-by-minute data.
- Parameters:
- dfpd.DataFrame
Pandas dataframe of minute-by-minute sensor data.
- Returns:
- hourly_dfpd.DataFrame
Hourly mean aggregation of the input data, with partial hours removed.
- daily_dfpd.DataFrame
Daily mean aggregation of the input data, with partial days removed.
- ecopipeline.transform.delete_erroneous_from_time_pt(df: DataFrame, time_point: Timestamp, column_names: list, new_value=None) DataFrame¶
Replace erroneous values at a specific timestamp with a given replacement value.
- Parameters:
- dfpd.DataFrame
Timestamp-indexed Pandas dataframe that contains the erroneous value.
- time_pointpd.Timestamp
The index timestamp at which the erroneous values occur.
- column_nameslist
List of column name strings that contain erroneous values at this timestamp.
- new_valueany, optional
Replacement value to write into the erroneous cells. If
None, the cells are replaced with NaN. Defaults toNone.
- Returns:
- pd.DataFrame
Dataframe with the erroneous values replaced by
new_value.
- ecopipeline.transform.elev_correction(site_name: str, config: ConfigManager) DataFrame¶
Function creates a dataframe for a given site that contains site name, elevation, and the corrected elevation.
- Parameters:
- site_namestr
site’s name
- configecopipeline.ConfigManager
The ConfigManager object that holds configuration data for the pipeline
- Returns:
- pd.DataFrame:
new Pandas dataframe
- ecopipeline.transform.estimate_power(df: DataFrame, new_power_column: str, current_a_column: str, current_b_column: str, current_c_column: str, assumed_voltage: float = 208, power_factor: float = 1) DataFrame¶
Estimate three-phase power from per-phase current readings.
Calculates power as the average phase current multiplied by the assumed voltage, power factor, and sqrt(3), then converts from watts to kilowatts.
- Parameters:
- dfpd.DataFrame
Pandas dataframe with minute-to-minute data.
- new_power_columnstr
Column name to store the estimated power. Units will be kW.
- current_a_columnstr
Column name of the Phase A current. Units should be amps.
- current_b_columnstr
Column name of the Phase B current. Units should be amps.
- current_c_columnstr
Column name of the Phase C current. Units should be amps.
- assumed_voltagefloat, optional
Assumed line voltage in volts. Defaults to 208.
- power_factorfloat, optional
Power factor to apply. Defaults to 1.
- Returns:
- pd.DataFrame
Dataframe with a new estimated power column of the specified name.
- ecopipeline.transform.ffill_missing(original_df: DataFrame, config: ConfigManager, previous_fill: DataFrame = None) DataFrame¶
Forward-fill selected columns of a dataframe according to rules in Variable_Names.csv.
- Parameters:
- original_dfpd.DataFrame
Pandas dataframe that needs to be forward-filled.
- configConfigManager
The ConfigManager object that holds configuration data for the pipeline. Points to a file called Variable_Names.csv in the pipeline’s input folder. The CSV must have at least three columns:
variable_name: name of each variable to forward-fill.changepoint:1to forward-fill unconditionally until the next change point,0to forward-fill up toffill_lengthrows, or null to skip forward-filling for that variable.ffill_length: number of rows to forward-fill whenchangepointis0.
- previous_fillpd.DataFrame, optional
Dataframe with the same index type and at least some of the same columns as
original_df(typically the last row from the destination database). Its values are used to seed forward-filling into the new data.
- Returns:
- pd.DataFrame
Dataframe that has been forward-filled per the specifications in the Variable_Names.csv file.
- ecopipeline.transform.flag_dhw_outage(df: DataFrame, daily_df: DataFrame, dhw_outlet_column: str, supply_temp: int = 110, consecutive_minutes: int = 15) DataFrame¶
Detect DHW outage events and return an alarm event dataframe.
Identifies periods where DHW outlet temperature falls below
supply_tempfor at leastconsecutive_minutesconsecutive minutes, then records an ALARM event for each affected day.- Parameters:
- dfpd.DataFrame
Pandas dataframe of sensor data on minute intervals.
- daily_dfpd.DataFrame
Pandas dataframe of sensor data on daily intervals.
- dhw_outlet_columnstr
Name of the column in
dfthat contains the DHW temperature supplied to building occupants.- supply_tempint, optional
Minimum acceptable DHW supply temperature in °F. Defaults to 110.
- consecutive_minutesint, optional
Number of consecutive minutes below
supply_temprequired to qualify as a DHW outage. Defaults to 15.
- Returns:
- pd.DataFrame
Dataframe indexed by
start_time_ptcontaining'ALARM'events for each day on which a DHW outage occurred.
- ecopipeline.transform.gas_valve_diff(df: DataFrame, site: str, config: ConfigManager) DataFrame¶
Function takes in the site dataframe and the site name. If the site has gas heating, take the lagged difference to get per minute values.
- Parameters:
- dfpd.DataFrame
Dataframe for site
- sitestr
site name as string
- configecopipeline.ConfigManager
The ConfigManager object that holds configuration data for the pipeline
- Returns:
- pd.DataFrame:
modified Pandas Dataframe
- ecopipeline.transform.gather_outdoor_conditions(df: DataFrame, site: str) DataFrame¶
Function takes in a site dataframe and site name as a string. Returns a new dataframe that contains time_utc, <site>_ODT, and <site>_ODRH for the site.
- Parameters:
- dfpd.DataFrame
Pandas Dataframe
- sitestr
site name as string
- Returns:
- pd.DataFrame:
new Pandas Dataframe
- ecopipeline.transform.generate_event_log_df(config: ConfigManager)¶
Create an event log dataframe from a user-submitted Event_log.csv file.
- Parameters:
- configConfigManager
The ConfigManager object that holds configuration data for the pipeline. Points to the Event_log.csv file via
config.get_event_log_path().
- Returns:
- pd.DataFrame
Dataframe indexed by
start_time_ptand formatted from the events in Event_log.csv. Returns an empty dataframe with the expected columns if the file cannot be read.
- ecopipeline.transform.get_cfm_values(df, site_cfm, site_info, site)¶
- ecopipeline.transform.get_cop_values(df: DataFrame, site_info: DataFrame)¶
- ecopipeline.transform.get_energy_by_min(df: DataFrame) DataFrame¶
Energy is recorded cummulatively. Function takes the lagged differences in order to get a per/minute value for each of the energy variables.
- Parameters:
- dfpd.DataFrame
Pandas dataframe
- Returns:
- pd.DataFrame:
Pandas dataframe
- ecopipeline.transform.get_hvac_state(df: DataFrame, site_info: Series) DataFrame¶
- ecopipeline.transform.get_refrig_charge(df: DataFrame, site: str, config: ConfigManager) DataFrame¶
Function takes in a site dataframe, its site name as a string, the path to site_info.csv as a string, the path to superheat.csv as a string, and the path to 410a_pt.csv, and calculates the refrigerant charge per minute?
- Parameters:
- dfpd.DataFrame
Pandas Dataframe
- sitestr
site name as a string
- configecopipeline.ConfigManager
The ConfigManager object that holds configuration data for the pipeline
- Returns:
- pd.DataFrame:
modified Pandas Dataframe
- ecopipeline.transform.get_site_cfm_info(site: str, config: ConfigManager) DataFrame¶
Returns a dataframe of the site cfm information for the given site NOTE: The parsing is necessary as the first row of data are comments that need to be dropped.
- Parameters:
- sitestr
The site name
- configecopipeline.ConfigManager
The ConfigManager object that holds configuration data for the pipeline
- Returns:
- dfpd.DataFrame
The DataFrame of the site cfm information
- ecopipeline.transform.get_site_info(site: str, config: ConfigManager) Series¶
Returns a dataframe of the site information for the given site
- Parameters:
- sitestr
The site name
- configecopipeline.ConfigManager
The ConfigManager object that holds configuration data for the pipeline
- Returns:
- dfpd.Series
The Series of the site information
- ecopipeline.transform.get_storage_gals120(df: DataFrame, location: Series, gals: int, total: int, zones: Series) DataFrame¶
Function that creates and appends the Gals120 data onto the Dataframe
- Parameters:
- dfpd.Series
A Pandas Dataframe
- location (pd.Series)
- galsint
- totalint
- zonespd.Series
- Returns:
- pd.DataFrame:
a Pandas Dataframe
- ecopipeline.transform.get_temp_zones120(df: DataFrame) DataFrame¶
Function that keeps track of the average temperature of each zone. for this function to work, naming conventions for each parrallel tank must include ‘Temp1’ as the tempature at the top of the tank, ‘Temp5’ as that at the bottom of the tank, and ‘Temp2’-‘Temp4’ as the tempatures in between.
- Parameters:
- dfpd.Series
A Pandas Dataframe
- Returns:
- pd.DataFrame:
a Pandas Dataframe
- ecopipeline.transform.heat_output_calc(df: DataFrame, flow_var: str, hot_temp: str, cold_temp: str, heat_out_col_name: str, return_as_kw: bool = True) DataFrame¶
Calculate heat output from flow rate and supply/return temperatures.
Uses the formula
Heat (BTU/hr) = 500 * flow (gal/min) * delta_T (°F)and clips negative values to zero. Optionally converts the result to kW.- Parameters:
- dfpd.DataFrame
Pandas dataframe with minute-to-minute data.
- flow_varstr
Column name of the flow variable. Units must be gal/min.
- hot_tempstr
Column name of the hot (supply) temperature variable. Units must be °F.
- cold_tempstr
Column name of the cold (return) temperature variable. Units must be °F.
- heat_out_col_namestr
Name for the new heat output column added to the dataframe.
- return_as_kwbool, optional
If
True, the new column will be in kW. IfFalse, it will be in BTU/hr. Defaults toTrue.
- Returns:
- pd.DataFrame
Dataframe with the new heat output column of the specified name.
- ecopipeline.transform.join_to_daily(daily_data: DataFrame, cop_data: DataFrame) DataFrame¶
Left-join COP data onto the daily dataframe.
- Parameters:
- daily_datapd.DataFrame
Daily sensor dataframe.
- cop_datapd.DataFrame
COP values dataframe to join.
- Returns:
- pd.DataFrame
Daily dataframe left-joined with the COP dataframe.
- ecopipeline.transform.join_to_hourly(hourly_data: DataFrame, noaa_data: DataFrame, oat_column_name: str = 'OAT_NOAA') DataFrame¶
Left-join weather data onto the hourly dataframe.
- Parameters:
- hourly_datapd.DataFrame
Hourly sensor dataframe.
- noaa_datapd.DataFrame
Weather (e.g. NOAA) dataframe to join.
- oat_column_namestr, optional
Name of the outdoor air temperature column in
noaa_data. Defaults to'OAT_NOAA'.
- Returns:
- pd.DataFrame
Hourly dataframe left-joined with the weather dataframe. Returns
hourly_dataunchanged if the OAT column innoaa_datacontains no non-null values.
- ecopipeline.transform.lbnl_pressure_conversions(df: DataFrame) DataFrame¶
- ecopipeline.transform.lbnl_sat_calculations(df: DataFrame) DataFrame¶
- ecopipeline.transform.lbnl_temperature_conversions(df: DataFrame) DataFrame¶
- ecopipeline.transform.merge_indexlike_rows(df: DataFrame) DataFrame¶
Merges index-like rows together ensuring that all relevant information for a certain timestamp is stored in one row - not in multiple rows. It also rounds the timestamps to the nearest minute.
- Parameters:
- file_pathstr
The file path to the data.
- Returns:
- dfpd.DataFrame
The DataFrame with all index-like rows merged.
- ecopipeline.transform.nclarity_csv_to_df(csv_filenames: List[str]) DataFrame¶
Function takes a list of csv filenames containing nclarity data and reads all files into a singular dataframe.
- Parameters:
- csv_filenamesList[str]
List of filenames
- Returns:
- pd.DataFrame:
Pandas Dataframe containing data from all files
- ecopipeline.transform.nclarity_filter_new(date: str, filenames: List[str]) List[str]¶
Function filters the filenames list to only those from the given date or later.
- Parameters:
- datestr
target date
- filenamesList[str]
List of filenames to be filtered
- Returns:
- List[str]:
Filtered list of filenames
- ecopipeline.transform.nullify_erroneous(original_df: DataFrame, config: ConfigManager) DataFrame¶
Replace known error-sentinel values in a dataframe with NaN.
- Parameters:
- original_dfpd.DataFrame
Pandas dataframe that needs to be filtered for error values.
- configConfigManager
The ConfigManager object that holds configuration data for the pipeline. Points to a file called Variable_Names.csv in the pipeline’s input folder. The CSV must have at least two columns:
variable_name: names of columns that may contain error values.error_value: the sentinel error value for each variable, or null if no error value applies.
- Returns:
- pd.DataFrame
Dataframe with known error-sentinel values replaced by NaN.
- ecopipeline.transform.process_ls_signal(df: DataFrame, hourly_df: DataFrame, daily_df: DataFrame, load_dict: dict = {1: 'normal', 2: 'loadUp', 3: 'shed'}, ls_column: str = 'ls', drop_ls_from_df: bool = False)¶
Add load-shift signals to hourly and daily aggregated dataframes.
- Parameters:
- dfpd.DataFrame
Timestamp-indexed Pandas dataframe of minute-by-minute values.
- hourly_dfpd.DataFrame
Timestamp-indexed Pandas dataframe of hourly average values.
- daily_dfpd.DataFrame
Timestamp-indexed Pandas dataframe of daily average values.
- load_dictdict, optional
Mapping from integer load-shift signal values to descriptive string labels. Defaults to
{1: "normal", 2: "loadUp", 3: "shed"}.- ls_columnstr, optional
Name of the load-shift column in
df. Defaults to'ls'.- drop_ls_from_dfbool, optional
If
True, dropsls_columnfromdfafter processing. Defaults toFalse.
- Returns:
- dfpd.DataFrame
Minute-by-minute dataframe with
ls_columnremoved ifdrop_ls_from_dfisTrue.- hourly_dfpd.DataFrame
Hourly dataframe with an added
'system_state'column containing the load-shift command label fromload_dictfor each hour. Values are mapped from the rounded mean ofls_columnwithin each hour; hours whose rounded mean is not a key inload_dictwill be null.- daily_dfpd.DataFrame
Daily dataframe with an added boolean
'load_shift_day'column that isTrueon days containing at least one non-normal load-shift command inhourly_df.
- ecopipeline.transform.remove_outliers(original_df: DataFrame, config: ConfigManager, site: str = '') DataFrame¶
Remove outliers from a dataframe by replacing out-of-bounds values with NaN.
Reads bound information from Variable_Names.csv via
configand sets any values outside the definedlower_bound/upper_boundrange to NaN.- Parameters:
- original_dfpd.DataFrame
Pandas dataframe for which outliers need to be removed.
- configConfigManager
The ConfigManager object that holds configuration data for the pipeline. Points to a file called Variable_Names.csv in the pipeline’s input folder. The CSV must have at least three columns:
variable_name,lower_bound, andupper_bound.- sitestr, optional
Site name to filter bounds data by. Leave as an empty string if not applicable.
- Returns:
- pd.DataFrame
Dataframe with outliers replaced by NaN.
- ecopipeline.transform.remove_partial_days(df, hourly_df, daily_df, complete_hour_threshold: float = 0.8, complete_day_threshold: float = 1.0, partial_day_removal_exclusion: list = [])¶
Remove hourly and daily rows that are derived from insufficient minute-level data.
- Parameters:
- dfpd.DataFrame
Pandas dataframe of minute-by-minute sensor data.
- hourly_dfpd.DataFrame
Aggregated hourly dataframe.
- daily_dfpd.DataFrame
Aggregated daily dataframe.
- complete_hour_thresholdfloat, optional
Fraction of minutes in an hour required to count as a complete hour, expressed as a float (e.g. 80% = 0.8). Defaults to 0.8.
- complete_day_thresholdfloat, optional
Fraction of hours in a day required to count as a complete day, expressed as a float (e.g. 80% = 0.8). Defaults to 1.0.
- partial_day_removal_exclusionlist, optional
Column names to skip when evaluating completeness. Defaults to
[].
- Returns:
- hourly_dfpd.DataFrame
Hourly dataframe with incomplete hours removed and sparse columns nullified.
- daily_dfpd.DataFrame
Daily dataframe with incomplete days removed and sparse columns nullified.
- Raises:
- Exception
If
complete_hour_thresholdorcomplete_day_thresholdis not between 0 and 1.
- ecopipeline.transform.rename_sensors(original_df: DataFrame, config: ConfigManager, site: str = '', system: str = '')¶
Rename sensor columns from their raw aliases to their true names.
Reads the Variable_Names.csv file via
config, renames columns fromvariable_aliastovariable_name, drops columns with no matching true name, and optionally filters by site and/or system.- Parameters:
- original_dfpd.DataFrame
A dataframe containing data labeled by raw variable names to be renamed.
- configConfigManager
The ConfigManager object that holds configuration data for the pipeline. Points to a file called Variable_Names.csv in the pipeline’s input folder. The CSV must have at least two columns:
variable_alias(the raw name to change from) andvariable_name(the name to change to). Columns without a correspondingvariable_nameare dropped.- sitestr, optional
Site name to filter by. If provided, only rows whose
sitecolumn matches this value are retained. Leave as an empty string if not applicable.- systemstr, optional
System name to filter by. If provided, only rows whose
systemcolumn contains this string are retained. Leave as an empty string if not applicable.
- Returns:
- pd.DataFrame
Dataframe filtered by site and system (if applicable) with column names matching those specified in Variable_Names.csv.
- Raises:
- Exception
If the Variable_Names.csv file is not found at the path provided by
config.
- ecopipeline.transform.replace_humidity(df: DataFrame, od_conditions: DataFrame, date_forward: datetime, site_name: str) DataFrame¶
Function replaces all humidity readings for a given site after a given datetime.
- Parameters:
- dfpd.DataFrame
Dataframe containing the raw sensor data.
- od_conditionspd.DataFrame
DataFrame containing outdoor confitions measured by field sensors.
- date_forwarddt.datetime
Datetime containing the time after which all humidity readings should be replaced.
- site_namestr
String containing the name of the site for which humidity values are to be replaced.
- Returns:
- pd.DataFrame:
Modified DataFrame where the Humidity_ODRH column contains the field readings after the given datetime.
- ecopipeline.transform.round_time(df: DataFrame)¶
Round a dataframe’s DatetimeIndex down to the nearest minute, in place.
- Parameters:
- dfpd.DataFrame
A dataframe indexed by datetimes. All timestamps will be floored to the nearest minute.
- Returns:
- bool
Trueif the index has been rounded down,Falseif the operation failed (e.g. ifdfwas empty).
- ecopipeline.transform.sensor_adjustment(df: DataFrame, config: ConfigManager) DataFrame¶
Apply sensor adjustments from adjustments.csv to the dataframe.
Deprecated since version This: function is scheduled for removal. Use a more explicit adjustment approach instead.
- Parameters:
- dfpd.DataFrame
Dataframe to be adjusted.
- configConfigManager
The ConfigManager object that holds configuration data for the pipeline. Points to a file called
adjustments.csvin the pipeline’s input folder (e.g."full/path/to/pipeline/input/adjustments.csv").
- Returns:
- pd.DataFrame
Adjusted dataframe.
- ecopipeline.transform.shift_accumulative_columns(df: DataFrame, column_names: list = [])¶
Convert accumulative columns to period-difference (non-cumulative) values.
- Parameters:
- dfpd.DataFrame
Pandas dataframe of sensor data.
- column_nameslist, optional
Names of columns to convert from cumulative-sum data to non-cumulative difference data. If an empty list is provided, all columns are converted. Defaults to
[].
- Returns:
- pd.DataFrame
Dataframe with the specified columns (or all columns) converted from cumulative to period-difference values.
- ecopipeline.transform.site_specific(df: DataFrame, site: str) DataFrame¶
Does Site Specific Calculations for LBNL. The site name is searched using RegEx
- Parameters:
- dfpd.DataFrame
dataframe of data
- sitestr
site name as a string
- Returns:
- pd.DataFrame:
modified dataframe
- ecopipeline.transform.verify_power_energy(df: DataFrame, config: ConfigManager)¶
Verifies that for each timestamp, corresponding power and energy variables are consistent with one another. Power ~= energy * 60. Margin of error TBD. Outputs to a csv file any rows with conflicting power and energy variables.
- Prereq:
Input dataframe MUST have had get_energy_by_min() called on it previously
- Parameters:
- dfpd.DataFrame
Pandas dataframe
- configecopipeline.ConfigManager
The ConfigManager object that holds configuration data for the pipeline
- Returns:
- None