Load Documentation

ecopipeline.load.check_table_exists(cursor: MySQLCursor, table_name: str, dbname: str) int

Check if the given table name already exists in database.

Parameters:
cursormysql.connector.cursor.MySQLCursor

Database cursor object and the table name.

table_namestr

Name of the table

dbnamestr

Name of the database

Returns:
int:

The number of tables in the database with the given table name. This can directly be used as a boolean!

ecopipeline.load.create_new_table(cursor: MySQLCursor, table_name: str, table_column_names: list, table_column_types: list, primary_key: str = 'time_pt', has_primary_key: bool = True) bool

Creates a new table in the mySQL database.

Parameters:
cursormysql.connector.cursor.MySQLCursor

A cursor object and the name of the table to be created.

table_namestr

Name of the table

table_column_nameslist

list of columns names in the table must be passed.

primary_key: str

The name of the primary index of the table. Should be a datetime. If has_primary_key is set to False, this will just be a column not a key.

has_primary_keybool

Set to False if the table should not establish a primary key. Defaults to True

Returns:
bool:

A boolean value indicating if a table was sucessfully created.

ecopipeline.load.load_data_statistics(config: ConfigManager, daily_stats_df: DataFrame, config_daily_indicator: str = 'day', custom_table_name: str = None)

Logs data statistics for the site in a table with name “{daily table name}_stats”

Parameters:
configecopipeline.ConfigManager

The ConfigManager object that holds configuration data for the pipeline.

daily_stats_dfpd.DataFrame

dataframe created by the create_data_statistics_df() function in ecopipeline.transform

config_daily_indicatorstr

the indicator of the daily_table name in the config.ini file of the data pipeline

custom_table_namestr

custom table name for data statistics. Overwrites the name “{daily table name}_stats” to your custom name. In this sense config_daily_indicator’s pointer is no longer used.

Returns:
bool:

A boolean value indicating if the data was successfully written to the database.

ecopipeline.load.load_event_table(config: ConfigManager, event_df: DataFrame, site_name: str = None)

Loads given pandas DataFrame into a MySQL table overwriting any conflicting data. Uses an UPSERT strategy to ensure any gaps in data are filled. Note: will not overwrite values with NULL. Must have a new value to overwrite existing values in database

Parameters:
configecopipeline.ConfigManager

The ConfigManager object that holds configuration data for the pipeline.

event_df: pd.DataFrame

The pandas DataFrame to be written into the mySQL server. Must have columns event_type and event_detail

site_namestr

the name of the site to correspond the events with. If left blank will default to minute table name

Returns:
bool:

A boolean value indicating if the data was successfully written to the database.

ecopipeline.load.load_overwrite_database(config: ConfigManager, dataframe: DataFrame, config_info: dict, data_type: str, primary_key: str = 'time_pt', table_name: str = None, auto_log_data_loss: bool = False, config_key: str = 'minute')

Loads given pandas DataFrame into a MySQL table overwriting any conflicting data. Uses an UPSERT strategy to ensure any gaps in data are filled. Note: will not overwrite values with NULL. Must have a new value to overwrite existing values in database

Parameters:
configecopipeline.ConfigManager

The ConfigManager object that holds configuration data for the pipeline.

dataframe: pd.DataFrame

The pandas DataFrame to be written into the mySQL server.

config_info: dict

The dictionary containing the configuration information in the data upload. This can be aquired through the get_login_info() function in this package

data_type: str

The header name corresponding to the table you wish to write data to.

primary_keystr

The name of the primary key in the database to upload to. Default as ‘time_pt’

table_namestr

overwrites table name from config_info if needed

auto_log_data_lossbool

if set to True, a data loss event will be reported if no data exits in the dataframe for the last two days from the current date OR if an error occurs

config_keystr

The key in the config.ini file that points to the minute table data for the site. The name of this table is also the site name.

Returns:
bool:

A boolean value indicating if the data was successfully written to the database.

ecopipeline.load.report_data_loss(config: ConfigManager, site_name: str = None)

Logs data loss event in event database (assumes one exists) as a DATA_LOSS_COP event to note that COP calculations have been effected

Parameters:
configecopipeline.ConfigManager

The ConfigManager object that holds configuration data for the pipeline.

site_namestr

the name of the site to correspond the events with. If left blank will default to minute table name

Returns:
bool:

A boolean value indicating if the data was successfully written to the database.