Load Documentation¶
- ecopipeline.load.check_table_exists(cursor: MySQLCursor, table_name: str, dbname: str) int ¶
Check if the given table name already exists in database.
- Parameters:
- cursormysql.connector.cursor.MySQLCursor
Database cursor object and the table name.
- table_namestr
Name of the table
- dbnamestr
Name of the database
- Returns:
- int:
The number of tables in the database with the given table name. This can directly be used as a boolean!
- ecopipeline.load.create_new_table(cursor: MySQLCursor, table_name: str, table_column_names: list, table_column_types: list, primary_key: str = 'time_pt', has_primary_key: bool = True) bool ¶
Creates a new table in the mySQL database.
- Parameters:
- cursormysql.connector.cursor.MySQLCursor
A cursor object and the name of the table to be created.
- table_namestr
Name of the table
- table_column_nameslist
list of columns names in the table must be passed.
- primary_key: str
The name of the primary index of the table. Should be a datetime. If has_primary_key is set to False, this will just be a column not a key.
- has_primary_keybool
Set to False if the table should not establish a primary key. Defaults to True
- Returns:
- bool:
A boolean value indicating if a table was sucessfully created.
- ecopipeline.load.load_data_statistics(config: ConfigManager, daily_stats_df: DataFrame, config_daily_indicator: str = 'day', custom_table_name: str = None)¶
Logs data statistics for the site in a table with name “{daily table name}_stats”
- Parameters:
- configecopipeline.ConfigManager
The ConfigManager object that holds configuration data for the pipeline.
- daily_stats_dfpd.DataFrame
dataframe created by the create_data_statistics_df() function in ecopipeline.transform
- config_daily_indicatorstr
the indicator of the daily_table name in the config.ini file of the data pipeline
- custom_table_namestr
custom table name for data statistics. Overwrites the name “{daily table name}_stats” to your custom name. In this sense config_daily_indicator’s pointer is no longer used.
- Returns:
- bool:
A boolean value indicating if the data was successfully written to the database.
- ecopipeline.load.load_event_table(config: ConfigManager, event_df: DataFrame, site_name: str = None)¶
Loads given pandas DataFrame into a MySQL table overwriting any conflicting data. Uses an UPSERT strategy to ensure any gaps in data are filled. Note: will not overwrite values with NULL. Must have a new value to overwrite existing values in database
- Parameters:
- configecopipeline.ConfigManager
The ConfigManager object that holds configuration data for the pipeline.
- event_df: pd.DataFrame
The pandas DataFrame to be written into the mySQL server. Must have columns event_type and event_detail
- site_namestr
the name of the site to correspond the events with. If left blank will default to minute table name
- Returns:
- bool:
A boolean value indicating if the data was successfully written to the database.
- ecopipeline.load.load_overwrite_database(config: ConfigManager, dataframe: DataFrame, config_info: dict, data_type: str, primary_key: str = 'time_pt', table_name: str = None, auto_log_data_loss: bool = False, config_key: str = 'minute')¶
Loads given pandas DataFrame into a MySQL table overwriting any conflicting data. Uses an UPSERT strategy to ensure any gaps in data are filled. Note: will not overwrite values with NULL. Must have a new value to overwrite existing values in database
- Parameters:
- configecopipeline.ConfigManager
The ConfigManager object that holds configuration data for the pipeline.
- dataframe: pd.DataFrame
The pandas DataFrame to be written into the mySQL server.
- config_info: dict
The dictionary containing the configuration information in the data upload. This can be aquired through the get_login_info() function in this package
- data_type: str
The header name corresponding to the table you wish to write data to.
- primary_keystr
The name of the primary key in the database to upload to. Default as ‘time_pt’
- table_namestr
overwrites table name from config_info if needed
- auto_log_data_lossbool
if set to True, a data loss event will be reported if no data exits in the dataframe for the last two days from the current date OR if an error occurs
- config_keystr
The key in the config.ini file that points to the minute table data for the site. The name of this table is also the site name.
- Returns:
- bool:
A boolean value indicating if the data was successfully written to the database.
- ecopipeline.load.report_data_loss(config: ConfigManager, site_name: str = None)¶
Logs data loss event in event database (assumes one exists) as a DATA_LOSS_COP event to note that COP calculations have been effected
- Parameters:
- configecopipeline.ConfigManager
The ConfigManager object that holds configuration data for the pipeline.
- site_namestr
the name of the site to correspond the events with. If left blank will default to minute table name
- Returns:
- bool:
A boolean value indicating if the data was successfully written to the database.