Function Reference¶

sqlite_manager.create_sqlite_db(df: DataFrame, schema_file: str | PathLike, db_file: str | PathLike, table_name: str, log_dir: str | PathLike = None) → None¶

Create or update an SQLite database using a schema file and load data into a specified table from a pandas DataFrame.

Parameters:

df (pd.DataFrame) – The data to be loaded into the database.
schema_file (Union[str, os.PathLike]) – Path to the SQL file containing the schema definition.
db_file (Union[str, os.PathLike]) – Path to the SQLite database file to be created or updated.
table_name (str) – Name of the table to insert the DataFrame into.
log_dir (Union[str, os.PathLike], optional) – Directory where the log file is written. Default is the current working directory.

Raises:

FileNotFoundError – If the schema file does not exist.
ValueError – If the specified table name is not found in the schema or the schema does not match the DataFrame structure.

Examples

import pandas as pd
from sqlite_manager import create_sqlite_db

# Define the DataFrame to insert
data = {
    "InvoiceNo": ["A001", "A002", "A003"],
    "StockCode": ["P001", "P002", "P003"],
    "Description": ["Product 1", "Product 2", "Product 3"],
    "Quantity": [10, 5, 20],
    "InvoiceDate": ["2023-01-01", "2023-01-02", "2023-01-03"],
    "UnitPrice": [12.5, 8.0, 15.0],
    "CustomerID": ["C001", "C002", "C003"],
    "Country": ["USA", "UK", "Germany"]
}
df = pd.DataFrame(data)

# Schema file (SQL file defining the database schema)
schema_file = "schema.sql"
# Contents of schema.sql:
# CREATE TABLE IF NOT EXISTS OnlineRetail (
#     InvoiceNo TEXT NOT NULL,
#     StockCode TEXT NOT NULL,
#     Description TEXT,
#     Quantity INTEGER NOT NULL,
#     InvoiceDate TEXT NOT NULL,
#     UnitPrice REAL NOT NULL,
#     CustomerID TEXT,
#     Country TEXT
# );

# SQLite database file to create or update
db_file = "data/online_retail.db"

# Create or update the database and insert data into the table
create_sqlite_db(
    df=df,
    schema_file=schema_file,
    db_file=db_file,
    table_name="OnlineRetail",
    log_dir="."  # Optional
)

sqlite_manager.run_sql_queries(db_file: str | PathLike, query_dir: str | PathLike = None, output_dir: str | PathLike = None, rerun_all: bool = False, rerun_queries: List[str] = None, log_dir: str | PathLike = None) → None¶

Execute all SQL queries in a directory (including subdirectories) on a SQLite database. Output results as CSV files mirroring the SQL directory structure.

Parameters:

db_file (str or Path) – The path to the SQLite database file.
query_dir (str or Path, optional) – Path to the directory containing SQL query files. Default is os.path.abspath(“./sql_queries”)
output_dir (str or Path, optional) – Path to the output directory for CSV files. Default is os.path.abspath(“./query_results”)
rerun_all (bool, optional) – Whether to rerun queries whose output already exists (default is False).
rerun_queries (list of str, optional) – List of specific query filenames to rerun, regardless of existing output.
log_dir (Union[str, os.PathLike], optional) – Directory where the log file is written. Default is the current working directory.

Return type:

None

Examples

from sqlite_manager import run_sql_queries

query_dir = "sql_queries"          # Directory containing SQL files
db_file = "data/online_retail.db"  # SQLite database file
output_dir = "output"              # Directory to store query results

# Run all queries, skipping those with existing outputs
run_sql_queries (
  query_dir,
  db_file,
  output_dir,
  log_dir="."
)

# Rerun all queries regardless of existing outputs
run_sql_queries (
  query_dir,
  db_file,
  output_dir,
  rerun_all=True
)

# Rerun specific queries
run_sql_queries (
  query_dir,
  db_file,
  output_dir,
  rerun_queries=["query1.sql", "query2.sql"]
)

# # Input directory
# sql_queries/
# ├── task_1/
# │   ├── query1.sql
# │   ├── query2.sql
# ├── task_2/
# │   ├── query3.sql
# │   └── query4.sql
#
# # Output Directory (Query Results):
# output/
# ├── task_1/
# │   ├── query1.csv
# │   ├── query2.csv
# ├── task_2/
# │   ├── query3.csv
# │   └── query4.csv

Run plotting functions based on query results and instructions.

Parameters:

query_results_dir (str or Path) – The path to the directory containing .csv results from SQL queries.
plot_functions_dir (str or Path) – The path to the directory containing Python files with plotting functions.
output_dir (str or Path) – The path to the directory where the plots should be saved.
log_dir (str or Path) – The directory where the log file should be saved. A timestamped log file will be created automatically. Default is the current working dir.
rerun_all (bool, optional) – Force rerun of all plotting functions, by default False.
rerun_functions (list of str, optional) – List of specific plotting functions to rerun, by default None.

Examples

Basic usage:

Assuming you have a directory structure like this:

project/
├── query_results/
│   └── data.csv
├── plot_functions/
│   └── plot_example.py
└── output/
└── log.txt

You can call the function as follows:

from pathlib import Path
from sqlite_manager.run_plot_functions import run_plot_functions

run_plot_functions(
    query_results_dir=Path("project/query_results"),
    plot_functions_dir=Path("project/plot_functions"),
    output_dir=Path("project/output"),
    log_file=Path("project/log.txt"),
    rerun_all=True
)

This will: - Load all .csv files from the query_results/ directory. - Execute all plotting functions in plot_functions/. - Save plots to the output/ directory. - Log messages to log.txt.