Database Configuration
PyBA can log all actions to a database, enabling code generation, auditing, and replay capabilities.
Overview
The database stores:
Session information: Unique IDs for each automation run
Actions: Every action taken during automation
Page URLs: Where each action was performed
Status: Success/failure of each action
Extraction results: Data extracted during automation
Supported Databases
PyBA supports three database engines:
Engine |
Type |
Best For |
|---|---|---|
SQLite |
File-based |
Local development |
PostgreSQL |
Server |
Production, teams |
MySQL |
Server |
Production, teams |
SQLite Setup
The simplest option—no server required:
from pyba import Engine, Database
database = Database(
engine="sqlite",
name="/tmp/pyba.db" # Path to the database file
)
engine = Engine(
openai_api_key="sk-...",
database=database
)
The database file is created automatically if it doesn’t exist.
PostgreSQL Setup
For production environments:
from pyba import Engine, Database
database = Database(
engine="postgres",
name="pyba_db", # Database name
host="localhost", # Or your server IP
port=5432, # Default PostgreSQL port
username="your_user",
password="your_password",
ssl_mode="disable" # Or "require" for encrypted connections
)
engine = Engine(
openai_api_key="sk-...",
database=database
)
SSL Mode options:
"disable"- No encryption (local development)"require"- Encrypted connection (production)
MySQL Setup
from pyba import Engine, Database
database = Database(
engine="mysql",
name="pyba_db", # Database name
host="localhost", # Or your server IP
port=3306, # Default MySQL port
username="your_user",
password="your_password"
)
engine = Engine(
openai_api_key="sk-...",
database=database
)
Database Schema
PyBA creates an EpisodicMemory table to store automation logs:
CREATE TABLE EpisodicMemory (
id INTEGER PRIMARY KEY,
session_id VARCHAR(64),
actions TEXT, -- JSON array of actions
urls TEXT, -- JSON array of URLs
action_status BOOLEAN, -- Success/failure
fail_reason TEXT -- Error message if failed
);
For BFS mode, there’s an additional BFSEpisodicMemory table with a context_id field to track parallel browser sessions.
Querying the Database
SQLite Example
sqlite3 /tmp/pyba.db
# List tables
.tables
# View all sessions
SELECT DISTINCT session_id FROM EpisodicMemory;
# View actions for a session
SELECT * FROM EpisodicMemory WHERE session_id = 'your-session-id';
Python Example
from pyba.database import DatabaseFunctions
# After running automation with a database
db_funcs = DatabaseFunctions(database)
# Get logs for a session
logs = db_funcs.get_episodic_memory_by_session_id(session_id="your-session-id")
print(logs.actions) # JSON string of actions
print(logs.urls) # JSON string of URLs
Code Generation
The main reason to use a database: generating reusable scripts.
from pyba import Engine, Database
db = Database(engine="sqlite", name="/tmp/pyba.db")
engine = Engine(openai_api_key="sk-...", database=db)
# Run the automation
engine.sync_run(prompt="Go to example.com and fill the contact form")
# Generate a standalone script
engine.generate_code(output_path="/tmp/contact_form.py")
The generated script:
Uses
playwright.sync_apidirectlyReplays the exact actions from the database
Has no AI dependencies—runs forever without API costs
See Code Generation for more details.
Configuration via YAML
You can also configure database defaults in pyba/config.yaml:
database:
engine: "sqlite"
name: "/tmp/pyba.db"
host: "localhost"
port: 5432
username: ""
password: ""
ssl_mode: "disable"
These serve as fallbacks when parameters aren’t provided to the Database constructor.
Best Practices
Security
Never put passwords directly in code:
import os
database = Database(
engine="postgres",
name="pyba_db",
host=os.environ["DB_HOST"],
username=os.environ["DB_USER"],
password=os.environ["DB_PASSWORD"],
)
File Paths
For SQLite, use absolute paths to avoid confusion:
# Good
database = Database(engine="sqlite", name="/tmp/pyba/runs.db")
# Risky - depends on working directory
database = Database(engine="sqlite", name="runs.db")
Session Management
Each Engine run gets a unique session_id. You can access it:
engine.sync_run(prompt="Your task")
print(f"Session ID: {engine.session_id}")
Use this ID to query logs or generate code for specific runs.
Cleanup
Database files grow over time. Periodically clean up old sessions:
-- SQLite: Delete sessions older than 7 days
DELETE FROM EpisodicMemory
WHERE session_id IN (
SELECT DISTINCT session_id
FROM EpisodicMemory
GROUP BY session_id
HAVING MAX(id) < (SELECT MAX(id) - 1000 FROM EpisodicMemory)
);