Architecture & Code Walkthrough
This guide explains how PyBA works internally. Understanding the architecture will help you contribute, debug issues, or extend the framework.
High-Level Overview
PyBA follows a layered architecture:
┌─────────────────────────────────────────────────────────────┐
│ User Code / CLI │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Entry Points: Engine, Step, DFS, BFS │
│ (pyba/core/main.py) │
│ (pyba/core/lib/mode/*.py) │
└─────────────────────────────────────────────────────────────┘
│
┌───────────────────┼───────────────────┐
▼ ▼ ▼
┌─────────────┐ ┌─────────────┐ ┌─────────────────┐
│ Provider │ │ Agents │ │ BaseEngine │
│ (LLM sel.) │ │ (LLM calls) │ │ (shared logic) │
└─────────────┘ └─────────────┘ └─────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Action Performer │
│ (pyba/core/lib/action.py) │
│ Executes Playwright commands on browser │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Playwright │
│ (Browser control) │
└─────────────────────────────────────────────────────────────┘
Directory Structure
pyba/
├── __init__.py # Public exports: Engine, Database, Step, DFS, BFS
├── config.yaml # Default configuration
├── logger.py # Logging setup
├── version.py # Version string
│
├── core/ # Core automation logic
│ ├── __init__.py # Exports: Engine, DependencyManager
│ ├── main.py # Engine class (main entry point)
│ ├── provider.py # LLM provider selection
│ ├── tracing.py # Playwright trace handling
│ │
│ ├── agent/ # LLM agents
│ │ ├── base_agent.py # Base class with retry logic
│ │ ├── llm_factory.py # Creates LLM clients
│ │ ├── playwright_agent.py # Action decision agent
│ │ ├── planner_agent.py # Plan generation (DFS/BFS)
│ │ └── extraction_agent.py # Data extraction agent
│ │
│ ├── lib/ # Core libraries
│ │ ├── action.py # PlaywrightActionPerformer
│ │ ├── code_generation.py # Script export
│ │ ├── handle_dependencies.py # Playwright setup
│ │ └── mode/ # Exploration modes
│ │ ├── base.py # BaseEngine (shared logic)
│ │ ├── step.py # Step-by-step interactive mode
│ │ ├── DFS.py # Depth-first search
│ │ └── BFS.py # Breadth-first search
│ │
│ ├── helpers/ # Utility helpers
│ │ ├── jitters.py # Random mouse/scroll movements
│ │ └── mem_dsl.py # Action-to-natural-language DSL and rolling history
│ │
│ └── scripts/ # Pre-built scripts
│ ├── js/ # Browser-side JavaScript
│ │ ├── extractions.js # Site-specific link extraction
│ │ └── input_fields.js # Batch input field discovery
│ ├── login/ # Auto-login handlers
│ │ ├── base.py # BaseLogin class
│ │ ├── instagram.py
│ │ ├── facebook.py
│ │ └── gmail.py
│ └── extractions/ # DOM extraction
│ ├── general.py # Generic extraction
│ └── youtube_.py # YouTube-specific
│
├── database/ # Database layer
│ ├── database.py # Database class
│ ├── db_funcs.py # DatabaseFunctions helper
│ ├── models.py # SQLAlchemy models
│ ├── sqlite.py # SQLite handler
│ ├── postgres.py # PostgreSQL handler
│ └── mysql.py # MySQL handler
│
├── utils/ # Utilities
│ ├── common.py # Helper functions
│ ├── exceptions.py # Custom exceptions
│ ├── structure.py # Pydantic models (DSL)
│ ├── load_yaml.py # Config loader
│ └── prompts/ # LLM prompts
│ ├── system_prompt.py
│ ├── general_prompt.py
│ └── planner_agent_prompt.py
│
└── cli/ # Command-line interface
├── cli_entry.py # Entry point
└── cli_core/
├── arg_parser.py # Argument parsing
└── cli_main.py # CLI logic
Entry Points
pyba/__init__.py
The public API. Users import from here:
from pyba import Engine, Database, Step, DFS, BFS
This file re-exports:
Enginefrompyba.core.mainDatabasefrompyba.databaseStep,DFS,BFSfrompyba.core.lib
The Engine Class
Location: pyba/core/main.py
The Engine class is the main entry point for normal mode automation.
Inheritance:
BaseEngine (pyba/core/lib/mode/base.py)
│
├── Engine (pyba/core/main.py)
├── Step (pyba/core/lib/mode/step.py)
├── DFS (pyba/core/lib/mode/DFS.py)
└── BFS (pyba/core/lib/mode/BFS.py)
Key attributes:
session_id: Unique identifier for this runplaywright_agent: The agent that decides actionsmem: MemDSL instance that accumulates a rolling natural language action historymode: “Normal”, “STEP”, “DFS”, or “BFS”max_depth: Maximum actions to take
The run() method flow:
1. Launch browser with Stealth
2. Create browser context (with tracing if enabled)
3. Open new page
4. Extract initial DOM
5. LOOP (up to max_depth times):
a. Check for automated login
b. Pass full action history (from MemDSL) to PlaywrightAgent
c. If action is None → automation complete → get output
d. Execute action via PlaywrightActionPerformer
e. Record outcome in MemDSL (success/failure with reason)
f. If action failed → retry with updated DOM and full history
g. Log to database (for code generation)
h. Extract new DOM
6. Save trace and close browser
BaseEngine
Location: pyba/core/lib/mode/base.py
The BaseEngine contains shared logic used by all modes:
Initialization:
Sets up the
Provider(LLM selection)Creates the
PlaywrightAgentCreates a
MemDSLinstance for rolling action historyInitializes database functions if provided
Handles dependency installation
Key methods:
extract_dom()
Extracts structured data from the current page:
async def extract_dom(self, page=None):
# Wait for page to load
await self.wait_till_loaded(page_obj)
# Get page content
page_html = await page_obj.content()
body_text = await page_obj.inner_text("body")
# Run extraction engine
extraction_engine = ExtractionEngines(
html=page_html,
body_text=body_text,
base_url=base_url,
page=page_obj,
)
cleaned_dom = await extraction_engine.extract_all()
return cleaned_dom
Returns a CleanedDOM dataclass with:
hyperlinks: List of links on pageinput_fields: Fillable form fieldsclickable_fields: Buttons, clickable elementsactual_text: Visible text contentcurrent_url: Current page URL
fetch_action()
Gets the next action from the PlaywrightAgent:
def fetch_action(self, cleaned_dom, user_prompt, action_history, ...):
action = self.playwright_agent.process_action(
cleaned_dom=cleaned_dom,
user_prompt=user_prompt,
action_history=action_history,
extraction_format=extraction_format,
fail_reason=fail_reason,
action_status=action_status,
)
return action
generate_output()
Called when automation is complete (action is None):
async def generate_output(self, action, cleaned_dom, prompt):
if action is None or all(value is None for value in vars(action).values()):
output = self.playwright_agent.get_output(
cleaned_dom=cleaned_dom.to_dict(),
user_prompt=prompt
)
return output
return None
The Agent System
All agents inherit from BaseAgent which provides:
Exponential backoff retry logic
LLM provider handling (OpenAI, VertexAI, Gemini)
Shared utilities
BaseAgent
Location: pyba/core/agent/base_agent.py
Key features:
calculate_next_time(): Exponential backoff calculationhandle_openai_execution(): OpenAI API calls with retrieshandle_vertexai_execution(): VertexAI API calls with retrieshandle_gemini_execution(): Gemini API calls with retries
Retry logic:
while True:
try:
response = agent.send_message(prompt)
break # Success
except Exception:
wait_time = self.calculate_next_time(attempt_number)
time.sleep(wait_time)
attempt_number += 1
PlaywrightAgent
Location: pyba/core/agent/playwright_agent.py
The brain of the operation. Decides what action to take on each page.
Two main methods:
process_action(): Given DOM and prompt, returns the next actionget_output(): Summarizes results when automation completes
process_action() flow:
1. Format prompt with DOM, user instruction, and full action history
2. Call LLM with PlaywrightResponse schema
3. Parse response into action object
4. If extract_info flag is set → trigger ExtractionAgent
5. Return action
The prompt includes:
Current DOM (hyperlinks, inputs, clickables, text)
User’s original task
Full action history from MemDSL (every action taken, with success/failure status and failure reasons)
PlannerAgent
Location: pyba/core/agent/planner_agent.py
Used in DFS and BFS modes to generate high-level plans.
For DFS: Generates one detailed plan, then new plans based on progress.
For BFS: Generates multiple plans upfront for parallel execution.
def generate(self, task, old_plan=None):
prompt = self._initialise_prompt(task=task, old_plan=old_plan)
return self._call_model(agent=self.agent, prompt=prompt)
ExtractionAgent
Location: pyba/core/agent/extraction_agent.py
Extracts structured data from pages when requested.
Runs in a separate thread (non-blocking)
Uses user-provided Pydantic model or generic format
Stores results in database
The Action System
PlaywrightAction (DSL)
Location: pyba/utils/structure.py
Defines all possible browser actions as a Pydantic model:
class PlaywrightAction(BaseModel):
# Navigation
goto: Optional[str]
go_back: Optional[bool]
go_forward: Optional[bool]
reload: Optional[bool]
# Interactions
click: Optional[str]
fill_selector: Optional[str]
fill_value: Optional[str]
# ... many more fields
The LLM fills in the relevant fields based on what action to take.
PlaywrightActionPerformer
Location: pyba/core/lib/action.py
Executes actions on the browser. Maps action fields to Playwright commands.
The perform() dispatcher:
async def perform(self):
a = self.action
if a.goto:
return await self.handle_navigation()
if a.click:
return await self.handle_click()
if a.fill_selector and a.fill_value is not None:
return await self.handle_input()
# ... handles all action types
Special handling in handle_click():
Checks if click target is actually a hyperlink
Extracts href and navigates directly if so
Handles strict mode violations (multiple matches)
Scrolls element into view before clicking
The Provider System
Location: pyba/core/provider.py
Detects which LLM provider the user configured:
class Provider:
def handle_keys(self):
if self.openai_api_key:
self.provider = "openai"
self.model = "gpt-4o"
elif self.vertexai_project_id:
self.provider = "vertexai"
self.model = "gemini-2.0-flash"
elif self.gemini_api_key:
self.provider = "gemini"
self.model = "gemini-2.5-pro"
LLMFactory
Location: pyba/core/agent/llm_factory.py
Creates the actual LLM clients based on provider:
OpenAI: Uses
openai.OpenAI()clientVertexAI: Uses
vertexaiSDKGemini: Uses
google.genaiclient
Database Layer
Location: pyba/database/
Database class (database.py):
Creates SQLAlchemy engine
Initializes tables via handlers
Supports SQLite, PostgreSQL, MySQL
DatabaseFunctions (db_funcs.py):
push_to_episodic_memory(): Log an actionget_episodic_memory_by_session_id(): Retrieve session logs
Models (models.py):
class EpisodicMemory(Base):
__tablename__ = "EpisodicMemory"
id = Column(Integer, primary_key=True)
session_id = Column(String(64))
actions = Column(Text) # JSON array
urls = Column(Text) # JSON array
action_status = Column(Boolean)
fail_reason = Column(Text)
Login System
Location: pyba/core/scripts/login/
BaseLogin (base.py):
Abstract base class for login handlers:
class BaseLogin(ABC):
def __init__(self, page, engine_name):
self.config = load_config("general")["automated_login_configs"][engine_name]
self.username = os.getenv(f"{engine_name}_username")
self.password = os.getenv(f"{engine_name}_password")
@abstractmethod
async def _perform_login(self) -> bool:
raise NotImplementedError
async def run(self):
# Check if we're on a login page
if not verify_login_page(self.page.url, self.config["urls"]):
return None
# Perform the login
success = await self._perform_login()
# Handle 2FA if needed
if self.uses_2FA:
await self._handle_2fa()
return success
Site-specific implementations:
instagram.py: Instagram login flowfacebook.py: Facebook login flowgmail.py: Gmail login flow
Each uses hardcoded selectors from config.yaml for speed.
DOM Extraction
Location: pyba/core/scripts/extractions/
ExtractionEngines (general.py):
Extracts structured data from the current page. Hyperlinks, clickables, and text are extracted via BeautifulSoup on the HTML string. Input fields use a single browser-side JavaScript evaluation for performance.
Input field extraction (pyba/core/scripts/js/input_fields.js):
Instead of iterating elements from Python and fill-testing each one (multiple
Playwright round-trips per element), a single page.evaluate() call runs
JavaScript inside the browser that discovers all fillable fields at once.
The JS checks readOnly, disabled, and getBoundingClientRect() to
determine fillability — no writes to the DOM, no cleanup needed.
async def _extract_input_fields(self):
js_config = {
"valid_tags": [...], # from extraction config
"invalid_input_types": [...], # from extraction config
}
return await self.page.evaluate(self._input_fields_js, js_config)
This reduces input field extraction from ~150 Playwright round-trips on a typical page to a single call.
Code Generation
Location: pyba/core/lib/code_generation.py
CodeGeneration class:
Queries database for session actions
Parses each action string
Maps to Playwright code templates
Generates complete Python script
def generate_script(self):
actions_list = self._get_run_actions()
script_header = "from playwright.sync_api import sync_playwright..."
script_body = []
for action_str in actions_list:
code = self._parse_action_to_code(action_str)
script_body.append(code)
final_script = script_header + "\n".join(script_body) + script_footer
with open(self.output_path, "w") as f:
f.write(final_script)
Stealth & Jitters
Location: pyba/core/helpers/jitters.py
MouseMovements:
async def random_movement(self):
# Generate random bezier curve
# Move mouse along curve with realistic timing
ScrollMovements:
async def apply_scroll_jitters(self):
# Small random scrolls during waits
# Simulates human fidgeting
Used during:
Page load waits
Action execution
2FA waiting
Execution Flow Diagram
Normal Mode:
User calls engine.sync_run(prompt)
│
▼
Launch browser with Stealth
│
▼
Create context & page
│
▼
Extract initial DOM ◄────────────────┐
│ │
▼ │
PlaywrightAgent.process_action() │
(receives full action history) │
│ │
▼ │
Action is None? ───Yes──► get_output() ──► Return result
│
No
│
▼
PlaywrightActionPerformer.perform()
│
▼
Record action in MemDSL
(success or failure with reason)
│
▼
Action failed? ───Yes──► retry_perform_action()
│ │
No │
│ │
▼ │
Log to database │
Extract new DOM ─────────────────┘
Step Mode:
User calls step.start()
│
▼
Launch browser with Stealth
│
▼
Create context & page
│
▼
Extract initial DOM
│
▼
Return control to user
│
▼
User calls step.step(instruction)
│
▼
FOR up to max_actions_per_step: ◄──┐
│ │
▼ │
PlaywrightAgent.process_action() │
│ │
▼ │
Action is None? ───Yes──► Return output to user
│ │
No │
│ │
▼ │
PlaywrightActionPerformer │
│ │
▼ │
Action failed? ───Yes──► Retry │
│ │
No │
│ │
▼ │
Record action in MemDSL │
Extract new DOM ───────────────────┘
│
▼
Return control to user
│
▼
User calls step.stop()
│
▼
Save trace & close browser
DFS Mode:
User calls dfs.sync_run(prompt)
│
▼
Launch browser
│
▼
FOR each breadth iteration:
│
▼
PlannerAgent.generate(task, old_plan)
│
▼
FOR each depth step:
│
▼
[Same as Normal mode loop]
│
▼
old_plan = current_plan
BFS Mode:
User calls bfs.sync_run(prompt)
│
▼
PlannerAgent.generate(task) → List of plans
│
▼
FOR each plan IN PARALLEL:
│
▼
Launch separate browser
│
▼
[Same as Normal mode loop]
│
▼
Collect all results
Extending PyBA
Adding a New Login Handler
Create
pyba/core/scripts/login/newsite.pyInherit from
BaseLoginImplement
_perform_login()Add to
pyba/core/scripts/__init__.pyAdd selectors to
config.yaml
Adding a New Action
Add field to
PlaywrightActioninstructure.pyAdd handler method in
PlaywrightActionPerformerAdd to
perform()dispatcherAdd a natural language template in
MemDSL._resolve()Add to
CodeGeneration.action_mapfor script export
Adding a New LLM Provider
Update
Provider.handle_keys()Add client creation in
LLMFactoryAdd execution handler in
BaseAgentUpdate
config.yamlwith model settings