Architecture & Code Walkthrough
================================

This guide explains how PyBA works internally. Understanding the architecture will help you contribute, debug issues, or extend the framework.

.. contents::
   :local:
   :depth: 3

High-Level Overview
-------------------

PyBA follows a layered architecture:

.. code-block:: text

   ┌─────────────────────────────────────────────────────────────┐
   │                     User Code / CLI                         │
   └─────────────────────────────────────────────────────────────┘
                              │
                              ▼
   ┌─────────────────────────────────────────────────────────────┐
   │           Entry Points: Engine, Step, DFS, BFS              │
   │                    (pyba/core/main.py)                      │
   │                 (pyba/core/lib/mode/*.py)                   │
   └─────────────────────────────────────────────────────────────┘
                              │
          ┌───────────────────┼───────────────────┐
          ▼                   ▼                   ▼
   ┌─────────────┐    ┌─────────────┐    ┌─────────────────┐
   │   Provider  │    │   Agents    │    │   BaseEngine    │
   │  (LLM sel.) │    │ (LLM calls) │    │ (shared logic)  │
   └─────────────┘    └─────────────┘    └─────────────────┘
                              │
                              ▼
   ┌─────────────────────────────────────────────────────────────┐
   │                  Action Performer                           │
   │              (pyba/core/lib/action.py)                      │
   │         Executes Playwright commands on browser             │
   └─────────────────────────────────────────────────────────────┘
                              │
                              ▼
   ┌─────────────────────────────────────────────────────────────┐
   │                     Playwright                              │
   │                  (Browser control)                          │
   └─────────────────────────────────────────────────────────────┘

Directory Structure
-------------------

.. code-block:: text

   pyba/
   ├── __init__.py              # Public exports: Engine, Database, Step, DFS, BFS
   ├── config.yaml              # Default configuration
   ├── logger.py                # Logging setup
   ├── version.py               # Version string
   │
   ├── core/                    # Core automation logic
   │   ├── __init__.py          # Exports: Engine, DependencyManager
   │   ├── main.py              # Engine class (main entry point)
   │   ├── provider.py          # LLM provider selection
   │   ├── tracing.py           # Playwright trace handling
   │   │
   │   ├── agent/               # LLM agents
   │   │   ├── base_agent.py    # Base class with retry logic
   │   │   ├── llm_factory.py   # Creates LLM clients
   │   │   ├── playwright_agent.py  # Action decision agent
   │   │   ├── planner_agent.py     # Plan generation (DFS/BFS)
   │   │   └── extraction_agent.py  # Data extraction agent
   │   │
   │   ├── lib/                 # Core libraries
   │   │   ├── action.py        # PlaywrightActionPerformer
   │   │   ├── code_generation.py   # Script export
   │   │   ├── handle_dependencies.py  # Playwright setup
   │   │   └── mode/            # Exploration modes
   │   │       ├── base.py      # BaseEngine (shared logic)
   │   │       ├── step.py      # Step-by-step interactive mode
   │   │       ├── DFS.py       # Depth-first search
   │   │       └── BFS.py       # Breadth-first search
   │   │
   │   ├── helpers/             # Utility helpers
   │   │   ├── jitters.py       # Random mouse/scroll movements
   │   │   └── mem_dsl.py       # Action-to-natural-language DSL and rolling history
   │   │
   │   └── scripts/             # Pre-built scripts
   │       ├── js/              # Browser-side JavaScript
   │       │   ├── extractions.js    # Site-specific link extraction
   │       │   └── input_fields.js   # Batch input field discovery
   │       ├── login/           # Auto-login handlers
   │       │   ├── base.py      # BaseLogin class
   │       │   ├── instagram.py
   │       │   ├── facebook.py
   │       │   └── gmail.py
   │       └── extractions/     # DOM extraction
   │           ├── general.py   # Generic extraction
   │           └── youtube_.py  # YouTube-specific
   │
   ├── database/                # Database layer
   │   ├── database.py          # Database class
   │   ├── db_funcs.py          # DatabaseFunctions helper
   │   ├── models.py            # SQLAlchemy models
   │   ├── sqlite.py            # SQLite handler
   │   ├── postgres.py          # PostgreSQL handler
   │   └── mysql.py             # MySQL handler
   │
   ├── utils/                   # Utilities
   │   ├── common.py            # Helper functions
   │   ├── exceptions.py        # Custom exceptions
   │   ├── structure.py         # Pydantic models (DSL)
   │   ├── load_yaml.py         # Config loader
   │   └── prompts/             # LLM prompts
   │       ├── system_prompt.py
   │       ├── general_prompt.py
   │       └── planner_agent_prompt.py
   │
   └── cli/                     # Command-line interface
       ├── cli_entry.py         # Entry point
       └── cli_core/
           ├── arg_parser.py    # Argument parsing
           └── cli_main.py      # CLI logic

Entry Points
------------

pyba/__init__.py
^^^^^^^^^^^^^^^^

The public API. Users import from here:

.. code-block:: python

   from pyba import Engine, Database, Step, DFS, BFS

This file re-exports:

- ``Engine`` from ``pyba.core.main``
- ``Database`` from ``pyba.database``
- ``Step``, ``DFS``, ``BFS`` from ``pyba.core.lib``

The Engine Class
----------------

Location: ``pyba/core/main.py``

The ``Engine`` class is the main entry point for normal mode automation.

**Inheritance:**

.. code-block:: text

   BaseEngine (pyba/core/lib/mode/base.py)
        │
        ├── Engine (pyba/core/main.py)
        ├── Step (pyba/core/lib/mode/step.py)
        ├── DFS (pyba/core/lib/mode/DFS.py)
        └── BFS (pyba/core/lib/mode/BFS.py)

**Key attributes:**

- ``session_id``: Unique identifier for this run
- ``playwright_agent``: The agent that decides actions
- ``mem``: MemDSL instance that accumulates a rolling natural language action history
- ``mode``: "Normal", "STEP", "DFS", or "BFS"
- ``max_depth``: Maximum actions to take

**The run() method flow:**

.. code-block:: text

   1. Launch browser with Stealth
   2. Create browser context (with tracing if enabled)
   3. Open new page
   4. Extract initial DOM
   5. LOOP (up to max_depth times):
      a. Check for automated login
      b. Pass full action history (from MemDSL) to PlaywrightAgent
      c. If action is None → automation complete → get output
      d. Execute action via PlaywrightActionPerformer
      e. Record outcome in MemDSL (success/failure with reason)
      f. If action failed → retry with updated DOM and full history
      g. Log to database (for code generation)
      h. Extract new DOM
   6. Save trace and close browser

BaseEngine
----------

Location: ``pyba/core/lib/mode/base.py``

The ``BaseEngine`` contains shared logic used by all modes:

**Initialization:**

- Sets up the ``Provider`` (LLM selection)
- Creates the ``PlaywrightAgent``
- Creates a ``MemDSL`` instance for rolling action history
- Initializes database functions if provided
- Handles dependency installation

**Key methods:**

``extract_dom()``
^^^^^^^^^^^^^^^^^

Extracts structured data from the current page:

.. code-block:: python

   async def extract_dom(self, page=None):
       # Wait for page to load
       await self.wait_till_loaded(page_obj)

       # Get page content
       page_html = await page_obj.content()
       body_text = await page_obj.inner_text("body")

       # Run extraction engine
       extraction_engine = ExtractionEngines(
           html=page_html,
           body_text=body_text,
           base_url=base_url,
           page=page_obj,
       )

       cleaned_dom = await extraction_engine.extract_all()
       return cleaned_dom

Returns a ``CleanedDOM`` dataclass with:

- ``hyperlinks``: List of links on page
- ``input_fields``: Fillable form fields
- ``clickable_fields``: Buttons, clickable elements
- ``actual_text``: Visible text content
- ``current_url``: Current page URL

``fetch_action()``
^^^^^^^^^^^^^^^^^^

Gets the next action from the PlaywrightAgent:

.. code-block:: python

   def fetch_action(self, cleaned_dom, user_prompt, action_history, ...):
       action = self.playwright_agent.process_action(
           cleaned_dom=cleaned_dom,
           user_prompt=user_prompt,
           action_history=action_history,
           extraction_format=extraction_format,
           fail_reason=fail_reason,
           action_status=action_status,
       )
       return action

``generate_output()``
^^^^^^^^^^^^^^^^^^^^^

Called when automation is complete (action is None):

.. code-block:: python

   async def generate_output(self, action, cleaned_dom, prompt):
       if action is None or all(value is None for value in vars(action).values()):
           output = self.playwright_agent.get_output(
               cleaned_dom=cleaned_dom.to_dict(),
               user_prompt=prompt
           )
           return output
       return None

The Agent System
----------------

All agents inherit from ``BaseAgent`` which provides:

- Exponential backoff retry logic
- LLM provider handling (OpenAI, VertexAI, Gemini)
- Shared utilities

BaseAgent
^^^^^^^^^

Location: ``pyba/core/agent/base_agent.py``

**Key features:**

- ``calculate_next_time()``: Exponential backoff calculation
- ``handle_openai_execution()``: OpenAI API calls with retries
- ``handle_vertexai_execution()``: VertexAI API calls with retries
- ``handle_gemini_execution()``: Gemini API calls with retries

**Retry logic:**

.. code-block:: python

   while True:
       try:
           response = agent.send_message(prompt)
           break  # Success
       except Exception:
           wait_time = self.calculate_next_time(attempt_number)
           time.sleep(wait_time)
           attempt_number += 1

PlaywrightAgent
^^^^^^^^^^^^^^^

Location: ``pyba/core/agent/playwright_agent.py``

The brain of the operation. Decides what action to take on each page.

**Two main methods:**

1. ``process_action()``: Given DOM and prompt, returns the next action
2. ``get_output()``: Summarizes results when automation completes

**process_action() flow:**

.. code-block:: text

   1. Format prompt with DOM, user instruction, and full action history
   2. Call LLM with PlaywrightResponse schema
   3. Parse response into action object
   4. If extract_info flag is set → trigger ExtractionAgent
   5. Return action

**The prompt includes:**

- Current DOM (hyperlinks, inputs, clickables, text)
- User's original task
- Full action history from MemDSL (every action taken, with success/failure status and failure reasons)

PlannerAgent
^^^^^^^^^^^^

Location: ``pyba/core/agent/planner_agent.py``

Used in DFS and BFS modes to generate high-level plans.

**For DFS**: Generates one detailed plan, then new plans based on progress.

**For BFS**: Generates multiple plans upfront for parallel execution.

.. code-block:: python

   def generate(self, task, old_plan=None):
       prompt = self._initialise_prompt(task=task, old_plan=old_plan)
       return self._call_model(agent=self.agent, prompt=prompt)

ExtractionAgent
^^^^^^^^^^^^^^^

Location: ``pyba/core/agent/extraction_agent.py``

Extracts structured data from pages when requested.

- Runs in a separate thread (non-blocking)
- Uses user-provided Pydantic model or generic format
- Stores results in database

The Action System
-----------------

PlaywrightAction (DSL)
^^^^^^^^^^^^^^^^^^^^^^

Location: ``pyba/utils/structure.py``

Defines all possible browser actions as a Pydantic model:

.. code-block:: python

   class PlaywrightAction(BaseModel):
       # Navigation
       goto: Optional[str]
       go_back: Optional[bool]
       go_forward: Optional[bool]
       reload: Optional[bool]

       # Interactions
       click: Optional[str]
       fill_selector: Optional[str]
       fill_value: Optional[str]
       # ... many more fields

The LLM fills in the relevant fields based on what action to take.

PlaywrightActionPerformer
^^^^^^^^^^^^^^^^^^^^^^^^^

Location: ``pyba/core/lib/action.py``

Executes actions on the browser. Maps action fields to Playwright commands.

**The perform() dispatcher:**

.. code-block:: python

   async def perform(self):
       a = self.action
       if a.goto:
           return await self.handle_navigation()
       if a.click:
           return await self.handle_click()
       if a.fill_selector and a.fill_value is not None:
           return await self.handle_input()
       # ... handles all action types

**Special handling in handle_click():**

- Checks if click target is actually a hyperlink
- Extracts href and navigates directly if so
- Handles strict mode violations (multiple matches)
- Scrolls element into view before clicking

The Provider System
-------------------

Location: ``pyba/core/provider.py``

Detects which LLM provider the user configured:

.. code-block:: python

   class Provider:
       def handle_keys(self):
           if self.openai_api_key:
               self.provider = "openai"
               self.model = "gpt-4o"
           elif self.vertexai_project_id:
               self.provider = "vertexai"
               self.model = "gemini-2.0-flash"
           elif self.gemini_api_key:
               self.provider = "gemini"
               self.model = "gemini-2.5-pro"

LLMFactory
^^^^^^^^^^

Location: ``pyba/core/agent/llm_factory.py``

Creates the actual LLM clients based on provider:

- OpenAI: Uses ``openai.OpenAI()`` client
- VertexAI: Uses ``vertexai`` SDK
- Gemini: Uses ``google.genai`` client

Database Layer
--------------

Location: ``pyba/database/``

**Database class** (``database.py``):

- Creates SQLAlchemy engine
- Initializes tables via handlers
- Supports SQLite, PostgreSQL, MySQL

**DatabaseFunctions** (``db_funcs.py``):

- ``push_to_episodic_memory()``: Log an action
- ``get_episodic_memory_by_session_id()``: Retrieve session logs

**Models** (``models.py``):

.. code-block:: python

   class EpisodicMemory(Base):
       __tablename__ = "EpisodicMemory"

       id = Column(Integer, primary_key=True)
       session_id = Column(String(64))
       actions = Column(Text)      # JSON array
       urls = Column(Text)         # JSON array
       action_status = Column(Boolean)
       fail_reason = Column(Text)

Login System
------------

Location: ``pyba/core/scripts/login/``

**BaseLogin** (``base.py``):

Abstract base class for login handlers:

.. code-block:: python

   class BaseLogin(ABC):
       def __init__(self, page, engine_name):
           self.config = load_config("general")["automated_login_configs"][engine_name]
           self.username = os.getenv(f"{engine_name}_username")
           self.password = os.getenv(f"{engine_name}_password")

       @abstractmethod
       async def _perform_login(self) -> bool:
           raise NotImplementedError

       async def run(self):
           # Check if we're on a login page
           if not verify_login_page(self.page.url, self.config["urls"]):
               return None

           # Perform the login
           success = await self._perform_login()

           # Handle 2FA if needed
           if self.uses_2FA:
               await self._handle_2fa()

           return success

**Site-specific implementations:**

- ``instagram.py``: Instagram login flow
- ``facebook.py``: Facebook login flow
- ``gmail.py``: Gmail login flow

Each uses hardcoded selectors from ``config.yaml`` for speed.

DOM Extraction
--------------

Location: ``pyba/core/scripts/extractions/``

**ExtractionEngines** (``general.py``):

Extracts structured data from the current page. Hyperlinks, clickables, and text
are extracted via BeautifulSoup on the HTML string. Input fields use a single
browser-side JavaScript evaluation for performance.

**Input field extraction** (``pyba/core/scripts/js/input_fields.js``):

Instead of iterating elements from Python and fill-testing each one (multiple
Playwright round-trips per element), a single ``page.evaluate()`` call runs
JavaScript inside the browser that discovers all fillable fields at once.
The JS checks ``readOnly``, ``disabled``, and ``getBoundingClientRect()`` to
determine fillability — no writes to the DOM, no cleanup needed.

.. code-block:: python

   async def _extract_input_fields(self):
       js_config = {
           "valid_tags": [...],          # from extraction config
           "invalid_input_types": [...], # from extraction config
       }
       return await self.page.evaluate(self._input_fields_js, js_config)

This reduces input field extraction from ~150 Playwright round-trips on a
typical page to a single call.

Code Generation
---------------

Location: ``pyba/core/lib/code_generation.py``

**CodeGeneration class:**

1. Queries database for session actions
2. Parses each action string
3. Maps to Playwright code templates
4. Generates complete Python script

.. code-block:: python

   def generate_script(self):
       actions_list = self._get_run_actions()

       script_header = "from playwright.sync_api import sync_playwright..."
       script_body = []

       for action_str in actions_list:
           code = self._parse_action_to_code(action_str)
           script_body.append(code)

       final_script = script_header + "\n".join(script_body) + script_footer

       with open(self.output_path, "w") as f:
           f.write(final_script)

Stealth & Jitters
-----------------

Location: ``pyba/core/helpers/jitters.py``

**MouseMovements:**

.. code-block:: python

   async def random_movement(self):
       # Generate random bezier curve
       # Move mouse along curve with realistic timing

**ScrollMovements:**

.. code-block:: python

   async def apply_scroll_jitters(self):
       # Small random scrolls during waits
       # Simulates human fidgeting

Used during:

- Page load waits
- Action execution
- 2FA waiting

Execution Flow Diagram
----------------------

**Normal Mode:**

.. code-block:: text

   User calls engine.sync_run(prompt)
         │
         ▼
   Launch browser with Stealth
         │
         ▼
   Create context & page
         │
         ▼
   Extract initial DOM ◄────────────────┐
         │                              │
         ▼                              │
   PlaywrightAgent.process_action()     │
   (receives full action history)       │
         │                              │
         ▼                              │
   Action is None? ───Yes──► get_output() ──► Return result
         │
         No
         │
         ▼
   PlaywrightActionPerformer.perform()
         │
         ▼
   Record action in MemDSL
   (success or failure with reason)
         │
         ▼
   Action failed? ───Yes──► retry_perform_action()
         │                          │
         No                         │
         │                          │
         ▼                          │
   Log to database                  │
   Extract new DOM ─────────────────┘

**Step Mode:**

.. code-block:: text

   User calls step.start()
         │
         ▼
   Launch browser with Stealth
         │
         ▼
   Create context & page
         │
         ▼
   Extract initial DOM
         │
         ▼
   Return control to user
         │
         ▼
   User calls step.step(instruction)
         │
         ▼
   FOR up to max_actions_per_step: ◄──┐
         │                            │
         ▼                            │
   PlaywrightAgent.process_action()   │
         │                            │
         ▼                            │
   Action is None? ───Yes──► Return output to user
         │                            │
         No                           │
         │                            │
         ▼                            │
   PlaywrightActionPerformer          │
         │                            │
         ▼                            │
   Action failed? ───Yes──► Retry     │
         │                            │
         No                           │
         │                            │
         ▼                            │
   Record action in MemDSL             │
   Extract new DOM ───────────────────┘
         │
         ▼
   Return control to user
         │
         ▼
   User calls step.stop()
         │
         ▼
   Save trace & close browser

**DFS Mode:**

.. code-block:: text

   User calls dfs.sync_run(prompt)
         │
         ▼
   Launch browser
         │
         ▼
   FOR each breadth iteration:
         │
         ▼
   PlannerAgent.generate(task, old_plan)
         │
         ▼
   FOR each depth step:
         │
         ▼
      [Same as Normal mode loop]
         │
         ▼
   old_plan = current_plan

**BFS Mode:**

.. code-block:: text

   User calls bfs.sync_run(prompt)
         │
         ▼
   PlannerAgent.generate(task) → List of plans
         │
         ▼
   FOR each plan IN PARALLEL:
         │
         ▼
      Launch separate browser
         │
         ▼
      [Same as Normal mode loop]
         │
         ▼
   Collect all results

Extending PyBA
--------------

Adding a New Login Handler
^^^^^^^^^^^^^^^^^^^^^^^^^^

1. Create ``pyba/core/scripts/login/newsite.py``
2. Inherit from ``BaseLogin``
3. Implement ``_perform_login()``
4. Add to ``pyba/core/scripts/__init__.py``
5. Add selectors to ``config.yaml``

Adding a New Action
^^^^^^^^^^^^^^^^^^^

1. Add field to ``PlaywrightAction`` in ``structure.py``
2. Add handler method in ``PlaywrightActionPerformer``
3. Add to ``perform()`` dispatcher
4. Add a natural language template in ``MemDSL._resolve()``
5. Add to ``CodeGeneration.action_map`` for script export

Adding a New LLM Provider
^^^^^^^^^^^^^^^^^^^^^^^^^

1. Update ``Provider.handle_keys()``
2. Add client creation in ``LLMFactory``
3. Add execution handler in ``BaseAgent``
4. Update ``config.yaml`` with model settings