API Reference

This page documents the public API of PyBA. For internal architecture details, see Architecture & Code Walkthrough.

Entry Points 

Engine 

The main entry point for autonomous browser automation.

class pyba.core.main.Engine(openai_api_key: str = None, vertexai_project_id: str = None, vertexai_server_location: str = None, gemini_api_key: str = None, headless: bool = False, handle_dependencies: bool = False, use_random: bool = False, use_logger: bool = False, enable_tracing: bool = True, trace_save_directory: str = None, max_depth: int = 100, database: Database = None, model_name: str = None, low_memory: bool = False, secrets: PasswordManager = None, enable_screenshots: bool = False, screenshot_directory: str = None)[source]

Bases: BaseEngine

The main entrypoint for browser automation. This engine exposes the main entry point which is the run() method

Parameters:

openai_api_key – API key for OpenAI models should you want to use that
vertexai_project_id – Create a VertexAI project to use that instead of OpenAI
vertexai_server_location – VertexAI server location
gemini_api_key – API key for Gemini-2.5-pro native support without VertexAI
headless – Choose if you want to run in the headless mode or not
handle_dependencies – Choose if you want to automatically install dependencies during runtime
use_logger – Choose if you want to use the logger (that is enable logging of data)
enable_tracing – Choose if you want to enable tracing. This will create a .zip file which you can use in traceviewer
trace_save_directory – The directory where you want the .zip file to be saved
max_depth – The maximum number of actions that you want the model to execute
database – An instance of the Database class which will define all database specific configs
model_name – The model name which you want to run. The default is set to None (because it depends on the provider).
low_memory – Optional parameter, defaults to False for disable some heavy dependencies and running with additional flags.
secrets – A password manager class which implements a resolve() method to give out a dictionary of secrets

Find these default values at pyba/config.yaml.

The Engine is inherited off from the BaseEngine. The BaseEngine handles the common methods for all the modes (default, DFS and BFS). The main Engine decides if execution needs to be passed to a different mode depending on what is set by the user.

async run(prompt: str = None, automated_login_sites: List[str] = None, extraction_format: BaseModel = None)[source]

The most basic implementation for the run function

Parameters:

prompt – The user’s instructions. This is a well defined instruction.
automated_login_sites – A list of sites that you want the model to automatically login to using env credentials
extraction_format – A pydantic BaseModel which defines the extraction format for any data extraction

Note:

The extraction_format will be decided based on every action. For example:

```python3 from pydantic import BaseModel from pyba import Engine

task = “Go to hackernews. For each post, extract the title, number of upvotes and comments, and the description too”

class Output(BaseModel):: # Using optional is a good idea in case the things you’re looking for don’t exist title: Optional[str], num_upvotes: Optional[int], num_comments: Optional[int], desc: Optional[str]

engine = Engine(**kwargs)

await engine.run(task, extraction_format=Output) ```

would return data during the execution, not once it finishes. It will dump it in the database as well, and it decides if data needs to be extracted on an action basis.

Using this feature will NOT cost you any more tokens than usual.

sync_run(prompt: str = None, automated_login_sites: List[str] = None, extraction_format: BaseModel = None) → str | None[source]: Sync endpoint for running the above function

Step (Step-by-Step)

Entry point for interactive step-by-step mode. The user controls the browser one instruction at a time via start(), step(), and stop().

class pyba.core.lib.mode.step.Step(openai_api_key: str = None, vertexai_project_id: str = None, vertexai_server_location: str = None, gemini_api_key: str = None, headless: bool = False, handle_dependencies: bool = False, use_random: bool = False, use_logger: bool = False, enable_tracing: bool = True, trace_save_directory: str = None, max_actions_per_step: int = 5, database: Database = None, get_output: bool = False, model_name: str = None, low_memory: bool = False, secrets: PasswordManager = None, enable_screenshots: bool = False, screenshot_directory: str = None)[source]

Bases: BaseEngine

Step-by-step browser automation. The user controls the loop externally by calling start(), step(), and stop().

Parameters:

openai_api_key – API key for OpenAI models should you want to use that
vertexai_project_id – Create a VertexAI project to use that instead of OpenAI
vertexai_server_location – VertexAI server location
gemini_api_key – API key for Gemini-2.5-pro native support without VertexAI
use_random – Enables mouse and scroll randomisations to evade bot detection
headless – Choose if you want to run in the headless mode or not
handle_dependencies – Choose if you want to automatically install dependencies during runtime
use_logger – Choose if you want to use the logger (that is enable logging of data)
enable_tracing – Choose if you want to enable tracing. This will create a .zip file which you can use in traceviewer
trace_save_directory – The directory where you want the .zip file to be saved
database – An instance of the Database class which will define all database specific configs
get_output – When True, asks the model for a summarised output when a step completes. When False (default), step() silently returns None on completion
model_name – The model name which you want to run. The default is set to None (because it depends on the provider).
secrets – A password manager class which implements a resolve() method to give out a dictionary of secrets

cancel_current_step()[source]: This is the method to be called to cancel a task

get_step_screenshots() → List[bytes][source]: Returns the screenshots captured during the most recent step() call. Each entry is a PNG image in bytes.

async start(automated_login_sites: List[str] = None)[source]: Creates a persistent browser instance. This needs to be explicitly called by the user when using the Step mode. This handles the automated login for us as well.

async step(prompt_step: str, extraction_format: BaseModel = None) → str | None[source]

The step function is a replica of the Engine.run(). It passes the full action history into context and tries to figure out the best way to achieve the short term prompt given by the user.

Parameters:

prompt_step – A single stepwise prompt given by the user (This might require more than one steps)
extraction_format – The final extraction format IF NEEDED

For every step() call, we create a StepRunContext() with a unique ID. This ID can be used to cancel this particular step. For reference, please see structure.py.

async stop()[source]: Kills the persistent browser instance once called. For using the Step engine, this NEEDS to be called explicitly by the user in order to close the instance.

sync_start(automated_login_sites: List[str] = None)[source]

sync_step(prompt_step: str, extraction_format: BaseModel = None) → str | None[source]

sync_stop()[source]

DFS (Depth-First Search)

Entry point for deep exploration mode.

class pyba.core.lib.mode.DFS.DFS(openai_api_key: str = None, vertexai_project_id: str = None, vertexai_server_location: str = None, gemini_api_key: str = None, headless: bool = False, handle_dependencies: bool = False, use_random: bool = False, use_logger: bool = False, max_depth: int = 5, max_breadth: int = 5, enable_tracing: bool = True, trace_save_directory: str = None, database: Database = None, model_name: str = None, low_memory: bool = False, secrets: PasswordManager = None, enable_screenshots: bool = False, screenshot_directory: str = None)[source]

Bases: BaseEngine

Methods for handling DFS exploratory searches. The BaseEngine initialises the provider and with that the playwright action and output agents.

This is another entry point engine and can be directly imported by the user.

The following params are defined:

Parameters:

openai_api_key – API key for OpenAI models should you want to use that
vertexai_project_id – Create a VertexAI project to use that instead of OpenAI
vertexai_server_location – VertexAI server location
gemini_api_key – API key for Gemini-2.5-pro native support without VertexAI
headless – Choose if you want to run in the headless mode or not
handle_dependencies – Choose if you want to automatically install dependencies during runtime
use_logger – Choose if you want to use the logger (that is enable logging of data)
max_depth – The maximum depth to go into for each plan, where each level of depth corresponds to an action
max_breadth – The number of plans to execute one by one in depth
enable_tracing – Choose if you want to enable tracing. This will create a .zip file which you can use in traceviewer
trace_save_directory – The directory where you want the .zip file to be saved
database – An instance of the Database class which will define all database specific configs
model_name – The model name which you want to run. The default is set to None (because it depends on the provider).
secrets – A password manager class which implements a resolve() method to give out a dictionary of secrets

Find these default values at pyba/config.yaml.

async run(prompt: str, automated_login_sites: List[str] = None, extraction_format: BaseModel = None) → str | None[source]

Run pyba in DFS mode.

Parameters:

prompt – The task assigned to DFS by the user
automated_login_sites – Login site name for pre-written scripts to run
extraction_format – A pydantic BaseModel which defines the extraction format for any data extraction

The task is fed into the planner to get a plan which is then passed to the action models to fetch an actionable element.

sync_run(prompt: str, automated_login_sites: List[str] = None, extraction_format: BaseModel = None) → str | None[source]: Sync endpoint for running the above function

BFS (Breadth-First Search)

Entry point for wide exploration mode.

class pyba.core.lib.mode.BFS.BFS(openai_api_key: str = None, vertexai_project_id: str = None, vertexai_server_location: str = None, gemini_api_key: str = None, headless: bool = False, handle_dependencies: bool = False, use_logger: bool = False, max_depth: int = 5, max_breadth: int = 5, enable_tracing: bool = True, trace_save_directory: str = None, database: Database = None, model_name: str = None, low_memory: bool = False, secrets: PasswordManager = None, enable_screenshots: bool = False, screenshot_directory: str = None)[source]

Bases: BaseEngine

Methods for handling BFS exploratory searches. The BaseEngine initialises the provider and with that the playwright action and output agents.

This is another entry point engine and can be directly imported by the user.

The following params are defined:

Parameters:

openai_api_key – API key for OpenAI models should you want to use that
vertexai_project_id – Create a VertexAI project to use that instead of OpenAI
vertexai_server_location – VertexAI server location
gemini_api_key – API key for Gemini-2.5-pro native support without VertexAI
headless – Choose if you want to run in the headless mode or not
handle_dependencies – Choose if you want to automatically install dependencies during runtime
use_logger – Choose if you want to use the logger (that is enable logging of data)
max_depth – The maximum depth to go into for each plan, where each level of depth corresponds to an action
max_breadth – The number of plans to execute one by one in depth
enable_tracing – Choose if you want to enable tracing. This will create a .zip file which you can use in traceviewer
trace_save_directory – The directory where you want the .zip file to be saved
database – An instance of the Database class which will define all database specific configs
model_name – The model name which you want to run. The default is set to None (because it depends on the provider).
secrets – A password manager class which implements a resolve() method to give out a dictionary of secrets

Find these default values at pyba/config.yaml.

async run(prompt: str, automated_login_sites: List[str] = None, extraction_format: BaseModel = None) → List[source]

The async run function

Parameters:

prompt – The prompt which needs to be converted to plans
automated_login_sites – List of names for which sites to login automatically
extraction_format – The extraction format for any extraction that needs to be done

Returns:

List

sync_run(prompt: str, automated_login_sites: List[str] = None, extraction_format: BaseModel = None)[source]

Synchronous endpoint for running BFS mode.

Parameters:

prompt – The prompt which needs to be converted to plans
automated_login_sites – List of names for which sites to login automatically
extraction_format – The extraction format for any extraction that needs to be done

Database 

Database Configuration 

class pyba.database.database.Database(engine: Literal['sqlite', 'postgres', 'mysql'], name: str = None, host: str = None, port: int = None, username: str = None, password: str = None, ssl_mode: Literal['disable', 'require'] = None)[source]

Bases: object

Client-side database interface that minimizes config usage.

build_connection_string(engine_name: Literal['sqlite', 'postgres', 'mysql']) → str[source]

Builds connection URLs for different database engines for SQLAlchemy usage.

Parameters:: engine_name – The database engine name for initialization.
Returns:: Connection string for SQLAlchemy.

create_connection(engine_name: Literal['sqlite', 'postgres', 'mysql'])[source]

Creates a connection to the database.

Parameters:: engine_name – The database engine name.
Returns:: Database session if successful, otherwise False.

initialise_tables_and_database()[source]: Manages the creation of database and tables for SQLite, PostgreSQL, and MySQL.

Database Functions 

class pyba.database.db_funcs.DatabaseFunctions(database: Database)[source]

Bases: object

Composition class for database operations.

get_all_bfs_contexts_by_session(session_id: str) → List[BFSEpisodicMemory] | None[source]

Retrieves all BFS context records for a given session.

Parameters:: session_id – The parent session ID to query for.
Returns:: A list of BFSEpisodicMemory objects for all contexts in the session, or None if no records found or error occurred.

get_bfs_episodic_memory_by_context(session_id: str, context_id: str) → BFSEpisodicMemory | None[source]

Retrieves a specific BFS context’s episodic memory. Needs both the session_id and the context_id to retrieve the correct record.

Parameters:

session_id – The parent session ID.
context_id – The specific context ID to retrieve.

Returns:

A BFSEpisodicMemory object if found, else None

get_episodic_memory_by_session_id(session_id: str) → EpisodicMemory | None[source]

Retrieves an episodic memory record by its session_id.

Parameters:: session_id – The unique session ID to query for.
Returns:: An EpisodicMemory object if found, else None.

get_semantic_memory_by_session_id(session_id: str) → SemanticMemory | None[source]

Retrieves semantic memory from the database.

Parameters:: session_id – The unique session ID to query for.
Returns:: A SemanticMemory object if found, else None.

push_to_bfs_episodic_memory(session_id: str, context_id: str, action: str, page_url: str) → bool[source]

Pushes a new action and page_url for a specific BFS context. Creates a new record if the (session_id, context_id) pair doesn’t exist, otherwise appends to the existing record.

Note: This function uses a composite primary key of (session_id, context_id) to allow multiple browser windows per session.

Parameters:

session_id – The parent session ID for the BFS run.
context_id – The unique context ID for this browser window.
action – The action string to be pushed.
page_url – The page URL string to be pushed.

Returns:

True if the operation was successful, otherwise False.

push_to_episodic_memory(session_id: str, action: str, page_url: str, action_status: bool, fail_reason: str = None) → bool[source]

Pushes a new action and page_url onto the stack for a given session_id. It retrieves the existing record, appends the new values as JSON strings, and updates/inserts the record.

Parameters:

session_id – The unique session ID.
action – The action string to be pushed.
page_url – The page URL string to be pushed.
action_status – The success or failure of the current action (True for success, False for failure).
fail_reason – A string describing why a particular action failed (defaults to None on success).

Returns:

True if the operation was successful, otherwise False.

push_to_semantic_memory(session_id: str, logs: str) → bool[source]

Pushes logs to semantic memory.

Parameters:

session_id – The unique session ID.
logs – A dump generated by the memory generator.

Returns:

True if the operation was successful, otherwise False.

submit_query_with_retry()[source]

Commits database transactions with retry logic.

Retries up to 100 times if the connection returns an error. Used for insert, update, and delete operations.

Returns:: True if commit was successful, otherwise False.

Core Components 

BaseEngine 

The base class for all engine modes.

class pyba.core.lib.mode.base.BaseEngine(headless: bool = True, enable_tracing: bool = True, trace_save_directory: str = None, database=None, use_random=None, use_logger: bool = None, mode: Literal['DFS', 'BFS', 'Normal', 'STEP'] = None, handle_dependencies: bool = False, openai_api_key: str = None, vertexai_project_id: str = None, vertexai_server_location: str = None, gemini_api_key: str = None, model_name: str = None, low_memory: bool = False, secrets: PasswordManager = None, enable_screenshots: bool = False, screenshot_directory: str = None)[source]

Bases: object

A reusable base class that encapsulates the shared browser lifecycle, tracing, DOM extraction, and utility helpers.

The following will be initialised by the BaseEngine:

db_funcs: Initializes the database functions to be used for inserting and querying logs

mode: The mode of operation (DFS, BFS or Normal), read the relevant documentation in pyba.readthedocs.io

provider_instance: This will detect the provider you’re using, either OpenAI, VertexAI and Gemini

playwright_agent: The actual playwright agent setup via the provider

secrets_manager: The secrets manager provided by the user, it must have a resolve() method

async attempt_login(page=None) → bool[source]

Helper function to attempt and perform a login to chosen sites. This is backwards compatible with Engine and DFS while it supports BFS by pinning the page down.

Parameters:: page – Optional argument to pin the page for removing self dependency
Returns:: A boolean to indicate the success or failure for the attempt
Return type:: flag

The login attempt may fail due to two reasons:

The current page is not a login page

Some selectors changed due to which the login engine returned None

Note that the LoginEngines are hardcoded engines for speed.

async extract_dom(page=None)[source]

Extracts the relevant fields from the DOM of the current page and returns the DOM dataclass. This is backwards compatible with Engine and DFS while it supports BFS by pinning the page down.

Parameters:: page – Optional argument to pin the page for removing self dependency

fetch_action(cleaned_dom: Dict, user_prompt: str, action_history: str = None, extraction_format: BaseModel = None, context_id: str = None, fail_reason: str = None, action_status: bool = None)[source]

Helper function to fetch an actionable PlaywrightResponse element

Parameters:

cleaned_dom – The DOM for the current page
user_prompt – The actual task given by the user
action_history – The full natural language history of actions taken so far
extraction_format – The extraction format requested by the user.
context_id – A unique identifier for this browser window (useful when multiple windows)
fail_reason – The reason for the failure of the previous action
action_status – A boolean to indicate if the previous action was successful or not

For an explanation of the extraction_format read the main file documentation.

Returns:: An actionable playwrightresponse element
Return type:: action

generate_code(output_path: str) → bool[source]

Function end-point for code generation

Parameters:: output_path – output file path to save the generated code to

async generate_output(action: str, cleaned_dom: CleanedDOM, prompt: str)[source]

Helper function to generate the output if the action has been completed.

Parameters:

action – The action as given out by the model
cleaned_dom – The latest cleaned_dom for the model to read
prompt – The prompt which was given to the model

get_screenshots() → List[bytes][source]

Returns the list of screenshot bytes captured so far. Each entry is a PNG image in bytes, ordered by capture time.

If a screenshot_directory was specified, this returns an empty list since images are saved to disk instead.

async get_trace_context(browser_instance=None)[source]

Initialises the browser context with tracing configuration. Accepts an optional browser instance to support BFS mode.

Parameters:: browser_instance – Optional argument to pin the browser session down
Returns:: The playwright to be used for automation
Return type:: context

async retry_perform_action(cleaned_dom: Dict, prompt: str, action_history: str, action_status: bool, fail_reason: str, extraction_format: BaseModel = None, page=None, mem=None) → str | None[source]

Helper function to retry the action after a failure. This is backwards compatible with Engine and DFS while it supports BFS by pinning the page down.

Parameters:

cleaned_dom – The new cleaned DOM for the current page
prompt – The original prompt given by the user
action_history – The full natural language history of actions taken so far
action_status – Boolean indicating the previous action’s success or failure
fail_reason – Reason for the failure for the action
extraction_format – In case the current page needs extraction as well
page – Optional argument to pin the page down to remove self dependency
mem – Optional MemDSL instance (BFS passes its per-window instance)

This function will retry the action based on the current DOM and the past action. This should most likely fix the issue of a stale element or a hallucinated component or something.

Returns:: If the action was successful and automation is completed None: The usual case where an action is performed
Return type:: output

async run()[source]: Run function which will be defined inside all child classes

async save_trace(context=None)[source]

Saves the trace if tracing is enabled. Accepts an optional context to support BFS mode where multiple browser contexts exist.

Parameters:: context – Optional argument to pin the browser context down

static set_secrets(secrets: Dict[str, str])[source]

Method to set the environment for the browser using the secrets manager provided by the user.

Note: This relies on the secret manager class implementing a “resolve() -> dict[str, str]” method.

async shut_down(context=None, browser=None)[source]

Closes the browser context and browser instance. Accepts optional arguments to support BFS mode where multiple browsers exist.

Parameters:

context – Optional browser context to close
browser – Optional argument to pin the browser instance down

async successful_login_clean_and_get_dom(page=None)[source]

Helper function to obtain the cleaned_dom after a successful login. This is backwards compatible with Engine and DFS while it supports BFS by pinning the page down.

Parameters:: page – Optional argument to pin the page for removing self dependency

Functionality:

Cleans the automated_login_engine_classes list (that is, we’re assuming only 1 login session

for each run) - Gets the latest page contents and parses the DOM using the extraction engine

async wait_till_loaded(page=None)[source]

Helper function to wait till load state while applying random jitters (if specified by the user). This is backwards compatible with Engine and DFS while it supports BFS by pinning the page down.

Parameters:: page – Optional argument to pin the page for removing self dependency

Provider 

LLM provider selection and configuration.

class pyba.core.provider.Provider(openai_api_key: str = None, gemini_api_key: str = None, vertexai_project_id: str = None, vertexai_server_location: str = None, model_name: str = None)[source]

Bases: object

Class to handle the provider instances.

handle_keys()[source]: Handles provider selection, defaults to openai when multiple providers conflict

handle_model(provider: str)[source]

Helper function that manages model selection based on the keys chosen.

Note

The default models in config will be used if model name is not provided by the user. The list of valid model names will be present in the config file as well.

Parameters:: provider – The name of the provider in question

Agents 

PlaywrightAgent 

The agent responsible for deciding browser actions.

class pyba.core.agent.playwright_agent.PlaywrightAgent(engine)[source]

Bases: BaseAgent

Defines the playwright agent’s actions

Provides two endpoints:

process_action: for returning the right action on a page
get_output: for summarizing the chat and returning a string

get_output(cleaned_dom: Dict[str, List | str], user_prompt: str, context_id: str = None) → str[source]: Gets the final text output from the model based on the current page state.

process_action(cleaned_dom: Dict[str, List | str], user_prompt: str, action_history: str = None, fail_reason: str = None, extraction_format: BaseModel = None, context_id: str = None, action_status: bool = None) → PlaywrightResponse[source]

Processes the current DOM and returns the next PlaywrightAction to execute.

Parameters:

cleaned_dom – Dictionary of extracted DOM elements (hyperlinks, input_fields, clickable_fields, actual_text).
user_prompt – The user’s task instruction.
action_history – The full natural language history of actions taken so far.
fail_reason – Reason the previous action failed, if applicable.
extraction_format – Pydantic model defining the extraction output schema.
context_id – Unique identifier for this browser window (used in BFS mode).
action_status – Whether the previous action succeeded.

Returns:

A PlaywrightAction to execute next, or None if the task is complete.

PlannerAgent 

The agent for generating exploration plans (DFS/BFS).

class pyba.core.agent.planner_agent.PlannerAgent(engine)[source]

Bases: BaseAgent

Planner agent for DFS and BFS exploration modes. Generates execution plans that are then carried out by the action agent.

Parameters:: engine – Engine instance holding all user-provided configuration.

generate(task: str, old_plan: str = None) → PlannerAgentOutputBFS | PlannerAgentOutputDFS[source]

Generates exploration plan(s) based on the current mode.

Parameters:

task – The user’s exploratory task.
old_plan – The previous plan to diverge from (DFS mode only).

Returns:

A plan string (DFS) or list of plan strings (BFS).

BaseAgent 

Base class for all agents with retry logic.

class pyba.core.agent.base_agent.BaseAgent(engine)[source]

Bases: object

Base class for all agents. Provides LLM execution with exponential backoff and retry logic. The backoff is blocking per context to avoid overwhelming rate-limited APIs.

Defines the following variables:

exponential_base: 2 (we’re using base 2) base_timeout: 1 second max_backoff_time: 60 seconds attempt_number: The current attempt number initialised to 1 LLMFactory: The internal agent call is made by agent itself log: The logger for the agents

calculate_next_time(attempt_number)[source]

Calculates the next backoff wait time in seconds using exponential backoff with jitter.

Parameters:: attempt_number – The number of consecutive failed attempts.

handle_gemini_execution(agent: Any, prompt: str, context_id: str = None)[source]

Helper method to handle Gemini execution

Parameters:

agent – The agent to use (action_agent or output_agent)
prompt – The fully formatted prompt string
context_id – A unique identifier for the current browser window

The context_id is to help in differentiating between different browser windows during parallel execution for BFS mode.

`context_id`=None => There is only one browser session.

Returns:: The raw response from the model. The exact required values are expected to be extracted within each agent.
Return type:: response

handle_openai_execution(agent: Any, prompt: str, context_id: str = None)[source]

Helper method to handle OpenAI execution

Parameters:

agent – The agent to use (action_agent or output_agent)
prompt – The fully formatted prompt string
context_id – A unique identifier for the current browser window

The context_id is to help in differentiating between different browser windows during parallel execution for BFS mode.

`context_id`=None => There is only one browser session.

Returns:: The raw response from the model. The exact required values are expected to be extracted within each agent.
Return type:: response

handle_vertexai_execution(agent: Any, prompt: str, context_id: str = None)[source]

Helper method to handle VertexAI execution

Parameters:

agent – The agent to use (action_agent or output_agent)
prompt – The fully formatted prompt string
context_id – A unique identifier for the current browser window

The context_id is to help in differentiating between different browser windows during parallel execution for BFS mode.

`context_id`=None => There is only one browser session.

Returns:: The raw response from the model. The exact required values are expected to be extracted within each agent.
Return type:: response

initialise_depth_ladder(unique_context_id: str)[source]

Resets the retry counter for a browser session after a successful call.

Parameters:: unique_context_id – The context ID for the current browser session

update_depth_ladder(unique_context_id: str)[source]

Increments the retry counter for a browser session after a failed call.

Parameters:: unique_context_id – The context ID for the browser

Action System 

PlaywrightActionPerformer 

Executes browser actions.

class pyba.core.lib.action.PlaywrightActionPerformer(page: Page, action: PlaywrightAction)[source]

Bases: object

The playwright automation class. To add new handles, make a function here and define that under perform()

Below is an exhaustive set of playwright actions that the handler will manage and the dispatcher will execute

Navigation functions
- handle_navigation
- handle_back
- handle_forward
- handle_reload
Interaction functions
- handle_input
- handle_typing
- handle_click
- handle_double_click
- handle_hover
- handle_checkboxes
- handle_select
- handle_file_upload
Keyboard/mouse functions
- handle_press
- handle_keyboard_press
- handle_keyboard_type
- handle_mouse_move
- handle_mouse_click
Scrolling
- handle_scrolling
Waits
- handle_wait
Javascript functions
- handle_evaluate_js
- handle_screenshot
- handle_download
New pages
- handle_switch_page
- handle_new_page
- handle_close_page

async handle_back()[source]

async handle_checkboxes()[source]

async handle_click()[source]

Handles clicking elements. Has additional checks to ensure that the element is not actually a relational hyperlink.

This is done in the following ways:

We first check if the click element is actually an <a> tag
Or if it has a closest ancestor <a> tag

In either case we extract the href from that <a> tag and directly goto that

async handle_close_page()[source]

async handle_double_click()[source]: Handles double clicking an element.

async handle_download()[source]

async handle_dropdown_click()[source]: Dispatch function to handle dropdown menus. This function requires both the field_id and the field_value to be specified in the single action.

async handle_evaluate_js()[source]

Handles the evaluation of Javascript in the browser environment and brings the result back to the code.

This is the recommended way to using it. `js const href = await page.evaluate(() => document.location.href); `

We strip the js snippet here for any return statements because those aren’t required for inline functions.

async handle_file_upload()[source]

async handle_forward()[source]

async handle_hover()[source]: Handles hovering over an element to make new actions visible.

async handle_input()[source]: Inputs a value to a selector field

async handle_keyboard_press()[source]: Handles a keyboard press action on the entire page.

async handle_keyboard_type()[source]

async handle_mouse_click()[source]

async handle_mouse_move()[source]

async handle_navigation()[source]: Handles browser navigation by opening new websites. Waits until the page is loaded.

async handle_new_page()[source]

async handle_press()[source]: Handles a key press.

async handle_reload()[source]

async handle_right_click()[source]: Dispatch function to handle a right click

async handle_screenshot()[source]

async handle_scrolling()[source]: Automates manual scrolling (or scrolls to center)

async handle_select()[source]

async handle_switch_page()[source]

async handle_typing()[source]

async handle_wait()[source]

async perform() → None[source]

The main dispatch function.

All handlers are called here as and when required by the AI models. Contains an exhaustive list of all the functions that can be chosen during the automation process

async wait_till_loaded()[source]

Data Structures 

PlaywrightAction 

The DSL for browser actions.

Bases: BaseModel

check: str | None

click: str | None

close_page: bool | None

dblclick: str | None

download_selector: str | None

dropdown_field_id: str | None

dropdown_field_value: str | None

evaluate_js: str | None

fill_selector: str | None

fill_value: str | None

go_back: bool | None

go_forward: bool | None

goto: str | None

hover: str | None

keyboard_press: str | None

keyboard_type: str | None

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

mouse_click_x: int | None

mouse_click_y: int | None

mouse_move_x: int | None

mouse_move_y: int | None

new_page: str | None

press_key: str | None

press_selector: str | None

reload: bool | None

right_click: str | None

screenshot_path: str | None

scroll_x: int | None

scroll_y: int | None

select_selector: str | None

select_value: str | None

switch_page_index: int | None

type_selector: str | None

type_text: str | None

uncheck: str | None

upload_path: str | None

upload_selector: str | None

wait_ms: int | None

wait_selector: str | None

wait_timeout: int | None

PlaywrightResponse 

Response format from the PlaywrightAgent.

class pyba.utils.structure.PlaywrightResponse(*, actions: List[PlaywrightAction], extract_info: bool | None)[source]

Bases: BaseModel

actions: List[PlaywrightAction]

extract_info: bool | None

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

CleanedDOM 

Structured representation of page DOM.

class pyba.utils.structure.CleanedDOM(hyperlinks: ~typing.List[str] | None = <factory>, input_fields: ~typing.List[str] | None = <factory>, clickable_fields: ~typing.List[str] | None = <factory>, actual_text: str | None = None, current_url: str | None = None, youtube: str | None = None)[source]

Bases: object

Represents the cleaned DOM snapshot of the current browser page.

Additional parameter for the youtube DOM extraction

actual_text: str | None = None

clickable_fields: List[str] | None

current_url: str | None = None

hyperlinks: List[str] | None

input_fields: List[str] | None

to_dict() → dict[source]

youtube: str | None = None

Login Handlers 

BaseLogin 

Base class for automated login handlers.

class pyba.core.scripts.login.base.BaseLogin(page: Page, engine_name: str)[source]

Bases: ABC

The base class for all login engines. This handles common logic like credential loading, page verification, and 2FA waiting.

async run() → bool | None[source]

The main login execution flow.

Returns:: None if we’re not supposed to launch the automated login script here True/False if the login was successful or a failure

Code Generation 

CodeGeneration 

Generates standalone Playwright scripts.

class pyba.core.lib.code_generation.CodeGeneration(session_id: str, output_path: str, database_funcs: DatabaseFunctions)[source]

Bases: object

Create the full automation code used by the model

Requires the database to be populated with all the actions
Pulls action from the database and writes the script at a user location

Parameters:

session_id – The unique identifier for this session
output_path – Path to save the code to
database_funcs – The Database instantiated by the user

SELECTOR_VALUE_PAIRS = {'fill_selector': 'fill_value', 'press_selector': 'press_key', 'select_selector': 'select_value', 'type_selector': 'type_text', 'upload_selector': 'upload_path'}

TEMPLATES = {'check': 'page.check("{value}")', 'click': 'page.click("{value}")', 'close_page': 'page.close()', 'dblclick': 'page.dblclick("{value}")', 'download_selector': 'with page.expect_download() as download_info:\n page.click("{value}")\ndownload = download_info.value\ndownload.save_as(download.suggested_filename)', 'dropdown_field_id': 'page.locator("{selector}").select_option(label="{value}")', 'evaluate_js': 'page.evaluate({value})', 'fill_selector': 'page.fill("{selector}", "{value}")', 'go_back': 'page.go_back()', 'go_forward': 'page.go_forward()', 'goto': 'page.goto("{value}")', 'hover': 'page.hover("{value}")', 'keyboard_press': 'page.keyboard.press("{value}")', 'keyboard_type': 'page.keyboard.type("{value}")', 'mouse_click_x': 'page.mouse.click({x}, {y})', 'mouse_move_x': 'page.mouse.move({x}, {y})', 'new_page': 'page.context.new_page().goto("{value}")', 'press_selector': 'page.press("{selector}", "{value}")', 'reload': 'page.reload()', 'right_click': 'page.click("{value}", button="right")', 'screenshot_path': 'page.screenshot(path="{value}")', 'scroll_x': 'page.mouse.wheel({x}, {y})', 'select_selector': 'page.select_option("{selector}", "{value}")', 'switch_page_index': 'page = page.context.pages[{value}]', 'type_selector': 'page.type("{selector}", "{value}")', 'uncheck': 'page.uncheck("{value}")', 'upload_selector': 'page.set_input_files("{selector}", "{value}")', 'wait_ms': 'page.wait_for_timeout({value})', 'wait_selector': 'page.wait_for_selector("{value}", timeout={timeout})'}

XY_PAIRS = {'mouse_click_x': 'mouse_click_y', 'mouse_move_x': 'mouse_move_y', 'scroll_x': 'scroll_y'}

generate_script()[source]: Generates the full Playwright script from the sequence of actions and writes it to the output path.

Dependencies 

HandleDependencies 

Manages Playwright browser installation.

class pyba.core.lib.handle_dependencies.HandleDependencies[source]

Bases: object

playwright: alias of PlaywrightDependencies

Exceptions 

exception pyba.utils.exceptions.ActionError(message: str, cause: Exception = None)[source]

Bases: PybaError

An action dispatched to Playwright failed.

category: str = 'action'

exception pyba.utils.exceptions.ActionTimeoutError(message: str, cause: Exception = None)[source]

Bases: ActionError

A Playwright action exceeded its timeout.

category: str = 'timeout'

exception pyba.utils.exceptions.CannotResolveError[source]

Bases: Exception

Exception to be rasied when the user provides a PasswordManager class which requires positional arguments to be specified.

exception pyba.utils.exceptions.CredentialsNotSpecified(site_name: str)[source]

Bases: Exception

Exception raised in the login scripts when the relevant credentials haven’t been specified

exception pyba.utils.exceptions.DatabaseNotInitialised[source]

Bases: Exception

Exception to be raised when the user asks for automation code generation but has not initialised the database!

exception pyba.utils.exceptions.ElementNotFoundError(message: str, cause: Exception = None)[source]

Bases: ActionError

A selector did not match any element on the page.

category: str = 'element_not_found'

exception pyba.utils.exceptions.IncorrectMode(mode: str)[source]

Bases: Exception

Exception to be raised when the mode specified by the user is incorrect

exception pyba.utils.exceptions.InvalidModelSelected(model_name: str, provider: str, provider_valid_models: list)[source]

Bases: Exception

Exception to be raised when the model chosen by the user doesn’t fall under the provider for whom the keys are specified

exception pyba.utils.exceptions.LLMError(message: str, cause: Exception = None)[source]

Bases: PybaError

The LLM provider returned an error or an unparseable response.

category: str = 'llm'

exception pyba.utils.exceptions.LLMRateLimitError(message: str, cause: Exception = None)[source]

Bases: LLMError

The LLM provider rate-limited the request.

category: str = 'llm_rate_limit'

exception pyba.utils.exceptions.LLMResponseParseError(message: str, cause: Exception = None)[source]

Bases: LLMError

The LLM returned a response that could not be parsed into an action.

category: str = 'llm_parse'

exception pyba.utils.exceptions.NavigationError(message: str, cause: Exception = None)[source]

Bases: ActionError

A page navigation (goto, back, forward, reload) failed.

category: str = 'navigation'

exception pyba.utils.exceptions.PromptNotPresent[source]

Bases: Exception

This exception is raised when the user forgets to enter a prompt to the engine

exception pyba.utils.exceptions.PybaError(message: str, cause: Exception = None)[source]

Bases: Exception

Base class for all structured runtime errors raised by Pyba.

Every subclass carries a human-readable message and the original cause exception (if any) so callers can inspect both without parsing tracebacks.

category: str = 'unknown'

exception pyba.utils.exceptions.ServerLocationUndefined(server_location)[source]

Bases: Exception

This exception is raised when the user doesn’t define the server location for a VertexAI project.

exception pyba.utils.exceptions.ServiceNotSelected[source]

Bases: Exception

This exception is raised when the user doesn’t set an API key in the engine

exception pyba.utils.exceptions.UnknownSiteChosen(sites: list)[source]

Bases: Exception

Exception to be raised when the user chooses a site for automated login that isn’t implemented yet.

exception pyba.utils.exceptions.UnsupportedModelUsed(model_name: str, valid_model_names: list)[source]

Bases: Exception

Exception to be raised when the model specified by the user is not supported