API Reference

This page documents the public API of PyBA. For internal architecture details, see Architecture & Code Walkthrough.

Entry Points

Engine

The main entry point for autonomous browser automation.

Step (Step-by-Step)

Entry point for interactive step-by-step mode. The user controls the browser one instruction at a time via start(), step(), and stop().

Database

Database Configuration

Database Functions

Core Components

BaseEngine

The base class for all engine modes.

Provider

LLM provider selection and configuration.

Agents

PlaywrightAgent

The agent responsible for deciding browser actions.

class pyba.core.agent.playwright_agent.PlaywrightAgent(engine)[source]

Bases: BaseAgent

Defines the playwright agent’s actions

Provides two endpoints:
  • process_action: for returning the right action on a page

  • get_output: for summarizing the chat and returning a string

get_output(cleaned_dom: Dict[str, List | str], user_prompt: str, context_id: str = None) str[source]

Method to get the final output from the model if the user requested for one

process_action(cleaned_dom: Dict[str, List | str], user_prompt: str, previous_action: str = None, fail_reason: str = None, extraction_format: BaseModel = None, context_id: str = None, action_status: bool = None) PlaywrightResponse[source]

Method to process the DOM and provide an actionable playwright response

Parameters:
  • cleaned_dom – Dictionary of the extracted items from the DOM - hyperlinks: List - input_fields (basically all fillable boxes): List - clickable_fields: List - actual_text: string

  • user_prompt – The instructions given by the user

  • previous_action – The previous executed action

  • fail_reason – Holds the fail-reason should the previous task fail

  • extraction_format – The extraction format for the task

  • context_id – A unique identifier for this browser window (useful when multiple windows)

  • fail_reason – The reason for failure of the previous action (None if not provided => Action passed)

  • action_status – The success or the failure of an action

output:

A predefined pydantic model called PlaywrightResponse which defines our DSL

PlannerAgent

The agent for generating exploration plans (DFS/BFS).

class pyba.core.agent.planner_agent.PlannerAgent(engine)[source]

Bases: BaseAgent

Planner agent for DFS and BFS modes under exploratory cases. This is inheriting off from the Retry class as well and supports all agents under LLM_factory.

Parameters:

engine – Engine to hold all arguments provided by the user

Initialises the max_breadth for the maximum number of plans to generate for BFS mode

Note

context_id is not relevant here because this is a higer level class

generate(task: str, old_plan: str = None) PlannerAgentOutputBFS | PlannerAgentOutputDFS[source]

Endpoint to generate the plan(s) depending on the set mode (the agent encodes the mode)

Parameters:
  • task – The task provided by the user

  • old_plan – The previous plan if using DFS mode

Function:
  • Takes in the user prompt which serves as the task for the model to perform

  • Depending on DFS or BFS mode generates plan(s)

BaseAgent

Base class for all agents with retry logic.

class pyba.core.agent.base_agent.BaseAgent(engine)[source]

Bases: object

The base class for all Agents to define common methods

Contains methods for exponential backoff and retry as well Note: this backoff and retry will be blocking for that specific context.

Defines the following variables:

exponential_base: 2 (we’re using base 2) base_timeout: 1 second max_backoff_time: 60 seconds attempt_number: The current attempt number initialised to 1 LLMFactory: The internal agent call is made by agent itself log: The logger for the agents

calculate_next_time(attempt_number)[source]

Function to calculate the next wait time in seconds

Parameters:

attempt_number – The number of failed attempts

handle_gemini_execution(agent: Any, prompt: str, context_id: str = None)[source]

Helper method to handle gemini’s execution

Parameters:
  • agent – The agent to use (action_agent or output_agent)

  • prompt – The fully formatted prompt string

  • context_id – A unique identifier for the current browser window

The context_id is to help in differentiating between different browser windows during parallel execution for BFS mode.

`context_id`=None => There is only one browser session.

Returns:

The raw response from the model. The exact required values are expected to be extraced within each agent

Return type:

response

handle_openai_execution(agent: Any, prompt: str, context_id: str = None)[source]

Helper method to handle OpenAI execution

Parameters:
  • agent – The agent to use (action_agent or output_agent)

  • prompt – The fully formatted prompt string

  • context_id – A unique identifier for the current browser window

The context_id is to help in differentiating between different browser windows during parallel execution for BFS mode.

`context_id`=None => There is only one browser session.

Returns:

The raw response from the model. The exact required values are expected to be extraced within each agent

Return type:

response

handle_vertexai_execution(agent: Any, prompt: str, context_id: str = None)[source]

Helper method to handle VertexAI execution

Parameters:
  • agent – The agent to use (action_agent or output_agent)

  • prompt – The fully formatted prompt string

  • context_id – A unique identifier for the current browser window

The context_id is to help in differentiating between different browser windows during parallel execution for BFS mode.

`context_id`=None => There is only one browser session.

Returns:

The raw response from the model. The exact required values are expected to be extraced within each agent

Return type:

response

initialise_depth_ladder(unique_context_id: str)[source]

Initialises and helps manage the depth-ladder for different browser sessions

Parameters:

unique_context_id – The context ID for the current browser session

update_depth_ladder(unique_context_id: str)[source]

This function helps increments the depth-value for each browser

Parameters:

unique_context_id – The context ID for the browser

Action System

PlaywrightActionPerformer

Executes browser actions.

Data Structures

PlaywrightAction

The DSL for browser actions.

class pyba.utils.structure.PlaywrightAction(*, goto: str | None = None, go_back: bool | None = None, go_forward: bool | None = None, reload: bool | None = None, click: str | None = None, dblclick: str | None = None, hover: str | None = None, right_click: str | None = None, dropdown_field_id: str | None = None, dropdown_field_value: str | None = None, fill_selector: str | None = None, fill_value: str | None = None, type_selector: str | None = None, type_text: str | None = None, press_selector: str | None = None, press_key: str | None = None, check: str | None = None, uncheck: str | None = None, select_selector: str | None = None, select_value: str | None = None, upload_selector: str | None = None, upload_path: str | None = None, scroll_x: int | None = None, scroll_y: int | None = None, wait_selector: str | None = None, wait_timeout: int | None = None, wait_ms: int | None = None, keyboard_press: str | None = None, keyboard_type: str | None = None, mouse_move_x: int | None = None, mouse_move_y: int | None = None, mouse_click_x: int | None = None, mouse_click_y: int | None = None, new_page: str | None = None, close_page: bool | None = None, switch_page_index: int | None = None, evaluate_js: str | None = None, screenshot_path: str | None = None, download_selector: str | None = None)[source]

Bases: BaseModel

The BaseModel for playwright automations

Goal:

This contains an exhaustive list of commands that playwright can execute. It will be filled accordingly by the LLM depending on the DOM recieved from playwright and the goal of the task.

check: str | None
click: str | None
close_page: bool | None
dblclick: str | None
download_selector: str | None
dropdown_field_id: str | None
dropdown_field_value: str | None
evaluate_js: str | None
fill_selector: str | None
fill_value: str | None
go_back: bool | None
go_forward: bool | None
goto: str | None
hover: str | None
keyboard_press: str | None
keyboard_type: str | None
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

mouse_click_x: int | None
mouse_click_y: int | None
mouse_move_x: int | None
mouse_move_y: int | None
new_page: str | None
press_key: str | None
press_selector: str | None
reload: bool | None
right_click: str | None
screenshot_path: str | None
scroll_x: int | None
scroll_y: int | None
select_selector: str | None
select_value: str | None
switch_page_index: int | None
type_selector: str | None
type_text: str | None
uncheck: str | None
upload_path: str | None
upload_selector: str | None
wait_ms: int | None
wait_selector: str | None
wait_timeout: int | None

PlaywrightResponse

Response format from the PlaywrightAgent.

class pyba.utils.structure.PlaywrightResponse(*, actions: List[PlaywrightAction], extract_info: bool | None)[source]

Bases: BaseModel

actions: List[PlaywrightAction]
extract_info: bool | None
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

CleanedDOM

Structured representation of page DOM.

class pyba.utils.structure.CleanedDOM(hyperlinks: ~typing.List[str] | None = <factory>, input_fields: ~typing.List[str] | None = <factory>, clickable_fields: ~typing.List[str] | None = <factory>, actual_text: str | None = None, current_url: str | None = None, youtube: str | None = None)[source]

Bases: object

Represents the cleaned DOM snapshot of the current browser page.

Additional parameter for the youtube DOM extraction

actual_text: str | None = None
clickable_fields: List[str] | None
current_url: str | None = None
input_fields: List[str] | None
to_dict() dict[source]
youtube: str | None = None

Login Handlers

BaseLogin

Base class for automated login handlers.

Code Generation

CodeGeneration

Generates standalone Playwright scripts.

Dependencies

HandleDependencies

Manages Playwright browser installation.

class pyba.core.lib.handle_dependencies.HandleDependencies[source]

Bases: object

playwright

alias of PlaywrightDependencies

Exceptions

exception pyba.utils.exceptions.CredentialsnotSpecified(site_name: str)[source]

Bases: Exception

Exception raised in the login scripts when the relevant credentials haven’t been specified

exception pyba.utils.exceptions.DatabaseNotInitialised[source]

Bases: Exception

Exception to be raised when the user asks for automation code generation but has not initialised the database!

exception pyba.utils.exceptions.IncorrectMode(mode: str)[source]

Bases: Exception

Exception to be raised when the mode specified by the user is incorrect

exception pyba.utils.exceptions.InvalidModelSelected(model_name: str, provider: str, provider_valid_models: list)[source]

Bases: Exception

Exception to be raised when the model chosen by the user doesn’t fall under the provider for whom the keys are specified

exception pyba.utils.exceptions.PromptNotPresent[source]

Bases: Exception

This exception is raised when the user forgets to enter a prompt to the engine

exception pyba.utils.exceptions.ServerLocationUndefined(server_location)[source]

Bases: Exception

This exception is raised when the user doesn’t define the server location for a VertexAI project.

exception pyba.utils.exceptions.ServiceNotSelected[source]

Bases: Exception

This exception is raised when the user doesn’t set an API key in the engine

exception pyba.utils.exceptions.UnknownSiteChosen(sites: list)[source]

Bases: Exception

Exception to be raised when the user chooses a site for automated login that isn’t implemented yet.

exception pyba.utils.exceptions.UnsupportedModelUsed(model_name: str, valid_model_names: list)[source]

Bases: Exception

Exception to be raised when the model specified by the user is not supported