API Reference

This page documents the public API of PyBA. For internal architecture details, see Architecture & Code Walkthrough.

Entry Points 

Engine 

The main entry point for autonomous browser automation.

Step (Step-by-Step)

Entry point for interactive step-by-step mode. The user controls the browser one instruction at a time via start(), step(), and stop().

Core Components 

Provider 

LLM provider selection and configuration.

Agents 

PlaywrightAgent 

The agent responsible for deciding browser actions.

class pyba.core.agent.playwright_agent.PlaywrightAgent(engine)[source]

Bases: BaseAgent

Defines the playwright agent’s actions

Provides two endpoints:

process_action: for returning the right action on a page
get_output: for summarizing the chat and returning a string

get_output(cleaned_dom: Dict[str, List | str], user_prompt: str, context_id: str = None) → str[source]: Method to get the final output from the model if the user requested for one

process_action(cleaned_dom: Dict[str, List | str], user_prompt: str, previous_action: str = None, fail_reason: str = None, extraction_format: BaseModel = None, context_id: str = None, action_status: bool = None) → PlaywrightResponse[source]

Method to process the DOM and provide an actionable playwright response

Parameters:

cleaned_dom – Dictionary of the extracted items from the DOM - hyperlinks: List - input_fields (basically all fillable boxes): List - clickable_fields: List - actual_text: string
user_prompt – The instructions given by the user
previous_action – The previous executed action
fail_reason – Holds the fail-reason should the previous task fail
extraction_format – The extraction format for the task
context_id – A unique identifier for this browser window (useful when multiple windows)
fail_reason – The reason for failure of the previous action (None if not provided => Action passed)
action_status – The success or the failure of an action

output:: A predefined pydantic model called PlaywrightResponse which defines our DSL

PlannerAgent 

The agent for generating exploration plans (DFS/BFS).

class pyba.core.agent.planner_agent.PlannerAgent(engine)[source]

Bases: BaseAgent

Planner agent for DFS and BFS modes under exploratory cases. This is inheriting off from the Retry class as well and supports all agents under LLM_factory.

Parameters:: engine – Engine to hold all arguments provided by the user

Initialises the max_breadth for the maximum number of plans to generate for BFS mode

Note

context_id is not relevant here because this is a higer level class

generate(task: str, old_plan: str = None) → PlannerAgentOutputBFS | PlannerAgentOutputDFS[source]

Endpoint to generate the plan(s) depending on the set mode (the agent encodes the mode)

Parameters:

task – The task provided by the user
old_plan – The previous plan if using DFS mode

Function:

Takes in the user prompt which serves as the task for the model to perform
Depending on DFS or BFS mode generates plan(s)

BaseAgent 

Base class for all agents with retry logic.

class pyba.core.agent.base_agent.BaseAgent(engine)[source]

Bases: object

The base class for all Agents to define common methods

Contains methods for exponential backoff and retry as well Note: this backoff and retry will be blocking for that specific context.

Defines the following variables:

exponential_base: 2 (we’re using base 2) base_timeout: 1 second max_backoff_time: 60 seconds attempt_number: The current attempt number initialised to 1 LLMFactory: The internal agent call is made by agent itself log: The logger for the agents

calculate_next_time(attempt_number)[source]

Function to calculate the next wait time in seconds

Parameters:: attempt_number – The number of failed attempts

handle_gemini_execution(agent: Any, prompt: str, context_id: str = None)[source]

Helper method to handle gemini’s execution

Parameters:

agent – The agent to use (action_agent or output_agent)
prompt – The fully formatted prompt string
context_id – A unique identifier for the current browser window

The context_id is to help in differentiating between different browser windows during parallel execution for BFS mode.

`context_id`=None => There is only one browser session.

Returns:: The raw response from the model. The exact required values are expected to be extraced within each agent
Return type:: response

handle_openai_execution(agent: Any, prompt: str, context_id: str = None)[source]

Helper method to handle OpenAI execution

Parameters:

agent – The agent to use (action_agent or output_agent)
prompt – The fully formatted prompt string
context_id – A unique identifier for the current browser window

The context_id is to help in differentiating between different browser windows during parallel execution for BFS mode.

`context_id`=None => There is only one browser session.

Returns:: The raw response from the model. The exact required values are expected to be extraced within each agent
Return type:: response

handle_vertexai_execution(agent: Any, prompt: str, context_id: str = None)[source]

Helper method to handle VertexAI execution

Parameters:

agent – The agent to use (action_agent or output_agent)
prompt – The fully formatted prompt string
context_id – A unique identifier for the current browser window

The context_id is to help in differentiating between different browser windows during parallel execution for BFS mode.

`context_id`=None => There is only one browser session.

Returns:: The raw response from the model. The exact required values are expected to be extraced within each agent
Return type:: response

initialise_depth_ladder(unique_context_id: str)[source]

Initialises and helps manage the depth-ladder for different browser sessions

Parameters:: unique_context_id – The context ID for the current browser session

update_depth_ladder(unique_context_id: str)[source]

This function helps increments the depth-value for each browser

Parameters:: unique_context_id – The context ID for the browser

Action System 

PlaywrightActionPerformer 

Executes browser actions.

Data Structures 

PlaywrightAction 

The DSL for browser actions.

Bases: BaseModel

The BaseModel for playwright automations

Goal:: This contains an exhaustive list of commands that playwright can execute. It will be filled accordingly by the LLM depending on the DOM recieved from playwright and the goal of the task.

check: str | None

click: str | None

close_page: bool | None

dblclick: str | None

download_selector: str | None

dropdown_field_id: str | None

dropdown_field_value: str | None

evaluate_js: str | None

fill_selector: str | None

fill_value: str | None

go_back: bool | None

go_forward: bool | None

goto: str | None

hover: str | None

keyboard_press: str | None

keyboard_type: str | None

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

mouse_click_x: int | None

mouse_click_y: int | None

mouse_move_x: int | None

mouse_move_y: int | None

new_page: str | None

press_key: str | None

press_selector: str | None

reload: bool | None

right_click: str | None

screenshot_path: str | None

scroll_x: int | None

scroll_y: int | None

select_selector: str | None

select_value: str | None

switch_page_index: int | None

type_selector: str | None

type_text: str | None

uncheck: str | None

upload_path: str | None

upload_selector: str | None

wait_ms: int | None

wait_selector: str | None

wait_timeout: int | None

PlaywrightResponse 

Response format from the PlaywrightAgent.

class pyba.utils.structure.PlaywrightResponse(*, actions: List[PlaywrightAction], extract_info: bool | None)[source]

Bases: BaseModel

actions: List[PlaywrightAction]

extract_info: bool | None

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

CleanedDOM 

Structured representation of page DOM.

class pyba.utils.structure.CleanedDOM(hyperlinks: ~typing.List[str] | None = <factory>, input_fields: ~typing.List[str] | None = <factory>, clickable_fields: ~typing.List[str] | None = <factory>, actual_text: str | None = None, current_url: str | None = None, youtube: str | None = None)[source]