API Reference
This page documents the public API of PyBA. For internal architecture details, see Architecture & Code Walkthrough.
Entry Points
Engine
The main entry point for autonomous browser automation.
Step (Step-by-Step)
Entry point for interactive step-by-step mode. The user controls the browser one instruction at a time via start(), step(), and stop().
DFS (Depth-First Search)
Entry point for deep exploration mode.
BFS (Breadth-First Search)
Entry point for wide exploration mode.
Database
Database Configuration
Database Functions
Core Components
BaseEngine
The base class for all engine modes.
Provider
LLM provider selection and configuration.
Agents
PlaywrightAgent
The agent responsible for deciding browser actions.
- class pyba.core.agent.playwright_agent.PlaywrightAgent(engine)[source]
Bases:
BaseAgentDefines the playwright agent’s actions
- Provides two endpoints:
process_action: for returning the right action on a page
get_output: for summarizing the chat and returning a string
- get_output(cleaned_dom: Dict[str, List | str], user_prompt: str, context_id: str = None) str[source]
Method to get the final output from the model if the user requested for one
- process_action(cleaned_dom: Dict[str, List | str], user_prompt: str, previous_action: str = None, fail_reason: str = None, extraction_format: BaseModel = None, context_id: str = None, action_status: bool = None) PlaywrightResponse[source]
Method to process the DOM and provide an actionable playwright response
- Parameters:
cleaned_dom – Dictionary of the extracted items from the DOM - hyperlinks: List - input_fields (basically all fillable boxes): List - clickable_fields: List - actual_text: string
user_prompt – The instructions given by the user
previous_action – The previous executed action
fail_reason – Holds the fail-reason should the previous task fail
extraction_format – The extraction format for the task
context_id – A unique identifier for this browser window (useful when multiple windows)
fail_reason – The reason for failure of the previous action (None if not provided => Action passed)
action_status – The success or the failure of an action
- output:
A predefined pydantic model called PlaywrightResponse which defines our DSL
PlannerAgent
The agent for generating exploration plans (DFS/BFS).
- class pyba.core.agent.planner_agent.PlannerAgent(engine)[source]
Bases:
BaseAgentPlanner agent for DFS and BFS modes under exploratory cases. This is inheriting off from the Retry class as well and supports all agents under LLM_factory.
- Parameters:
engine – Engine to hold all arguments provided by the user
Initialises the max_breadth for the maximum number of plans to generate for BFS mode
Note
context_id is not relevant here because this is a higer level class
- generate(task: str, old_plan: str = None) PlannerAgentOutputBFS | PlannerAgentOutputDFS[source]
Endpoint to generate the plan(s) depending on the set mode (the agent encodes the mode)
- Parameters:
task – The task provided by the user
old_plan – The previous plan if using DFS mode
- Function:
Takes in the user prompt which serves as the task for the model to perform
Depending on DFS or BFS mode generates plan(s)
BaseAgent
Base class for all agents with retry logic.
- class pyba.core.agent.base_agent.BaseAgent(engine)[source]
Bases:
objectThe base class for all Agents to define common methods
Contains methods for exponential backoff and retry as well Note: this backoff and retry will be blocking for that specific context.
Defines the following variables:
exponential_base: 2 (we’re using base 2) base_timeout: 1 second max_backoff_time: 60 seconds attempt_number: The current attempt number initialised to 1 LLMFactory: The internal agent call is made by agent itself log: The logger for the agents
- calculate_next_time(attempt_number)[source]
Function to calculate the next wait time in seconds
- Parameters:
attempt_number – The number of failed attempts
- handle_gemini_execution(agent: Any, prompt: str, context_id: str = None)[source]
Helper method to handle gemini’s execution
- Parameters:
agent – The agent to use (action_agent or output_agent)
prompt – The fully formatted prompt string
context_id – A unique identifier for the current browser window
The context_id is to help in differentiating between different browser windows during parallel execution for BFS mode.
`context_id`=None => There is only one browser session.
- Returns:
The raw response from the model. The exact required values are expected to be extraced within each agent
- Return type:
response
- handle_openai_execution(agent: Any, prompt: str, context_id: str = None)[source]
Helper method to handle OpenAI execution
- Parameters:
agent – The agent to use (action_agent or output_agent)
prompt – The fully formatted prompt string
context_id – A unique identifier for the current browser window
The context_id is to help in differentiating between different browser windows during parallel execution for BFS mode.
`context_id`=None => There is only one browser session.
- Returns:
The raw response from the model. The exact required values are expected to be extraced within each agent
- Return type:
response
- handle_vertexai_execution(agent: Any, prompt: str, context_id: str = None)[source]
Helper method to handle VertexAI execution
- Parameters:
agent – The agent to use (action_agent or output_agent)
prompt – The fully formatted prompt string
context_id – A unique identifier for the current browser window
The context_id is to help in differentiating between different browser windows during parallel execution for BFS mode.
`context_id`=None => There is only one browser session.
- Returns:
The raw response from the model. The exact required values are expected to be extraced within each agent
- Return type:
response
Action System
PlaywrightActionPerformer
Executes browser actions.
Data Structures
PlaywrightAction
The DSL for browser actions.
- class pyba.utils.structure.PlaywrightAction(*, goto: str | None = None, go_back: bool | None = None, go_forward: bool | None = None, reload: bool | None = None, click: str | None = None, dblclick: str | None = None, hover: str | None = None, right_click: str | None = None, dropdown_field_id: str | None = None, dropdown_field_value: str | None = None, fill_selector: str | None = None, fill_value: str | None = None, type_selector: str | None = None, type_text: str | None = None, press_selector: str | None = None, press_key: str | None = None, check: str | None = None, uncheck: str | None = None, select_selector: str | None = None, select_value: str | None = None, upload_selector: str | None = None, upload_path: str | None = None, scroll_x: int | None = None, scroll_y: int | None = None, wait_selector: str | None = None, wait_timeout: int | None = None, wait_ms: int | None = None, keyboard_press: str | None = None, keyboard_type: str | None = None, mouse_move_x: int | None = None, mouse_move_y: int | None = None, mouse_click_x: int | None = None, mouse_click_y: int | None = None, new_page: str | None = None, close_page: bool | None = None, switch_page_index: int | None = None, evaluate_js: str | None = None, screenshot_path: str | None = None, download_selector: str | None = None)[source]
Bases:
BaseModelThe BaseModel for playwright automations
- Goal:
This contains an exhaustive list of commands that playwright can execute. It will be filled accordingly by the LLM depending on the DOM recieved from playwright and the goal of the task.
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
PlaywrightResponse
Response format from the PlaywrightAgent.
- class pyba.utils.structure.PlaywrightResponse(*, actions: List[PlaywrightAction], extract_info: bool | None)[source]
Bases:
BaseModel- actions: List[PlaywrightAction]
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
CleanedDOM
Structured representation of page DOM.
- class pyba.utils.structure.CleanedDOM(hyperlinks: ~typing.List[str] | None = <factory>, input_fields: ~typing.List[str] | None = <factory>, clickable_fields: ~typing.List[str] | None = <factory>, actual_text: str | None = None, current_url: str | None = None, youtube: str | None = None)[source]
Bases:
objectRepresents the cleaned DOM snapshot of the current browser page.
Additional parameter for the youtube DOM extraction
Login Handlers
BaseLogin
Base class for automated login handlers.
Code Generation
CodeGeneration
Generates standalone Playwright scripts.
Dependencies
HandleDependencies
Manages Playwright browser installation.
Exceptions
- exception pyba.utils.exceptions.CredentialsnotSpecified(site_name: str)[source]
Bases:
ExceptionException raised in the login scripts when the relevant credentials haven’t been specified
- exception pyba.utils.exceptions.DatabaseNotInitialised[source]
Bases:
ExceptionException to be raised when the user asks for automation code generation but has not initialised the database!
- exception pyba.utils.exceptions.IncorrectMode(mode: str)[source]
Bases:
ExceptionException to be raised when the mode specified by the user is incorrect
- exception pyba.utils.exceptions.InvalidModelSelected(model_name: str, provider: str, provider_valid_models: list)[source]
Bases:
ExceptionException to be raised when the model chosen by the user doesn’t fall under the provider for whom the keys are specified
- exception pyba.utils.exceptions.PromptNotPresent[source]
Bases:
ExceptionThis exception is raised when the user forgets to enter a prompt to the engine
- exception pyba.utils.exceptions.ServerLocationUndefined(server_location)[source]
Bases:
ExceptionThis exception is raised when the user doesn’t define the server location for a VertexAI project.
- exception pyba.utils.exceptions.ServiceNotSelected[source]
Bases:
ExceptionThis exception is raised when the user doesn’t set an API key in the engine