API Reference
This page documents the public API of PyBA. For internal architecture details, see Architecture & Code Walkthrough.
Entry Points
Engine
The main entry point for autonomous browser automation.
- class pyba.core.main.Engine(openai_api_key: str = None, vertexai_project_id: str = None, vertexai_server_location: str = None, gemini_api_key: str = None, headless: bool = False, handle_dependencies: bool = False, use_random: bool = False, use_logger: bool = False, enable_tracing: bool = True, trace_save_directory: str = None, max_depth: int = 100, database: Database = None, model_name: str = None, low_memory: bool = False, secrets: PasswordManager = None, enable_screenshots: bool = False, screenshot_directory: str = None)[source]
Bases:
BaseEngineThe main entrypoint for browser automation. This engine exposes the main entry point which is the run() method
- Parameters:
openai_api_key – API key for OpenAI models should you want to use that
vertexai_project_id – Create a VertexAI project to use that instead of OpenAI
vertexai_server_location – VertexAI server location
gemini_api_key – API key for Gemini-2.5-pro native support without VertexAI
headless – Choose if you want to run in the headless mode or not
handle_dependencies – Choose if you want to automatically install dependencies during runtime
use_logger – Choose if you want to use the logger (that is enable logging of data)
enable_tracing – Choose if you want to enable tracing. This will create a .zip file which you can use in traceviewer
trace_save_directory – The directory where you want the .zip file to be saved
max_depth – The maximum number of actions that you want the model to execute
database – An instance of the Database class which will define all database specific configs
model_name – The model name which you want to run. The default is set to None (because it depends on the provider).
low_memory – Optional parameter, defaults to False for disable some heavy dependencies and running with additional flags.
secrets – A password manager class which implements a resolve() method to give out a dictionary of secrets
Find these default values at pyba/config.yaml.
The Engine is inherited off from the BaseEngine. The BaseEngine handles the common methods for all the modes (default, DFS and BFS). The main Engine decides if execution needs to be passed to a different mode depending on what is set by the user.
- async run(prompt: str = None, automated_login_sites: List[str] = None, extraction_format: BaseModel = None)[source]
The most basic implementation for the run function
- Parameters:
prompt – The user’s instructions. This is a well defined instruction.
automated_login_sites – A list of sites that you want the model to automatically login to using env credentials
extraction_format – A pydantic BaseModel which defines the extraction format for any data extraction
Note:
The extraction_format will be decided based on every action. For example:
```python3 from pydantic import BaseModel from pyba import Engine
task = “Go to hackernews. For each post, extract the title, number of upvotes and comments, and the description too”
- class Output(BaseModel):
# Using optional is a good idea in case the things you’re looking for don’t exist title: Optional[str], num_upvotes: Optional[int], num_comments: Optional[int], desc: Optional[str]
engine = Engine(**kwargs)
await engine.run(task, extraction_format=Output) ```
would return data during the execution, not once it finishes. It will dump it in the database as well, and it decides if data needs to be extracted on an action basis.
Using this feature will NOT cost you any more tokens than usual.
Step (Step-by-Step)
Entry point for interactive step-by-step mode. The user controls the browser one instruction at a time via start(), step(), and stop().
- class pyba.core.lib.mode.step.Step(openai_api_key: str = None, vertexai_project_id: str = None, vertexai_server_location: str = None, gemini_api_key: str = None, headless: bool = False, handle_dependencies: bool = False, use_random: bool = False, use_logger: bool = False, enable_tracing: bool = True, trace_save_directory: str = None, max_actions_per_step: int = 5, database: Database = None, get_output: bool = False, model_name: str = None, low_memory: bool = False, secrets: PasswordManager = None, enable_screenshots: bool = False, screenshot_directory: str = None)[source]
Bases:
BaseEngineStep-by-step browser automation. The user controls the loop externally by calling start(), step(), and stop().
- Parameters:
openai_api_key – API key for OpenAI models should you want to use that
vertexai_project_id – Create a VertexAI project to use that instead of OpenAI
vertexai_server_location – VertexAI server location
gemini_api_key – API key for Gemini-2.5-pro native support without VertexAI
use_random – Enables mouse and scroll randomisations to evade bot detection
headless – Choose if you want to run in the headless mode or not
handle_dependencies – Choose if you want to automatically install dependencies during runtime
use_logger – Choose if you want to use the logger (that is enable logging of data)
enable_tracing – Choose if you want to enable tracing. This will create a .zip file which you can use in traceviewer
trace_save_directory – The directory where you want the .zip file to be saved
database – An instance of the Database class which will define all database specific configs
get_output – When True, asks the model for a summarised output when a step completes. When False (default), step() silently returns None on completion
model_name – The model name which you want to run. The default is set to None (because it depends on the provider).
secrets – A password manager class which implements a resolve() method to give out a dictionary of secrets
- get_step_screenshots() List[bytes][source]
Returns the screenshots captured during the most recent step() call. Each entry is a PNG image in bytes.
- async start(automated_login_sites: List[str] = None)[source]
Creates a persistent browser instance. This needs to be explicitly called by the user when using the Step mode. This handles the automated login for us as well.
- async step(prompt_step: str, extraction_format: BaseModel = None) str | None[source]
The step function is a replica of the Engine.run(). It passes the full action history into context and tries to figure out the best way to achieve the short term prompt given by the user.
- Parameters:
prompt_step – A single stepwise prompt given by the user (This might require more than one steps)
extraction_format – The final extraction format IF NEEDED
For every step() call, we create a StepRunContext() with a unique ID. This ID can be used to cancel this particular step. For reference, please see structure.py.
DFS (Depth-First Search)
Entry point for deep exploration mode.
- class pyba.core.lib.mode.DFS.DFS(openai_api_key: str = None, vertexai_project_id: str = None, vertexai_server_location: str = None, gemini_api_key: str = None, headless: bool = False, handle_dependencies: bool = False, use_random: bool = False, use_logger: bool = False, max_depth: int = 5, max_breadth: int = 5, enable_tracing: bool = True, trace_save_directory: str = None, database: Database = None, model_name: str = None, low_memory: bool = False, secrets: PasswordManager = None, enable_screenshots: bool = False, screenshot_directory: str = None)[source]
Bases:
BaseEngineMethods for handling DFS exploratory searches. The BaseEngine initialises the provider and with that the playwright action and output agents.
This is another entry point engine and can be directly imported by the user.
The following params are defined:
- Parameters:
openai_api_key – API key for OpenAI models should you want to use that
vertexai_project_id – Create a VertexAI project to use that instead of OpenAI
vertexai_server_location – VertexAI server location
gemini_api_key – API key for Gemini-2.5-pro native support without VertexAI
headless – Choose if you want to run in the headless mode or not
handle_dependencies – Choose if you want to automatically install dependencies during runtime
use_logger – Choose if you want to use the logger (that is enable logging of data)
max_depth – The maximum depth to go into for each plan, where each level of depth corresponds to an action
max_breadth – The number of plans to execute one by one in depth
enable_tracing – Choose if you want to enable tracing. This will create a .zip file which you can use in traceviewer
trace_save_directory – The directory where you want the .zip file to be saved
database – An instance of the Database class which will define all database specific configs
model_name – The model name which you want to run. The default is set to None (because it depends on the provider).
secrets – A password manager class which implements a resolve() method to give out a dictionary of secrets
Find these default values at pyba/config.yaml.
- async run(prompt: str, automated_login_sites: List[str] = None, extraction_format: BaseModel = None) str | None[source]
Run pyba in DFS mode.
- Parameters:
prompt – The task assigned to DFS by the user
automated_login_sites – Login site name for pre-written scripts to run
extraction_format – A pydantic BaseModel which defines the extraction format for any data extraction
The task is fed into the planner to get a plan which is then passed to the action models to fetch an actionable element.
BFS (Breadth-First Search)
Entry point for wide exploration mode.
- class pyba.core.lib.mode.BFS.BFS(openai_api_key: str = None, vertexai_project_id: str = None, vertexai_server_location: str = None, gemini_api_key: str = None, headless: bool = False, handle_dependencies: bool = False, use_logger: bool = False, max_depth: int = 5, max_breadth: int = 5, enable_tracing: bool = True, trace_save_directory: str = None, database: Database = None, model_name: str = None, low_memory: bool = False, secrets: PasswordManager = None, enable_screenshots: bool = False, screenshot_directory: str = None)[source]
Bases:
BaseEngineMethods for handling BFS exploratory searches. The BaseEngine initialises the provider and with that the playwright action and output agents.
This is another entry point engine and can be directly imported by the user.
The following params are defined:
- Parameters:
openai_api_key – API key for OpenAI models should you want to use that
vertexai_project_id – Create a VertexAI project to use that instead of OpenAI
vertexai_server_location – VertexAI server location
gemini_api_key – API key for Gemini-2.5-pro native support without VertexAI
headless – Choose if you want to run in the headless mode or not
handle_dependencies – Choose if you want to automatically install dependencies during runtime
use_logger – Choose if you want to use the logger (that is enable logging of data)
max_depth – The maximum depth to go into for each plan, where each level of depth corresponds to an action
max_breadth – The number of plans to execute one by one in depth
enable_tracing – Choose if you want to enable tracing. This will create a .zip file which you can use in traceviewer
trace_save_directory – The directory where you want the .zip file to be saved
database – An instance of the Database class which will define all database specific configs
model_name – The model name which you want to run. The default is set to None (because it depends on the provider).
secrets – A password manager class which implements a resolve() method to give out a dictionary of secrets
Find these default values at pyba/config.yaml.
- async run(prompt: str, automated_login_sites: List[str] = None, extraction_format: BaseModel = None) List[source]
The async run function
- Parameters:
prompt – The prompt which needs to be converted to plans
automated_login_sites – List of names for which sites to login automatically
extraction_format – The extraction format for any extraction that needs to be done
- Returns:
List
- sync_run(prompt: str, automated_login_sites: List[str] = None, extraction_format: BaseModel = None)[source]
Synchronous endpoint for running BFS mode.
- Parameters:
prompt – The prompt which needs to be converted to plans
automated_login_sites – List of names for which sites to login automatically
extraction_format – The extraction format for any extraction that needs to be done
Database
Database Configuration
- class pyba.database.database.Database(engine: Literal['sqlite', 'postgres', 'mysql'], name: str = None, host: str = None, port: int = None, username: str = None, password: str = None, ssl_mode: Literal['disable', 'require'] = None)[source]
Bases:
objectClient-side database interface that minimizes config usage.
- build_connection_string(engine_name: Literal['sqlite', 'postgres', 'mysql']) str[source]
Builds connection URLs for different database engines for SQLAlchemy usage.
- Parameters:
engine_name – The database engine name for initialization.
- Returns:
Connection string for SQLAlchemy.
Database Functions
- class pyba.database.db_funcs.DatabaseFunctions(database: Database)[source]
Bases:
objectComposition class for database operations.
- get_all_bfs_contexts_by_session(session_id: str) List[BFSEpisodicMemory] | None[source]
Retrieves all BFS context records for a given session.
- Parameters:
session_id – The parent session ID to query for.
- Returns:
A list of BFSEpisodicMemory objects for all contexts in the session, or None if no records found or error occurred.
- get_bfs_episodic_memory_by_context(session_id: str, context_id: str) BFSEpisodicMemory | None[source]
Retrieves a specific BFS context’s episodic memory. Needs both the session_id and the context_id to retrieve the correct record.
- Parameters:
session_id – The parent session ID.
context_id – The specific context ID to retrieve.
- Returns:
A BFSEpisodicMemory object if found, else None
- get_episodic_memory_by_session_id(session_id: str) EpisodicMemory | None[source]
Retrieves an episodic memory record by its session_id.
- Parameters:
session_id – The unique session ID to query for.
- Returns:
An EpisodicMemory object if found, else None.
- get_semantic_memory_by_session_id(session_id: str) SemanticMemory | None[source]
Retrieves semantic memory from the database.
- Parameters:
session_id – The unique session ID to query for.
- Returns:
A SemanticMemory object if found, else None.
- push_to_bfs_episodic_memory(session_id: str, context_id: str, action: str, page_url: str) bool[source]
Pushes a new action and page_url for a specific BFS context. Creates a new record if the (session_id, context_id) pair doesn’t exist, otherwise appends to the existing record.
Note: This function uses a composite primary key of (session_id, context_id) to allow multiple browser windows per session.
- Parameters:
session_id – The parent session ID for the BFS run.
context_id – The unique context ID for this browser window.
action – The action string to be pushed.
page_url – The page URL string to be pushed.
- Returns:
True if the operation was successful, otherwise False.
- push_to_episodic_memory(session_id: str, action: str, page_url: str, action_status: bool, fail_reason: str = None) bool[source]
Pushes a new action and page_url onto the stack for a given session_id. It retrieves the existing record, appends the new values as JSON strings, and updates/inserts the record.
- Parameters:
session_id – The unique session ID.
action – The action string to be pushed.
page_url – The page URL string to be pushed.
action_status – The success or failure of the current action (True for success, False for failure).
fail_reason – A string describing why a particular action failed (defaults to None on success).
- Returns:
True if the operation was successful, otherwise False.
Core Components
BaseEngine
The base class for all engine modes.
- class pyba.core.lib.mode.base.BaseEngine(headless: bool = True, enable_tracing: bool = True, trace_save_directory: str = None, database=None, use_random=None, use_logger: bool = None, mode: Literal['DFS', 'BFS', 'Normal', 'STEP'] = None, handle_dependencies: bool = False, openai_api_key: str = None, vertexai_project_id: str = None, vertexai_server_location: str = None, gemini_api_key: str = None, model_name: str = None, low_memory: bool = False, secrets: PasswordManager = None, enable_screenshots: bool = False, screenshot_directory: str = None)[source]
Bases:
objectA reusable base class that encapsulates the shared browser lifecycle, tracing, DOM extraction, and utility helpers.
The following will be initialised by the BaseEngine:
db_funcs: Initializes the database functions to be used for inserting and querying logs
mode: The mode of operation (DFS, BFS or Normal), read the relevant documentation in pyba.readthedocs.io
provider_instance: This will detect the provider you’re using, either OpenAI, VertexAI and Gemini
playwright_agent: The actual playwright agent setup via the provider
secrets_manager: The secrets manager provided by the user, it must have a resolve() method
- async attempt_login(page=None) bool[source]
Helper function to attempt and perform a login to chosen sites. This is backwards compatible with Engine and DFS while it supports BFS by pinning the page down.
- Parameters:
page – Optional argument to pin the page for removing self dependency
- Returns:
A boolean to indicate the success or failure for the attempt
- Return type:
flag
The login attempt may fail due to two reasons:
The current page is not a login page
Some selectors changed due to which the login engine returned None
Note that the LoginEngines are hardcoded engines for speed.
- async extract_dom(page=None)[source]
Extracts the relevant fields from the DOM of the current page and returns the DOM dataclass. This is backwards compatible with Engine and DFS while it supports BFS by pinning the page down.
- Parameters:
page – Optional argument to pin the page for removing self dependency
- fetch_action(cleaned_dom: Dict, user_prompt: str, action_history: str = None, extraction_format: BaseModel = None, context_id: str = None, fail_reason: str = None, action_status: bool = None)[source]
Helper function to fetch an actionable PlaywrightResponse element
- Parameters:
cleaned_dom – The DOM for the current page
user_prompt – The actual task given by the user
action_history – The full natural language history of actions taken so far
extraction_format – The extraction format requested by the user.
context_id – A unique identifier for this browser window (useful when multiple windows)
fail_reason – The reason for the failure of the previous action
action_status – A boolean to indicate if the previous action was successful or not
For an explanation of the extraction_format read the main file documentation.
- Returns:
An actionable playwrightresponse element
- Return type:
action
- generate_code(output_path: str) bool[source]
Function end-point for code generation
- Parameters:
output_path – output file path to save the generated code to
- async generate_output(action: str, cleaned_dom: CleanedDOM, prompt: str)[source]
Helper function to generate the output if the action has been completed.
- Parameters:
action – The action as given out by the model
cleaned_dom – The latest cleaned_dom for the model to read
prompt – The prompt which was given to the model
- get_screenshots() List[bytes][source]
Returns the list of screenshot bytes captured so far. Each entry is a PNG image in bytes, ordered by capture time.
If a screenshot_directory was specified, this returns an empty list since images are saved to disk instead.
- async get_trace_context(browser_instance=None)[source]
Initialises the browser context with tracing configuration. Accepts an optional browser instance to support BFS mode.
- Parameters:
browser_instance – Optional argument to pin the browser session down
- Returns:
The playwright to be used for automation
- Return type:
context
- async retry_perform_action(cleaned_dom: Dict, prompt: str, action_history: str, action_status: bool, fail_reason: str, extraction_format: BaseModel = None, page=None, mem=None) str | None[source]
Helper function to retry the action after a failure. This is backwards compatible with Engine and DFS while it supports BFS by pinning the page down.
- Parameters:
cleaned_dom – The new cleaned DOM for the current page
prompt – The original prompt given by the user
action_history – The full natural language history of actions taken so far
action_status – Boolean indicating the previous action’s success or failure
fail_reason – Reason for the failure for the action
extraction_format – In case the current page needs extraction as well
page – Optional argument to pin the page down to remove self dependency
mem – Optional MemDSL instance (BFS passes its per-window instance)
This function will retry the action based on the current DOM and the past action. This should most likely fix the issue of a stale element or a hallucinated component or something.
- Returns:
If the action was successful and automation is completed None: The usual case where an action is performed
- Return type:
output
- async save_trace(context=None)[source]
Saves the trace if tracing is enabled. Accepts an optional context to support BFS mode where multiple browser contexts exist.
- Parameters:
context – Optional argument to pin the browser context down
- static set_secrets(secrets: Dict[str, str])[source]
Method to set the environment for the browser using the secrets manager provided by the user.
Note: This relies on the secret manager class implementing a “resolve() -> dict[str, str]” method.
- async shut_down(context=None, browser=None)[source]
Closes the browser context and browser instance. Accepts optional arguments to support BFS mode where multiple browsers exist.
- Parameters:
context – Optional browser context to close
browser – Optional argument to pin the browser instance down
- async successful_login_clean_and_get_dom(page=None)[source]
Helper function to obtain the cleaned_dom after a successful login. This is backwards compatible with Engine and DFS while it supports BFS by pinning the page down.
- Parameters:
page – Optional argument to pin the page for removing self dependency
Functionality:
Cleans the automated_login_engine_classes list (that is, we’re assuming only 1 login session
for each run) - Gets the latest page contents and parses the DOM using the extraction engine
- async wait_till_loaded(page=None)[source]
Helper function to wait till load state while applying random jitters (if specified by the user). This is backwards compatible with Engine and DFS while it supports BFS by pinning the page down.
- Parameters:
page – Optional argument to pin the page for removing self dependency
Provider
LLM provider selection and configuration.
- class pyba.core.provider.Provider(openai_api_key: str = None, gemini_api_key: str = None, vertexai_project_id: str = None, vertexai_server_location: str = None, model_name: str = None)[source]
Bases:
objectClass to handle the provider instances.
- handle_keys()[source]
Handles provider selection, defaults to openai when multiple providers conflict
- handle_model(provider: str)[source]
Helper function that manages model selection based on the keys chosen.
Note
The default models in config will be used if model name is not provided by the user. The list of valid model names will be present in the config file as well.
- Parameters:
provider – The name of the provider in question
Agents
PlaywrightAgent
The agent responsible for deciding browser actions.
- class pyba.core.agent.playwright_agent.PlaywrightAgent(engine)[source]
Bases:
BaseAgentDefines the playwright agent’s actions
- Provides two endpoints:
process_action: for returning the right action on a page
get_output: for summarizing the chat and returning a string
- get_output(cleaned_dom: Dict[str, List | str], user_prompt: str, context_id: str = None) str[source]
Gets the final text output from the model based on the current page state.
- process_action(cleaned_dom: Dict[str, List | str], user_prompt: str, action_history: str = None, fail_reason: str = None, extraction_format: BaseModel = None, context_id: str = None, action_status: bool = None) PlaywrightResponse[source]
Processes the current DOM and returns the next PlaywrightAction to execute.
- Parameters:
cleaned_dom – Dictionary of extracted DOM elements (hyperlinks, input_fields, clickable_fields, actual_text).
user_prompt – The user’s task instruction.
action_history – The full natural language history of actions taken so far.
fail_reason – Reason the previous action failed, if applicable.
extraction_format – Pydantic model defining the extraction output schema.
context_id – Unique identifier for this browser window (used in BFS mode).
action_status – Whether the previous action succeeded.
- Returns:
A PlaywrightAction to execute next, or None if the task is complete.
PlannerAgent
The agent for generating exploration plans (DFS/BFS).
- class pyba.core.agent.planner_agent.PlannerAgent(engine)[source]
Bases:
BaseAgentPlanner agent for DFS and BFS exploration modes. Generates execution plans that are then carried out by the action agent.
- Parameters:
engine – Engine instance holding all user-provided configuration.
- generate(task: str, old_plan: str = None) PlannerAgentOutputBFS | PlannerAgentOutputDFS[source]
Generates exploration plan(s) based on the current mode.
- Parameters:
task – The user’s exploratory task.
old_plan – The previous plan to diverge from (DFS mode only).
- Returns:
A plan string (DFS) or list of plan strings (BFS).
BaseAgent
Base class for all agents with retry logic.
- class pyba.core.agent.base_agent.BaseAgent(engine)[source]
Bases:
objectBase class for all agents. Provides LLM execution with exponential backoff and retry logic. The backoff is blocking per context to avoid overwhelming rate-limited APIs.
Defines the following variables:
exponential_base: 2 (we’re using base 2) base_timeout: 1 second max_backoff_time: 60 seconds attempt_number: The current attempt number initialised to 1 LLMFactory: The internal agent call is made by agent itself log: The logger for the agents
- calculate_next_time(attempt_number)[source]
Calculates the next backoff wait time in seconds using exponential backoff with jitter.
- Parameters:
attempt_number – The number of consecutive failed attempts.
- handle_gemini_execution(agent: Any, prompt: str, context_id: str = None)[source]
Helper method to handle Gemini execution
- Parameters:
agent – The agent to use (action_agent or output_agent)
prompt – The fully formatted prompt string
context_id – A unique identifier for the current browser window
The context_id is to help in differentiating between different browser windows during parallel execution for BFS mode.
`context_id`=None => There is only one browser session.
- Returns:
The raw response from the model. The exact required values are expected to be extracted within each agent.
- Return type:
response
- handle_openai_execution(agent: Any, prompt: str, context_id: str = None)[source]
Helper method to handle OpenAI execution
- Parameters:
agent – The agent to use (action_agent or output_agent)
prompt – The fully formatted prompt string
context_id – A unique identifier for the current browser window
The context_id is to help in differentiating between different browser windows during parallel execution for BFS mode.
`context_id`=None => There is only one browser session.
- Returns:
The raw response from the model. The exact required values are expected to be extracted within each agent.
- Return type:
response
- handle_vertexai_execution(agent: Any, prompt: str, context_id: str = None)[source]
Helper method to handle VertexAI execution
- Parameters:
agent – The agent to use (action_agent or output_agent)
prompt – The fully formatted prompt string
context_id – A unique identifier for the current browser window
The context_id is to help in differentiating between different browser windows during parallel execution for BFS mode.
`context_id`=None => There is only one browser session.
- Returns:
The raw response from the model. The exact required values are expected to be extracted within each agent.
- Return type:
response
Action System
PlaywrightActionPerformer
Executes browser actions.
- class pyba.core.lib.action.PlaywrightActionPerformer(page: Page, action: PlaywrightAction)[source]
Bases:
objectThe playwright automation class. To add new handles, make a function here and define that under perform()
Below is an exhaustive set of playwright actions that the handler will manage and the dispatcher will execute
- Navigation functions
handle_navigation
handle_back
handle_forward
handle_reload
- Interaction functions
handle_input
handle_typing
handle_click
handle_double_click
handle_hover
handle_checkboxes
handle_select
handle_file_upload
- Keyboard/mouse functions
handle_press
handle_keyboard_press
handle_keyboard_type
handle_mouse_move
handle_mouse_click
- Scrolling
handle_scrolling
- Waits
handle_wait
- Javascript functions
handle_evaluate_js
handle_screenshot
handle_download
- New pages
handle_switch_page
handle_new_page
handle_close_page
- async handle_click()[source]
Handles clicking elements. Has additional checks to ensure that the element is not actually a relational hyperlink.
This is done in the following ways:
We first check if the click element is actually an <a> tag
Or if it has a closest ancestor <a> tag
In either case we extract the href from that <a> tag and directly goto that
- async handle_dropdown_click()[source]
Dispatch function to handle dropdown menus. This function requires both the field_id and the field_value to be specified in the single action.
- async handle_evaluate_js()[source]
Handles the evaluation of Javascript in the browser environment and brings the result back to the code.
This is the recommended way to using it.
`js const href = await page.evaluate(() => document.location.href); `We strip the js snippet here for any return statements because those aren’t required for inline functions.
Handles browser navigation by opening new websites. Waits until the page is loaded.
Data Structures
PlaywrightAction
The DSL for browser actions.
- class pyba.utils.structure.PlaywrightAction(*, goto: str | None = None, go_back: bool | None = None, go_forward: bool | None = None, reload: bool | None = None, click: str | None = None, dblclick: str | None = None, hover: str | None = None, right_click: str | None = None, dropdown_field_id: str | None = None, dropdown_field_value: str | None = None, fill_selector: str | None = None, fill_value: str | None = None, type_selector: str | None = None, type_text: str | None = None, press_selector: str | None = None, press_key: str | None = None, check: str | None = None, uncheck: str | None = None, select_selector: str | None = None, select_value: str | None = None, upload_selector: str | None = None, upload_path: str | None = None, scroll_x: int | None = None, scroll_y: int | None = None, wait_selector: str | None = None, wait_timeout: int | None = None, wait_ms: int | None = None, keyboard_press: str | None = None, keyboard_type: str | None = None, mouse_move_x: int | None = None, mouse_move_y: int | None = None, mouse_click_x: int | None = None, mouse_click_y: int | None = None, new_page: str | None = None, close_page: bool | None = None, switch_page_index: int | None = None, evaluate_js: str | None = None, screenshot_path: str | None = None, download_selector: str | None = None)[source]
Bases:
BaseModel- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
PlaywrightResponse
Response format from the PlaywrightAgent.
- class pyba.utils.structure.PlaywrightResponse(*, actions: List[PlaywrightAction], extract_info: bool | None)[source]
Bases:
BaseModel- actions: List[PlaywrightAction]
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
CleanedDOM
Structured representation of page DOM.
- class pyba.utils.structure.CleanedDOM(hyperlinks: ~typing.List[str] | None = <factory>, input_fields: ~typing.List[str] | None = <factory>, clickable_fields: ~typing.List[str] | None = <factory>, actual_text: str | None = None, current_url: str | None = None, youtube: str | None = None)[source]
Bases:
objectRepresents the cleaned DOM snapshot of the current browser page.
Additional parameter for the youtube DOM extraction
Login Handlers
BaseLogin
Base class for automated login handlers.
Code Generation
CodeGeneration
Generates standalone Playwright scripts.
- class pyba.core.lib.code_generation.CodeGeneration(session_id: str, output_path: str, database_funcs: DatabaseFunctions)[source]
Bases:
objectCreate the full automation code used by the model
Requires the database to be populated with all the actions
Pulls action from the database and writes the script at a user location
- Parameters:
session_id – The unique identifier for this session
output_path – Path to save the code to
database_funcs – The Database instantiated by the user
- SELECTOR_VALUE_PAIRS = {'fill_selector': 'fill_value', 'press_selector': 'press_key', 'select_selector': 'select_value', 'type_selector': 'type_text', 'upload_selector': 'upload_path'}
- TEMPLATES = {'check': 'page.check("{value}")', 'click': 'page.click("{value}")', 'close_page': 'page.close()', 'dblclick': 'page.dblclick("{value}")', 'download_selector': 'with page.expect_download() as download_info:\n page.click("{value}")\ndownload = download_info.value\ndownload.save_as(download.suggested_filename)', 'dropdown_field_id': 'page.locator("{selector}").select_option(label="{value}")', 'evaluate_js': 'page.evaluate({value})', 'fill_selector': 'page.fill("{selector}", "{value}")', 'go_back': 'page.go_back()', 'go_forward': 'page.go_forward()', 'goto': 'page.goto("{value}")', 'hover': 'page.hover("{value}")', 'keyboard_press': 'page.keyboard.press("{value}")', 'keyboard_type': 'page.keyboard.type("{value}")', 'mouse_click_x': 'page.mouse.click({x}, {y})', 'mouse_move_x': 'page.mouse.move({x}, {y})', 'new_page': 'page.context.new_page().goto("{value}")', 'press_selector': 'page.press("{selector}", "{value}")', 'reload': 'page.reload()', 'right_click': 'page.click("{value}", button="right")', 'screenshot_path': 'page.screenshot(path="{value}")', 'scroll_x': 'page.mouse.wheel({x}, {y})', 'select_selector': 'page.select_option("{selector}", "{value}")', 'switch_page_index': 'page = page.context.pages[{value}]', 'type_selector': 'page.type("{selector}", "{value}")', 'uncheck': 'page.uncheck("{value}")', 'upload_selector': 'page.set_input_files("{selector}", "{value}")', 'wait_ms': 'page.wait_for_timeout({value})', 'wait_selector': 'page.wait_for_selector("{value}", timeout={timeout})'}
- XY_PAIRS = {'mouse_click_x': 'mouse_click_y', 'mouse_move_x': 'mouse_move_y', 'scroll_x': 'scroll_y'}
Dependencies
HandleDependencies
Manages Playwright browser installation.
Exceptions
- exception pyba.utils.exceptions.ActionError(message: str, cause: Exception = None)[source]
Bases:
PybaErrorAn action dispatched to Playwright failed.
- exception pyba.utils.exceptions.ActionTimeoutError(message: str, cause: Exception = None)[source]
Bases:
ActionErrorA Playwright action exceeded its timeout.
- exception pyba.utils.exceptions.CannotResolveError[source]
Bases:
ExceptionException to be rasied when the user provides a PasswordManager class which requires positional arguments to be specified.
- exception pyba.utils.exceptions.CredentialsNotSpecified(site_name: str)[source]
Bases:
ExceptionException raised in the login scripts when the relevant credentials haven’t been specified
- exception pyba.utils.exceptions.DatabaseNotInitialised[source]
Bases:
ExceptionException to be raised when the user asks for automation code generation but has not initialised the database!
- exception pyba.utils.exceptions.ElementNotFoundError(message: str, cause: Exception = None)[source]
Bases:
ActionErrorA selector did not match any element on the page.
- exception pyba.utils.exceptions.IncorrectMode(mode: str)[source]
Bases:
ExceptionException to be raised when the mode specified by the user is incorrect
- exception pyba.utils.exceptions.InvalidModelSelected(model_name: str, provider: str, provider_valid_models: list)[source]
Bases:
ExceptionException to be raised when the model chosen by the user doesn’t fall under the provider for whom the keys are specified
- exception pyba.utils.exceptions.LLMError(message: str, cause: Exception = None)[source]
Bases:
PybaErrorThe LLM provider returned an error or an unparseable response.
- exception pyba.utils.exceptions.LLMRateLimitError(message: str, cause: Exception = None)[source]
Bases:
LLMErrorThe LLM provider rate-limited the request.
- exception pyba.utils.exceptions.LLMResponseParseError(message: str, cause: Exception = None)[source]
Bases:
LLMErrorThe LLM returned a response that could not be parsed into an action.
Bases:
ActionErrorA page navigation (goto, back, forward, reload) failed.
- exception pyba.utils.exceptions.PromptNotPresent[source]
Bases:
ExceptionThis exception is raised when the user forgets to enter a prompt to the engine
- exception pyba.utils.exceptions.PybaError(message: str, cause: Exception = None)[source]
Bases:
ExceptionBase class for all structured runtime errors raised by Pyba.
Every subclass carries a human-readable
messageand the originalcauseexception (if any) so callers can inspect both without parsing tracebacks.
- exception pyba.utils.exceptions.ServerLocationUndefined(server_location)[source]
Bases:
ExceptionThis exception is raised when the user doesn’t define the server location for a VertexAI project.
- exception pyba.utils.exceptions.ServiceNotSelected[source]
Bases:
ExceptionThis exception is raised when the user doesn’t set an API key in the engine