Performance & Benchmarking

Performance Tips 

Use headless mode for faster execution (headless=True)
Enable low memory mode on resource-constrained environments (low_memory=True)
Enable database logging only when you need code generation
Set appropriate max_depth — higher isn’t always better
Use extraction_format when you need structured data

Benchmarking Memory 

You can measure PyBA’s memory usage on your own machine using this script:

import asyncio, os, glob, gc

def chromium_rss():
    total = 0
    for p in glob.glob("/proc/[0-9]*/stat"):
        try:
            pid = int(p.split("/")[2])
            if b"chromium" in open(f"/proc/{pid}/cmdline", "rb").read().lower():
                with open(f"/proc/{pid}/statm") as f:
                    total += int(f.read().split()[1])
        except Exception:
            pass
    return total * os.sysconf("SC_PAGE_SIZE") // 1048576

def py_rss():
    return int(open(f"/proc/{os.getpid()}/statm").read().split()[1]) * os.sysconf("SC_PAGE_SIZE") // 1048576

async def bench(low_memory):
    from pyba import Step
    print(f"--- low_memory={low_memory} ---")
    print(f"Before: python={py_rss()} MB")
    step = Step(gemini_api_key="...", headless=True, low_memory=low_memory, enable_tracing=False)
    await step.start()
    await step.page.goto("https://www.amazon.com", wait_until="domcontentloaded")
    await asyncio.sleep(5)
    try:
        await step.step("Search for headphones")
    except Exception:
        pass
    print(f"Peak: chromium={chromium_rss()} MB, python={py_rss()} MB")
    await step.stop()
    del step
    gc.collect()
    print(f"After: python={py_rss()} MB")

asyncio.run(bench(low_memory=True))

Measured Results 

Tested on Amazon.com, headless mode, Gemini provider:

Metric	`low_memory=True`	`low_memory=False`
Idle (before any session)	~60 MB	~180 MB
Peak (during session)	~940 MB	~940 MB
After session cleanup	~130 MB	~130 MB

Where the Savings Are 

The ~120MB saving is at idle — before any browser session is launched. This comes from lazy-loading heavy Python dependencies:

oxymouse (numpy/scipy): ~46MB
google-genai: ~64MB (skipped when using OpenAI)
openai: ~9MB (skipped when using Gemini)

Peak memory during a session is dominated by Chromium (~800MB) which is unaffected by low_memory. After the first session, lazy-loaded modules remain cached in sys.modules for the process lifetime.

Note

The Chromium flags in low memory mode (--disable-gpu, --disable-dev-shm-usage, etc.) do not measurably reduce browser RSS. They improve stability in containerized environments, especially --disable-dev-shm-usage which prevents OOM when /dev/shm is too small.

Memory Lifecycle in a Server 

If you run PyBA as a long-lived server (e.g. with aiohttp), the memory lifecycle is:

Server starts: ~60MB (with low_memory=True) or ~180MB (without)
First session opens: Jumps to ~600-1000MB (Chromium + lazy-loaded modules)
First session closes: Drops to ~130-220MB (modules stay loaded, Chromium exits)
Subsequent sessions: Same peak, returns to ~130-220MB each time

To ensure memory is returned to the OS after sessions close, set these environment variables in your container:

MALLOC_TRIM_THRESHOLD_=65536
MALLOC_MMAP_THRESHOLD_=65536

These tell glibc’s allocator to return freed pages to the OS more aggressively.

Performance & Benchmarking

Performance Tips

Benchmarking Memory

Measured Results

Where the Savings Are

Memory Lifecycle in a Server

Performance Tips 

Benchmarking Memory 

Measured Results 

Where the Savings Are 

Memory Lifecycle in a Server 