rtrvr.ai: Retrieve, Research, Automate the Web with AI

The field of AI Web Agents is rapidly evolving, promising to revolutionize how we interact with the internet. These cutting-edge technologies automate tasks, extract valuable data, and streamline workflows directly within web browsers and applications. This report provides a comprehensive analysis of four leading AI Web Agent platforms: rtrvr.ai, OpenAI's Operator, Convergence Lab's Proxy, and the Browser Use open-source library. We'll deep dive into their core features, strengths, and weaknesses across key metrics such as security, cost, website accessibility, performance, and integration with other tools to offer a comparative guide for users and developers.

Fundamental Design Choices in AI Web Agents

Before diving into individual agents, it's crucial to understand the fundamental design choices that shape their capabilities and limitations. These choices impact everything from website compatibility to security and the types of tasks an agent can effectively perform. Let's explore some of the key architectural decisions:

Local vs. Remote Cloud Browsers

One of the primary design decisions is whether the AI agent operates using a local browser on your own device or a remote browser hosted in the cloud.

Remote Cloud Browsers: Agents like OpenAI Operator and Convergence Lab's Proxy utilize cloud-hosted browsers. This approach offers scalability and removes the processing burden from your local machine. However, it introduces significant challenges. Websites are increasingly sophisticated at detecting and blocking traffic originating from cloud IP addresses, often flagging them as bots. This leads to issues like:
- Bot Detections and CAPTCHAs: Cloud-based agents frequently encounter CAPTCHAs and bot detection mechanisms, interrupting workflows and requiring human intervention.
- Website Incompatibility: Some websites, particularly those with robust security measures like YouTube and LinkedIn, may simply not function correctly or block access entirely when accessed from cloud IPs.
Local Web Agents: Agents like rtrvr.ai and Browser Use, when implemented locally, operate within your own browser on your device. This offers a significant advantage in terms of website accessibility.
- Bypass Bot Detection: Because network requests originate from your familiar local IP address, local agents largely bypass bot detection and CAPTCHA challenges.
- Limited Permissions: Rtrvr is only agent not using Debugger permissions that can expose browser to exploits and also trigger bot detection.
- Signed-In Profiles: Local agents can just reuse your local signed in profiles and subscriptions without you having to share your sensitive credentials with the AI Agent provider.

rtrvr can interact with signed in Instagram, which is usually blocked from cloud browsers

Vision-Based vs. DOM-Based Agentic Approaches

Another critical design choice is how the AI agent "sees" and interacts with web pages. There are two primary approaches: vision-based and DOM-based.

Vision-Based Approaches: Agents like OpenAI Operator adopt a vision-based approach, using annotated screenshots of web pages. While mimicking human vision, this method has limitations:
- Active Tab Rendering: Browsers primarily render only the active tab fully. Vision-based agents simply won't work on background tabs, limiting their ability to perform actions across multiple tabs or parallelize tasks efficiently.
- Increased Hallucination: Relying on visual representations leads to higher hallucination rates compared to having the underlying DOM representation.
- Limited Information Access: Vision-based agents have access only to what is visually rendered. They lack direct access to the underlying website structure and data. For example, to understand the options in a dropdown menu, a vision-based agent might need to click through and visually parse the options, whereas a DOM-based agent can directly access the list of options from the HTML structure.
DOM-Based Approaches: Agents like rtrvr.ai leverage the Document Object Model (DOM), the underlying HTML structure of a webpage. This approach offers several advantages:
- Background Tab Operation: DOM-based agents can operate seamlessly in background tabs as they directly interact with the website's structure, regardless of whether the tab is actively rendered. This enables powerful multi-tab workflows and efficient parallel task execution.
- Accurate Scrapers: DOM-based agents can be effective scrapers because they just have to regurgitate the text already provided in the context.
- Faster Agentic Workflows: These agents can leverage the full DOM information to skip actions as well as distribute subtasks onto new tabs.
- Collapse Exponential Failure Rates: For a sequential agent the success rate for a task is simply: (success rate per step)^(# of steps). Thus, even for an agent with 90% success but on a 10 step task the overall success rate becomes (.9)^10 = .34. But if you can parallelize your steps across parallel tabs this success rate collapses back from an exponential.

rtrvr can distribute applying to jobs as new tab subtasks with our Explore feature

Custom Integrations vs. User-Supplied Functions

Finally, consider how AI web agents extend their capabilities beyond basic web interactions. Two main strategies emerge: custom integrations and user-supplied functions.

Custom Integrations: Some agents, like OpenAI Operator, have create custom, pre-built integrations with specific services (e.g., DoorDash). These integrations offer streamlined workflows for particular tasks but are inherently limited to the services with which the agent can leverage.
User-Supplied Functions (AI Function Calling): Agents like rtrvr.ai empower users to define and supply their own custom code or functions that the AI agent can autonomously call using AI Function Calling. This approach provides immense flexibility and extensibility, allowing users to tailor the agent's capabilities to virtually any external tool, API, or custom workflow. This shifts the power to the user, enabling them to create integrations as needed rather than relying on pre-defined partnerships.

rtrvr can use AI Function Calling to auto add leads to a CRM

Understanding the Landscape: AI Web Agents vs. Web Automation Infrastructure

As we explore the world of web automation, it's important to distinguish between fully autonomous AI Web Agents and the underlying infrastructure that powers web automation. AI Web Agents, like those we primarily focus on in this article, are designed to be intelligent and autonomous, capable of understanding natural language instructions and performing complex tasks on the web with minimal human intervention.

On the other hand, platforms like Browserbase (and their StageHand framework) provide infrastructure and frameworks to host cloud browsers and functions that web agents can leverage to take actions on a page. Stagehand, specifically, just generates Playwright code to take actions on a page given a prompt so it is not an end-to-end autonomous agent itself.

RTRVR.AI

rtrvr.ai is an AI Web Agent Chrome Extension that autonomously completes tasks on the web, effortlessly scrapes data directly into Google Sheets, and calls APIs as your browse using AI Function Calling - all with simple prompts and your own Chrome tabs/browser!

Key Features

Web Data Extraction: Extract structured data from any website, with just prompts.
Website Crawling: Crawl through pages/directories/infinite scroll websites and extract data from nested listings.
Bulk Scraping: Launch Sheets column of urls as tabs and scrape sites in bulk.
Cross-Tab Actions: Perform complex workflows across multiple tabs with cross-tab context.
Scheduled Tasks: Schedule tasks to run automatically in the background.
Google Sheets Integration: Custom logic to interact with Sheets and maintain tab to row relationships. Spreadsheets are notoriously tricky for AI Web Agents to interact with and need custom integration logic that only rtrvr.ai supports natively.
Function Calling: Integrate with external tools and APIs using natural language.
Graph Visualizations: Generate interactive graphs to represent any website using natural language commands.
Recording Grounding: Record step-by-step demonstrations to guide the AI agent.

Security

rtrvr.ai prioritizes user security and privacy. It operates as a client-side Chrome extension, meaning it has been vetted and tested by Google and can only operate within the sandboxed execution environment of the Chrome extension environment.

Cost

rtrvr.ai offers a free tier with 50 credits. Paid plans start at $9.99 per month for the Basic tier (1,000 credits) and $29.99 per month for the Pro tier (3,000 credits). We hammered down our cost to be less than $0.002 per page interaction and are expecting in the future that our AI Web Agent will come out to be free for users!

Website Accessibility

rtrvr.ai can access paywalled content as it operates within the user's browser, leveraging existing logins and subscriptions.

Performance

rtrvr.ai is known for its fast performance, and has been tested to be up to 4x faster than alternatives like OpenAI Operator.

Integration with Third Party Tools

rtrvr.ai supports AI function calling for integration with external tools and APIs. It also seamlessly integrates with Google Sheets.

OpenAI's Operator

OpenAI Operator is an AI-powered agent designed to automate web-based tasks. It utilizes computer vision to interact with websites, mimicking human actions. Operator is currently only available as a research preview for ChatGPT Pro users ($200/mo) in the US.

Key Features

Task Automation: Automate form filling, booking travel, ordering groceries, and more.
User Collaboration: Allows users to take over control at any time.
Saved Tasks: Save frequently used workflows for quick execution.

Security

Operator incorporates safeguards like takeover mode, user confirmations, and a monitoring system to protect user data. Although your passwords and logged in sessions are stored in their cloud environment. Additionally, if you don't opt opt your usage will be used as training data for future models.

Cost

Operator is currently only available to ChatGPT Pro users, which costs $200 per month.

Website Accessibility

Since Operator is making network requests from the cloud, some website just block these cloud requests and Operator won't work on them. For example LinkedIn and Youtube are not accessibile with Operator.

Performance

Due to the vision based approach as well as the underlying implementations, Operator is slow per step execution and needs to take extra steps compared to DOM approaches.

Integration with Other Tools

Operator is primarily designed for its own ecosystem with limited integration with external tools, but can interact with partnered services such as Doordash.

Convergence Lab's Proxy

Convergence Lab's Proxy is an AI-powered digital assistant designed to automate daily tasks through a conversational interface. It focuses on ease of use and automating routine activities.

Key Features

Task Automation: Automates scheduling, email management, data entry, and more.
Conversational Interface: Interacts with users through a chat-like interface.
Scheduling: Schedule tasks to run automatically at specific times.
Templates: Offers pre-built templates for common tasks.

Security

Although Proxy runs remotely in a cloud browser environment, user usage and trajectories are retained for long term memories and training.

Cost

Proxy offers a free tier with basic access. The Pro tier costs $20 per month and provides unlimited sessions and automations.

Website Accessibility

Since Proxy is making network requests from the cloud, some website just block these cloud requests and Operator won't work on them. There are reports of frequent captcha's, 2FA, and sites such as Indeed not working.

Performance

We found Proxy to be pretty slow, get stuck in loops and that the model selects/types incorrect fields (such as my 2FA codes, or choosing to deactivate my Twitter account instead of logging out).

Integration with Other Tools

Proxy doesn't have custom integrations.

Browser Use: Open-Source Web Automation Library for Developers

Browser Use is an open-source Python library that allows developers to build AI agents that interact with web browsers. It provides a framework for creating custom web automation tools and applications.

Key Features

Web Automation: Enables AI agents to perform web actions like clicking, typing, and navigating.
Open Source: Open-source nature allows for customization and flexibility.
Multi-Modal Support: Supports multiple and even local large language models (GPT-4, Claude 3, Llama 2).

Security

It's important to note significant security concerns have been raised regarding Browser Use's architecture, particularly when used in local setups as guided by their documentation. Key issues include:

Debugger Tools (CDP): Browser Use utilizes Chrome DevTools Protocol (CDP) for browser control, which is a low-level debugger tool. These tools have known security exploits that are not patched by Google as they are intended for development and debugging, not for consumer use. Additionally, these tools consume significant amount of resources and slow down Browser Use as well as your device.
Unsandboxed Browser & Debug Mode: Instructions for local setup involve launching Chrome in debug mode and connecting Browser Use to a user's main browser profile. This is done without a sandbox, further increasing security risks.
Exploit Potential: The combination of debugger tools and unsandboxed browser environments creates a potential attack vector. You or the agent could accidentally go to a malicious website that could exploit these vulnerabilities to compromise the user's browser and potentially escalate to the local machine.

Users should be acutely aware of these risks when deploying Browser Use, especially in local environments, and carefully consider the security implications before using it with sensitive data or in production systems. When deploying Browser Use, it is crucial to deviate from the insecure default setup instructions and implement robust security measures, and ideally run in isolated, sandboxed environments independent of your own usual Chrome browser.

Cost

Browser Use is free to use as an open-source library and can be used with your own locally running model.

Website Accessibility

Since Browser Use runs locally and network requests are originating from your own device, this AI Agent has access to the same websites you do.

Performance

Browser Use has been documented to be only 3x as fast as Operator compared to rtrvr.ai's 4x. This could be due to the insecure container it is deployed on, playwright overhead, and vision based approach.

Integration with Other Tools

Browser Use has support for Custom Functions, but they require in depth code implementation and can't be called by the AI Web Agent autonomously with AI Function Calling.

Comparison Table

A side-by-side comparison of features, security, cost, and other key aspects:

Feature	rtrvr.ai	Operator	Proxy	Browser Use
Primary Function	Web extraction & automation	Web automation	Web automation	Web automation framework
Interface	Browser extension	Web Chat	Web Chat	Python library
Security	Secure Chrome Sandbox	Cloud browsers, sessions stored	Cloud browsers, sessions stored	Debugger tools, Unsandboxed browser (High Risk)
Cost	Free tier, from $9.99/month	$200/month	Free tier, from $20/month	Free
Blocked Websites	None, own browser	Some blocked	Some blocked	None, own browser
Integrations	Sheets Native, AI Function Calling	Limited Site Specific	None	Custom Functions w/o AI Calling

Conclusion

At each part of our development process, we designed rtrvr.ai to take unique and novel approaches that have unlocked a ton of differentiators and unforeseen capabilities. Our leading performance on metrics such as security, cost, speed, and leveraging novel AI capabilities let us unbashedly present ourselves as the best AI Web Agent on the market!

An AI Web Agent Deep Comparison: rtrvr.ai vs. OpenAI Operator vs. Convergence Lab's Proxy vs. Browser Use

Fundamental Design Choices in AI Web Agents

Local vs. Remote Cloud Browsers

Vision-Based vs. DOM-Based Agentic Approaches

Custom Integrations vs. User-Supplied Functions

Understanding the Landscape: AI Web Agents vs. Web Automation Infrastructure

RTRVR.AI

Key Features

Security

Cost

Website Accessibility

Performance

Integration with Third Party Tools

OpenAI's Operator

Key Features

Security

Cost

Website Accessibility

Performance

Integration with Other Tools

Convergence Lab's Proxy

Key Features

Security

Cost

Website Accessibility

Performance

Integration with Other Tools

Browser Use: Open-Source Web Automation Library for Developers

Key Features

Security

Cost

Website Accessibility

Performance

Integration with Other Tools

Comparison Table

Conclusion