By Richard Zhang | February 27, 2025
OpenAI released Operator last month, its first computer-using agent that controls software like a human. Others have come before, from both bigger companies and startups, like Anthropic’s Computer Use, Skyvern, and Browser Use. While this launch is exciting for consumers, it’s equally as intriguing for developers. In particular, developers are wondering: Are these agents the best way to control external platforms when official APIs don’t exist?
In more technical terms, we’re talking about “integrations”, which refers to the process of enabling an application to communicate and interact with another platform. Traditionally, integrations are done via APIs, but now, with Operator and others, agentic browser automation is an additional way to integrate. Here, we will compare the different options and discuss the pros and cons of each.
Three ways to integrate with web platforms that lack official APIs:
Here’s a chart comparing the 3 options today:
Agentic Browser Automation, like Operator, is the most flexible approach because an agent can theoretically use any UI designed for a human. When working properly, the agent should perform any actions on any platform even when visiting it for the first time. Agentic Browser Automation’s step-by-step reasoning is great for handling unforeseen platforms and use cases, but it also has drawbacks: reliability and latency.
Reliability issues root from hallucination. Anthropic’s Computer Use famously took a break in the middle of a task to look at photos of Yellowstone National Park. However, we can be confident that hallucinations will reduce as models improve.
Latency, the time it takes to accomplish a given task, can increase due to reasoning speeds and infrastructure challenges. Introducing reasoning to every step means the agent needs to think on every page. This causes both increased errors and time delays. While we expect reasoning speeds to improve, browser automation infrastructure is the harder part to speed up irrespective of foundation model improvements. To enable browser automation, developers need to spin up browsers and wait for each page to fully load before triggering actions on the page. The typically long time it takes to spin up browsers is called the “cold start problem”, which is solvable with third-party services. However, the main chunk of the latency comes from page load times, which is much harder to accelerate.
In practice, many actions are at least 3-4 page clicks away. For instance, to download a bank statement, one needs to select a bank account, go to the “statements” tab, and then select the relevant month. These cumulative steps can cause 30-40 seconds of latency to download just one statement. A typical workflow, like downloading all available statements, can comprise multiple smaller actions, therefore accumulating each action’s time delay.
Hardcoded Browser Automation is a more direct approach than the agentic alternative. Instead of dynaimically reasoning through each step, developers program fixed scripts to navigate a platform — clicking buttons, filling out forms, and manipulating data based on known UI structures.
Common tools include Puppeteer, Playwright, and Selenium. The scripts follow a predefined flow: (1) spin up a browser, (2) navigate to specific URLs, (3) locate elements using CSS selectors or XPath, and (4) execute actions. This approach yields lower latency than Agentic Browser Automation due to a lack of AI reasoning at every step.
However, Hardcoded Browser Automation suffers from the same core limitations as Agentic Browser Automation: lower reliability and higher latency. Platforms often update their frontend code, changing element structures and class names, or requiring new user interactions (e.g. CAPTCHAs). This causes the scripts to break during UI changes. Latency remains an issue as well because these scripts still require spinning up browsers and waiting for pages to load.
In short, Hardcoded Browser Automation trades flexibility for speed but is still limited by the fundamental problems of browser-based automation.
Internal API Connection is the most reliable and low-latency approach. It is best for products that need the most performant integrations possible. Every web application has internal APIs — hidden interfaces that let the frontend and the backend communicate and power everything behind the scenes. Instead of relying on the frontend, Internal API Connection sends needed requests straight to the backend, avoiding browser-based automation’s latency and reliability issues.
Internal APIs change less frequently than frontend elements because they tend to remain stable and support core platform functionalities. These requests are also harder to detect and block because they closely mimic the platform’s network traffic. This means higher integration reliability.
Since requests are directly sent to the backend, Internal API Connection doesn’t require spinning up browsers or waiting for pages to load. Even actions hidden behind many pages on the UI can trigger in seconds. This approach is the fastest of the three, adding only a few seconds on average to the platform’s native request latency.
Since internal APIs aren’t publicly documented, developers must reverse-engineer them by figuring out how a platform’s frontend communicates with its backend. This means digging into network requests to uncover hidden endpoints and data structures. Tools like Chrome DevTools, mitmproxy, and Burp Suite help capture and analyze these requests, but even with these tools, the process is still quite complicated. Platforms often encrypt payloads, generate authentication tokens on the fly, or intentionally scramble API traffic to make reverse engineering harder. Understanding these patterns takes patience, technical expertise, and trial and error.
Due to these difficulties, Internal API Connection is the most technically challenging approach. Unlike browser-based automation, it requires highly custom solutions for each platform, which leads to less flexibility and higher setup costs.
To ease the resource-intensive work of Internal API Connection, Integuru created the first AI agent to automate reverse-engineering internal APIs and generate the entire integration code. Given how difficult reverse engineering can be, automation significantly reduces development costs and helps companies reap the benefits of Internal API Connections without sacrificing development resources.
Agentic Browser Automation doesn’t need to be a standalone approach; it can also enhance the other two methods’ maintenance processes. Instead of introducing Al reasoning at every step, developers can use it only when needed for maximum efficiency.
For Hardcoded Browser Automation, Al agents like Operator can be deployed when scripts break. Developers can run the agent to complete the desired action and then use its recorded steps to generate an updated script. This hybrid approach reduces maintenance overhead and thereby increases reliability. However, it’s important to note that the fundamental latency issue within browser-based automation still persists.
For Internal API Connections, Agentic Browser Automation can also help with maintenance. When websites change their internal APIs, a web-browsing agent can redo the desired action, triggering the relevant network requests along the way. At that point, a reverse-engineering agent like Integuru can analyze the newly updated network requests to generate a working integration.
By introducing Agentic Browser Automation when needed, developers can minimize reasoning speeds and errors while maximizing reliability. We can expect most approaches to involve Agentic Browser Automation at some level in the near future.
So when is the best time to use browser-based automation (including Agentic Browser Automation and Hardcoded Browser Automation) vs Internal API Connection?
Given its flexibility, browser-based automation is best for cases where there is a high quantity of platforms and/or actions to automate, including scraping. For instance, when a company needs to scrape thousands of websites or streamline a workflow across dozens of platforms, browser-based automation is the easiest and fastest approach. Agentic Browser Automation is also the only way to deal with unforeseen cases as of now. Or, if you’re a small team trying to spin up automation quickly and willing to accept the trade-offs, manually writing hardcoded UI-based scripts can be much faster.
On the other hand, if you need specific functionalities on specific platforms and integrations are core parts of your product, Internal API Connection should be the top choice. For example, an AI voice agent should reverse-engineer an electronic health record (EHR) system that lacks official APIs if it wants to check, schedule, and cancel patient appointments. In this example, low latency and reliability are especially important because humans are waiting on the line, and reducing errors is important for the product's usefulness. In essence, if you know exactly what you need and you’ll use those actions repeatedly, Internal API Connection is the most effective approach.
Over time, both browser-based automation and Internal API Connection will improve alongside foundation model improvements. Models will have faster reasoning speeds and become more accurate at browsing the web, resulting in higher reliability and lower latency for browser-based automation. With efforts from companies like Integuru that also benefit from model advancements, Internal API Connection will become more flexible and approachable. In other words, the two categories will see drastic improvements.
Here’s a chart showing where the future is heading:
At Integuru, we’re the first company to use AI to better the Internal API Connection approach. While we focus on improving this method, we don’t shy away from recommending teams to choose another approach when we sense a better fit elsewhere. Our goal is to make the web interoperable, which enables better products and experiences for all. We’d love to hear your ideas and feedback as we continue shaping the future of integrations. Let’s build a more connected world together.