August 8, 2025
Give it a task — CTA takes over your computer and gets things done. Autonomously. Intelligently. With precision.
Today, we are announcing the research preview of the Computer Tasking Agent (CTA)—an autonomous agent engineered to perform tasks directly on your computer. CTA leverages SVECTOR’s advanced vision models alongside chain-of-thought reasoning and planning, enabling it to perceive your screen, understand context, and interact with graphical user interfaces (GUIs) as a human would. By operating through the universal interface of screen, mouse, and keyboard—without dependence on app-specific APIs—CTA delivers flexible, intelligent automation across any application or workflow.
CTA (Computer Tasking Agent) is the result of extensive research in autonomous system control and intelligent automation. Unlike traditional automation tools, CTA operates directly at the system level, perceiving your screen and interacting with your computer just like a human. By combining advanced vision models with chain-of-thought reasoning, CTA can understand complex tasks, break them down into actionable steps, and dynamically adapt to changing environments. This breakthrough enables CTA to automate virtually any workflow or application—without relying on app-specific APIs—ushering in a new era of flexible, intelligent computer use for everyone.
CTA is available to SVECTOR Pro users, with early access applications open for users in India. If you’re interested in trying CTA, apply for early access and join the research preview as we continue to refine safety and capabilities through real-world feedback.
CTA begins with your prompt. Whether it's "Search for nearby coffee shops" or "Open Xcode and build the to-do app project", CTA uses a natural language processing pipeline to understand your intent. It breaks down your instruction into actionable steps, forming a high-level plan for execution. This plan is then passed to the execution layer for dynamic task realization.
The Computer Tasking Agent (CTA) runs directly at your system's core layer, capturing screen-level data to understand what's happening in real time. It uses vision only when needed, building a visual context from screenshots to identify UI elements, text, buttons, and forms — all without relying on app-specific APIs or accessibility layers.
After seeing the screen, CTA thinks before acting. Using a chain-of-thought reasoning process, it evaluates the current screen state, the history of previous actions, and the remaining task goals. This iterative thought loop allows it to plan next steps, handle multi-stage flows, recover from unexpected changes, and dynamically adjust when something goes off-plan.
Once CTA knows what to do, it acts like a real user: moving the mouse, clicking, scrolling, typing, and navigating across apps. These low-level actions are guided entirely by its perception and reasoning layers — enabling it to automate virtually anything on your computer, from filling out forms to operating full software suites.
While CTA can operate fully autonomously, safety remains a priority. For sensitive actions such as entering passwords, making purchases, or responding to CAPTCHA challenges, CTA will pause and request user confirmation. This ensures both trust and control in high-risk or privacy-sensitive operations.
CTA establishes a new state-of-the-art in both computer use and browser use benchmarks by using the same universal interface of screen, mouse, and keyboard.
Benchmark type | Benchmark | Computer use (universal interface) | Web browsing agents | Human | |
---|---|---|---|---|---|
SVECTOR CTA | Previous SOTA | Previous SOTA | |||
Computer use | OSWorld | 45.1% | 40.1% | - | 70.4% |
Browser use | WebArena | 53.8% | 35.8% | 52.2% | 73.6% |
WebVoyager | 92.2% | 88.1% | 87.0% | - |
Note: Benchmark results are obtained in controlled environments and have been evaluated internally. No external entities were involved in the assessment process.
As an agentic system with direct, system-level access to your entire computer — including applications, files, system settings, and network connections — the Computer Tasking Agent (CTA) introduces unprecedented safety and security challenges. Unlike traditional AI tools that operate within sandboxed environments or limited interfaces, CTA requires comprehensive protective measures across multiple layers. We've implemented an extensive multi-layered defense strategy that addresses system-wide access risks, data security, misuse prevention, and frontier AI safety concerns.
CTA operates with direct access to your operating system (currently MacOS), applications, and data. To protect against unauthorized actions and system compromise:
Given CTA's ability to access and interact with your personal data, applications, and files:
To prevent malicious use of CTA's system-level capabilities, we enforce strict policies:
To ensure user trust and control, CTA incorporates human-in-the-loop safety mechanisms:
CTA's system access capabilities require protection against sophisticated attack vectors:
Security in system-level AI agents requires ongoing vigilance and adaptation:
As CTA represents a significant advancement in AI system capabilities, our research preview includes additional protections:
The Computer Tasking Agent (CTA) introduces a new paradigm in human-computer interaction by leveraging direct system-level integration. Unlike traditional automation tools that rely on specialized APIs or custom configurations, CTA operates through the universal interface of screen perception and native system control. This approach enables seamless adaptation to any application or workflow, addressing the complexity and diversity of real-world digital environments.
As CTA enters its early access phase, our focus remains on refining its capabilities and ensuring robust safety, privacy, and user control. By combining advanced visual perception, autonomous reasoning, and intelligent action execution, CTA sets the foundation for truly autonomous digital agents—unlocking new possibilities for productivity and transforming the future of human-computer interaction.