Qualifire AI Open-Sources Rogue: An Finish-to-Finish Agentic AI Testing Framework Designed to Consider the Efficiency, Compliance, and Reliability of AI Brokers
Agentic programs are stochastic, context-dependent, and policy-bounded. Standard QA—unit checks, static prompts, or scalar “LLM-as-a-judge” scores—fails to reveal multi-turn vulnerabilities and supplies weak audit trails. Developer groups want protocol-accurate conversations, specific coverage checks, and machine-readable proof that may gate releases with confidence.
Qualifire AI has open-sourced Rogue, a Python framework that evaluates AI brokers over the Agent-to-Agent (A2A) protocol. Rogue converts enterprise insurance policies into executable eventualities, drives multi-turn interactions in opposition to a goal agent, and outputs deterministic reviews appropriate for CI/CD and compliance critiques.
Fast Begin
Stipulations
- uvx – If not put in, observe uv installation guide
- Python 3.10+
- An API key for an LLM supplier (e.g., OpenAI, Google, Anthropic).
Set up
Choice 1: Fast Set up (Really useful)
Use our automated set up script to stand up and working shortly:
# TUI
uvx rogue-ai
# Internet UI
uvx rogue-ai ui
# CLI / CI/CD
uvx rogue-ai cli
Choice 2: Handbook Set up
(a) Clone the repository:
git clone https://github.com/qualifire-dev/rogue.git
cd rogue
(b) Set up dependencies:
In case you are utilizing uv:
uv sync
Or, if you’re utilizing pip:
pip set up -e .
(c) OPTIONALLY: Arrange your atmosphere variables: Create a .env file within the root listing and add your API keys. Rogue makes use of LiteLLM, so you possibly can set keys for numerous suppliers.
OPENAI_API_KEY="sk-..."
ANTHROPIC_API_KEY="sk-..."
GOOGLE_API_KEY="..."
Operating Rogue
Rogue operates on a client-server structure the place the core analysis logic runs in a backend server, and numerous purchasers connect with it for various interfaces.
Default Habits
If you run uvx rogue-ai with none mode specified, it:
- Begins the Rogue server within the background
- Launches the TUI (Terminal Consumer Interface) consumer
uvx rogue-ai
Accessible Modes
- Default (Server + TUI): uvx rogue-ai – Begins server in background + TUI consumer
- Server: uvx rogue-ai server – Runs solely the backend server
- TUI: uvx rogue-ai tui – Runs solely the TUI consumer (requires server working)
- Internet UI: uvx rogue-ai ui – Runs solely the Gradio net interface consumer (requires server working)
- CLI: uvx rogue-ai cli – Runs non-interactive command-line analysis (requires server working, splendid for CI/CD)
Mode Arguments
Server Mode
uvx rogue-ai server [OPTIONS]
Choices:
- –host HOST – Host to run the server on (default: 127.0.0.1 or HOST env var)
- –port PORT – Port to run the server on (default: 8000 or PORT env var)
- –debug – Allow debug logging
TUI Mode
uvx rogue-ai tui [OPTIONS]
Internet UI Mode
uvx rogue-ai ui [OPTIONS]
Choices:
- –rogue-server-url URL – Rogue server URL (default: http://localhost:8000)
- –port PORT – Port to run the UI on
- –workdir WORKDIR – Working listing (default: ./.rogue)
- –debug – Allow debug logging
Instance: Testing the T-Shirt Retailer Agent
This repository features a easy instance agent that sells T-shirts. You need to use it to see Rogue in motion.
Set up instance dependencies:
In case you are utilizing uv:
uv sync --group examples
or, if you’re utilizing pip:
pip set up -e .[examples]
(a) Begin the instance agent server in a separate terminal:
In case you are utilizing uv:
uv run examples/tshirt_store_agent
If not:
python examples/tshirt_store_agent
This can begin the agent on http://localhost:10001.
(b) Configure Rogue within the UI to level to the instance agent:
- Agent URL: http://localhost:10001
- Authentication: no-auth
(c) Run the analysis and watch Rogue check the T-Shirt agent’s insurance policies!
You need to use both the TUI (uvx rogue-ai) or Internet UI (uvx rogue-ai ui) mode.
The place Rogue Suits: Sensible Use Circumstances
- Security & Compliance Hardening: Validate PII/PHI dealing with, refusal conduct, secret-leak prevention, and regulated-domain insurance policies with transcript-anchored proof.
- E-Commerce & Assist Brokers: Implement OTP-gated reductions, refund guidelines, SLA-aware escalation, and tool-use correctness (order lookup, ticketing) below adversarial and failure circumstances.
- Developer/DevOps Brokers: Assess code-mod and CLI copilots for workspace confinement, rollback semantics, rate-limit/backoff conduct, and unsafe command prevention.
- Multi-Agent Programs: Confirm planner
executor contracts, functionality negotiation, and schema conformance over A2A; consider interoperability throughout heterogeneous frameworks. - Regression & Drift Monitoring: Nightly suites in opposition to new mannequin variations or immediate adjustments; detect behavioral drift and implement policy-critical go standards earlier than launch.
What Precisely Is Rogue—and Why Ought to Agent Dev Groups Care?
Rogue is an end-to-end testing framework designed to guage the efficiency, compliance, and reliability of AI brokers. Rogue synthesizes enterprise context and danger into structured checks with clear goals, ways and success standards. The EvaluatorAgent runs protocol right conversations in quick single flip or deep multi flip adversarial modes. Convey your individual mannequin, or let Rogue use Qualifire’s bespoke SLM judges to drive the checks. Streaming observability and deterministic artifacts: reside transcripts,go/fail verdicts, rationales tied to transcript spans, timing and mannequin/model lineage.
Beneath the Hood: How Rogue Is Constructed
Rogue operates on a client-server structure:
- Rogue Server: Accommodates the core analysis logic
- Consumer Interfaces: A number of interfaces that connect with the server:
- TUI (Terminal UI): Trendy terminal interface constructed with Go and Bubble Tea
- Internet UI: Gradio-based net interface
- CLI: Command-line interface for automated analysis and CI/CD
This structure permits for versatile deployment and utilization patterns, the place the server can run independently and a number of purchasers can connect with it concurrently.
Abstract
Rogue helps developer groups check agent conduct the way in which it really runs in manufacturing. It turns written insurance policies into concrete eventualities, workouts these eventualities over A2A, and data what occurred with transcripts you possibly can audit. The result’s a transparent, repeatable sign you should use in CI/CD to catch coverage breaks and regressions earlier than they ship.
Because of the Qualifire workforce for the thought management/ Assets for this text. Qualifire workforce has supported this content material/article.
The publish Qualifire AI Open-Sources Rogue: An End-to-End Agentic AI Testing Framework Designed to Evaluate the Performance, Compliance, and Reliability of AI Agents appeared first on MarkTechPost.