High-Level Overview
Rikugan is a generator-based agentic loop embedded inside IDA Pro and Binary Ninja. The agent runs in a background thread and communicates with the Qt UI via a stream of TurnEvent objects. All host API calls (IDA/BN) are marshalled to the main thread via @idasync.
Key Files
| File | Purpose |
|---|---|
rikugan/agent/loop.py | AgentLoop + BackgroundAgentRunner |
rikugan/agent/turn.py | TurnEvent / TurnEventType definitions |
rikugan/tools/base.py | @tool decorator, ToolDefinition |
rikugan/tools/registry.py | ToolRegistry |
rikugan/ui/panel_core.py | RikuganPanelCore (Qt UI) |
rikugan/ui/session_controller_base.py | SessionControllerBase |
The Agentic Loop
File: rikugan/agent/loop.py
AgentLoop.run() Pipeline
The entry point is AgentLoop.run(), a Python generator. It yields TurnEvent objects that the UI consumes.
Turn Loop Detail
while True:
yield TURN_START
stream = provider.chat_stream(messages, tools, system)
text, tool_calls = _stream_llm_turn(stream) # yields TEXT_DELTA events
yield TEXT_DONE
if not tool_calls:
break # LLM is done, no more tool calls
results = _execute_tool_calls(tool_calls) # yields TOOL_RESULT events
append results to messages
yield TURN_END
_stream_llm_turn()
Consumes the provider's chat_stream() generator. Each StreamChunk is processed:
| Chunk Field | Action | Event Yielded |
|---|---|---|
chunk.text | Accumulate text | TEXT_DELTA |
chunk.is_tool_call_start | Start tool accumulation | TOOL_CALL_START |
chunk.tool_args_delta | Accumulate JSON args | TOOL_CALL_ARGS_DELTA |
chunk.is_tool_call_end | Finalize ToolCall | TOOL_CALL_DONE |
chunk.usage | Update context manager | USAGE_UPDATE |
_execute_tool_calls()
For each ToolCall in the batch:
- Pseudo-tool check — Handled inline with
continue - Approval gate —
execute_pythonrequires user approval - Pre-state capture — For mutating tools,
capture_pre_state()records current state - Execution —
ToolRegistry.execute(name, args)dispatches to the handler - Mutation recording — If
defn.mutating, createsMutationRecord, yieldsMUTATION_RECORDED - Error handling —
ToolErrorandExceptioncaught, returned as error results - Result — Each result becomes a
ToolResultand yieldsTOOL_RESULT
BackgroundAgentRunner
class BackgroundAgentRunner:
def start(self, user_message):
self._thread = Thread(target=self._run, args=(user_message,))
self._thread.start()
def _run(self, message):
for event in self.agent_loop.run(message):
self._event_queue.put(event)
def get_event(self, timeout=0):
return self._event_queue.get(timeout=timeout)
TurnEvent System
File: rikugan/agent/turn.py — All communication from the agent loop to the UI goes through TurnEvent objects. Each event type has a static factory method for clean construction.
Tool Framework
Files: rikugan/tools/base.py, rikugan/tools/registry.py
@tool Decorator
@tool(category="annotations", mutating=True)
def rename_function(
old_name: Annotated[str, "Current function name"],
new_name: Annotated[str, "New name to assign"],
) -> str:
"""Rename a function in the database."""
# ... implementation
The decorator inspects the function signature, extracts parameter descriptions from Annotated metadata, generates a ToolDefinition with JSON schema, and wraps the handler with @idasync for IDA thread-safety.
ToolDefinition
@dataclass
class ToolDefinition:
name: str
description: str
parameters: List[ParameterSchema]
category: str = "general"
requires_decompiler: bool = False
mutating: bool = False # marks tool as modifying the database
timeout: Optional[float] = None # per-tool timeout in seconds
handler: Optional[Callable] = None
Argument Coercion
The registry automatically coerces arguments during execution:
| Input | Target | Result |
|---|---|---|
"0x401000" | int | 4198400 |
"true" / "false" | bool | True / False |
0 / 1 | bool | False / True |
Tool Categories
| Category | Examples |
|---|---|
| Navigation | get_cursor_position, jump_to, get_name_at |
| Functions | list_functions, search_functions, get_function_info |
| Strings | list_strings, search_strings |
| Database | list_segments, list_imports, list_exports, read_bytes |
| Disassembly | read_disassembly, read_function_disassembly |
| Decompiler | decompile_function, get_pseudocode |
| Xrefs | xrefs_to, xrefs_from, function_xrefs |
| Annotations | rename_function, set_comment, set_type |
| Types | create_struct, modify_struct, set_function_prototype |
| Scripting | execute_python (requires approval) |
| Microcode (IDA) | get_microcode, nop_microcode |
| IL (BN) | get_il, get_cfg, il_replace_expr, il_set_condition, nop_instructions, patch_branch |
Timeout Wrapping
future = _executor.submit(defn.handler, **arguments)
result = future.result(timeout=timeout) # default 30s
Pseudo-Tools
Pseudo-tools are tool schemas injected into the LLM's tool list but handled directly in _execute_tool_calls() rather than dispatched through the registry. They use a continue statement to skip normal execution.
function_purpose, hypothesis, data_structure, constant, string_ref, import_usage, patch_result. When category="patch_result", also creates a PatchRecord.ExplorationState.can_transition_to(). Denied if KnowledgeBase lacks minimum findings.RIKUGAN.md in the IDB/BNDB directory. Categories: function_purpose, architecture, naming_convention, prior_analysis, general.SubagentRunner with its own SessionState. Accepts task and max_turns parameters.exploration_report Schema
{
"category": "function_purpose|hypothesis|patch_result|...",
"summary": "Description of the finding",
"address": 4198400,
"function_name": "main",
"relevance": "high|medium|low",
"original_hex": "74 05",
"new_hex": "75 05"
}
Skill System
Files: rikugan/skills/loader.py, rikugan/skills/registry.py
Skill Format
Skills are Markdown files with YAML frontmatter:
---
name: Malware Analysis
description: Windows PE malware analysis workflow
tags: [malware, windows]
allowed_tools: [decompile_function, list_imports, search_strings]
mode: exploration
---
Task: Analyze this binary as potential malware.
## Approach
1. Check imports for suspicious APIs...
Discovery
SkillRegistry.discover() scans:
- Built-in skills:
rikugan/skills/builtins/*/SKILL.md - User skills (IDA Pro):
~/.idapro/rikugan/skills/*/SKILL.md(Linux / macOS) ·%APPDATA%\Hex-Rays\IDA Pro\rikugan\skills\*\SKILL.md(Windows) - User skills (Binary Ninja):
~/.binaryninja/rikugan/skills/*/SKILL.md(Linux) ·~/Library/Application Support/Binary Ninja/rikugan/skills/*/SKILL.md(macOS) ·%APPDATA%\Binary Ninja\rikugan\skills\*\SKILL.md(Windows)
Reference files in references/*.md subdirectories are automatically appended to the skill body.
Built-in Skills
Exploration Mode
Files: rikugan/agent/exploration_mode.py, rikugan/agent/loop.py
Exploration mode is a 4-phase autonomous agent flow for binary modification.
Phase 1: EXPLORE
The agent autonomously investigates the binary to understand the user's goal.
- Triggered by
/modify <goal>,/explore <goal>, or skills withmode: exploration - Uses all analysis tools +
exploration_report+phase_transitionpseudo-tools - Findings accumulated in
KnowledgeBase: relevant functions, structured findings, hypotheses - Turn limit: 30 turns (
max_explore_turns) - For
/modify: runs as a subagent (isolated context window)
Phase Transition Gate
To move EXPLORE → PLAN, all of these must be true:
- At least 1 relevant function discovered
- At least 1 hypothesis formed
- At least 1 hypothesis with
relevance="high"
If the gate fails, the agent receives a gap description and continues exploring.
Phase 2: PLAN
Receives PLAN_SYNTHESIS_PROMPT with KnowledgeBase.to_summary(). Outputs a numbered list of changes, each with target address, current/proposed behavior, and patch strategy. User must approve before execution.
Phase 3: EXECUTE
Iterates over ModificationPlan.changes. Each step activates the platform-specific patching skill. After each patch, exploration_report(category="patch_result") creates a PatchRecord.
Phase 4: SAVE
Emits SAVE_APPROVAL_REQUEST with patch details. User responds "Save All" or "Discard All". Discard rolls back by writing PatchRecord.original_bytes back.
/explore vs /modify
| Aspect | /explore | /modify |
|---|---|---|
| Phases | EXPLORE only | EXPLORE → PLAN → EXECUTE → SAVE |
| Subagent | No (inline) | Yes (Phase 1 in subagent) |
| Patching | No | Yes |
| Knowledge base | Accumulated, returned | Accumulated, passed to Phase 2 |
ExplorationState
@dataclass
class ExplorationState:
phase: ExplorationPhase
knowledge_base: KnowledgeBase
modification_plan: Optional[ModificationPlan]
patches_applied: List[PatchRecord]
explore_turns: int
execute_turns: int
total_turns: int # monotonic counter for UI
max_explore_turns: int # default 30
max_execute_turns: int # default 20
explore_only: bool # True for /explore (no patching)
Plan Mode
Files: rikugan/agent/plan_mode.py, rikugan/agent/loop.py
Plan mode is a simpler two-step workflow: plan first, then execute. Triggered by /plan <message>.
- Plan Generation — LLM receives
_PLAN_GENERATION_PROMPT, outputs a numbered list - Plan Parsing —
parse_plan()extracts numbered steps from text - User Approval —
PLAN_GENERATEDevent; user approves or rejects - Step Execution — For each step: emit
PLAN_STEP_START, run a full turn cycle, emitPLAN_STEP_DONE
Subagents
File: rikugan/agent/subagent.py
Subagents are isolated AgentLoop instances with their own SessionState. They keep the parent's context window clean from verbose tool output.
class SubagentRunner:
def run_task(self, task, max_turns=20) -> Generator[TurnEvent, None, str]:
# General-purpose: returns final text
loop = AgentLoop(provider, tools, config, fresh_session)
for event in loop.run(augmented_task):
yield event
return final_text
def run_exploration(self, user_goal, max_turns=30) -> Generator[..., None, KnowledgeBase]:
# Phase 1 specific: returns KnowledgeBase
loop = AgentLoop(provider, tools, config, fresh_session)
for event in loop.run(f"/explore {user_goal}"):
yield event
return loop.last_knowledge_base
Knowledge Base Transfer
_clear_exploration_state()saves theKnowledgeBaseto_last_knowledge_base- The parent accesses it via the
last_knowledge_baseproperty - The parent populates its own
ExplorationState.knowledge_basefrom the subagent's results - Phases 2-4 proceed in the parent with a clean context window
Mutation Tracking & Undo
File: rikugan/agent/mutation.py
Every mutating tool call (defn.mutating=True) is recorded in AgentLoop._mutation_log for undo support.
MutationRecord
@dataclass
class MutationRecord:
tool_name: str # e.g., "rename_function"
arguments: Dict[str, Any] # original arguments
reverse_tool: str # tool to call for undo
reverse_arguments: Dict # arguments for undo
timestamp: float
description: str # human-readable
reversible: bool # False for execute_python, etc.
Reverse Strategies
| Tool | Reverse Strategy |
|---|---|
rename_function | Swap old_name ↔ new_name |
rename_variable | Swap variable_name ↔ new_name |
set_comment | Restore old_comment (from pre-state) or delete_comment |
set_function_comment | Restore old_comment or delete_function_comment |
rename_data | Restore old_name (from pre-state) |
set_function_prototype | Restore old_prototype (from pre-state) |
retype_variable | Restore old_type (from pre-state) |
execute_python | Not reversible |
/undo Flow
Context Window Management
File: rikugan/agent/context_window.py
class ContextWindowManager:
max_tokens: int # from config (default 128000)
compaction_threshold: 0.8 # compact when usage > 80%
def should_compact() -> bool
def compact_messages(messages) -> List[Message]
def estimate_tokens(text) -> int # ~3.5 chars/token heuristic
Compaction Strategy
When should_compact() returns True: keep the first message (system/initial), summarize all middle messages into one [Context summary] message, and keep the last 4 messages (recent context).
SessionState._truncate_results() also caps tool results with [...N chars omitted] markers to prevent individual messages from consuming too much context.
Persistent Memory
Files: rikugan/agent/system_prompt.py, rikugan/agent/loop.py
RIKUGAN.md
A per-binary Markdown file stored alongside the IDB/BNDB. It acts as cross-session memory.
- Location:
<idb_directory>/RIKUGAN.md - Loading: First 200 lines loaded into the system prompt
- Writing: Via the
save_memorypseudo-tool or plan persistence
save_memory Pseudo-Tool
{"fact": "sub_401230 is the snake initializer, length at +0x1A", "category": "function_purpose"}
Categories: function_purpose, architecture, naming_convention, prior_analysis, general.
Session Management
Files: rikugan/state/session.py, rikugan/state/history.py, rikugan/ui/session_controller_base.py
SessionState
@dataclass
class SessionState:
id: str # unique hex ID
created_at: float
messages: List[Message] # full conversation history
total_usage: TokenUsage # cumulative token usage
last_prompt_tokens: int # most recent prompt size
current_turn: int
is_running: bool
provider_name: str
model_name: str
idb_path: str
metadata: Dict[str, str]
Multi-Tab & Fork
class SessionControllerBase:
_sessions: Dict[str, SessionState] # tab_id -> session
_active_tab_id: str
def create_tab() -> str
def close_tab(tab_id)
def switch_tab(tab_id)
def fork_session(source_tab_id) -> Optional[str] # deep copy
fork_session() creates a deep copy of a session's messages and state into a new tab. The forked session gets metadata["forked_from"] set to the source session ID.
Persistence
Sessions are JSON-serialized to <config_dir>/rikugan/sessions/. Auto-saved after each agent turn if checkpoint_auto_save is enabled. Full round-trip: messages, token usage, tool calls, and tool results are all preserved.
MCP Integration
Files: rikugan/mcp/client.py, rikugan/mcp/bridge.py, rikugan/mcp/manager.py
MCPClient
Communicates with an MCP server subprocess via JSON-RPC 2.0 + Content-Length framing.
- Heartbeat: Background thread pings every 30s. Marks
_healthy=Falseon failure - Per-request timeout: Configurable default
- Tool discovery:
tools/listRPC call at startup
MCPBridge
Converts MCP tool schemas to ToolDefinition objects and registers them in the ToolRegistry with the prefix mcp_<server>_<tool>.
Provider Layer
Files: rikugan/providers/base.py, rikugan/providers/*.py
Prompt Caching (Anthropic)
cache_control: {"type": "ephemeral"} is set on the system prompt, last tool result message, and last user message. This enables 2-10x cost reduction on long conversations.
Retry Logic
for attempt in range(max_retries):
try:
yield from stream
break
except RateLimitError as e:
wait = e.retry_after or (2 ** attempt)
yield TEXT_DELTA(f"Rate limited, retrying in {wait}s...")
time.sleep(wait)
System Prompt Architecture
Files: rikugan/agent/system_prompt.py, rikugan/agent/prompts/
Shared Prompt Sections
Defined in prompts/base.py:
DISCIPLINE_SECTION— "Do exactly what was asked"RENAMING_SECTION— Renaming/retyping guidelinesANALYSIS_SECTION— Analysis approachSAFETY_SECTION— Safety guidelinesTOKEN_EFFICIENCY_SECTION— Prefer search over listingCLOSING_SECTION— Final reminders
UI Layer
Files: rikugan/ui/panel_core.py, rikugan/ui/chat_view.py, rikugan/ui/message_widgets.py
Event Polling
A QTimer fires every 50ms, calling _poll_events(). Dequeues up to 20 events per tick and routes each to the appropriate handler.
ChatView Widget Mapping
| Event | Widget |
|---|---|
TEXT_DELTA / TEXT_DONE | AssistantMessageWidget (Markdown rendered) |
TOOL_CALL_* | ToolCallWidget (collapsible, syntax-highlighted) |
TURN_START | ThinkingWidget (animated dots) |
ERROR | ErrorMessageWidget |
PLAN_GENERATED | PlanView (step list with status indicators) |
TOOL_APPROVAL_REQUEST | ToolApprovalWidget (Allow/Deny) |
EXPLORATION_PHASE_CHANGE | ExplorationPhaseWidget |
EXPLORATION_FINDING | ExplorationFindingWidget |
Thread Safety Model
Architecture
IDA API Marshalling
IDA Pro requires all API calls on the main thread. The @idasync decorator marshalls calls via ida_kernwin.execute_sync(). Binary Ninja tools run directly — BN's API is thread-safe.
User Answer/Approval Queues
Two queue.Queue(maxsize=1) instances:
_user_answer_queue— ForUSER_QUESTIONresponses_tool_approval_queue— Forexecute_pythonapproval
The agent waits with queue.get(timeout=0.5) in a loop, checking for cancellation between attempts. The UI thread calls put(). No race condition possible.
Error Handling & Retry
Exception Hierarchy
├── AgentError — loop-level errors
├── CancellationError — user cancelled
├── ProviderError — LLM API errors
└── RateLimitError — HTTP 429
├── ToolError — tool execution errors
├── ToolValidationError — argument validation
├── MCPError — MCP protocol errors
├── MCPConnectionError
└── MCPTimeoutError
└── SkillError — skill loading errors
Consecutive Error Tracking
_consecutive_errors counts sequential tool failures. After 3 consecutive errors, tools are disabled for the current turn, forcing the LLM to respond with text instead of looping on broken calls.
Logging
File: rikugan/core/logging.py
[Rikugan] LEVEL: messageflushed + fsynced after every write
Append-mode JSONL, machine-parseable
JSON Log Format
{"ts": 1709500000.123, "level": "INFO", "thread": "Thread-1", "msg": "Subagent started"}
Commands Reference
| Command | Description |
|---|---|
/plan <msg> | Enter plan mode: generate plan, then execute step-by-step |
/modify <msg> | Enter exploration mode: EXPLORE → PLAN → EXECUTE → SAVE |
/explore <msg> | Enter explore-only mode: autonomous read-only analysis |
/memory | Show current RIKUGAN.md contents |
/undo [N] | Undo last N mutations (default 1) |
/mcp | Show MCP server health status |
/doctor | Diagnose provider, API key, tools, skills, config issues |
/<skill-slug> | Activate a skill (e.g., /malware-analysis, /ctf) |
Data Flow Diagrams
Normal Turn
User "Explain main()" │ ├─→ SessionState.add_message(USER) ├─→ build_system_prompt() ├─→ provider.chat_stream(messages, tools, system) │ ├─→ TEXT_DELTA "The main function..." │ ├─→ TOOL_CALL_START "decompile_function" │ ├─→ TOOL_CALL_DONE │ └─→ USAGE_UPDATE ├─→ ToolRegistry.execute("decompile_function", {name: "main"}) │ └─→ TOOL_RESULT "int main() { ... }" ├─→ SessionState.add_message(TOOL) ├─→ provider.chat_stream(messages + tool_result) │ ├─→ TEXT_DELTA "This function initializes..." │ └─→ TEXT_DONE └─→ TURN_END
Exploration Mode (/modify)
User "/modify Change score from 100 to 999" │ ├─→ Phase 1: EXPLORE (subagent) │ ├─→ SubagentRunner.run_exploration() │ │ ├─→ [subagent uses tools, logs findings] │ │ ├─→ exploration_report → KnowledgeBase │ │ └─→ phase_transition("plan") → KnowledgeBase returned │ └─→ Parent receives KnowledgeBase summary (~1-2KB) │ ├─→ Phase 2: PLAN │ ├─→ PLAN_SYNTHESIS_PROMPT + KB summary → LLM │ ├─→ Parse plan → ModificationPlan │ └─→ User approves plan │ ├─→ Phase 3: EXECUTE │ ├─→ For each PlannedChange: │ │ ├─→ EXECUTE_STEP_PROMPT → LLM │ │ ├─→ Smart patch skill activated │ │ ├─→ execute_python (with approval) → patch bytes │ │ ├─→ redecompile_function → verify │ │ └─→ exploration_report(category="patch_result") → PatchRecord │ └─→ All patches applied │ └─→ Phase 4: SAVE ├─→ SAVE_APPROVAL_REQUEST → User ├─→ "Save All" → write to file → SAVE_COMPLETED └─→ "Discard All" → restore original bytes → SAVE_DISCARDED
Mutation Tracking
LLM calls rename_function(old="sub_401000", new="main") │ ├─→ capture_pre_state() → {} (no pre-state needed for renames) ├─→ ToolRegistry.execute("rename_function", {...}) ├─→ build_reverse_record() → MutationRecord( │ reverse_tool="rename_function", │ reverse_args={"old_name": "main", "new_name": "sub_401000"}) ├─→ _mutation_log.append(record) └─→ MUTATION_RECORDED event → UI (MutationLogPanel) User "/undo" │ ├─→ Pop last MutationRecord from _mutation_log ├─→ ToolRegistry.execute("rename_function", │ {"old_name": "main", "new_name": "sub_401000"}) └─→ TEXT_DONE "Undone: Rename function main → sub_401000"