← Back to feed

GTIG Reports First Autonomous GenAI-Powered Malware in the Wild: PromptSpy Uses Gemini API for Real-Time Device Control; Russia-Nexus CANFAIL and LONGSTREAM Deploy LLM-Generated Decoy Code

Date: 2026-05-25
Tags: malware, nation-state

Executive Summary

Google Threat Intelligence Group published a May 11, 2026 report documenting the transition from experimental to industrial-scale use of generative AI in adversarial workflows. The report reveals previously unreported capabilities in PromptSpy, an Android backdoor that uses the Gemini API to autonomously navigate victim devices in real time without human supervision, including biometric data capture and anti-uninstall overlays. Separately, GTIG confirmed Russia-nexus malware families CANFAIL and LONGSTREAM are using LLM-generated decoy code to obfuscate malicious payloads targeting Ukrainian organizations. Defenders should hunt for Gemini API calls from non-standard applications and implement detection for LLM-characteristic code patterns in malware samples.

Campaign Summary

FieldDetail
Campaign / MalwarePromptSpy (Android), CANFAIL (Russia-nexus), LONGSTREAM (Russia-nexus)
Actor / AttributionPromptSpy: Unknown (samples from Hong Kong and Argentina). CANFAIL/LONGSTREAM: Russia-nexus (high confidence, per GTIG)
TargetPromptSpy: Android users. CANFAIL/LONGSTREAM: Ukrainian organizations
VectorPromptSpy: sideloaded APK via dedicated website. CANFAIL/LONGSTREAM: targeted intrusion operations
Statusactive
First ObservedPromptSpy: 2026-01-13 (VT upload). CANFAIL/LONGSTREAM: ongoing since at least 2025

Detailed Findings

PromptSpy: Autonomous GenAI-Powered Android Backdoor

ESET researchers first identified PromptSpy in February 2026 as the first known Android malware to abuse generative AI in its execution flow. According to ESET researcher Lukas Stefanko, PromptSpy uses Google's Gemini to interpret on-screen elements and generate step-by-step instructions for UI manipulation to maintain persistence.

Google's GTIG report from May 11 revealed additional capabilities that go significantly beyond the initial ESET findings. According to GTIG, PromptSpy contains an autonomous agent module called GeminiAutomationAgent. The module serializes the device's visible user interface hierarchy into an XML format via the Accessibility API and sends it to the gemini-2.5-flash-lite model. Gemini returns structured JSON responses containing action types and spatial coordinates, which PromptSpy parses to simulate physical gestures: clicks, swipes, and navigation. The AI interprets the device state and generates commands in real time without human supervision.

According to Google, PromptSpy can capture victim biometric data to replay authentication gestures and regain access to compromised devices. If a victim attempts to uninstall the malware, it identifies the on-screen coordinates of the uninstall button and renders an invisible overlay that intercepts touch events, making the button appear unresponsive. Its command-and-control infrastructure, including Gemini API keys and VNC relay servers, can be updated dynamically at runtime.

ESET noted that PromptSpy has not yet appeared in their wider telemetry, suggesting it may still be a proof of concept. ESET confirmed it has not been observed on Google Play. Google stated it has disabled the assets associated with this activity.

CANFAIL and LONGSTREAM: LLM-Generated Code Obfuscation

GTIG confirmed that Russia-nexus threat actors targeting Ukrainian organizations are deploying two malware families that use LLM-generated decoy code to obfuscate their malicious functionality.

According to Help Net Security, CANFAIL contains LLM-authored comments that explicitly describe blocks of code as "unused filler," indicating the threat actor specifically requested the model generate large volumes of inert code for obfuscation purposes.

According to GTIG, LONGSTREAM contains 32 separate instances of code querying the system's daylight saving time status, a repetitive and functionally irrelevant pattern designed to make the script appear benign to analysts. This bloating technique exploits the tendency of automated analysis tools and human reviewers to evaluate code volume as an indicator of legitimacy.

APT45 AI-Industrialized Vulnerability Research

GTIG also documented North Korean state actor APT45 sending thousands of recursive prompts to AI models to systematically analyze CVEs and validate proof-of-concept exploits. According to the Korea JoongAng Daily, this pattern represents efforts to automate vulnerability analysis and attack-code testing at industrial scale.

MITRE ATT&CK Mapping

TechniqueIDContext
Abuse Elevation Control Mechanism: Accessibility FeaturesT1548PromptSpy uses Accessibility API to serialize UI hierarchy for Gemini
Input CaptureT1056PromptSpy captures biometric data for authentication replay
Obfuscated Files or InformationT1027CANFAIL and LONGSTREAM use LLM-generated junk code for obfuscation
Application Layer Protocol: Web ProtocolsT1071.001PromptSpy communicates with Gemini API over HTTPS
Impair Defenses: Disable or Modify ToolsT1562.001PromptSpy renders invisible overlay to prevent uninstallation
Exploit Public-Facing ApplicationT1190APT45 uses AI to automate CVE analysis and PoC validation

IOCs

Domains

No domain IOCs published by source

Full URL Paths

No URL IOCs published by source

Splunk Format

No IOCs available for Splunk query

File Hashes

No hash IOCs published by source

Detection Recommendations

Monitor for outbound API calls to generativelanguage.googleapis.com from non-standard applications or processes, particularly on Android endpoints managed via MDM. Alert on Accessibility Service registrations by applications not in the organization's approved app list. Hunt for Android APKs that request both Accessibility Service and Internet permissions simultaneously with VNC-related functionality. For CANFAIL/LONGSTREAM detection, implement static analysis rules that flag source code containing LLM-characteristic patterns: educational docstrings in operational code, commented blocks explicitly labeled as filler, and high-repetition of functionally irrelevant system calls such as repeated daylight saving time queries. Monitor for scripts containing 10+ identical system information queries that serve no functional purpose. Network detection: flag outbound traffic to Gemini API endpoints from servers or endpoints where such traffic is not expected.

References