GTIG Reports First Autonomous GenAI-Powered Malware in the Wild: PromptSpy Uses Gemini API for Real-Time Device Control; Russia-Nexus CANFAIL and LONGSTREAM Deploy LLM-Generated Decoy Code

Date: 2026-05-25
Tags: malware, nation-state

Executive Summary

Google Threat Intelligence Group published a May 11, 2026 report documenting the transition from experimental to industrial-scale use of generative AI in adversarial workflows. The report reveals previously unreported capabilities in PromptSpy, an Android backdoor that uses the Gemini API to autonomously navigate victim devices in real time without human supervision, including biometric data capture and anti-uninstall overlays. Separately, GTIG confirmed Russia-nexus malware families CANFAIL and LONGSTREAM are using LLM-generated decoy code to obfuscate malicious payloads targeting Ukrainian organizations. Defenders should hunt for Gemini API calls from non-standard applications and implement detection for LLM-characteristic code patterns in malware samples.

Campaign Summary

Field	Detail
Campaign / Malware	PromptSpy (Android), CANFAIL (Russia-nexus), LONGSTREAM (Russia-nexus)
Actor / Attribution	PromptSpy: Unknown (samples from Hong Kong and Argentina). CANFAIL/LONGSTREAM: Russia-nexus (high confidence, per GTIG)
Target	PromptSpy: Android users. CANFAIL/LONGSTREAM: Ukrainian organizations
Vector	PromptSpy: sideloaded APK via dedicated website. CANFAIL/LONGSTREAM: targeted intrusion operations
Status	active
First Observed	PromptSpy: 2026-01-13 (VT upload). CANFAIL/LONGSTREAM: ongoing since at least 2025

Detailed Findings

PromptSpy: Autonomous GenAI-Powered Android Backdoor

ESET researchers first identified PromptSpy in February 2026 as the first known Android malware to abuse generative AI in its execution flow. According to ESET researcher Lukas Stefanko, PromptSpy uses Google's Gemini to interpret on-screen elements and generate step-by-step instructions for UI manipulation to maintain persistence.

Google's GTIG report from May 11 revealed additional capabilities that go significantly beyond the initial ESET findings. According to GTIG, PromptSpy contains an autonomous agent module called GeminiAutomationAgent. The module serializes the device's visible user interface hierarchy into an XML format via the Accessibility API and sends it to the gemini-2.5-flash-lite model. Gemini returns structured JSON responses containing action types and spatial coordinates, which PromptSpy parses to simulate physical gestures: clicks, swipes, and navigation. The AI interprets the device state and generates commands in real time without human supervision.

According to Google, PromptSpy can capture victim biometric data to replay authentication gestures and regain access to compromised devices. If a victim attempts to uninstall the malware, it identifies the on-screen coordinates of the uninstall button and renders an invisible overlay that intercepts touch events, making the button appear unresponsive. Its command-and-control infrastructure, including Gemini API keys and VNC relay servers, can be updated dynamically at runtime.

ESET noted that PromptSpy has not yet appeared in their wider telemetry, suggesting it may still be a proof of concept. ESET confirmed it has not been observed on Google Play. Google stated it has disabled the assets associated with this activity.

CANFAIL and LONGSTREAM: LLM-Generated Code Obfuscation

GTIG confirmed that Russia-nexus threat actors targeting Ukrainian organizations are deploying two malware families that use LLM-generated decoy code to obfuscate their malicious functionality.

According to Help Net Security, CANFAIL contains LLM-authored comments that explicitly describe blocks of code as "unused filler," indicating the threat actor specifically requested the model generate large volumes of inert code for obfuscation purposes.

According to GTIG, LONGSTREAM contains 32 separate instances of code querying the system's daylight saving time status, a repetitive and functionally irrelevant pattern designed to make the script appear benign to analysts. This bloating technique exploits the tendency of automated analysis tools and human reviewers to evaluate code volume as an indicator of legitimacy.

APT45 AI-Industrialized Vulnerability Research

GTIG also documented North Korean state actor APT45 sending thousands of recursive prompts to AI models to systematically analyze CVEs and validate proof-of-concept exploits. According to the Korea JoongAng Daily, this pattern represents efforts to automate vulnerability analysis and attack-code testing at industrial scale.

MITRE ATT&CK Mapping

Technique	ID	Context
Abuse Elevation Control Mechanism: Accessibility Features	T1548	PromptSpy uses Accessibility API to serialize UI hierarchy for Gemini
Input Capture	T1056	PromptSpy captures biometric data for authentication replay
Obfuscated Files or Information	T1027	CANFAIL and LONGSTREAM use LLM-generated junk code for obfuscation
Application Layer Protocol: Web Protocols	T1071.001	PromptSpy communicates with Gemini API over HTTPS
Impair Defenses: Disable or Modify Tools	T1562.001	PromptSpy renders invisible overlay to prevent uninstallation
Exploit Public-Facing Application	T1190	APT45 uses AI to automate CVE analysis and PoC validation

IOCs

Domains

No domain IOCs published by source

Full URL Paths

No URL IOCs published by source

Splunk Format

No IOCs available for Splunk query

File Hashes

No hash IOCs published by source

Detection Recommendations

Monitor for outbound API calls to generativelanguage.googleapis.com from non-standard applications or processes, particularly on Android endpoints managed via MDM. Alert on Accessibility Service registrations by applications not in the organization's approved app list. Hunt for Android APKs that request both Accessibility Service and Internet permissions simultaneously with VNC-related functionality. For CANFAIL/LONGSTREAM detection, implement static analysis rules that flag source code containing LLM-characteristic patterns: educational docstrings in operational code, commented blocks explicitly labeled as filler, and high-repetition of functionally irrelevant system calls such as repeated daylight saving time queries. Monitor for scripts containing 10+ identical system information queries that serve no functional purpose. Network detection: flag outbound traffic to Gemini API endpoints from servers or endpoints where such traffic is not expected.

References

[Google Cloud Blog / GTIG] Adversaries Leverage AI for Vulnerability Exploitation, Augmented Operations, and Initial Access (2026-05-11) — https://cloud.google.com/blog/topics/threat-intelligence/ai-vulnerability-exploitation-initial-access
[ESET] PromptSpy ushers in the era of Android threats using GenAI (2026-02-19) — https://www.welivesecurity.com/en/eset-research/promptspy-ushers-in-era-android-threats-using-genai/
[Help Net Security] Google researchers uncover criminal zero-day exploit likely built with AI (2026-05-11) — https://www.helpnetsecurity.com/2026/05/11/google-ai-vulnerability-exploitation/
[The Next Web] Google identifies first AI-developed zero-day exploit and thwarts planned mass exploitation event (2026-05-11) — https://thenextweb.com/news/google-ai-zero-day-exploit-cybersecurity-arms-race
[Korea JoongAng Daily] Rise of AI raises fears of North Korean hacking capabilities (2026-05-14) — https://koreajoongangdaily.joins.com/news/2026-05-14/national/northKorea/Rise-of-AI-raises-fears-of-North-Korean-hacking-capabilities/2592651
[Kaspersky / Securelist] Disclosing new PebbleDash-based tools by Kimsuky (2026-05-14) — https://securelist.com/kimsuky-appleseed-pebbledash-campaigns/119785/