Customer portal
Articles Tagged with

cyber threat intelligence

"SOS
SOS Intelligence Weekly News Round Up

Weekly News Round Up

19 – 25 August 2024

CVE Discussion

Over the past week, we’ve monitored our vast collection of new data to identify discussions of CVEs. 

Ransomware Top 5s

News Roundup

Emergence of New Stealer Malware: QWERTY & Styx

A new strain of malware, named “QWERTY Info Stealer,” has been identified as a significant threat to Windows systems, utilising advanced anti-debugging techniques and data exfiltration capabilities. Hosted on the domain mailservicess[.]com, the malware is designed to evade detection, making it particularly dangerous for both individuals and organisations. Discovered on a Linux-based server in Frankfurt, Germany, the malware is distributed via the URL hxxps://mailservicess[.]com/res/data/i.exe.

QWERTY Info Stealer employs multiple anti-debugging strategies, such as using Windows API functions like IsProcessorFeaturePresent() and IsDebuggerPresent(), and the lesser-known __CheckForDebuggerJustMyCode function. These techniques enable the malware to terminate if it detects a debugging environment, complicating efforts by security researchers to analyse its behaviour. After bypassing these checks, the malware begins collecting data, including system information and browser data, which it stores in specific directories on the infected system. It then communicates with Command and Control (C2) servers, downloading additional payloads and exfiltrating data using HTTP POST requests, underlining its sophistication and the ongoing threat it poses to cybersecurity.

Cybersecurity researchers at Check Point have uncovered a new malware strain called “Styx Stealer,” designed to steal browser and instant messenger data. Emerging in April 2024 and based on the Phemedrone Stealer, Styx Stealer enhances its predecessor’s capabilities with features like crypto-clipping, real-time clipboard monitoring, and auto-start functionality. It targets Chromium and Gecko-based browsers to extract sensitive information such as passwords, cookies, and cryptocurrency wallet data, while also compromising Telegram and Discord sessions. The malware resists analysis by antivirus programs and sandboxes, making it a formidable tool for cybercriminals.

Styx Stealer was developed by a Turkish hacker known as “Sty1x,” who marketed it via Telegram, charging between $75 per month and $350 for unlimited access. An operational security lapse exposed his identity and connections with a Nigerian cybercriminal linked to an Agent Tesla campaign. This revelation highlighted the broader network of cybercriminals involved in various illicit activities, including targeting Chinese firms. Despite Sty1x’s efforts, there are no confirmed victims beyond their own systems and a few security sandboxes, suggesting that their attempts to widely distribute Styx Stealer were largely unsuccessful.

New Phishing Attack Targets Android & iOS Users

A new phishing attack targeting both Android and iOS users has been discovered, combining traditional social engineering techniques with the use of Progressive Web Applications (PWAs) and WebAPKs. First identified in November 2023, the attack primarily targets clients of Czech banks, though cases have also been reported in Hungary and Georgia, indicating a wider spread. The attackers employ various delivery methods, such as automated voice calls, SMS messages, and social media ads, which often use official bank mascots and logos to lure victims to a phishing link mimicking a Google Play page. If accessed via a mobile device, the page prompts the installation of a phishing app disguised as a legitimate banking application.

This phishing app, installed as a PWA or WebAPK, is almost indistinguishable from the real banking app, leading victims to a fake login page that captures their banking credentials. The stolen information is then transmitted to the attackers’ Command and Control (C&C) servers, which are operated by two distinct groups—one using a Telegram bot for real-time logging, and the other using a traditional C&C server. The attackers have managed to evade detection by frequently changing domains and launching new campaigns. To mitigate the risk, users should be cautious when installing apps, verify the authenticity of downloads, and keep their devices updated with the latest security patches.

Linux Kernel Vulnerability

Researchers have identified a vulnerability in the Linux kernel’s dmam_free_coherent() function, caused by a race condition during the process of freeing DMA (Direct Memory Access) allocations and managing associated resources. This flaw can lead to system instabilities, as DMA is essential for allowing hardware devices to transfer data directly to and from system memory without CPU involvement. The vulnerability arises from an improper order of operations within the function, which could result in incorrect memory access, data corruption, or system crashes.

The vulnerability is particularly concerning because an attacker could exploit the race condition by timing their operations to coincide with the freeing and reallocation of DMA memory. If successful, this could cause the devres_destroy function to free the wrong memory entry, triggering a WARN_ON assertion in the dmam_match function, which is part of the DMA management subsystem. This issue occurs when a concurrent task allocates memory with the same virtual address before the original entry is removed from the tracking list, potentially leading to significant system errors.

To address this vulnerability, Greg Kroah-Hartman committed a patch (CVE-2024-43856) authored by Lance Richardson from Google, which modifies the dmam_free_coherent function. The patch swaps the order of the function calls, ensuring that the tracking data structure is destroyed before the DMA allocation is freed, thereby preventing the race condition. The patch has been tested on Google’s internal network encryption project and has been approved for inclusion in the mainline Linux kernel, mitigating the risk associated with this vulnerability. Exploiting this vulnerability to achieve arbitrary code execution would be complex and would likely require additional vulnerabilities or precise control over the target system.

Zero-day Vulnerability in Google Chrome

Google recently patched a high-severity zero-day vulnerability in its Chrome browser, CVE-2024-7971. This flaw, found in the V8 JavaScript engine, is a type confusion issue that can be exploited to execute arbitrary code. The vulnerability was reported by the Microsoft Threat Intelligence Center (MSTIC) and the Microsoft Security Response Center (MSRC) on August 19, 2024, and it is actively being exploited in the wild. In response, Google quickly released updates to mitigate the risk, urging users to update their browsers to the latest version.

The latest Chrome update, version 128.0.6613.84/.85, addresses a total of 38 security vulnerabilities, including several high-severity issues. Among these are CVE-2024-7964, a use-after-free vulnerability in the Passwords component; CVE-2024-7965, an inappropriate implementation in the V8 engine; and CVE-2024-7966, an out-of-bounds memory access flaw in the Skia graphics library. Each of these vulnerabilities could allow attackers to execute arbitrary code, leading to serious security breaches or system compromises.

Users are strongly advised to update to the latest version of Google Chrome to ensure protection against these vulnerabilities. While Chrome generally updates automatically, users can manually check for updates via Settings > About Chrome. Additionally, those using Chromium-based browsers such as Microsoft Edge, Brave, Opera, and Vivaldi should also apply the latest security updates as they become available. This patch highlights the need for vigilance and prompt action in the face of zero-day exploits in widely used software.

Chinese Hackers Exploiting Cisco Zero-day

A sophisticated cyber espionage group known as Velvet Ant, linked to China, has been found exploiting a zero-day vulnerability in Cisco NX-OS Software to deploy custom malware on network switches. The vulnerability, identified as CVE-2024-20399, was discovered by cybersecurity firm Sygnia during a forensic investigation and promptly reported to Cisco. This flaw, with a CVSS score of 6.0, allows an authenticated local attacker with administrative privileges to execute arbitrary commands as root on the affected devices due to insufficient validation of arguments passed to specific CLI commands.

Velvet Ant exploited this vulnerability to install a custom malware, dubbed VELVETSHELL, on compromised Cisco Nexus devices. The malware, which combines elements of the TinyShell Unix backdoor and the 3proxy tool, enables attackers to execute arbitrary commands, upload and download files, and create tunnels to proxy network traffic. Sygnia’s investigation revealed that Velvet Ant has been operating for approximately three years, targeting inadequately protected network appliances like outdated F5 BIG-IP systems to maintain long-term access and steal sensitive information.

Cisco has released software updates to patch the vulnerability and strongly advises customers to apply these updates immediately. Experts warn that network appliances, especially switches, are often under-monitored, with logs rarely forwarded to centralized logging systems, making it difficult to detect and investigate such malicious activities. To mitigate this threat, organizations are urged to apply Cisco’s updates, enhance monitoring of network appliances, regularly update administrator credentials, and adopt stringent security practices to prevent unauthorized access.

""/
SOS Intelligence Weekly News Round Up

Weekly News Round Up

12 – 18 August 2024

CVE Discussion

Over the past week, we’ve monitored our vast collection of new data to identify discussions of CVEs. 

Ransomware Top 5s

News Roundup

Hackers’ Toolkit Exposed

Cybersecurity researchers have uncovered an extensive hacker toolkit, revealing a sophisticated set of tools designed for various stages of cyberattacks. The toolkit, discovered in an open directory in December 2023, comprises a range of batch scripts and malware targeting both Windows and Linux systems. These tools illustrate the hackers’ ability to execute a variety of malicious activities, from initial system compromise to long-term control and data exfiltration.

Among the most significant tools found were PoshC2 and Sliver, two command and control (C2) frameworks commonly used by penetration testers but repurposed for malicious purposes. The toolkit also included custom scripts designed for defence evasion and system manipulation, such as those for removing remote management agents, deleting system backups, and erasing event logs. These components reflect the attackers’ intent to maintain persistent access while covering their tracks.

The discovery of this toolkit highlights the advanced methods used by modern cybercriminals and emphasises the need for robust cybersecurity measures. Experts recommend that organisations adopt comprehensive security strategies, including regular updates, employee training, and advanced threat detection, to protect against these sophisticated attacks. The presence of tools aimed at stopping services, deleting backups, and disabling antivirus software suggests that the toolkit was likely used in ransomware activities.

Critical Vulnerabilities in AWS Identified

Researchers from Aqua identified critical vulnerabilities in six Amazon Web Services (AWS) offerings: CloudFormation, Glue, EMR, SageMaker, ServiceCatalog, and CodeStar. These vulnerabilities, varying in severity, posed significant risks such as remote code execution, service user takeover, AI module manipulation, data exposure, exfiltration, and denial of service (DoS) attacks, potentially affecting any organisation globally that utilised these services. Aqua introduced two key attack vectors, “Shadow Resource” and “Bucket Monopoly,” which exploit automatically generated AWS resources, like S3 buckets, created without explicit user commands. These techniques could allow attackers to execute code, steal data, or take over user accounts.

The vulnerabilities were reported to AWS between February and March 2024, with AWS confirming fixes for most by June 2024. However, a subsequent report indicated that the CloudFormation fix left users vulnerable to a DoS attack, prompting AWS to announce further work on this issue. By August 2024, the vulnerabilities and fixes were publicly discussed at prominent cybersecurity conferences, Black Hat USA and DEF CON 32. AWS’s response included adding random sequences to bucket names if a name conflict arose and planning the deprecation of CodeStar, which had been vulnerable but would no longer allow new projects.

One of the most critical vulnerabilities was in AWS Glue, where attackers could exploit predictable S3 bucket naming to inject malicious code into Glue jobs, leading to remote code execution. To mitigate these risks, it is recommended that organisations implement scoped policies, verify bucket ownership, and avoid using predictable bucket names. While AWS has addressed these specific vulnerabilities, similar risks may exist in other services, underscoring the importance of following best practices and implementing robust security measures to protect against evolving threats.

0-Click Vulnerability leading to RCE found in Outlook

Morphisec researchers have identified a critical vulnerability in Microsoft Outlook, labelled as CVE-2024-30103, which allows remote code execution when a malicious email is opened. This flaw builds on a previously discovered vulnerability, CVE-2024-21378, that exposed Outlook to remote code execution via synchronized form objects. The new vulnerability exploits weaknesses in the allow-listing mechanism, which fails to properly validate form server properties, enabling attackers to instantiate unauthorized custom forms.

The vulnerability hinges on how the Windows API function RegCreateKeyExA handles registry paths. Specifically, the function removes trailing backslashes, allowing attackers to manipulate registry keys and bypass security checks. This manipulation can lead to the loading of malicious executables when a specially crafted email is opened in Outlook. By exploiting this behaviour, attackers can execute arbitrary code within the Outlook process, potentially leading to data breaches, unauthorized access, and other malicious activities.

In response, Microsoft has issued a security update that revises the allow-listing matching algorithm to prevent such exploits. The update modifies how subkeys are matched by removing trailing backslashes before performing an exact match, enhancing system defences. Additionally, Microsoft has strengthened its denylist to block remote code execution attacks exploiting subkey manipulation. Despite these improvements, the evolving nature of security threats means organisations must remain vigilant, regularly updating and auditing their systems to protect against future vulnerabilities.

APT42 targeting US Presidential Election

The Iranian government-backed cyber group APT42 has launched a phishing campaign targeting high-profile individuals connected to the U.S. presidential election, according to Google’s Threat Analysis Group (TAG). This sophisticated threat actor, linked to Iran’s Islamic Revolutionary Guard Corps (IRGC), has been focusing on individuals affiliated with both the Biden and Trump campaigns. The campaign is part of APT42’s broader efforts to support Iran’s political and military objectives through cyber espionage, with a notable focus on the U.S. and Israel, which together represent 60% of the group’s known targets.

APT42 employs a range of tactics in its phishing campaigns, including the use of malware, phishing pages, and malicious redirects, often hosted on popular services like Google Drive and OneDrive. The group is known for creating fake domains that closely resemble legitimate organizations, a tactic called typosquatting, to deceive their targets. Their phishing emails, often designed to seem credible, encourage recipients to enter credentials on fake landing pages, with the capability to bypass multi-factor authentication, making them particularly dangerous.

In response to these activities, Google has taken measures to secure compromised accounts and issued warnings to targeted individuals. They have also reported the malicious activities to law enforcement and are working with authorities to mitigate the threat. As the U.S. presidential election nears, the actions of APT42 highlight the ongoing risk of foreign interference, emphasizing the need for robust cybersecurity measures to protect democratic processes. High-risk individuals are advised to enhance their security, including enrolling in Google’s Advanced Protection Program.

Phishing Campaign masquerading as Google Safety Center

A sophisticated phishing campaign has been identified, where cybercriminals impersonate the Google Safety Centre to trick users into downloading a malicious file disguised as the Google Authenticator app. This attack threatens personal data by installing two types of malware, Latrodectus and ACR Stealer, on victims’ devices. Latrodectus allows attackers to remotely control the infected device, while ACR Stealer uses advanced techniques to obscure its command and control server, making it difficult for cybersecurity experts to trace and neutralize the threat.

What makes this campaign particularly concerning is the attackers’ use of advanced evasion techniques, which indicate a high level of sophistication and ongoing refinement of their methods. As cybercriminals continue to evolve, cybersecurity experts urge users to be cautious when receiving unsolicited emails or messages, especially those prompting software downloads. Verifying the authenticity of such communications and keeping software and security systems up to date are crucial steps in protecting against these increasingly sophisticated threats.

Photo by Kenny Eliason on Unsplash

"SOS
Ransomware

Ransomware – State of Play July 2024

Ransomware – State of Play

July 2024

SOS Intelligence is currently tracking 206 distinct ransomware groups, with data collection covering 424 relays and mirrors.

In the reporting period, SOS Intelligence has identified 388 instances of publicised ransomware attacks.  These have been identified through the publication of victim details and data on ransomware blog sites accessible via Tor.  While this data represents known and publicised data breaches and ransomware attacks, the nature and operation of these groups means that not every successful attack is published and made public, so true figures on the volume of attacks are likely to be higher.   Our analysis of available public data is presented below:

Threat Group Activity and Trends

Ransomware activity showed a 2% increase in July when compared to the previous month, and a 16% increase in activity when compared to the previous year.  Furthermore, the number of active groups has decreased to 34 from 37 the previous month.

This month has seen a significant increase in activity from Ransomhub, making a significant charge to fill the void left by LockBit.  Data for this strain may be skewed, however, by the group using multiple data leak sites to advertise and disseminate stolen data.

Significant activity has been noted from the Handala group, who have exclusively targeted Israel and Israel-based entities over the month.  Handala (Arabic: حنظلة) is a prominent national symbol and personification of the Palestinian people, so this activity is highly likely a response to the continued conflict in the Middle East.  Handala has been increasing activity against Israel throughout the year, including significant attacks against Zerto, and allegedly Israel’s Iron Dome air defence system.

Analysis of Geographic Targeting

Over the last month, targeting continues to follow financial lines, with the majority of attacks targeted at G7, EU and BRICS bloc countries.  Furthermore, a significant number of attacks have been directed towards Israel, with likely political motivations.

Compared to June, 4% more countries were targeted in July.  Our data is also showing interesting geographic targeting data.  We have observed emerging or developing strains targeting developing countries in Southeast Asia, Africa and South America, whereas more established variants focus more on North America, Western Europe and Australia.

Industry Targeting

Targeting has broadly increased across all victim sectors, however significant increases have been seen in the Manufacturing, Construction & Engineering and IT & Technology industries.

Notably, there appears to have been increased targeting against public-sector entities.  This is likely a result of many groups abandoning their affiliate rules on targeting of such victims.

Significant Events

Scattered Spider, a threat actor group known for its social engineering tactics and attacks on VMware ESXi servers, has recently incorporated new ransomware strains into its operations. The group has adopted RansomHub, a rebranded variant of Knight ransomware, and Qilin ransomware. Previously, Scattered Spider used the now-defunct BlackCat ransomware, but it has since shifted to deploying RansomHub in post-compromise scenarios, reflecting its evolving tactics and adaptation to new tools within the cybercriminal landscape.

A flaw in the cryptographic scheme of the DoNex ransomware family has been identified, enabling victims to recover their files for free using a newly released decryptor. This vulnerability, affecting all variants of DoNex, was revealed at a recent cybersecurity conference and involves issues with the encryption key generation and application of ChaCha20 and RSA-4096 algorithms. The decryptor, available through private channels since March 2024, was publicly released following the flaw’s disclosure. Victims are advised to use a large example file for decryption and to back up their encrypted data before proceeding.

Two Russian nationals have pleaded guilty to their involvement in LockBit ransomware attacks that targeted victims worldwide. As affiliates of LockBit’s ransomware-as-a-service operation, they breached vulnerable systems, stole data, and deployed ransomware to encrypt files. One of the individuals has been arrested and faces up to 25 years in prison, while the other has been sentenced to four years. Despite recent law enforcement actions that have seized LockBit’s infrastructure and decryption keys, the ransomware group remains active, continuing to target victims and release stolen data.

Threat Group Development

Change in threat group TTPs to target VMWare ESXi

Play ransomware has recently expanded its focus to target VMware ESXi environments, marking a significant shift in its operations toward broader Linux platform attacks. Utilizing a dedicated Linux locker, Play ransomware encrypts virtual machines (VMs) by first verifying the environment, then scanning for and shutting down active VMs before proceeding with encryption. This approach highlights the group’s advanced evasion techniques and adaptability in the ransomware landscape. The encryption process affects critical VM files, such as disks and configurations, with files receiving a .PLAY extension. Additionally, Play has started using URL-shortening services for its operations, further showcasing its sophistication.

Similarly, Eldorado ransomware, which initially targeted Windows systems, has expanded its scope to include VMware ESXi VMs since its emergence in March. This ransomware employs ChaCha20 encryption across both platforms, allowing affiliates to customise attacks. Meanwhile, the SEXi ransomware operation, rebranded as APT INC, has intensified its focus on VMware ESXi servers since February 2024, leveraging leaked Babuk and LockBit 3 encryptors. APT INC has gained notoriety with high-profile attacks, such as the one on Chilean hosting provider IxMetro Powerhost, with ransom demands reaching millions. The operation continues to use the same encrypted messaging application for negotiations, with no known weaknesses in its encryption for file recovery.

Evolution of BlackBasta

In 2024, Black Basta ransomware has shown significant evolution, adapting to challenges by shifting to custom malware and incorporating new tools after the disruption of its previous partner, QBot. The group now utilizes sophisticated malware like the SilentNight backdoor, memory-only droppers such as DawnCry and KnowTrap, and custom tunneling tools including PortYard and SystemBC. Additionally, it has integrated reconnaissance and execution utilities like CogScan and KnockTrock into its attack processes. These developments underscore Black Basta’s resilience and sophistication, as it continues to pose a formidable global threat by employing advanced tactics and exploiting zero-day vulnerabilities.

New & Emerging Groups

MAD LIBERATOR is a newly emerged ransomware group that launched its leak site in July 2024. The group claims to offer services to help companies fix security issues and recover their files, demanding a fee for their assistance. If the payment is not made, MAD LIBERATOR threatens to publicly list the companies and publish their stolen data. They employ AES/RSA encryption for securing the files. As of the report’s writing, the group had already listed eight victims on its leak site, showcasing their active and ongoing operations.

Ransomcortex is a lesser-known ransomware group with limited information available. However, the group has claimed responsibility for three attacks, all targeting the healthcare sector in Brazil. Despite the lack of detailed information, the choice of victims within such a critical industry highlights the potentially serious impact of their activities.

Vanir Group is a new ransomware group that has quickly gained notoriety for its aggressive and professional tactics. They publicize their attacks via a data leak site and issue intimidating messages to CEOs and domain administrators of the affected companies. These messages warn that the companies’ internal infrastructure has been compromised, backups deleted or encrypted, and critical data stolen. The Vanir Group stresses the importance of cooperation to avoid further damage, threatening to sell or distribute the stolen data if demands are not met. Their website also features an interactive terminal for updates and invites potential affiliates to join their operations. Interestingly, their leak site bears a resemblance to that of Akira, another notorious ransomware group.

Vulnerabilities Observed in Use

"SOS
SOS Intelligence Webinar

Our next webinar – AMA with the team

Submit your questions and we look forward to answering them!

We often get asked questions from how SOS Intelligence is built, to the state of threats right now in the world and everything in between.

So we thought it would be a good idea to involve you as well in the form of an AMA Webinar…

If you have a question on anything to do with cyber threats, security, what we do at SOS Intelligence or perhaps what we are currently working on, then send your question to [email protected] with the subject line AMA Webinar.

Anything goes, so get your thinking caps on now 🙂

Join us on Wednesday 28th August at 4pm BST

Hosted by Jon Moss with SOS Intelligence Founder and CEO Amir Hadzipasic and Threat Analyst, Daniel Collyer.

Sign up to the webinar to receive a recording via email if you cannot attend on the day. By signing up you will also receive our newsletter for future events. You can always unsubscribe with one click.

Submit your question and then…

Join the webinar

"Compromised
SOS Intelligence Weekly News Round Up

Weekly News Round-up

29 July – 4 August 2024

CVE Discussion

Over the past week, we’ve monitored our vast collection of new data to identify discussions of CVEs. 

News Roundup

Linux Servers Exposed to Data Exfiltration from TgRat

The TgRat trojan, first discovered in 2022, is now targeting Linux servers to steal data. Controlled via a private Telegram group, it can download files, take screenshots, execute commands remotely, and upload files. TgRat verifies the computer name’s hash upon startup and establishes a network connection if it matches, using Telegram to communicate with its control server.

Due to Telegram’s popularity and the anonymity it provides, TgRat’s use of it as a control mechanism makes detection difficult. It executes commands via the bash interpreter, encrypted with RSA, and manages multiple bots using unique IDs.

This unique control mechanism complicates detection, as typical network traffic to Telegram servers can mask malicious activity. Installing antivirus software on all local network nodes is recommended to prevent infection.

Threat Actors Using Fake Authenticator Sites to Deliver Malware

Researchers from ANY RUN identified a malware campaign called DeerStealer, which uses fake websites mimicking legitimate Google Authenticator download pages to deceive users. The primary site, “authentificcatorgoolglte[.]com,” looks similar to the genuine Google page to trick users into downloading malware. Clicking the download button on this fake site transmits the visitor’s IP address and country to a Telegram bot and redirects users to a malicious file on GitHub, likely containing DeerStealer, which can steal sensitive data once executed.

The Delphi-based DeerStealer malware employs obfuscation techniques to hide its activities and runs directly in memory without leaving a persistent file. It initiates communication with a Command and Control (C2) server by sending a POST request with the device’s hardware ID to “paradiso4.fun.” Subsequent POST requests suggest data exfiltration.

Analysis revealed the use of single-byte XOR encryption for transmitted data, uncovering PKZip archives containing system information. Researchers also linked DeerStealer to the XFiles malware family, noting that both use fake software sites for distribution but differ in their communication methods.

Threat Actors Abusing TryCloudflare to Deliver Malware

Cybercriminals are increasingly using TryCloudflare Tunnel to deliver Remote Access Trojans (RATs) like Xworm, AsyncRAT, VenomRAT, GuLoader, and Remcos in financially motivated attacks. TryCloudflare allows developers to experiment with Cloudflare Tunnel without adding a site to Cloudflare’s DNS, which attackers exploit to create temporary infrastructures that bypass traditional security controls.

This tactic, initiated in February 2024, has intensified, posing a significant threat due to its rapid deployment and evasion capabilities. Recent campaigns often use URL links or attachments to download malicious files, which execute scripts to install RATs and other malware.

Campaigns frequently target global organisations, using high-volume email campaigns with lures in multiple languages, often exceeding the volume of other malware campaigns. Attackers dynamically adapt their attack chains and obfuscate scripts to evade defences, demonstrating a sophisticated and persistent threat.

By abusing TryCloudflare tunnels, attackers generate random subdomains on trycloudflare.com, routing traffic through Cloudflare to avoid detection. For example, on May 28, 2024, and July 11, 2024, targeted campaigns used tax-themed lures and order invoice themes, respectively, to deliver AsyncRAT and Xworm via malicious email attachments and PowerShell scripts, providing remote system access and data exfiltration capabilities.

Ransomware Threat Actors Exploiting VMWare ESXi

Microsoft researchers have identified a critical vulnerability in VMware’s ESXi hypervisors, CVE-2024-37085, which allows ransomware operators to gain full administrative permissions on domain-joined ESXi hypervisors. This flaw, associated with the “ESX Admins” group, enables any domain user who can create or rename groups to escalate their privileges, potentially gaining full control over the ESXi hypervisor. Exploiting this vulnerability can result in the encryption of the hypervisor’s file system, access to virtual machines, data exfiltration, and lateral movement within the network.

Ransomware groups such as Storm-0506, Storm-1175, Octo Tempest, and Manatee Tempest have been observed exploiting this vulnerability, deploying ransomware like Akira and Black Basta to encrypt ESXi file systems.

A notable attack by Storm-0506 involved using Qakbot and exploiting a Windows vulnerability to elevate privileges, followed by deploying Black Basta ransomware. In response, VMware has released a security update to address CVE-2024-37085. Microsoft urges organisations to apply this update, validate and secure the “ESX Admins” group, deny access or change administrative group settings, use multifactor authentication for privileged accounts, and secure critical assets with the latest security updates and monitoring procedures.

Photo by Joshua Hoehne on Unsplash

"Crowdstrike
SOS Intelligence Weekly News Round Up

Weekly News Round-up

15 – 21 July 2024

CVE Discussion

Over the past week, we’ve monitored our vast collection of new data to identify discussions of CVEs.

News Roundup

Ransom paid by AT&T

AT&T recently paid $370,000 to a hacker affiliated with the ShinyHunters group to delete manipulated client data, including call and text metadata, which had been compromised between May 2022 and January 2023. The breach occurred from April 14th to April 25th, 2024, through unauthorised access to AT&T’s third-party cloud platform. The compromised data included phone numbers, communication dates, and call durations, but did not involve the actual content of conversations or text messages.

The payment was made in Bitcoin, and the hacker confirmed the data deletion through a demonstration video. Despite this effort to erase evidence, there is concern that some information might still be accessible, potentially posing ongoing security risks for AT&T’s consumers.

Compromise of Squarespace domain names

Squarespace customer accounts were compromised by hackers, leading to unauthorised access to sensitive information such as email addresses and account details. The breach was attributed to a third-party vendor, highlighting concerns about the security measures in place for customer data. In response, Squarespace has notified affected users and is working to enhance their security protocols.

To protect their accounts, customers are urged to change their passwords and enable two-factor authentication. This incident underscores the persistent risks associated with third-party integrations in the digital environment and the importance of robust security measures.

22 minutes to exploit

Cloudflare’s Q1 2024 Application Security Report reveals that it takes hackers an average of just 22 minutes to exploit newly disclosed vulnerabilities, highlighting a concerning trend in cybersecurity. The report indicates that Distributed Denial-of-Service (DDoS) attacks remain a significant threat, constituting 37.1% of mitigated traffic, while automated traffic makes up one-third of all internet activities, a substantial portion of which is malicious.

Additionally, API traffic has increased to 60%, with many organisations regularly missing a large number of their public-facing API endpoints. The report also underscores the growing use of zero-day exploits and the challenges posed by third-party integrations in web applications, emphasising the constantly evolving cybersecurity threat landscape.

Exploiting the Crowdstrike Issue

On July 19, 2024, Windows systems were impacted by an issue with the CrowdStrike Falcon sensor, which cybersecurity experts have flagged as a serious concern. Hackers exploited this vulnerability to target CrowdStrike customers through phishing campaigns, social engineering, and the distribution of potentially harmful software. The attackers impersonated CrowdStrike support, falsely claiming the issue was a content update error rather than a security problem.

This incident underscores the need for companies to authenticate communication channels and adhere to official guidance on modern threats. Additionally, it highlights the importance of educating employees about behaviours that could compromise security, helping to strengthen defences against such opportunistic attacks.

Photo by Joshua Hoehne on Unsplash

"SOS
Ransomware

Ransomware – State of Play June 2024

Ransomware – State of Play

June 2024

SOS Intelligence is currently tracking 199 distinct ransomware groups, with data collection covering 393 relays and mirrors.

In the reporting period, SOS Intelligence has identified 381 instances of publicised ransomware attacks.  These have been identified through the publication of victim details and data on ransomware blog sites accessible via Tor.  While this data represents known and publicised data breaches and ransomware attacks, the nature and operation of these groups means that not every successful attack is published and made public, so true figures on the volume of attacks are likely to be higher.   Our analysis of available public data is presented below:

Threat Group Activity and Trends

Ransomware activity showed a 20% decrease in June when compared to the previous month, but a 10% increase in activity when compared to the previous year.  Furthermore, the number of active groups has decreased to 34 from 37 the previous month.

This month has seen a significant increase in activity from Ransomhub, making a significant charge to fill the void left by LockBit.  Data for this strain may be skewed, however, by the group using multiple data leak sites to advertise and disseminate stolen data.

Analysis of Geographic Targeting

Over the last month, the percentage volume of attacks against the US dropped by 7%.  Targeting continues to follow financial lines, with the majority of remaining attacks targeted at G7 and BRICS bloc countries.

Compared to May, 22% fewer countries were targeted in June.  Our data is also showing interesting geographic targeting data.  We have observed emerging or developing strains targeting developing countries in Southeast Asia, Africa and South America, whereas more established variants focus more on North America, Western Europe and Australia.

Industry Targeting

Targeting has broadly increased across all victim sectors, however significant increases have been seen in the Manufacturing, Construction & Engineering and IT & Technology industries.

Notably, there appears to have been increased targeting against public-sector entities.  This is likely a result of many groups abandoning their affiliate rules on targeting of such victims.

Significant Events

Australian Mining Company Reports Breach

Northern Minerals, an Australian rare earth exploration firm, reported a security breach where data was stolen and later appeared on the dark web. The stolen information includes corporate, operational, and financial data, as well as details about personnel and shareholders. The company informed authorities and affected individuals. Despite the breach, its mining operations continue uninterrupted. The BianLian ransomware group claimed responsibility and published the data after Northern Minerals refused to pay the ransom.

Black Basta Hits Keytronic

Keytronic, a leading PCBA manufacturer, suffered a major data breach due to the Black Basta ransomware gang. The attack halted operations in the U.S. and Mexico for two weeks. Black Basta leaked 530GB of stolen data, including personal and corporate information. In an SEC filing, Keytronic reported that the breach would significantly impact their fourth-quarter financials, with expenses of around $600,000 for external cybersecurity services. The company is notifying affected parties and regulatory agencies as required by law.

Conti and LockBit Ransomware Specialist Arrested

Ukrainian cyber police have apprehended a 28-year-old Russian in Kyiv for assisting Conti and LockBit ransomware groups. The individual specialized in making their malware undetectable and also carried out a ransomware attack himself. The arrest, part of ‘Operation Endgame,’ was the result of a Dutch police investigation. The suspect could face up to 15 years in prison.

Qilin Targets Healthcare Sector

A ransomware attack by the Qilin group disrupted multiple London hospitals by targeting Synnovis (formerly Viapath). This affected operations at Guy’s and St Thomas’ NHS Foundation Trust and King’s College Hospital NHS Foundation Trust. Over 800 surgeries and 700 appointments had to be rescheduled, with IT system recovery expected to take months. The attack caused shortages in blood supplies, especially O-positive and O-negative types, prompting urgent donation appeals from NHS Blood and Transplant.

Ransomhub Makes Headlines

British auction house Christie’s faced a data breach by the RansomHub ransomware group, resulting in the theft of sensitive client information. Christie’s quickly secured their network, enlisted external cybersecurity experts, and notified law enforcement. The breach affected potentially 500,000 clients. Christie’s is offering a free twelve-month subscription to an identity theft and fraud monitoring service for those impacted. RansomHub, known for its data-theft extortion methods, claimed responsibility and announced the sale of the stolen data on their dark web auction platform.

New & Emerging Groups

El Dorado Ransomware

In mid-June 2024, researchers identified a new ransomware group named El Dorado. This variant, derived from the LostTrust ransomware, encrypts files and appends the “.00000001” extension to filenames. It also generates a ransom note titled “HOW_RETURN_YOUR_DATA.TXT.”

Cicada3301

Cicada3301 has recently emerged as a significant cyber threat, targeting organizations worldwide. Known for their sophisticated attack methods, they have been involved in several high-profile ransomware attacks. Their operations typically involve detailed reconnaissance and exploiting network vulnerabilities, especially those neglected in IT security protocols. As of this report, the group has listed four victims, with all the companies’ data leaked on their site.

SenSayQ

SenSayQ is a new ransomware actor recently observed in the threat landscape. While their exact methods remain unclear, they are known to use double-extortion tactics, involving both data exfiltration and file encryption. SenSayQ utilizes a variant of LockBit ransomware, leaving ransom notes in most folders. Currently, the group has two victims listed on its leak site.

Trinity

Trinity is a newly identified ransomware variant, believed to be an updated version of “2023Lock.” This malware encrypts files and appends the “.trinitylock” extension. Trinity shares some of its code with another variant known as Venus. The threat actors behind Trinity use double extortion techniques, exfiltrating confidential files and threatening to release them publicly if their demands are not met. By mid-June, the group listed five victims, but this was later reduced to three, suggesting possible ransom payments or data sales to third parties.

VULNERABILITIES EXPLOITED IN JUNE 2024 BY RANSOMWARE GROUPS

CVECVSSVulnerability NameAssociated Threat actor
CVE-2024-45779.8PHP-CGI OS Command Injection VulnerabilityTellYouThePass ransomware
CVE-2024-261697.8Microsoft Windows Error Reporting Service Improper Privilege Management VulnerabilityBlackbasta
"SOS
Ransomware

Ransomware – State of Play May 2024

SOS Intelligence is currently tracking 193 distinct ransomware groups, with data collection covering 384 relays and mirrors.

In the reporting period, SOS Intelligence has identified 474 instances of publicised ransomware attacks.  These have been identified through the publication of victim details and data on ransomware blog sites accessible via Tor.  While this data represents known and publicised data breaches and ransomware attacks, the nature and operation of these groups means that not every successful attack is published and made public, so true figures on the volume of attacks are likely to be higher.   Our analysis of available public data is presented below:

Threat Group Activity and Trends

Ransomware activity showed a 30% increase in May when compared to the previous month, and a 4% increase in activity when compared to the previous year.  Furthermore, the number of active groups has increased to 37 from 36 the previous month.

This significant increase in activity has been driven by a surprise surge of activity from the Lockbit group.  In May, the group published 176 victims to its Data Leak Site (DLS), representing 37% of all publicised attacks for the month.  Further, this is a 633% increase in activity from the previous month and comes at a time when Lockbit was expected to be showing a continued decrease in activity.  Rather what we have seen is Lockbit’s busiest month on record.

The sudden surge from Lockbit has been a surprise to many.  The first tranche of published data emerged shortly after further law enforcement announcements regarding the group and its takedown.  Notable among the data released is an unusually high volume of affected victims in Spain and India being released quickly.  This may indicate the activity of an affiliate or affiliates with a particular proclivity for targeting those countries.  It should be noted that some of the victims had previously had their data released in the previous year, suggesting that Lockbit might be recycling data for additional ransoms and also to appear active.  Furthermore, it isn’t clear when these victims were targeted, so the actual point of breach may have been before law enforcement activity against Lockbit in February 2024.

Analysis of Geographic Targeting

Over the last month, the percentage volume of attacks against the US dropped by 7%.  Targeting continues to follow financial lines, with the majority of remaining attacks targeted at G7 and BRICS bloc countries.

Compared to April, 41% more countries were targeted in May.  Our data is also showing interesting geographic targeting data.  We have observed emerging or developing strains targeting developing countries in Southeast Asia, Africa and South America, whereas more established variants focus more on North America, Western Europe and Australia.

Industry Targeting

Targeting has broadly increased across all victim sectors, however significant increases have been seen in the Manufacturing, Construction & Engineering and IT & Technology industries.

Notably, there appears to have been increased targeting against public-sector entities.  This is likely a result of many groups abandoning their affiliate rules on targeting of such victims.

Significant Events

LockBit Black distributed via Botnet in the wild

Since April, the Phorpiex botnet has sent millions of phishing emails to distribute LockBit Black ransomware. These emails, often sent using aliases with simple names, include ZIP attachments containing executables that install the ransomware. Leveraging LockBit 3.0’s leaked builder, the campaign targets various industries worldwide. Active for over a decade, the Phorpiex botnet has evolved from a worm to an IRC-controlled trojan, and has been implicated in sextortion and cryptocurrency theft.

Social engineering attacks delivering Blackbasta

Researchers have observed the threat actor Storm-1811 using Microsoft Teams and Quick Assist for social engineering attacks that result in the deployment of Blackbasta ransomware. Storm-1811 employs voice phishing (vishing) and malicious links to gain access through Quick Assist. They use tools such as Qakbot, remote monitoring and management (RMM) tools like ScreenConnect and NetSupport Manager, and Cobalt Strike. Additionally, Storm-1811 utilises EvilProxy phishing sites and SystemBC for persistence and command-and-control. After compromising a system, they use PsExec to deploy Black Basta ransomware.

INC Ransomware source code for sale

Threat actor “salfetka” is alleging to have for sale the source code to INC Ransom, valued at $300,000.  The legitimacy of the sale is uncertain.   This comes at a time where there have been changes within the groups operation, which suggests possible plans for a new encryptor.

Threat actors targeting Windows admins with fake ads

A ransomware campaign is targeting Windows system administrators by promoting fake download sites for Putty and WinSCP through search engine ads. These fraudulent sites offer Trojanized installers that deploy the Sliver toolkit, facilitating further network access and potential ransomware deployment. The campaign employs tactics similar to those used by BlackCat/ALPHV ransomware, highlighting an increasing threat from search engine advertisements for popular software.

New Groups

SpiderX

SpiderX, a new ransomware-as-a-service promoted by threat actors on underground forums, is designed for Windows systems and boasts advanced features surpassing its predecessor, Diablo. Key capabilities include ChaCha20-256 encryption for rapid file encryption, offline functionality for stealth operations, comprehensive targeting of all connected drives, and a built-in information stealer that exfiltrates data to MegaNz. Priced at $150, SpiderX poses a significant cybersecurity threat due to its affordability and efficiency.

Fakepenny

Researchers have identified a new North Korean hacking group, Moonstone Sleet, active since August 2023. This threat actor employs custom ransomware called ‘FakePenny,’ first detected in April 2024, which includes a loader and an encryptor with ransom notes resembling those used by Seashell Blizzard’s NotPetya. Moonstone Sleet’s ransom demands are notably high, with one reaching $6.6 million in Bitcoin, surpassing previous North Korean ransomware demands such as WannaCry 2.0 and H0lyGh0st.

Arcusmedia

First identified in May, the Arcusmedia group has been responsible for at least 17 incidents to date, primarily targeting South America across a wide range of sectors, including government, banking, finance, construction, architecture, music, entertainment, IT, manufacturing, professional services, healthcare, and education.

"SOS
Ransomware

Ransomware – State of Play April 2024

SOS Intelligence is currently tracking 192 distinct ransomware groups, with data collection covering 382 relays and mirrors.

In the reporting period, SOS Intelligence has identified 365 instances of publicised ransomware attacks.  These have been identified through the publication of victim details and data on ransomware blog sites accessible via Tor. Our analysis is presented below:

Group Activity and Trends

Ransomware activity showed a 13% decrease in April when compared to the previous month, and a 7 % decrease in activity when compared to the previous year. However, the number of active groups has increased to 36 from 33 the previous month.

The overall drop in victim numbers for April is likely an ongoing effect of the dissolution of AlphV/BlackCat and the significant decrease in activity from Lockbit as a result of law enforcement activity in February.

Since February, we have closely monitored group activity for signs of where AlphV and Lockbit affiliates would take their business. The top six groups for the year-to-date are represented above and as yet, no one group has emerged above the others. Hunters International, Play and Ransomhub established themselves as the most active across April, but over the three months, we have also seen significant activity from Blackbasta and 8base. This could suggest that displaced affiliates are not settled on a final product, and have been utilising different ransomware services in the wake of the downfall of AlphV and Lockbit.

Analysis of Geographic Targeting

The volume of targeting against US-based victims has remained steady at around 50% of all reported ransomware attacks.  Targeting continues to follow financial lines, with the majority of remaining attacks targeted at G7 and BRICS bloc countries.

Compared to March, 11% fewer countries were targeted in April.  Our data is also showing interesting geographic targeting data.  We have observed emerging or developing strains targeting developing countries, whereas more established variants focus more on North America, Western Europe and Australia.

Top Strains per Country

United States
Canada
United Kingdom
Germany
Italy
– play
– play
– snatch
– ragroup
– ransomhub
– hunters
– blacksuit
– dragonforce
– 8base
– rhysida
– blacksuit
– akira
– lockbit3
– lockbit3
– ciphbit

Industry Targeting

Despite a reduction in victim volume, Manufacturing and IT & Technology remain at the forefront of threat actor targeting. Health & Social Care and Retail & Wholesale continue to see an emergence as a target of choice amongst multiple different variants, likely due to many groups removing targeting restrictions in the wake of law enforcement activity and continued western support for Ukraine.

Top Strains per Industry

Manufacturing
IT & Technology
Health & Social Care
Construction & Engineering
Retail & Wholesale
– play
– ransomhub
– incransom
– play
– hunters
– hunters
– darkvault
– qiulong
– cactus
– ransomhub
– blackbasta
– cactus
– ransomhub
– lockbit3
– lockbit3

Significant Events

8base targets the United Nations

The United Nations Development Programme (UNDP) was subject to an 8base ransomware attack, resulting in the exfiltration of human resources and procurement information. Despite significant demands being made, the UN has stood fast in its decision to not make payment.

Akira collects ransoms worth USD 42 million

An advisory provided cyber security centre’s in the USA, Netherlands and Europe has revealed that, since March 2023, the Akira ransomware strain has been responsible for attacks against 250 victims, with an estimated total ransom value of USD 42 million.

Lockbit not disappearing without a fight

The District of Columbia Department of Insurance, Securities & Banking,a local government department in the US capital, was added to the long list of Lockbit victims.  An estimated 800GB of sensitive data was obtained in the breach, which has not been made available to the public amid reports of it being sold privately.

New Groups

APT73

  • Suspected to be a LockBit spin-off – several pages on their leaksite resemble those used by LockBit
  • Listed 4 victims since appearing in late April

DarkVault

  • Suspected to be a LockBit spin-off – several pages on their leaksite resemble those used by LockBit
  • Also involved in other illicit activities, such as bomb threats, doxing, and fraud.
  • Listed 22 victims since appearing in April

Quilong

  • Currently exclusively targeting victims in Brazil
  • Listed 6 victims since appearing in April

SEXi

  • Emerged in April 2024, targeting a hosting company in Chile. 
  • Encrypts VMware ESXi servers and backups, appending the .SEXi extension to encrypted files and dropping ransom notes named SEXi.txt. The name ‘SEXi’ is believed to be a play on ‘ESXi,’ as the attacks exclusively target VMWare ESXi servers.

Space Bears

  • Sports a unique front end with corporate stock images but also maintains a classic “wall of shame” for their victims.
  • Alongside instructions for affected companies, they operate both a .onion site and a clearnet website.

Vulnerability Exploitation

Threat actors are maintaining techniques focusing on the exploitation of vulnerabilities in public-facing corporate infrastructure.  

In recent weeks, Linux variants of the Cerber ransomware have been seen to be deployed utilising exploitation of Atlassian Confluence Data Center and Server, specifically CVE-2023-22518.  CVE-2023-22518 is a critical severity (CVSS 9.1) Improper Authorisation Vulnerability which allows an unauthenticated attacker to reset Confluence and create an administrator account for persistent access.

"SOS
Investigation, Product news

Cracking CAPTCHAs for fun and profit

Through synthetic training sample dataset generation and ML training.

Preface

Cracking CAPTCHAs is already a well-documented and established process which this article looks to expand on. We will approach this article with a general view of how we’ve cracked CAPTCHAs within undesirable conditions. This article is not meant to be a how-to or detailed guide to replicate our steps. However, it may give you some inspiration for your specific challenge. 

We believe that the methods laid out in this article are novel and significantly improve the efficiency of automated CAPTCHA solving in contrast to traditional approaches. Especially when considering a target CAPTCHA system with poor sample harvesting opportunities.

Ethics

We bypass human verification checks to maintain automatic information collection pipelines. The use of the methods we have developed only extends as far as what is required to automate our collection process. 

If a CAPTCHA or other human verification check system is poorly designed and not adequately rate limited, condition checked etc. bypassing it on scale may lead to a DDoS (Distributed Denial of Service) attack in the worst of cases. But with correctly implemented human verification systems, you should mitigate this even with the system bypassed. At best, unethical manipulation of these verification systems can lead to spam posts/comments and otherwise undesirable automated “bot” interaction. We do not condone this type of use. 

The Problem

There are several well-established methods to automate the solving of CAPTCHAs, depending on the complexity of the CAPTCHA, and if we start at the easy end of the spectrum we are presented with a fairly basic alphabetical captcha. 

With a simple distortion background, one might choose to apply a straightforward process of applying denoise filters or Gaussian blurring to an image to reduce or remove the amount of “stars” or random dot pixels present in its background that are applied at random. 

This process can give us a less noisy picture and we can further convert the image to grayscale.  If the source sample is a colour image doing so improves edge detection. 

The image can then be processed through a standard OCR (Optical Character Recognition) library and in our experience can result in a 0.1% failure rate yielding excellent stable solutions. 

In some cases, a good test of CAPTCHA ease of solvability is to feed it to Google Translate as an image; have Google Translate attempt to read the text and translate the letters back into English. If it can, then you have a very good chance that rudimentary OCR libraries will also work for you.

But this article is not about the easy end of the challenge…

What we are dealing with is a CAPTCHA that is both alphanumeric, upper and lower case with random character placement and rotation, and random disruption lines across the image and characters.  Furthermore, most importantly, a point that we will discuss in more detail is where the target source is a Tor Onion website that, at the best of times loads slowly and at the worst of times is offline or responds with backend timeout errors. 

The image complexity of the source CAPTCHA means it’s nearly impossible to effectively read it by OCR. This is made challenging due to the disruption patterns provided by the background random line arrangement (an outward star pattern) and each of our characters are independently disrupted with seemingly random lines of various length and width. Combining all that with offset angles of each character it’s beyond what most OCR or OpenCV methods can handle. 

Therefore, for more complex CAPTCHAs image manipulation (removing noise, grey scaling etc.) is typically not sufficient. These challenges usually require machine learning to get a reasonable failure rate and sufficient solving speed. 

The biggest factor in achieving a good model that will solve accurately is having a large enough sample base. In some cases, many thousands of samples are required for training. Certainly, when dealing with a CAPTCHA that may have upper, lowercase and numerical characters with randomisation of all these points plus randomisation on disruption patterns or lines the larger the sample set, the more accurate a model the training will produce. 

So how do you get thousands of samples from a source that is slow to load and has poor availability, both conditions of the source being a Tor website? Harvesting samples this way would be far too inefficient and we can’t hang around! 

Even with a target source that responds reasonably quickly, has good availability, and can be harvested without aggressively hitting rate limits, who would want to sit there endlessly solving eight thousand captchas to feed to an optical character recognition model? 

I know that’s not going to be me! Sure, there are options to outsource these problems and crowdsource them, but those options take time, money and are likely to introduce errors in our training sample data. Neither of these is desirable, so how do we get 100% accurate sample data cheaply without human solving, without having to harvest the source, and that can scale? 

The Solution

The solution we came up with was first to not focus on the solving of the CAPTCHAs, or the training of our model, or anything that was a direct result or outcome of the end goal we are driving towards. Instead, we looked at how the CAPTCHAs are constructed; what do they look like and what are their elemental parts. 

We know harvesting is not an optimal option, so we have put that aside. Doing so leaves us with a handful of maybe 20 or so harvested solved CAPTCHA samples. Nowhere near enough to start training but it’s enough to start focusing on the sample set we have.

If we look at how the CAPTCHA is constructed and try and break its construction down piece by piece, in a way “reverse engineering” the construction of the CAPTCHA we might either: 1) be able to generate our own `synthetic` CAPTCHAs on demand and at scale all 100% accurately pre solved, or 2) sufficiently understand the method of construction to identify the library or process in which the CAPTCHA is constructed and reimplement it for ourselves with the same 100% accurately pre-solved outcome. 

In our case and the example, we are writing this article from the path of the former option. This option was chosen as some time was spent trying to identify the particular CAPTCHA library but no exact match was found, and in the interest of not burning too much time, and depending on external factors we decided to attempt to create our own synthetic CAPTCHA generation process.

To create our CAPTCHAs, we used Pillow (a PIL Python Fork), a Python Image Manipulation Library that offers a wide range of features all well suited for the job at hand. 

We start by defining a few values that we have observed to be fixed, such as a defined image size (in our case, 280 by 50 pixels) and use this to create a simple image. 

Then we define our letter set (a to z, A to Z, 0 to 9) as we know these to be fixed. 

Using `random.choice` we can pick a required amount of characters.  In our case, the CAPTCHA uses a fixed length of 6 characters. 

The text font is also important and from our source samples we see it is fixed: therefore we try to match the font type as closely as possible. Font size also remains constant. This will be important in ensuring that our training is as accurate as possible when our model is presented with real sample data.

To kick things off, the process carefully establishes the dimensions of the image canvas, akin to laying out a pristine piece of paper before beginning a drawing. Then, with a deft stroke, we construct a blank background canvas, pristine and white, awaiting the arrival of the CAPTCHA characters. But here’s where the true artistry takes centre stage; the process methodically layers complexity onto the character, 

With each character in the CAPTCHA text, our process doesn’t simply slap it onto the canvas; instead, it treats each letter as an individual brushstroke, adding specific characteristics at every turn. We begin by precisely measuring the width and height of each character, ensuring that characters will not be chopped off the edges, correctly fit and fill the CAPTCHA, and that they resemble the source CAPTCHA text. Then, like with the source samples, we introduce randomness into the mix, spacing out the letters with varying degrees of separation, akin to scattering scrabble pieces.

We are also introducing a touch of chaos by randomly rotating each character, giving them a tilt that defies conventional alignment. This clever sleight of hand resembles the source samples accurately and adds to the difficulty level of solving this CAPTCHA. 

Yet the process doesn’t stop there. No, it goes above and beyond, adorning our canvas with a riotous display of crisscrossing lines, as if an abstract artist had gone wild with a brush. These random lines serve as a digital labyrinth, obscuring the text beneath a veil of confusion and intrigue.

We then add and overlay lines of random length and weight across each character, aligned to the character’s angle closely matching that of the source sample. 

Now that we have a way to populate our image canvas, we have a working framework with which we can iterate to get an output that resembles the source samples as closely as possible. 

For now, we generate a few hundred samples, each image file is named the randomly selected CAPTCHA text, assisting us by essentially generating a sample set that has already been solved. 

After that, we compared each iteration’s output closely to the source and made tweaks and adaptations. For each iteration of the CAPTCHA generator we looked closely at just one specific attribute to simplify the synthesis process. We adjust the random scattered background lines, adjusting their length, width and count.  Moving then onto tweaking the letter placement and random angles, to closely match the apparent pseudo randomness of the sample data set.

Following sufficient tweaking and iterations, we are producing a CAPTCHA that is at least visually very closely matching our source samples. It matches so closely that if mixed with real samples it’s difficult to distinguish. This is the ideal level of synthesis we are looking to achieve. 

Example synthetic captcha on the left, real on the right

Next steps

Now that we have a way to produce synthetic CAPTCHAs that very closely match our target, it’s time to produce a few thousand of them. This is easily and quickly done by specifying the total count in our process loop and out pops 5,000 freshly generated pre-solved captchas all nicely labelled and ready for shoving into our training process. 

For model training, we’ve chosen to use the TensorFlow framework alongside the ONNX Runtime machine learning model accelerator. This combination worked well for us for both training accuracy and efficiency. All training was conducted with the use of a Nvidia GPU.

Following initial training, using just our best-produced synthetic CAPTCHA samples as our data set, we achieved a CER (Character error rate) of 3.26%. For a first batch run of a model trained against a synthetic data set was not too bad at all. But we knew we could do better. 

Now that we had a model to work with, we could use it to start solving actual real target CAPTCHAs.  This would allow us to generate a larger pool of real CAPTCHA samples, with a solve set, and mix those in with our synthetic set.  We were looking to generate 5k synthetic and 1k real harvested CAPTCHAs with our newly trained, albeit unoptimized model. 

With a framework in place that would interface with the target website, collect CAPTCHAs, generate a text prediction, check that with the website and if solved, store the solved and labelled CAPTCHA image we generated about 1,000 samples over a short time.

Feeding this back into the mix of training model data we dropped the CER down to 2.77%.

A screen shot of a black screen

Description automatically generated

We were confident that even with 2.7% it was a rate better than a human could achieve, and we were also confident that our methodology was working. 

Our remaining tasks were to reiterate the model once more, using this slightly more optimised model and generate a slightly larger set of labelled real CAPTCHAs. 

We were able to go from the initial model, with a worse CER (orange line) to the best model (green line) in only a few training iterations.

The model training improvements are best shown in the graph below with each improvement yielding a lower CER, for longer (more stable) and at a sooner point in time. 

At which point we settled on a final model, with a CER of 1.4%, opting for an optimal  mix real CAPTCHAs to synthetic. 

Our final ML model diagram: 

Once the efficacy of this model was validated it was then a task of simply plugging it into the collection pipeline process and enlivening it into our production collection system. The automated solver process has been running stable ever since and most of the disruption we’ve observed has solely been to the target source going offline and being unavailable. 

Bias and Variance

A key consideration during the training process was to be aware of and mitigate where possible Overfitting and Overtraining our model. Instead of using the terms `overfitting` and `overtraining` I like to instead use Bias and Variance as two potential pitfalls of ML training as they better explain undesirable conditions that may occur. Without diving into too many details around these ML concepts as to fully understand them you would probably need a PhD. The best way I can describe what my simple mind can understand is as follows.

Due to the nature of our novel, one might say clever iterative process to train a CAPTCHA solver on a very low original source data set we are by virtue potentially adding bias into our training process. For example, from the first model any solved data sets will be solved by a model that has a predefined bias to solving a particular set, style or character combination potentially resulting in a new data set that is biassed towards what that previous model was good at solving thereby amplifying the bias in our next model’s training. 

This bias would result in a real world regression of CER as the model is unoptimised to solve a wider range of character combinations and randomisation characteristics. 

Our second pitfall: overfitting slides at both ends of the extreme in terms of providing an overly varied training set or an insufficiently varied training set, i.e. creeping into bias. Whereby we must consider that although we could train a model to solve many different types of CAPTCHAs, beyond just this one example, from one model using a very varied data set doing so and if not carefully tuned could result in `overfitting` our data set thereby introducing an unoptimised CER as our model is essentially training on more noise than signal. 

We therefore considered both Bias and Variance closely, ensuring a healthy mix of varied real correctly labelled CAPTCHAs harvested from source to a ratio of synthetically generated CAPTCHAs with a randomly distributed character set. An optimal CER band was then discovered through iterative AB testing of data set mix, training iterations until a stable plateau was identified. 

Conclusion

We deploy a final model, incorporating a mix of synthetic and real CAPTCHAs, achieving a CER of 1.4%. The automated solver process seamlessly integrates into our production collection system, ensuring stability and efficiency.

By leveraging synthetic sample training data generation, we’ve advanced CAPTCHA cracking. Our approach offers an effective and efficient solution for CAPTCHA cracking without significant human involvement or effort allowing for effective automated data collection.

With this capability, we are able to add value to our customers by automating the collection from otherwise programmatically inaccessible sources, where we would have to manually have a human solve the CAPTCHA access the page, insert any updates and then alert our customers. Automation is key to what we do at speed and at scale especially when dealing with many hundreds of collection sources as we do.

Photo by Kaffeebart on Unsplash.

1 2 3 4 5
Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Youtube
Consent to display content from - Youtube
Vimeo
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google
Spotify
Consent to display content from - Spotify
Sound Cloud
Consent to display content from - Sound