Algorithm Upgrades: How the GSE is Innovating to Protect Vulnerable Groups

The Global Signal Exchange (GSE) is harnessing the unique capabilities of Large-Language Models (LLMs) as part of upgrading our algorithms.
Algorithm upgrades are routine, and reflect the latest challenges in the ever-growing scam and fraud landscape. The July 2025 upgrade will provide more data about the sectors being exploited by cybercriminals when they carry out scams and fraud. Additionally, we are now able to capture information about the vulnerable populations being targeted.
What are signals?
The Global Signal Exchange shares information in the form of “signals”. A single signal could be a URL, a domain name, a subdomain or a hostname. For the consumer, a “signal” is a “suspicious link”. Keywords are strategically extracted from URLs, ignoring the underlying URL structure. This method aids in focusing on core lexical and semantic components.
How are signals scored?
Each URL is scored on conceptual relevance against a list of pre-determined groups - this may be business sectors, age groups or vulnerable populations. Similarity scores from 0-100% are assigned according to conceptual relevance. Brands do not factor into scoring, the score is uniform per signal-sector combination. Any brands that are directly lexically related are identified as part of analysis.
Previously, each signal would be sorted into one of twenty discrete sector categories. For example, the signal “calibreaudio.org.uk” would be matched with the highest similarity score for “Entertainment & Media” and placed in this sector. However, this would fail to capture the nuance that Calibre Audio provides audiobook lending to people with print disabilities (such as dyslexia, stroke, blindness, or Parkinson’s disease).
With our new upgraded algorithm, Google Gemini will provide similarity scores for multiple sectors, as well as vulnerability categories. Signals are fed into Gemini, which is then prompted to score each signal in terms of conceptual relevance to a string of sectors, age groups, and vulnerability groups.
The signal “calibreaudio.org.uk” will now be scored with similarity of 95% for the sector “Entertainment & Media”, as well as similarity of 60% for "Technology and Electronics" and 70% for “Healthcare”. This will help to capture the intersection between different sectors, as well as helping to provide nuance where a signal might not easily fit into a single sector. This approach is less prescriptive than previously, where a finite set of brands were developed and assigned to each sector. Now, brands are not strictly limited to individual sectors.
URL parsing has been significantly refined using regular expressions for dynamic keyword extraction. This enhancement ensured more accurate brand and sector mapping by handling a broader array of keywords. Brand extraction has also been upgraded to account for diverse keyword combinations. This ensures robust brand identification and precise sector scoring, with correct handling of lexical nuance.
How will this impact the consumer?
In addition to capturing sector data, the GSE is working to develop insights into which vulnerable populations might be being targeted by scam and fraud actors. The same Levenstein methodology can be applied to age-groups, as well as vulnerability groups. The GSE is working to develop an evidence-based selection of the populations most vulnerable to scams and fraud. A small pilot study demonstrated that the signal “calibreaudio.org.uk” has a similarity score of 95% for “Lifelong Physical Disabilities”, 95% for “Acquired Physical Disabilities or Chronic Illness”, and 80% for “Acquired Cognitive Impairment”. Vulnerability groups are to be refined in order to most accurately represent the existing evidence base on populations most vulnerable to scams and fraud.
Our upgraded algorithm will provide more information on 1) scammer’s motivations and techniques 2) how scammers operate within different sectors and 3) who scammers might be targeting.