
Artificial Intelligence (AI) as an emergent technology is progressing at breakneck speed. Across the globe, we are starting to see AI systems being trialled in critical infrastructure such as hospitals, financial services and public services, to make these systems more efficient and effective.
As the adoption of AI expands, cybersecurity concerns have also increased. These mainly come from the perspectives of how AI can be used to undermine cybersecurity, such as automating or optimising cyberattacks, or conducting fraud and identity theft to obtain unauthorised access. Much less discussed is the aspect of how AI systems can be compromised by malicious actors, affecting the systems’ efficacy or using them as entry points to commit data breaches.
Views are short opinion pieces by the author(s) to encourage the exchange of ideas on current issues. They may not necessarily represent the official views of KRI. All errors remain the authors’ own.
This view was prepared by Laventhen Sivashanmugam, an independent researcher, and Dr Jun-E Tan, a researcher from the Khazanah Research Institute (KRI). The authors are grateful for the valuable comments from Dr Rachel Gong, Gregory Ho Wai Son, Nik Syafiah Anis Nik Sharifulden, Khoo Wei Yang and Salbiah Idris.
Authors’ email address: laventhens@gmail.com and june.tan@krinstitute.org
Attribution – Please cite the work as follows: Laventhen Sivashanmugam and Jun-E Tan. 2024. Cybersecurity Risks of AI. Kuala Lumpur: Khazanah Research Institute. License: Creative Commons Attribution CC BY 3.0.
Photo by Olemedia on iStock
Information on Khazanah Research Institute publications and digital products can be found at www.KRInstitute.org.
In this article, we consider vulnerabilities in AI systems, a topic that is increasingly relevant as our reliance on the systems grows. Broadly speaking, an AI system “learns” from the data we feed into it and makes predictions based on that data. It can learn wrongly, it can be tricked, and it can be reverse engineered. These three issues map onto three AI hacking methods that we will go into:
- Data poisoning affects how an AI model learns, thus reducing its predictive accuracy. 2) Evasion attacks attempt to trick an AI model, invalidating the model’s predictions. 3) Model inversion is when hackers try to reverse-engineer an AI model, to try to steal the model itself, or steal the model’s training data.
This article is the third within our AI risk series. Readers who are interested can refer to the first article, “Introducing the AI Risk Series” for the baseline assumptions and understanding of AI, and the second on the risks of uneven AI adoption by MSMEs.
AI Learning Wrongly: Data Poisoning
Data poisoning occurs when a hacker feeds tampered data into an AI system, forcing it to “behave the way the attacker wants, as opposed to its creator’s intent.” As previously mentioned, data poisoning affects the predictive accuracy of an AI model, thus affecting its ability to classify data and make correct decisions.
By tampering with training dataset through injecting mislabelled or malicious data, a hacker “teaches” the AI model to behave in a way that benefits the hacker. Box 1 contains a visual representation of data poisoning to help us understand how a model can be “taught” to behave differently. The data points fed into the system affect the model’s accuracy, thus rendering it incapable of making correct classifications. For example, Google has previously admitted that its email spam filter has been tricked before. By repeatedly marking malicious spam emails as nonspam with thousands of burner accounts, hackers eventually “taught” the spam filter algorithm to mark spam emails as legitimate.
Figure 1 Feature space classification depicting the process of data poisoning
Visualisation created by authors, adapted from Miller, Xiang, and Kesidis (2019). In the first diagram, the AI model (represented by the black line) can classify the data points (represented by the orange triangles and blue circles) into groups A and B. After data poisoning, the hacker then feeds manipulated data points into the model gradually skews the model’s outputs away from the originally accurate classifications. Eventually, the model becomes fully poisoned and starts misclassifying data points into wrong groups.
Another example of real-world implications of data poisoning is provided as follows. Company A uses an AI system to filter out fake business enquiries they receive on their website to save themselves some time. A rival company (Company B) finds out about this and starts spamming Company A’s website with fake enquiries and nonsense text but uses the names of real potential customers. The AI system eventually starts to filter out real enquiries from those potential customers in the name of efficiency, and Company A loses business. Applied to a scenario of government agencies receiving many reports or enquiries a day, this also illustrates how public services using AI-based input filters can be hacked.
The strength of an AI model lies in its ability to “learn”, but that is also an entry point for hackers to exploit. An AI model reinforces its learning to improve; but errors can become embedded into the model’s functions and have far-reaching effects. In the same way a shaky foundation results in a shaky building, data poisoning will affect a model’s ability to engage with new data objectively, reinforcing its own inaccurate conclusions. Integrating AI systems into critical infrastructure must therefore be done with a high level of quality control at every stage of development.
AI Being Tricked: Evasion Attacks and Adversarial Examples
The goal of an evasion attack is to mislead an AI model in its classifications, affecting the validity of its predictions and thus the validity of its decisions. To make a distinction between data poisoning and evasion attacks, a data poisoning attack goes after the initial training data in hopes of skewing the AI model itself; an evasion attack is not trying to change the model per se but is instead trying to “trick” it. A hacker can trick an AI model by changing the input in a specific way, so the model misreads the input, and the system takes the wrong course of action.
Here is an example to illustrate the process. An image recognition AI system assesses every pixel in an image, assigns markers based on the data it’s trained on, and classifies images accordingly. As the statistical analysis an AI model undertakes to classify images is complex, we may not immediately understand what the markers are. Take for example a case presented during an international conference with the Association of Computing Machinery, where a group of researchers trained an image classifier AI model to differentiate between a wolf and a dog. The model succeeded at classifying images correctly, but after analysing its processes, the researchers realised that it was differentiating dogs and wolves based on the presence of snow in the images instead of the characteristics of the animals themselves.
Someone who understands how an AI model classifies objects can exploit them. In 2010, Adam Harvey, a postgraduate student at New York University, discovered unusual ways of tricking computer vision algorithms with hairstyles and makeup. Figure 2 shows a few examples of these looks, which strategically cover up parts of the face that face detection models would immediately recognise; elements such as colour, shapes and facial asymmetry also played a part in his looks. With trial-and-error and an understanding of how computer vision algorithms process light and shape, Harvey was able to make his test subjects immune to facial detection AI models.
Figure 2: Makeup and hairstyles that can trick face detection algorithms
Source: Adam Harvey
Evasion attacks can be employed in both physical and digital worlds. An example in the physical world would be attacks on self-driving cars. The autonomous driving systems of these cars can misread traffic signals and thus make the wrong driving choices, potentially resulting in car accidents. In the digital world, evasion attacks can trick AI-powered detection systems, with known cases of insurance fraud in the insurance industry or offensive or problematic content bypassing automated content moderation.
AI Being Reverse-Engineered: Model Inversion, Model Extraction and Membership Inference
There are two possible objectives of a model inversion attack: a hacker is either interested in stealing the target AI model itself or stealing the data that they can obtain through the target model. A hacker conducting an attack engages directly with an AI system by feeding it information and seeing its behaviour and figuring out its parameters via that behaviour, a bit like feeling the shape of a present over the wrapping paper. If the hacker wants to steal the model itself, this is known as model extraction. If the hacker wants to steal or retrieve training data of the model, this is known as membership inference.
Model Extraction: Stealing Models
If a hacker has access to a target model’s training data (e.g. it is publicly available or the dataset has been leaked) and can guess a model’s parameters from its behaviour, they can try to replicate the model. With a replicated model, the hacker can then accomplish other objectives such as creating and testing adversarial examples for evasion attacks or copy an AI system without permission and compete against the business selling the original product.
A real-life example of the risks of model extraction can be seen in the case of cybersecurity. Modern cybersecurity mechanisms employ AI systems to detect vulnerabilities or threats, such as viruses or firewall breaches. An AI-powered cybersecurity system can detect these threats automatically and quickly, without needing constant monitoring by a cybersecurity expert. However, if a hacker can replicate the system, its threat detection abilities can be circumvented and become obsolete. With the replicated system, the hacker can create an evasion attack tailormade to trick the target cybersecurity AI system. Once the cybersecurity model has been “extracted”, the hacker would know what kinds of code/files the system registers as malware and would formulate the exact code/virus that would slip past its defences.
Membership Inference: Stealing Data
Membership inference refers to the process of inferring characteristics about the training data based on the model’s output, which presents security and privacy risks for sensitive data. To put it in broad terms, the AI model creates its internal architecture (such as weights and optimisers) based on the data it receives. By understanding the architecture of the model, a hacker can potentially reverse engineer the data that it was originally trained on.
The health sector is often mentioned as a site for membership inference, given the amount of sensitive patient data held in healthcare facilities or medical databases for research. However, high value and confidential data can range widely from business trade secrets to governmental data for national security, beyond sensitive personal data. There are options to safeguard against membership inference attacks such as employing differential privacy measures, which inserts noise or randomness into datasets, but these techniques can be expensive for large datasets.
Conclusion
As AI adoption expands, cybersecurity concerns of AI systems should receive commensurate attention. Data poisoning and evasion attacks are well documented within the cybersecurity world, but threats like membership inference are still quite new and mainly exist as proofs of concept in academic papers. Even so, the possibilities of AI systems being hacked need to be taken seriously, given the speed and scale of harms that can be perpetuated using automated decision-making systems.
With these security vulnerabilities in mind, what do potential solutions look like?
Data poisoning is ultimately a data security issue and therefore countermeasures such as data quality control, routine auditing and continuous monitoring can be very effective. Evasion attacks can be defended against by training our AI models to recognise potential evasion attacks, or by building AI systems with multiple AI models as components, so that hackers have a harder time slipping through.
Model inversion is a threat because it allows hackers to glean sensitive information by making inferences about the output of the target model; differential privacy prevents this from happening by obfuscating the inferences themselves, making it very difficult for hackers to “steal” your private information.
Ultimately, the solutions boil down to being intentional with AI implementation, conscientious with data, and sparing no effort during the AI audit process. Security is especially pertinent in the case of AI being implemented in national critical infrastructure and systems involving public interest. State actors as well as businesses looking to maximise efficiency and effectiveness using AI must not only implement data governance policies that ensure security and privacy, but also invest in cybersecurity measures that cover AI system vulnerabilities.