Machine Learning Project Security: 5 Machine Learning Security Risks You Should Watch Out For

Last updated on Sep 23, 2021

Machine learning can offer your business plenty of benefits that unfortunately come hand in hand with plenty of security vulnerabilities. Although you may want your ML project to work as fast as possible at the lowest cost, good security can be precisely the opposite - slow and expensive. This article explains how secure ML is and provides you with a list of the main risks you should guard against.


1. Adversarial Examples

These are the most encountered attacks that aim to fool your ML system by feeding it with malicious input that includes very small and unnoticeable perturbations. They function like optical illusions for your system, and they can cause it to make false predictions and categorizations.  


2. Data Poisoning

This happens when the attacker manipulates the data fed into your ML system, thus compromising it. Your ML engineers should consider your training data and be aware of any weaknesses that could make it prone to an attacker and to what extent that could happen. Attackers can even manipulate raw data used to train models so that even your ML training could go bad.


3. Online System Manipulation

Online ML systems are the ones that continuously learn during operational use and can modify behavior throughout time. An easy to carry out attack consists of nudging the still-learning system through system input and then retraining the model to do the wrong thing. For this, your ML engineers should consider data provenance and algorithm choice very carefully.


4. Transfer-Learning Attack

ML systems are usually made by tuning an already trained-based model - basically, its generic abilities are fine-tuned with specialized training. If the pre-trained model is widely available, attackers can use it and succeed against your tuned model. Make sure that your ML system used for fine-tuning does not include unanticipated ML behavior. There is also a risk when you take models for transfer from groups. If you do so, make sure that there is a description of exactly what their system does and how they control the risks in the document.  


5. Data Confidentiality

ML systems often include highly sensitive and confidential data that can be attacked. In this case, sub-symbolic ‘feature’ extraction may be helpful because it can hone adversarial attacks. 

So, by now, you hopefully have a good idea about why risk management is an essential part of any data science project. Find out more about how to design a data science experiment here. Threats are always there and can range in severity, so be cautious and prepared. 

About the author

Claudia is a data scientist, consultant and trainer. She is the CEO of Edlitera, a data science and machine learning training and consulting company helping teams and businesses futureproof themselves and turn their data into profits.

Before Edlitera, Claudia taught Computer Science at Harvard, and worked in biotech (Qiagen), marketing tech (ZoomInfo), and ecommerce (Wayfair). Claudia earned her degree in Economics from Yale, with a focus on Statistics and Computer Science.