Machine learning can offer your business plenty of benefits that unfortunately come hand in hand with plenty of security vulnerabilities. Although you may want your machine learning project to work as fast as possible at the lowest cost, good security can be precisely the opposite - slow and expensive. This article explains how secure machine learning is and provides you with a list of the main risks you should guard against.
1. Adversarial Examples
These are the most encountered attacks that aim to fool your machine learning system by feeding it with malicious input that includes very small and unnoticeable perturbations. They function like optical illusions for your system, and they can cause it to make false predictions and categorizations.
2. Data Poisoning
This happens when the attacker manipulates the data fed into your machine learning system, thus compromising it. Your machine learning engineers should consider your training data and be aware of any weaknesses that could make it prone to an attacker and to what extent that could happen. Attackers can even manipulate raw data used to train models so that even your machine learning training could go bad.
3. Online System Manipulation
Online machine learning systems are the ones that continuously learn during operational use and can modify behavior throughout time. An easy to carry out attack consists of nudging the still-learning system through system input and then retraining the model to do the wrong thing. For this, your machine learning engineers should consider data provenance and algorithm choice very carefully.
4. Transfer-Learning Attack
Machine learning systems are usually made by tuning an already trained-based model - basically, its generic abilities are fine-tuned with specialized training. If the pre-trained model is widely available, attackers can use it and succeed against your tuned model. Make sure that your machine learning system used for fine-tuning does not include unanticipated behaviors. There is also a risk when you take models for transfer from groups. If you do so, make sure that there is a description of exactly what their system does and how they control the risks in the document.
5. Data Confidentiality
Machine learning systems often include highly sensitive and confidential data that can be attacked. In this case, sub-symbolic ‘feature’ extraction may be helpful because it can hone adversarial attacks.
So, by now, you hopefully have a good idea about why risk management is an essential part of any data science project. Find out more about how to design a data science experiment here. Threats are always there and can range in severity, so be cautious and prepared.