foolbox.criteria
¶
Provides classes that define what is adversarial.
Criteria¶
We provide criteria for untargeted and targeted adversarial attacks.
Misclassification |
Defines adversarials as inputs for which the predicted class is not the original class. |
TopKMisclassification |
Defines adversarials as inputs for which the original class is not one of the top k predicted classes. |
OriginalClassProbability |
Defines adversarials as inputs for which the probability of the original class is below a given threshold. |
ConfidentMisclassification |
Defines adversarials as inputs for which the probability of any class other than the original is above a given threshold. |
TargetClass |
Defines adversarials as inputs for which the predicted class is the given target class. |
TargetClassProbability |
Defines adversarials as inputs for which the probability of a given target class is above a given threshold. |
Examples¶
Untargeted criteria:
>>> from foolbox.criteria import Misclassification
>>> criterion1 = Misclassification()
>>> from foolbox.criteria import TopKMisclassification
>>> criterion2 = TopKMisclassification(k=5)
Targeted criteria:
>>> from foolbox.criteria import TargetClass
>>> criterion3 = TargetClass(22)
>>> from foolbox.criteria import TargetClassProbability
>>> criterion4 = TargetClassProbability(22, p=0.99)
Criteria can be combined to create a new criterion:
>>> criterion5 = criterion2 & criterion3
Detailed description¶
-
class
foolbox.criteria.
Criterion
[source]¶ Base class for criteria that define what is adversarial.
The
Criterion
class represents a criterion used to determine if predictions for an image are adversarial given a reference label. It should be subclassed when implementing new criteria. Subclasses must implement is_adversarial.-
is_adversarial
(self, predictions, label)[source]¶ Decides if predictions for an image are adversarial given a reference label.
Parameters: - predictions :
numpy.ndarray
A vector with the pre-softmax predictions for some image.
- label : int
The label of the unperturbed reference image.
Returns: - bool
True if an image with the given predictions is an adversarial example when the ground-truth class is given by label, False otherwise.
- predictions :
-
name
(self)[source]¶ Returns a human readable name that uniquely identifies the criterion with its hyperparameters.
Returns: - str
Human readable name that uniquely identifies the criterion with its hyperparameters.
Notes
Defaults to the class name but subclasses can provide more descriptive names and must take hyperparameters into account.
-
-
class
foolbox.criteria.
Misclassification
[source]¶ Defines adversarials as inputs for which the predicted class is not the original class.
See also
Notes
Uses numpy.argmax to break ties.
-
is_adversarial
(self, predictions, label)[source]¶ Decides if predictions for an image are adversarial given a reference label.
Parameters: - predictions :
numpy.ndarray
A vector with the pre-softmax predictions for some image.
- label : int
The label of the unperturbed reference image.
Returns: - bool
True if an image with the given predictions is an adversarial example when the ground-truth class is given by label, False otherwise.
- predictions :
-
name
(self)[source]¶ Returns a human readable name that uniquely identifies the criterion with its hyperparameters.
Returns: - str
Human readable name that uniquely identifies the criterion with its hyperparameters.
Notes
Defaults to the class name but subclasses can provide more descriptive names and must take hyperparameters into account.
-
-
class
foolbox.criteria.
ConfidentMisclassification
(p)[source]¶ Defines adversarials as inputs for which the probability of any class other than the original is above a given threshold.
Parameters: - p : float
The threshold probability. If the probability of any class other than the original is at least p, the image is considered an adversarial. It must satisfy 0 <= p <= 1.
-
is_adversarial
(self, predictions, label)[source]¶ Decides if predictions for an image are adversarial given a reference label.
Parameters: - predictions :
numpy.ndarray
A vector with the pre-softmax predictions for some image.
- label : int
The label of the unperturbed reference image.
Returns: - bool
True if an image with the given predictions is an adversarial example when the ground-truth class is given by label, False otherwise.
- predictions :
-
name
(self)[source]¶ Returns a human readable name that uniquely identifies the criterion with its hyperparameters.
Returns: - str
Human readable name that uniquely identifies the criterion with its hyperparameters.
Notes
Defaults to the class name but subclasses can provide more descriptive names and must take hyperparameters into account.
-
class
foolbox.criteria.
TopKMisclassification
(k)[source]¶ Defines adversarials as inputs for which the original class is not one of the top k predicted classes.
For k = 1, the
Misclassification
class provides a more efficient implementation.Parameters: - k : int
Number of top predictions to which the reference label is compared to.
See also
Misclassification
- Provides a more effcient implementation for k = 1.
Notes
Uses numpy.argsort to break ties.
-
is_adversarial
(self, predictions, label)[source]¶ Decides if predictions for an image are adversarial given a reference label.
Parameters: - predictions :
numpy.ndarray
A vector with the pre-softmax predictions for some image.
- label : int
The label of the unperturbed reference image.
Returns: - bool
True if an image with the given predictions is an adversarial example when the ground-truth class is given by label, False otherwise.
- predictions :
-
name
(self)[source]¶ Returns a human readable name that uniquely identifies the criterion with its hyperparameters.
Returns: - str
Human readable name that uniquely identifies the criterion with its hyperparameters.
Notes
Defaults to the class name but subclasses can provide more descriptive names and must take hyperparameters into account.
-
class
foolbox.criteria.
TargetClass
(target_class)[source]¶ Defines adversarials as inputs for which the predicted class is the given target class.
Parameters: - target_class : int
The target class that needs to be predicted for an image to be considered an adversarial.
Notes
Uses numpy.argmax to break ties.
-
is_adversarial
(self, predictions, label)[source]¶ Decides if predictions for an image are adversarial given a reference label.
Parameters: - predictions :
numpy.ndarray
A vector with the pre-softmax predictions for some image.
- label : int
The label of the unperturbed reference image.
Returns: - bool
True if an image with the given predictions is an adversarial example when the ground-truth class is given by label, False otherwise.
- predictions :
-
name
(self)[source]¶ Returns a human readable name that uniquely identifies the criterion with its hyperparameters.
Returns: - str
Human readable name that uniquely identifies the criterion with its hyperparameters.
Notes
Defaults to the class name but subclasses can provide more descriptive names and must take hyperparameters into account.
-
class
foolbox.criteria.
OriginalClassProbability
(p)[source]¶ Defines adversarials as inputs for which the probability of the original class is below a given threshold.
This criterion alone does not guarantee that the class predicted for the adversarial image is not the original class (unless p < 1 / number of classes). Therefore, it should usually be combined with a classifcation criterion.
Parameters: - p : float
The threshold probability. If the probability of the original class is below this threshold, the image is considered an adversarial. It must satisfy 0 <= p <= 1.
-
is_adversarial
(self, predictions, label)[source]¶ Decides if predictions for an image are adversarial given a reference label.
Parameters: - predictions :
numpy.ndarray
A vector with the pre-softmax predictions for some image.
- label : int
The label of the unperturbed reference image.
Returns: - bool
True if an image with the given predictions is an adversarial example when the ground-truth class is given by label, False otherwise.
- predictions :
-
name
(self)[source]¶ Returns a human readable name that uniquely identifies the criterion with its hyperparameters.
Returns: - str
Human readable name that uniquely identifies the criterion with its hyperparameters.
Notes
Defaults to the class name but subclasses can provide more descriptive names and must take hyperparameters into account.
-
class
foolbox.criteria.
TargetClassProbability
(target_class, p)[source]¶ Defines adversarials as inputs for which the probability of a given target class is above a given threshold.
If the threshold is below 0.5, this criterion does not guarantee that the class predicted for the adversarial image is not the original class. In that case, it should usually be combined with a classification criterion.
Parameters: - target_class : int
The target class for which the predicted probability must be above the threshold probability p, otherwise the image is not considered an adversarial.
- p : float
The threshold probability. If the probability of the target class is above this threshold, the image is considered an adversarial. It must satisfy 0 <= p <= 1.
-
is_adversarial
(self, predictions, label)[source]¶ Decides if predictions for an image are adversarial given a reference label.
Parameters: - predictions :
numpy.ndarray
A vector with the pre-softmax predictions for some image.
- label : int
The label of the unperturbed reference image.
Returns: - bool
True if an image with the given predictions is an adversarial example when the ground-truth class is given by label, False otherwise.
- predictions :
-
name
(self)[source]¶ Returns a human readable name that uniquely identifies the criterion with its hyperparameters.
Returns: - str
Human readable name that uniquely identifies the criterion with its hyperparameters.
Notes
Defaults to the class name but subclasses can provide more descriptive names and must take hyperparameters into account.