foolbox.criteria

Provides classes that define what is adversarial.

Criteria

We provide criteria for untargeted and targeted adversarial attacks.

Misclassification Defines adversarials as images for which the predicted class is not the original class.
TopKMisclassification Defines adversarials as images for which the original class is not one of the top k predicted classes.
OriginalClassProbability Defines adversarials as images for which the probability of the original class is below a given threshold.
TargetClass Defines adversarials as images for which the predicted class is the given target class.
TargetClassProbability Defines adversarials as images for which the probability of a given target class is above a given threshold.

Examples

Untargeted criteria:

>>> from foolbox.criteria import Misclassification
>>> criterion1 = Misclassification()
>>> from foolbox.criteria import TopKMisclassification
>>> criterion2 = TopKMisclassification(k=5)

Targeted criteria:

>>> from foolbox.criteria import TargetClass
>>> criterion3 = TargetClass(22)
>>> from foolbox.criteria import TargetClassProbability
>>> criterion4 = TargetClassProbability(22, p=0.99)

Criteria can be combined to create a new criterion:

>>> criterion5 = criterion2 & criterion3

Detailed description

class foolbox.criteria.Criterion[source]

Base class for criteria that define what is adversarial.

The Criterion class represents a criterion used to determine if predictions for an image are adversarial given a reference label. It should be subclassed when implementing new criteria. Subclasses must implement is_adversarial.

is_adversarial(predictions, label)[source]

Decides if predictions for an image are adversarial given a reference label.

Parameters:

predictions : numpy.ndarray

A vector with the pre-softmax predictions for some image.

label : int

The label of the unperturbed reference image.

Returns:

bool

True if an image with the given predictions is an adversarial example when the ground-truth class is given by label, False otherwise.

name()[source]

Returns a human readable name that uniquely identifies the criterion with its hyperparameters.

Returns:

str

Human readable name that uniquely identifies the criterion with its hyperparameters.

Notes

Defaults to the class name but subclasses can provide more descriptive names and must take hyperparameters into account.

class foolbox.criteria.Misclassification[source]

Defines adversarials as images for which the predicted class is not the original class.

Notes

Uses numpy.argmax to break ties.

class foolbox.criteria.TopKMisclassification(k)[source]

Defines adversarials as images for which the original class is not one of the top k predicted classes.

For k = 1, the Misclassification class provides a more efficient implementation.

Parameters:

k : int

Number of top predictions to which the reference label is compared to.

See also

Misclassification
Provides a more effcient implementation for k = 1.

Notes

Uses numpy.argsort to break ties.

class foolbox.criteria.TargetClass(target_class)[source]

Defines adversarials as images for which the predicted class is the given target class.

Parameters:

target_class : int

The target class that needs to be predicted for an image to be considered an adversarial.

Notes

Uses numpy.argmax to break ties.

class foolbox.criteria.OriginalClassProbability(p)[source]

Defines adversarials as images for which the probability of the original class is below a given threshold.

This criterion alone does not guarantee that the class predicted for the adversarial image is not the original class (unless p < 1 / number of classes). Therefore, it should usually be combined with a classifcation criterion.

Parameters:

p : float

The threshold probability. If the probability of the original class is below this threshold, the image is considered an adversarial. It must satisfy 0 <= p <= 1.

class foolbox.criteria.TargetClassProbability(target_class, p)[source]

Defines adversarials as images for which the probability of a given target class is above a given threshold.

If the threshold is below 0.5, this criterion does not guarantee that the class predicted for the adversarial image is not the original class. In that case, it should usually be combined with a classification criterion.

Parameters:

target_class : int

The target class for which the predicted probability must be above the threshold probability p, otherwise the image is not considered an adversarial.

p : float

The threshold probability. If the probability of the target class is above this threshold, the image is considered an adversarial. It must satisfy 0 <= p <= 1.