`foolbox.v1.adversarial`¶

Provides a class that represents an adversarial example.

class foolbox.v1.adversarial.Adversarial(model, criterion, unperturbed, original_class, distance=<class 'foolbox.distances.MeanSquaredDistance'>, threshold=None, verbose=False)[source]¶

Defines an adversarial that should be found and stores the result.

The Adversarial class represents a single adversarial example for a given model, criterion and reference input. It can be passed to an adversarial attack to find the actual adversarial perturbation.

Parameters:

model : a Model instance: The model that should be fooled by the adversarial.
criterion : a Criterion instance: The criterion that determines which inputs are adversarial.
unperturbed : a numpy.ndarray: The unperturbed input to which the adversarial input should be as close as possible.
original_class : int: The ground-truth label of the unperturbed input.
distance : a Distance class: The measure used to quantify how close inputs are.
threshold : float or Distance: If not None, the attack will stop as soon as the adversarial perturbation has a size smaller than this threshold. Can be an instance of the Distance class passed to the distance argument, or a float assumed to have the same unit as the the given distance. If None, the attack will simply minimize the distance as good as possible. Note that the threshold only influences early stopping of the attack; the returned adversarial does not necessarily have smaller perturbation size than this threshold; the reached_threshold() method can be used to check if the threshold has been reached.

adversarial_class[source]¶

The argmax of the model predictions for the best adversarial found so far.

None if no adversarial has been found.

backward_one(self, gradient, x=None, strict=True)[source]¶

Interface to model.backward_one for attacks.

Parameters:	gradient : numpy.ndarray Gradient of some loss w.r.t. the logits. x : numpy.ndarray Single input with shape as expected by the model (without the batch dimension).
Returns:	gradient : numpy.ndarray The gradient w.r.t the input.

See also

gradient()

channel_axis(self, batch)[source]¶

Interface to model.channel_axis for attacks.

Parameters:	batch : bool Controls whether the index of the axis for a batch of inputs (4 dimensions) or a single input (3 dimensions) should be returned.

distance[source]¶: The distance of the adversarial input to the original input.

forward(self, inputs, greedy=False, strict=True, return_details=False)[source]¶

Interface to model.forward for attacks.

Parameters:	inputs : numpy.ndarray Batch of inputs with shape as expected by the model. greedy : bool Whether the first adversarial should be returned. strict : bool Controls if the bounds for the pixel values should be checked.

forward_and_gradient(self, x, label=None, strict=True, return_details=False)[source]¶

Interface to model.forward_and_gradient_one for attacks.

Parameters:	x : numpy.ndarray Multiple input with shape as expected by the model (with the batch dimension). label : numpy.ndarray Labels used to calculate the loss that is differentiated. Defaults to the original label. strict : bool Controls if the bounds for the pixel values should be checked.

forward_and_gradient_one(self, x=None, label=None, strict=True, return_details=False)[source]¶

Interface to model.forward_and_gradient_one for attacks.

Parameters:	x : numpy.ndarray Single input with shape as expected by the model (without the batch dimension). Defaults to the original input. label : int Label used to calculate the loss that is differentiated. Defaults to the original label. strict : bool Controls if the bounds for the pixel values should be checked.

forward_one(self, x, strict=True, return_details=False)[source]¶

Interface to model.forward_one for attacks.

Parameters:	x : numpy.ndarray Single input with shape as expected by the model (without the batch dimension). strict : bool Controls if the bounds for the pixel values should be checked.

gradient_one(self, x=None, label=None, strict=True)[source]¶

Interface to model.gradient_one for attacks.

Parameters:	x : numpy.ndarray Single input with shape as expected by the model (without the batch dimension). Defaults to the original input. label : int Label used to calculate the loss that is differentiated. Defaults to the original label. strict : bool Controls if the bounds for the pixel values should be checked.

has_gradient(self)[source]¶: Returns true if _backward and _forward_backward can be called by an attack, False otherwise.

normalized_distance(self, x)[source]¶

Calculates the distance of a given input x to the original input.

Parameters:	x : numpy.ndarray The input x that should be compared to the original input.
Returns:	`Distance` The distance between the given input and the original input.

original_class[source]¶: The class of the original input (ground-truth, not model prediction).

output[source]¶

The model predictions for the best adversarial found so far.

None if no adversarial has been found.

perturbed[source]¶: The best adversarial example found so far.

reached_threshold(self)[source]¶: Returns True if a threshold is given and the currently best adversarial distance is smaller than the threshold.

target_class[source]¶: Interface to criterion.target_class for attacks.

unperturbed[source]¶: The original input.

foolbox.v1.adversarial¶

`foolbox.v1.adversarial`¶