foolbox.criteria

Criteria are used to define which inputs are adversarial. We provide common criteria for untargeted and targeted adversarial attacks, e.g. Misclassification and TargetedMisclassification. New criteria can easily be implemented by subclassing Criterion and implementing Criterion.__call__().

Criteria can be combined using a logical and criterion1 & criterion2 to create a new criterion.

Misclassification

from foolbox.criteria import Misclassification
criterion = Misclassification(labels)
class foolbox.criteria.Misclassification(labels)

Considers those perturbed inputs adversarial whose predicted class differs from the label.

Parameters

labels (Any) – Tensor with labels of the unperturbed inputs (batch,).

TargetedMisclassification

from foolbox.criteria import TargetedMisclassification
criterion = TargetedMisclassification(target_classes)
class foolbox.criteria.TargetedMisclassification(target_classes)

Considers those perturbed inputs adversarial whose predicted class matches the target class.

Parameters

target_classes (Any) – Tensor with target classes (batch,).

Criterion

class foolbox.criteria.Criterion

Abstract base class to implement new criteria.

abstract __call__(perturbed, outputs)

Returns a boolean tensor indicating which perturbed inputs are adversarial.

Parameters
  • perturbed (T) – Tensor with perturbed inputs (batch, ...).

  • outputs (T) – Tensor with model outputs for the perturbed inputs (batch, ...).

Returns

A boolean tensor indicating which perturbed inputs are adversarial (batch,).

Return type

T