# foolbox.models¶

Provides classes to wrap existing models in different framworks so that they provide a unified API to the attacks.

## Models¶

 Model Base class to provide attacks with a unified interface to models. DifferentiableModel Base class for differentiable models that provide gradients. TensorFlowModel Creates a Model instance from existing TensorFlow tensors. TensorFlowEagerModel Creates a Model instance from a TensorFlow model using eager execution. PyTorchModel Creates a Model instance from a PyTorch module. KerasModel Creates a Model instance from a Keras model. TheanoModel Creates a Model instance from existing Theano tensors. LasagneModel Creates a Model instance from a Lasagne network. MXNetModel Creates a Model instance from existing MXNet symbols and weights. MXNetGluonModel Creates a Model instance from an existing MXNet Gluon Block.

## Wrappers¶

 ModelWrapper Base class for models that wrap other models. DifferentiableModelWrapper Base class for models that wrap other models and provide gradient methods. ModelWithoutGradients Turns a model into a model without gradients. ModelWithEstimatedGradients Turns a model into a model with gradients estimated by the given gradient estimator. CompositeModel Combines predictions of a (black-box) model with the gradient of a (substitute) model.

## Detailed description¶

class foolbox.models.Model(bounds, channel_axis, preprocessing=(0, 1))[source]

Base class to provide attacks with a unified interface to models.

The Model class represents a model and provides a unified interface to its predictions. Subclasses must implement batch_predictions and num_classes.

Model instances can be used as context managers and subclasses can require this to allocate and release resources.

Parameters: bounds : tuple Tuple of lower and upper bound for the pixel values, usually (0, 1) or (0, 255). channel_axis : int The index of the axis that represents color channels. preprocessing: 2-element tuple with floats or numpy arrays Elementwises preprocessing of input; we first subtract the first element of preprocessing from the input and then divide the input by the second element.
batch_predictions(images)[source]

Calculates predictions for a batch of images.

Parameters: images : numpy.ndarray Batch of inputs with shape as expected by the model. numpy.ndarray Predictions (logits, i.e. before the softmax) with shape (batch size, number of classes).
num_classes()[source]

Determines the number of classes.

Returns: int The number of classes for which the model creates predictions.
predictions(image)[source]

Convenience method that calculates predictions for a single image.

Parameters: image : numpy.ndarray Single input with shape as expected by the model (without the batch dimension). numpy.ndarray Vector of predictions (logits, i.e. before the softmax) with shape (number of classes,).
class foolbox.models.DifferentiableModel(bounds, channel_axis, preprocessing=(0, 1))[source]

Base class for differentiable models that provide gradients.

The DifferentiableModel class can be used as a base class for models that provide gradients. Subclasses must implement predictions_and_gradient.

A model should be considered differentiable based on whether it provides a predictions_and_gradient() method and a gradient() method, not based on whether it subclasses DifferentiableModel.

A differentiable model does not necessarily provide reasonable values for the gradients, the gradient can be wrong. It only guarantees that the relevant methods can be called.

backward(gradient, image)[source]

Backpropagates the gradient of some loss w.r.t. the logits through the network and returns the gradient of that loss w.r.t to the input image.

Parameters: gradient : numpy.ndarray Gradient of some loss w.r.t. the logits. image : numpy.ndarray Single input with shape as expected by the model (without the batch dimension). gradient : numpy.ndarray The gradient w.r.t the image.
gradient(image, label)[source]

Calculates the gradient of the cross-entropy loss w.r.t. the image.

The default implementation calls predictions_and_gradient. Subclasses can provide more efficient implementations that only calculate the gradient.

Parameters: image : numpy.ndarray Single input with shape as expected by the model (without the batch dimension). label : int Reference label used to calculate the gradient. gradient : numpy.ndarray The gradient of the cross-entropy loss w.r.t. the image. Will have the same shape as the image.
predictions_and_gradient(image, label)[source]

Calculates predictions for an image and the gradient of the cross-entropy loss w.r.t. the image.

Parameters: image : numpy.ndarray Single input with shape as expected by the model (without the batch dimension). label : int Reference label used to calculate the gradient. predictions : numpy.ndarray Vector of predictions (logits, i.e. before the softmax) with shape (number of classes,). gradient : numpy.ndarray The gradient of the cross-entropy loss w.r.t. the image. Will have the same shape as the image.
class foolbox.models.TensorFlowModel(images, logits, bounds, channel_axis=3, preprocessing=(0, 1))[source]

Creates a Model instance from existing TensorFlow tensors.

Parameters: images : tensorflow.Tensor The input to the model, usually a tensorflow.placeholder. logits : tensorflow.Tensor The predictions of the model, before the softmax. bounds : tuple Tuple of lower and upper bound for the pixel values, usually (0, 1) or (0, 255). channel_axis : int The index of the axis that represents color channels. preprocessing: 2-element tuple with floats or numpy arrays Elementwises preprocessing of input; we first subtract the first element of preprocessing from the input and then divide the input by the second element.
backward(gradient, image)[source]

Backpropagates the gradient of some loss w.r.t. the logits through the network and returns the gradient of that loss w.r.t to the input image.

Parameters: gradient : numpy.ndarray Gradient of some loss w.r.t. the logits. image : numpy.ndarray Single input with shape as expected by the model (without the batch dimension). gradient : numpy.ndarray The gradient w.r.t the image.
batch_predictions(images)[source]

Calculates predictions for a batch of images.

Parameters: images : numpy.ndarray Batch of inputs with shape as expected by the model. numpy.ndarray Predictions (logits, i.e. before the softmax) with shape (batch size, number of classes).

predictions()

classmethod from_keras(model, bounds, input_shape=None, channel_axis=3, preprocessing=(0, 1))[source]

Alternative constructor for a TensorFlowModel that accepts a tf.keras.Model instance.

Parameters: model : tensorflow.keras.Model A tensorflow.keras.Model that accepts a single input tensor and returns a single output tensor representing logits. bounds : tuple Tuple of lower and upper bound for the pixel values, usually (0, 1) or (0, 255). input_shape : tuple The shape of a single input, e.g. (28, 28, 1) for MNIST. If None, tries to get the the shape from the model’s input_shape attribute. channel_axis : int The index of the axis that represents color channels. preprocessing: 2-element tuple with floats or numpy arrays Elementwises preprocessing of input; we first subtract the first element of preprocessing from the input and then divide the input by the second element.
gradient(image, label)[source]

Calculates the gradient of the cross-entropy loss w.r.t. the image.

The default implementation calls predictions_and_gradient. Subclasses can provide more efficient implementations that only calculate the gradient.

Parameters: image : numpy.ndarray Single input with shape as expected by the model (without the batch dimension). label : int Reference label used to calculate the gradient. gradient : numpy.ndarray The gradient of the cross-entropy loss w.r.t. the image. Will have the same shape as the image.
num_classes()[source]

Determines the number of classes.

Returns: int The number of classes for which the model creates predictions.
predictions_and_gradient(image, label)[source]

Calculates predictions for an image and the gradient of the cross-entropy loss w.r.t. the image.

Parameters: image : numpy.ndarray Single input with shape as expected by the model (without the batch dimension). label : int Reference label used to calculate the gradient. predictions : numpy.ndarray Vector of predictions (logits, i.e. before the softmax) with shape (number of classes,). gradient : numpy.ndarray The gradient of the cross-entropy loss w.r.t. the image. Will have the same shape as the image.
class foolbox.models.TensorFlowEagerModel(model, bounds, num_classes=None, channel_axis=3, preprocessing=(0, 1))[source]

Creates a Model instance from a TensorFlow model using eager execution.

Parameters: model : a TensorFlow eager model The TensorFlow eager model that should be attacked. It will be called with input tensors and should return logits. bounds : tuple Tuple of lower and upper bound for the pixel values, usually (0, 1) or (0, 255). num_classes : int If None, will try to infer it from the model’s output shape. channel_axis : int The index of the axis that represents color channels. preprocessing: 2-element tuple with floats or numpy arrays Elementwises preprocessing of input; we first subtract the first element of preprocessing from the input and then divide the input by the second element.
backward(gradient, image)[source]

Backpropagates the gradient of some loss w.r.t. the logits through the network and returns the gradient of that loss w.r.t to the input image.

Parameters: gradient : numpy.ndarray Gradient of some loss w.r.t. the logits. image : numpy.ndarray Single input with shape as expected by the model (without the batch dimension). gradient : numpy.ndarray The gradient w.r.t the image.

gradient()

batch_predictions(images)[source]

Calculates predictions for a batch of images.

Parameters: images : numpy.ndarray Batch of inputs with shape as expected by the model. numpy.ndarray Predictions (logits, i.e. before the softmax) with shape (batch size, number of classes).

predictions()

num_classes()[source]

Determines the number of classes.

Returns: int The number of classes for which the model creates predictions.
predictions_and_gradient(image, label)[source]

Calculates predictions for an image and the gradient of the cross-entropy loss w.r.t. the image.

Parameters: image : numpy.ndarray Single input with shape as expected by the model (without the batch dimension). label : int Reference label used to calculate the gradient. predictions : numpy.ndarray Vector of predictions (logits, i.e. before the softmax) with shape (number of classes,). gradient : numpy.ndarray The gradient of the cross-entropy loss w.r.t. the image. Will have the same shape as the image.

gradient()

class foolbox.models.PyTorchModel(model, bounds, num_classes, channel_axis=1, device=None, preprocessing=(0, 1))[source]

Creates a Model instance from a PyTorch module.

Parameters: model : torch.nn.Module The PyTorch model that should be attacked. bounds : tuple Tuple of lower and upper bound for the pixel values, usually (0, 1) or (0, 255). num_classes : int Number of classes for which the model will output predictions. channel_axis : int The index of the axis that represents color channels. device : string A string specifying the device to do computation on. If None, will default to “cuda:0” if torch.cuda.is_available() or “cpu” if not. preprocessing: 2-element tuple with floats or numpy arrays Elementwises preprocessing of input; we first subtract the first element of preprocessing from the input and then divide the input by the second element.
backward(gradient, image)[source]

Backpropagates the gradient of some loss w.r.t. the logits through the network and returns the gradient of that loss w.r.t to the input image.

Parameters: gradient : numpy.ndarray Gradient of some loss w.r.t. the logits. image : numpy.ndarray Single input with shape as expected by the model (without the batch dimension). gradient : numpy.ndarray The gradient w.r.t the image.

gradient()

batch_predictions(images)[source]

Calculates predictions for a batch of images.

Parameters: images : numpy.ndarray Batch of inputs with shape as expected by the model. numpy.ndarray Predictions (logits, i.e. before the softmax) with shape (batch size, number of classes).

predictions()

num_classes()[source]

Determines the number of classes.

Returns: int The number of classes for which the model creates predictions.
predictions_and_gradient(image, label)[source]

Calculates predictions for an image and the gradient of the cross-entropy loss w.r.t. the image.

Parameters: image : numpy.ndarray Single input with shape as expected by the model (without the batch dimension). label : int Reference label used to calculate the gradient. predictions : numpy.ndarray Vector of predictions (logits, i.e. before the softmax) with shape (number of classes,). gradient : numpy.ndarray The gradient of the cross-entropy loss w.r.t. the image. Will have the same shape as the image.

gradient()

class foolbox.models.KerasModel(model, bounds, channel_axis=3, preprocessing=(0, 1), predicts='probabilities')[source]

Creates a Model instance from a Keras model.

Parameters: model : keras.models.Model The Keras model that should be attacked. bounds : tuple Tuple of lower and upper bound for the pixel values, usually (0, 1) or (0, 255). channel_axis : int The index of the axis that represents color channels. preprocessing: 2-element tuple with floats or numpy arrays Elementwises preprocessing of input; we first subtract the first element of preprocessing from the input and then divide the input by the second element. predicts : str Specifies whether the Keras model predicts logits or probabilities. Logits are preferred, but probabilities are the default.
backward(gradient, image)[source]

Backpropagates the gradient of some loss w.r.t. the logits through the network and returns the gradient of that loss w.r.t to the input image.

Parameters: gradient : numpy.ndarray Gradient of some loss w.r.t. the logits. image : numpy.ndarray Single input with shape as expected by the model (without the batch dimension). gradient : numpy.ndarray The gradient w.r.t the image.

gradient()

batch_predictions(images)[source]

Calculates predictions for a batch of images.

Parameters: images : numpy.ndarray Batch of inputs with shape as expected by the model. numpy.ndarray Predictions (logits, i.e. before the softmax) with shape (batch size, number of classes).

predictions()

num_classes()[source]

Determines the number of classes.

Returns: int The number of classes for which the model creates predictions.
predictions_and_gradient(image, label)[source]

Calculates predictions for an image and the gradient of the cross-entropy loss w.r.t. the image.

Parameters: image : numpy.ndarray Single input with shape as expected by the model (without the batch dimension). label : int Reference label used to calculate the gradient. predictions : numpy.ndarray Vector of predictions (logits, i.e. before the softmax) with shape (number of classes,). gradient : numpy.ndarray The gradient of the cross-entropy loss w.r.t. the image. Will have the same shape as the image.

gradient()

class foolbox.models.TheanoModel(images, logits, bounds, num_classes, channel_axis=1, preprocessing=[0, 1])[source]

Creates a Model instance from existing Theano tensors.

Parameters: images : theano.tensor The input to the model. logits : theano.tensor The predictions of the model, before the softmax. bounds : tuple Tuple of lower and upper bound for the pixel values, usually (0, 1) or (0, 255). num_classes : int Number of classes for which the model will output predictions. channel_axis : int The index of the axis that represents color channels. preprocessing: 2-element tuple with floats or numpy arrays Elementwises preprocessing of input; we first subtract the first element of preprocessing from the input and then divide the input by the second element.
backward(gradient, image)[source]

Backpropagates the gradient of some loss w.r.t. the logits through the network and returns the gradient of that loss w.r.t to the input image.

Parameters: gradient : numpy.ndarray Gradient of some loss w.r.t. the logits. image : numpy.ndarray Single input with shape as expected by the model (without the batch dimension). gradient : numpy.ndarray The gradient w.r.t the image.
batch_predictions(images)[source]

Calculates predictions for a batch of images.

Parameters: images : numpy.ndarray Batch of inputs with shape as expected by the model. numpy.ndarray Predictions (logits, i.e. before the softmax) with shape (batch size, number of classes).

predictions()

gradient(image, label)[source]

Calculates the gradient of the cross-entropy loss w.r.t. the image.

The default implementation calls predictions_and_gradient. Subclasses can provide more efficient implementations that only calculate the gradient.

Parameters: image : numpy.ndarray Single input with shape as expected by the model (without the batch dimension). label : int Reference label used to calculate the gradient. gradient : numpy.ndarray The gradient of the cross-entropy loss w.r.t. the image. Will have the same shape as the image.
num_classes()[source]

Determines the number of classes.

Returns: int The number of classes for which the model creates predictions.
predictions_and_gradient(image, label)[source]

Calculates predictions for an image and the gradient of the cross-entropy loss w.r.t. the image.

Parameters: image : numpy.ndarray Single input with shape as expected by the model (without the batch dimension). label : int Reference label used to calculate the gradient. predictions : numpy.ndarray Vector of predictions (logits, i.e. before the softmax) with shape (number of classes,). gradient : numpy.ndarray The gradient of the cross-entropy loss w.r.t. the image. Will have the same shape as the image.
class foolbox.models.LasagneModel(input_layer, logits_layer, bounds, channel_axis=1, preprocessing=(0, 1))[source]

Creates a Model instance from a Lasagne network.

Parameters: input_layer : lasagne.layers.Layer The input to the model. logits_layer : lasagne.layers.Layer The output of the model, before the softmax. bounds : tuple Tuple of lower and upper bound for the pixel values, usually (0, 1) or (0, 255). channel_axis : int The index of the axis that represents color channels. preprocessing: 2-element tuple with floats or numpy arrays Elementwises preprocessing of input; we first subtract the first element of preprocessing from the input and then divide the input by the second element.
backward(gradient, image)[source]

Backpropagates the gradient of some loss w.r.t. the logits through the network and returns the gradient of that loss w.r.t to the input image.

Parameters: gradient : numpy.ndarray Gradient of some loss w.r.t. the logits. image : numpy.ndarray Single input with shape as expected by the model (without the batch dimension). gradient : numpy.ndarray The gradient w.r.t the image.
batch_predictions(images)[source]

Calculates predictions for a batch of images.

Parameters: images : numpy.ndarray Batch of inputs with shape as expected by the model. numpy.ndarray Predictions (logits, i.e. before the softmax) with shape (batch size, number of classes).

predictions()

gradient(image, label)[source]

Calculates the gradient of the cross-entropy loss w.r.t. the image.

The default implementation calls predictions_and_gradient. Subclasses can provide more efficient implementations that only calculate the gradient.

Parameters: image : numpy.ndarray Single input with shape as expected by the model (without the batch dimension). label : int Reference label used to calculate the gradient. gradient : numpy.ndarray The gradient of the cross-entropy loss w.r.t. the image. Will have the same shape as the image.
num_classes()[source]

Determines the number of classes.

Returns: int The number of classes for which the model creates predictions.
predictions_and_gradient(image, label)[source]

Calculates predictions for an image and the gradient of the cross-entropy loss w.r.t. the image.

Parameters: image : numpy.ndarray Single input with shape as expected by the model (without the batch dimension). label : int Reference label used to calculate the gradient. predictions : numpy.ndarray Vector of predictions (logits, i.e. before the softmax) with shape (number of classes,). gradient : numpy.ndarray The gradient of the cross-entropy loss w.r.t. the image. Will have the same shape as the image.
class foolbox.models.MXNetModel(data, logits, args, ctx, num_classes, bounds, channel_axis=1, aux_states=None, preprocessing=(0, 1))[source]

Creates a Model instance from existing MXNet symbols and weights.

Parameters: data : mxnet.symbol.Variable The input to the model. logits : mxnet.symbol.Symbol The predictions of the model, before the softmax. args : dictionary mapping str to mxnet.nd.array The parameters of the model. ctx : mxnet.context.Context The device, e.g. mxnet.cpu() or mxnet.gpu(). num_classes : int The number of classes. bounds : tuple Tuple of lower and upper bound for the pixel values, usually (0, 1) or (0, 255). channel_axis : int The index of the axis that represents color channels. aux_states : dictionary mapping str to mxnet.nd.array The states of auxiliary parameters of the model. preprocessing: 2-element tuple with floats or numpy arrays Elementwises preprocessing of input; we first subtract the first element of preprocessing from the input and then divide the input by the second element.
backward(gradient, image)[source]

Backpropagates the gradient of some loss w.r.t. the logits through the network and returns the gradient of that loss w.r.t to the input image.

Parameters: gradient : numpy.ndarray Gradient of some loss w.r.t. the logits. image : numpy.ndarray Single input with shape as expected by the model (without the batch dimension). gradient : numpy.ndarray The gradient w.r.t the image.

gradient()

batch_predictions(images)[source]

Calculates predictions for a batch of images.

Parameters: images : numpy.ndarray Batch of inputs with shape as expected by the model. numpy.ndarray Predictions (logits, i.e. before the softmax) with shape (batch size, number of classes).

predictions()

num_classes()[source]

Determines the number of classes.

Returns: int The number of classes for which the model creates predictions.
predictions_and_gradient(image, label)[source]

Calculates predictions for an image and the gradient of the cross-entropy loss w.r.t. the image.

Parameters: image : numpy.ndarray Single input with shape as expected by the model (without the batch dimension). label : int Reference label used to calculate the gradient. predictions : numpy.ndarray Vector of predictions (logits, i.e. before the softmax) with shape (number of classes,). gradient : numpy.ndarray The gradient of the cross-entropy loss w.r.t. the image. Will have the same shape as the image.

gradient()

class foolbox.models.MXNetGluonModel(block, bounds, num_classes, ctx=None, channel_axis=1, preprocessing=(0, 1))[source]

Creates a Model instance from an existing MXNet Gluon Block.

Parameters: block : mxnet.gluon.Block The Gluon Block representing the model to be run. ctx : mxnet.context.Context The device, e.g. mxnet.cpu() or mxnet.gpu(). num_classes : int The number of classes. bounds : tuple Tuple of lower and upper bound for the pixel values, usually (0, 1) or (0, 255). channel_axis : int The index of the axis that represents color channels. preprocessing: 2-element tuple with floats or numpy arrays Elementwises preprocessing of input; we first subtract the first element of preprocessing from the input and then divide the input by the second element.
backward(gradient, image)[source]

Backpropagates the gradient of some loss w.r.t. the logits through the network and returns the gradient of that loss w.r.t to the input image.

Parameters: gradient : numpy.ndarray Gradient of some loss w.r.t. the logits. image : numpy.ndarray Single input with shape as expected by the model (without the batch dimension). gradient : numpy.ndarray The gradient w.r.t the image.

gradient()

batch_predictions(images)[source]

Calculates predictions for a batch of images.

Parameters: images : numpy.ndarray Batch of inputs with shape as expected by the model. numpy.ndarray Predictions (logits, i.e. before the softmax) with shape (batch size, number of classes).

predictions()

num_classes()[source]

Determines the number of classes.

Returns: int The number of classes for which the model creates predictions.
predictions_and_gradient(image, label)[source]

Calculates predictions for an image and the gradient of the cross-entropy loss w.r.t. the image.

Parameters: image : numpy.ndarray Single input with shape as expected by the model (without the batch dimension). label : int Reference label used to calculate the gradient. predictions : numpy.ndarray Vector of predictions (logits, i.e. before the softmax) with shape (number of classes,). gradient : numpy.ndarray The gradient of the cross-entropy loss w.r.t. the image. Will have the same shape as the image.

gradient()

class foolbox.models.ModelWrapper(model)[source]

Base class for models that wrap other models.

This base class can be used to implement model wrappers that turn models into new models, for example by preprocessing the input or modifying the gradient.

Parameters: model : Model The model that is wrapped.
batch_predictions(images)[source]

Calculates predictions for a batch of images.

Parameters: images : numpy.ndarray Batch of inputs with shape as expected by the model. numpy.ndarray Predictions (logits, i.e. before the softmax) with shape (batch size, number of classes).
num_classes()[source]

Determines the number of classes.

Returns: int The number of classes for which the model creates predictions.
predictions(image)[source]

Convenience method that calculates predictions for a single image.

Parameters: image : numpy.ndarray Single input with shape as expected by the model (without the batch dimension). numpy.ndarray Vector of predictions (logits, i.e. before the softmax) with shape (number of classes,).
class foolbox.models.DifferentiableModelWrapper(model)[source]

Base class for models that wrap other models and provide gradient methods.

This base class can be used to implement model wrappers that turn models into new models, for example by preprocessing the input or modifying the gradient.

Parameters: model : Model The model that is wrapped.
class foolbox.models.ModelWithoutGradients(model)[source]

Turns a model into a model without gradients.

class foolbox.models.ModelWithEstimatedGradients(model, gradient_estimator)[source]

Turns a model into a model with gradients estimated by the given gradient estimator.

Parameters: model : Model The model that is wrapped. gradient_estimator : callable Callable taking three arguments (pred_fn, image, label) and returning the estimated gradients. pred_fn will be the batch_predictions method of the wrapped model.
class foolbox.models.CompositeModel(forward_model, backward_model)[source]

Combines predictions of a (black-box) model with the gradient of a (substitute) model.

Parameters: forward_model : Model The model that should be fooled and will be used for predictions. backward_model : Model The model that provides the gradients.
backward(gradient, image)[source]

Backpropagates the gradient of some loss w.r.t. the logits through the network and returns the gradient of that loss w.r.t to the input image.

Parameters: gradient : numpy.ndarray Gradient of some loss w.r.t. the logits. image : numpy.ndarray Single input with shape as expected by the model (without the batch dimension). gradient : numpy.ndarray The gradient w.r.t the image.
batch_predictions(images)[source]

Calculates predictions for a batch of images.

Parameters: images : numpy.ndarray Batch of inputs with shape as expected by the model. numpy.ndarray Predictions (logits, i.e. before the softmax) with shape (batch size, number of classes).

predictions()

gradient(image, label)[source]

Calculates the gradient of the cross-entropy loss w.r.t. the image.

The default implementation calls predictions_and_gradient. Subclasses can provide more efficient implementations that only calculate the gradient.

Parameters: image : numpy.ndarray Single input with shape as expected by the model (without the batch dimension). label : int Reference label used to calculate the gradient. gradient : numpy.ndarray The gradient of the cross-entropy loss w.r.t. the image. Will have the same shape as the image.
num_classes()[source]

Determines the number of classes.

Returns: int The number of classes for which the model creates predictions.
predictions_and_gradient(image, label)[source]

Calculates predictions for an image and the gradient of the cross-entropy loss w.r.t. the image.

Parameters: image : numpy.ndarray Single input with shape as expected by the model (without the batch dimension). label : int Reference label used to calculate the gradient. predictions : numpy.ndarray Vector of predictions (logits, i.e. before the softmax) with shape (number of classes,). gradient : numpy.ndarray The gradient of the cross-entropy loss w.r.t. the image. Will have the same shape as the image.