NadamSGD (DETROID 1.0.1 API)

Object
- NadamSGD<E,L>

Type Parameters:

E - The type of the data entries.

L - The type of the labels for the data entries.

Direct Known Subclasses:

TexelOptimizer
```
public abstract class NadamSGD<E,L>
extends Object
```
An adaptive stochastic gradient descent implementation for supervised learning. It is based on the Nadam (Nesterov-accelerated adaptive moment estimation) algorithm. It can be employed as a stochastic, mini-batch, or standard batch gradient descent algorithm. There are five abstract methods to implement for its subclasses, computeCost(double[], List) which is ideally a differentiable, smooth, convex function whose (global) minimum is to be found (if the function is not convex, the algorithm might converge to a local minimum), getTrainingData(long) which fetches batches the training data, resetTrainingDataReader() which resets the data reader to the beginning of the training data, and getTestData(long) and resetTestDataReader() which do the same for the test data. Nadam: http://cs229.stanford.edu/proj2015/054_report.pdf

Author:

Viktor

Field Summary

Fields
Modifier and Type	Field and Description
`protected long`	`costCalculationBatchSize`
`protected int`	`epochs`
`protected double`	`epsilon`
`protected static double`	`EPSILON` The default constant used to avoid division by zero.
`protected static double`	`FIRST_MOMENT_DECAY_RATE` The default decay constant of the accumulated Nesterov momentum.
`protected double`	`firstMomentDecayRate`
`protected double`	`h`
`protected static double`	`H` The default constant employed in the numerical differentiation formulas used to derive the derivatives of the cost function.
`protected Set<Integer>`	`indicesToIgnore`
`protected static double`	`L1_REGULARIZATION_COEFF` The default L1 regularization coefficient.
`protected double`	`l1RegularizationCoeff`
`protected static double`	`L2_REGULARIZATION_COEFF` The default L2 regularization coefficient.
`protected double`	`l2RegularizationCoeff`
`protected static double`	`LEARNING_ANNEALING_RATE` The default factor by which the learning rate is multiplied after every epoch.
`protected static double`	`LEARNING_RATE` The default base learning rate.
`protected double`	`learningAnnealingRate`
`protected double`	`learningRate`
`protected Logger`	`logger`
`protected double[]`	`maxValues`
`protected double[]`	`minValues`
`protected double[]`	`parameters`
`protected static double`	`SECOND_MOMENT_DECAY_RATE` The default decay constant of the accumulated gradient squares.
`protected double`	`secondMomentDecayRate`
`protected long`	`trainingBatchSize`

Constructor Summary

Constructors
Modifier	Constructor and Description
`protected`	`NadamSGD(double[] parameters, double[] minValues, double[] maxValues, long trainingBatchSize, long costCalculationBatchSize, int epochs, Double h, Double baseLearningRate, Double learningAnnealingRate, Double firstMomentDecayRate, Double secondMomentDecayRate, Double l1RegularizationCoeff, Double l2RegularizationCoeff, Double epsilon, Logger logger)` Constructs an instance according to the specified parameters.

Method Summary

All Methods Instance Methods Abstract Methods Concrete Methods
Modifier and Type	Method and Description
`protected abstract double`	`computeCost(double[] parameters, List<Map.Entry<E,L>> dataSample)` Calculates the costs associated with the given parameter set for the specified data sample.
`protected abstract double[]`	`computeGradient(double[] parameters, List<Map.Entry<E,L>> dataSample)` Calculates the derivative of the cost function with respect to the parameters.
`protected abstract List<Map.Entry<E,L>>`	`getTestData(long batchSize)` Extracts a sample from the test data set and loads it into a list of key-value pairs where the key is the data and the value is the ground truth.
`protected abstract List<Map.Entry<E,L>>`	`getTrainingData(long batchSize)` Extracts a sample from the training data set and loads it into a list of key-value pairs where the key is the data and the value is the ground truth.
`double[]`	`optimize()` Optimizes the parameters and returns the set that is associated with the minimum of the cost function (whether it's a local or global one depends on the convexity of the function).
`protected abstract void`	`resetTestDataReader()` Resets the test data reader enanbling the resampling of already sampled data points.
`protected abstract void`	`resetTrainingDataReader()` Resets the training data reader enanbling the resampling of already sampled data points.
`protected boolean`	`verifyGradient(List<Map.Entry<E,L>> dataSample, double absTol, double relTol)` Verifies the correctness of the symbolic gradient.

Methods inherited from class Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - H
```
protected static final double H
```
    The default constant employed in the numerical differentiation formulas used to derive the derivatives of the cost function.
    
    See Also:
    
    Constant Field Values
  - LEARNING_RATE
```
protected static final double LEARNING_RATE
```
    The default base learning rate.
    
    See Also:
    
    Constant Field Values
  - LEARNING_ANNEALING_RATE
```
protected static final double LEARNING_ANNEALING_RATE
```
    The default factor by which the learning rate is multiplied after every epoch.
    
    See Also:
    
    Constant Field Values
  - FIRST_MOMENT_DECAY_RATE
```
protected static final double FIRST_MOMENT_DECAY_RATE
```
    The default decay constant of the accumulated Nesterov momentum.
    
    See Also:
    
    Constant Field Values
  - SECOND_MOMENT_DECAY_RATE
```
protected static final double SECOND_MOMENT_DECAY_RATE
```
    The default decay constant of the accumulated gradient squares.
    
    See Also:
    
    Constant Field Values
  - L1_REGULARIZATION_COEFF
```
protected static final double L1_REGULARIZATION_COEFF
```
    The default L1 regularization coefficient.
    
    See Also:
    
    Constant Field Values
  - L2_REGULARIZATION_COEFF
```
protected static final double L2_REGULARIZATION_COEFF
```
    The default L2 regularization coefficient.
    
    See Also:
    
    Constant Field Values
  - EPSILON
```
protected static final double EPSILON
```
    The default constant used to avoid division by zero.
    
    See Also:
    
    Constant Field Values
  - parameters
```
protected final double[] parameters
```
  - minValues
```
protected final double[] minValues
```
  - maxValues
```
protected final double[] maxValues
```
  - indicesToIgnore
```
protected final Set<Integer> indicesToIgnore
```
  - h
```
protected final double h
```
  - learningRate
```
protected final double learningRate
```
  - learningAnnealingRate
```
protected final double learningAnnealingRate
```
  - firstMomentDecayRate
```
protected final double firstMomentDecayRate
```
  - secondMomentDecayRate
```
protected final double secondMomentDecayRate
```
  - l1RegularizationCoeff
```
protected final double l1RegularizationCoeff
```
  - l2RegularizationCoeff
```
protected final double l2RegularizationCoeff
```
  - epsilon
```
protected final double epsilon
```
  - trainingBatchSize
```
protected final long trainingBatchSize
```
  - costCalculationBatchSize
```
protected final long costCalculationBatchSize
```
  - epochs
```
protected final int epochs
```
  - logger
```
protected final Logger logger
```
- Constructor Detail
  - NadamSGD
```
protected NadamSGD(double[] parameters,
                   double[] minValues,
                   double[] maxValues,
                   long trainingBatchSize,
                   long costCalculationBatchSize,
                   int epochs,
                   Double h,
                   Double baseLearningRate,
                   Double learningAnnealingRate,
                   Double firstMomentDecayRate,
                   Double secondMomentDecayRate,
                   Double l1RegularizationCoeff,
                   Double l2RegularizationCoeff,
                   Double epsilon,
                   Logger logger)
            throws IllegalArgumentException
```
    Constructs an instance according to the specified parameters.
    
    Parameters:
    
    parameters - The starting values of the parameters to optimize.
    
    minValues - The minimum allowed values for the parameters. Each element corresponds to the element at the same index in the parameters array. If the length of the array is greater than that of the parameters array, the extra elements will be ignored. If the length of the array is smaller than that off the parameters array, the array will be extended by elements of the greatest negative double value to match the length of the parameters array. If it is null, an array of elements of the greatest negative double value will be used. Each element has to be smaller by at least the absolute value of h times two than the corresponding element in the maxValues array, else the corresponding parameter will be ignored.
    
    maxValues - The maximum allowed values for the parameters. Each element corresponds to the element at the same index in the parameters array. If the length of the array is greater than that of the parameters array, the extra elements will be ignored. If the length of the array is smaller than that off the parameters array, the array will be extended by elements of the greatest positive double value to match the length of the parameters array. If it is null, an array of elements of the greatest positive double value will be used. Each element has to be greater by at least the absolute value of h times two than the corresponding element in the minValues array, else the corresponding parameter will be ignored.
    
    trainingBatchSize - The number of samples in the mini-batches used for training.
    
    costCalculationBatchSize - The number of samples in the batches used for calculating the total training and test costs. Using batches allows for the calculation of costs over data sets that do not fit into memory. However, using small batches may incur a significant IO overhead if the data source is in a file system.
    
    epochs - The maximum number of iterations. If it is 0, the loop is endless.
    
    h - A constant employed in the numerical differentiation formula used to derive the derivative of the cost function. If the the function is smooth, usually the smaller it is, the more accurate the approximation of the derivatives will be. It should never be 0 nonetheless. If it is null, a default value of 1e-3 is used. However, the optimal value is highly application dependent (e.g. if the cost function treats the parameters as integers, a value of less than 1 or any non-integer value whatsoever would make no sense), thus it is recommended to provide a non-null value for h.
    
    baseLearningRate - The base step size for the gradient descent. If it is null, the default base learning rate of 1.0 will be used.
    
    learningAnnealingRate - The factor by which the learning rate is multiplied after every epoch. If it is null, a default value of 0.95 is used.
    
    firstMomentDecayRate - A constant that determines the base decay rate of the accumulated Nesterov momentum. If it is null, a default value of 0.99 is used. The lower this value, the faster the decay. It is not recommended to change this value. However, if it is changed, the new value has to be within the range of 0 (inclusive) and 1 (inclusive).
    
    secondMomentDecayRate - A constant that determines the base decay rate of the accumulated gradient squares. If it is null, a default value of 0.999 is used. The lower this value, the faster the decay. It is not recommended to change this value. However, if it is changed, the new value has to be within the range of 0 (inclusive) and 1 (inclusive).
    
    l1RegularizationCoeff - The coefficient to use for L1 parameter regularization, by default 0.
    
    l2RegularizationCoeff - The coefficient to use for L2 parameter regularization, by default 0.
    
    epsilon - A constant used to better condition the denominator when calculating the Root-Mean-Squares. If it is null, the default value of 1e-8 will be used. It is not recommended to change this value.
    
    logger - A logger to log the status of the optimization. If it is null, no logging is performed.
    
    Throws:
    
    IllegalArgumentException - If parameters is null or its length is 0. If the decay rate is greater than 1 or smaller than 0. If an element in minValues is greater than the respective element in maxValues.
- Method Detail
  - optimize
```
public double[] optimize()
```
    Optimizes the parameters and returns the set that is associated with the minimum of the cost function (whether it's a local or global one depends on the convexity of the function).
    
    Returns:
    
    The optimal parameter set.
  - verifyGradient
```
protected boolean verifyGradient(List<Map.Entry<E,L>> dataSample,
                                 double absTol,
                                 double relTol)
```
    Verifies the correctness of the symbolic gradient.
    
    Parameters:
    
    dataSample - The data batch for which the gradients are to be computed.
    
    absTol - The maximum acceptable absolute difference between the symbolic gradient and the numerical gradient.
    
    relTol - The maximum acceptable relative difference between the symbolic gradient and the numerical gradient.
    
    Returns:
    
    Whether the symbolic and the numerical gradients are sufficiently close.
  - resetTrainingDataReader
```
protected abstract void resetTrainingDataReader()
```
    Resets the training data reader enanbling the resampling of already sampled data points. E.g. if the data provider reads the data from a file line by line, the invocation of this method should set the file stream back to the first line.
  - resetTestDataReader
```
protected abstract void resetTestDataReader()
```
    Resets the test data reader enanbling the resampling of already sampled data points. E.g. if the data provider reads the data from a file line by line, the invocation of this method should set the file stream back to the first line.
  - getTrainingData
```
protected abstract List<Map.Entry<E,L>> getTrainingData(long batchSize)
```
    Extracts a sample from the training data set and loads it into a list of key-value pairs where the key is the data and the value is the ground truth. It should never extract the same data point twice until the resetTrainingDataReader() method is called. If there is no more training data left, an empty list should be returned. The list should never be null.
    
    Parameters:
    
    batchSize - The maximum number of entries the returned list is to have. It is never less than 1.
    
    Returns:
    
    A list holding the training observation-label pairs.
  - getTestData
```
protected abstract List<Map.Entry<E,L>> getTestData(long batchSize)
```
    Extracts a sample from the test data set and loads it into a list of key-value pairs where the key is the data and the value is the ground truth. It should never extract the same data point twice until the resetTestDataReader() method is called. If there is no more test data left, an empty list should be returned. The list should never be null.
    
    Parameters:
    
    batchSize - The maximum number of entries the returned list is to have. It is never less than 1.
    
    Returns:
    
    A list holding the test observation-label pairs.
  - computeCost
```
protected abstract double computeCost(double[] parameters,
                                      List<Map.Entry<E,L>> dataSample)
```
    Calculates the costs associated with the given parameter set for the specified data sample. The better the system performs, the lower the costs should be. Ideally, the cost function is differentiable, smooth, and convex, but these are not requirements. The cost should also not be averaged, the optimizer ensures that the costs are independent of the batch size.
    
    Parameters:
    
    parameters - An array of parameters.
    
    dataSample - A list of the training data mapped to the correct labels on which the cost function is to be calculated.
    
    Returns:
    
    The cost associated with the given parameters.
  - computeGradient
```
protected abstract double[] computeGradient(double[] parameters,
                                            List<Map.Entry<E,L>> dataSample)
```
    Calculates the derivative of the cost function with respect to the parameters.
    
    Parameters:
    
    parameters - An array of parameters.
    
    dataSample - A list of the training data mapped to the correct labels on which the cost function is to be calculated.
    
    Returns:
    
    The gradient of the parameters.

Class NadamSGD<E,L>

Field Summary

Constructor Summary

Method Summary

Methods inherited from class Object

Field Detail

H

LEARNING_RATE

LEARNING_ANNEALING_RATE

FIRST_MOMENT_DECAY_RATE

SECOND_MOMENT_DECAY_RATE

L1_REGULARIZATION_COEFF

L2_REGULARIZATION_COEFF

EPSILON

parameters

minValues

maxValues

indicesToIgnore

h

learningRate

learningAnnealingRate

firstMomentDecayRate

secondMomentDecayRate

l1RegularizationCoeff

l2RegularizationCoeff

epsilon

trainingBatchSize

costCalculationBatchSize

epochs

logger

Constructor Detail

NadamSGD

Method Detail

optimize

verifyGradient

resetTrainingDataReader

resetTestDataReader

getTrainingData

getTestData

computeCost

computeGradient