E
- The type of the data entries.L
- The type of the labels for the data entries.public abstract class NadamSGD<E,L> extends Object
computeCost(double[], List)
which is ideally a differentiable, smooth,
convex function whose (global) minimum is to be found (if the function is not convex, the algorithm might converge to a local minimum),
getTrainingData(long)
which fetches batches the training data, resetTrainingDataReader()
which resets the data reader
to the beginning of the training data, and getTestData(long)
and resetTestDataReader()
which do the same for the test
data.
Nadam: http://cs229.stanford.edu/proj2015/054_report.pdfModifier and Type | Field and Description |
---|---|
protected long |
costCalculationBatchSize |
protected int |
epochs |
protected double |
epsilon |
protected static double |
EPSILON
The default constant used to avoid division by zero.
|
protected static double |
FIRST_MOMENT_DECAY_RATE
The default decay constant of the accumulated Nesterov momentum.
|
protected double |
firstMomentDecayRate |
protected double |
h |
protected static double |
H
The default constant employed in the numerical differentiation formulas used to derive the derivatives of the cost function.
|
protected Set<Integer> |
indicesToIgnore |
protected static double |
L1_REGULARIZATION_COEFF
The default L1 regularization coefficient.
|
protected double |
l1RegularizationCoeff |
protected static double |
L2_REGULARIZATION_COEFF
The default L2 regularization coefficient.
|
protected double |
l2RegularizationCoeff |
protected static double |
LEARNING_ANNEALING_RATE
The default factor by which the learning rate is multiplied after every epoch.
|
protected static double |
LEARNING_RATE
The default base learning rate.
|
protected double |
learningAnnealingRate |
protected double |
learningRate |
protected Logger |
logger |
protected double[] |
maxValues |
protected double[] |
minValues |
protected double[] |
parameters |
protected static double |
SECOND_MOMENT_DECAY_RATE
The default decay constant of the accumulated gradient squares.
|
protected double |
secondMomentDecayRate |
protected long |
trainingBatchSize |
Modifier | Constructor and Description |
---|---|
protected |
NadamSGD(double[] parameters,
double[] minValues,
double[] maxValues,
long trainingBatchSize,
long costCalculationBatchSize,
int epochs,
Double h,
Double baseLearningRate,
Double learningAnnealingRate,
Double firstMomentDecayRate,
Double secondMomentDecayRate,
Double l1RegularizationCoeff,
Double l2RegularizationCoeff,
Double epsilon,
Logger logger)
Constructs an instance according to the specified parameters.
|
Modifier and Type | Method and Description |
---|---|
protected abstract double |
computeCost(double[] parameters,
List<Map.Entry<E,L>> dataSample)
Calculates the costs associated with the given parameter set for the specified data sample.
|
protected abstract double[] |
computeGradient(double[] parameters,
List<Map.Entry<E,L>> dataSample)
Calculates the derivative of the cost function with respect to the parameters.
|
protected abstract List<Map.Entry<E,L>> |
getTestData(long batchSize)
Extracts a sample from the test data set and loads it into a list of key-value pairs where the key is the data and the value is the
ground truth.
|
protected abstract List<Map.Entry<E,L>> |
getTrainingData(long batchSize)
Extracts a sample from the training data set and loads it into a list of key-value pairs where the key is the data and the value is the
ground truth.
|
double[] |
optimize()
Optimizes the parameters and returns the set that is associated with the minimum of the cost function (whether it's a local or global
one depends on the convexity of the function).
|
protected abstract void |
resetTestDataReader()
Resets the test data reader enanbling the resampling of already sampled data points.
|
protected abstract void |
resetTrainingDataReader()
Resets the training data reader enanbling the resampling of already sampled data points.
|
protected boolean |
verifyGradient(List<Map.Entry<E,L>> dataSample,
double absTol,
double relTol)
Verifies the correctness of the symbolic gradient.
|
protected static final double H
protected static final double LEARNING_RATE
protected static final double LEARNING_ANNEALING_RATE
protected static final double FIRST_MOMENT_DECAY_RATE
protected static final double SECOND_MOMENT_DECAY_RATE
protected static final double L1_REGULARIZATION_COEFF
protected static final double L2_REGULARIZATION_COEFF
protected static final double EPSILON
protected final double[] parameters
protected final double[] minValues
protected final double[] maxValues
protected final double h
protected final double learningRate
protected final double learningAnnealingRate
protected final double firstMomentDecayRate
protected final double secondMomentDecayRate
protected final double l1RegularizationCoeff
protected final double l2RegularizationCoeff
protected final double epsilon
protected final long trainingBatchSize
protected final long costCalculationBatchSize
protected final int epochs
protected final Logger logger
protected NadamSGD(double[] parameters, double[] minValues, double[] maxValues, long trainingBatchSize, long costCalculationBatchSize, int epochs, Double h, Double baseLearningRate, Double learningAnnealingRate, Double firstMomentDecayRate, Double secondMomentDecayRate, Double l1RegularizationCoeff, Double l2RegularizationCoeff, Double epsilon, Logger logger) throws IllegalArgumentException
parameters
- The starting values of the parameters to optimize.minValues
- The minimum allowed values for the parameters. Each element corresponds to the element at the same index in the parameters
array. If the length of the array is greater than that of the parameters array, the extra elements will be ignored. If the length of the
array is smaller than that off the parameters array, the array will be extended by elements of the greatest negative double value to
match the length of the parameters array. If it is null, an array of elements of the greatest negative double value will be used. Each
element has to be smaller by at least the absolute value of h times two than the corresponding element in the maxValues array, else the
corresponding parameter will be ignored.maxValues
- The maximum allowed values for the parameters. Each element corresponds to the element at the same index in the parameters
array. If the length of the array is greater than that of the parameters array, the extra elements will be ignored. If the length of the
array is smaller than that off the parameters array, the array will be extended by elements of the greatest positive double value to
match the length of the parameters array. If it is null, an array of elements of the greatest positive double value will be used. Each
element has to be greater by at least the absolute value of h times two than the corresponding element in the minValues array, else the
corresponding parameter will be ignored.trainingBatchSize
- The number of samples in the mini-batches used for training.costCalculationBatchSize
- The number of samples in the batches used for calculating the total training and test costs. Using
batches allows for the calculation of costs over data sets that do not fit into memory. However, using small batches may incur a
significant IO overhead if the data source is in a file system.epochs
- The maximum number of iterations. If it is 0, the loop is endless.h
- A constant employed in the numerical differentiation formula used to derive the derivative of the cost function. If the the
function is smooth, usually the smaller it is, the more accurate the approximation of the derivatives will be. It should never be 0
nonetheless. If it is null, a default value of 1e-3 is used. However, the optimal value is highly application dependent (e.g. if the
cost function treats the parameters as integers, a value of less than 1 or any non-integer value whatsoever would make no sense), thus it
is recommended to provide a non-null value for h.baseLearningRate
- The base step size for the gradient descent. If it is null, the default base learning rate of 1.0 will be
used.learningAnnealingRate
- The factor by which the learning rate is multiplied after every epoch. If it is null, a default value of
0.95 is used.firstMomentDecayRate
- A constant that determines the base decay rate of the accumulated Nesterov momentum. If it is null, a
default value of 0.99 is used. The lower this value, the faster the decay. It is not recommended to change this value. However, if it
is changed, the new value has to be within the range of 0 (inclusive) and 1 (inclusive).secondMomentDecayRate
- A constant that determines the base decay rate of the accumulated gradient squares. If it is null, a default
value of 0.999 is used. The lower this value, the faster the decay. It is not recommended to change this value. However, if it is
changed, the new value has to be within the range of 0 (inclusive) and 1 (inclusive).l1RegularizationCoeff
- The coefficient to use for L1 parameter regularization, by default 0.l2RegularizationCoeff
- The coefficient to use for L2 parameter regularization, by default 0.epsilon
- A constant used to better condition the denominator when calculating the Root-Mean-Squares. If it is null, the default
value of 1e-8 will be used. It is not recommended to change this value.logger
- A logger to log the status of the optimization. If it is null, no logging is performed.IllegalArgumentException
- If parameters is null or its length is 0. If the decay rate is greater than 1 or smaller than 0. If an
element in minValues is greater than the respective element in maxValues.public double[] optimize()
protected boolean verifyGradient(List<Map.Entry<E,L>> dataSample, double absTol, double relTol)
dataSample
- The data batch for which the gradients are to be computed.absTol
- The maximum acceptable absolute difference between the symbolic gradient and the numerical gradient.relTol
- The maximum acceptable relative difference between the symbolic gradient and the numerical gradient.protected abstract void resetTrainingDataReader()
protected abstract void resetTestDataReader()
protected abstract List<Map.Entry<E,L>> getTrainingData(long batchSize)
resetTrainingDataReader()
method is called. If there
is no more training data left, an empty list should be returned. The list should never be null.batchSize
- The maximum number of entries the returned list is to have. It is never less than 1.protected abstract List<Map.Entry<E,L>> getTestData(long batchSize)
resetTestDataReader()
method is called. If there
is no more test data left, an empty list should be returned. The list should never be null.batchSize
- The maximum number of entries the returned list is to have. It is never less than 1.protected abstract double computeCost(double[] parameters, List<Map.Entry<E,L>> dataSample)
parameters
- An array of parameters.dataSample
- A list of the training data mapped to the correct labels on which the cost function is to be calculated.protected abstract double[] computeGradient(double[] parameters, List<Map.Entry<E,L>> dataSample)
parameters
- An array of parameters.dataSample
- A list of the training data mapped to the correct labels on which the cost function is to be calculated.Copyright © 2020. All rights reserved.