machine learning: in Markov decision processes, why is R0 omitted?

I am in the process of learning MDP, and a rather small thing is bothering me. Wherever I look, I see the order of things in this order:

$ S_ {0}, A_ {0}, R_ {1}, S_ {1}, A_ {1}, R_ {2}, ldots, A_ {t}, S_ {t}, R_ {t} $

My question is why $ R_ {0} $ be omitted?

test assistants: What is the problem with the learning algorithms used for automated theorem testers?

I've searched a lot on Google but I haven't found an automated problem solver / problem solver capable of solving problems like a human.

I am interested in a specific type of tester: learning, such as Support Vector Machines.

Support Vector Machines fails for other tasks such as recommendation systems due to data shortages, that is, in datasets where there are very few or even no cases for some higher order interactions (read parper Multi-view machines for more details).

Support vector machines are used for the selection of premises, but I have not found an explanation as to why it does not work so well.

Question: Do the learning algorithms used for theorem testers suffer from data shortages?

machine learning: is there a search platform that calculates indexes based on the semantics of the words in the text?

I want to store emails for my data science project and look for different phrases in my entire collection. The phrases I will look for may be different from the actual words, but you should always receive those emails in return.

What is the best platform to do this? I need a search database that calculates the indexes in an email based on semantics (consider stemmers, synonyms, etc.), elasticsearch or cloudsearch directly will not work.

Also, how effective is the FREETEXT function in SQL Server? Can the purpose serve?

machine learning – Interpretability of characteristic weights of the Gaussian process classifier

Suppose I trained a Gaussian process classifier with a linear core (using the GPML toolbox) and obtained some characteristic weights for each input characteristic.

My question is then:

Does / does it make sense to interpret the weights to indicate the importance of each characteristic in real life or to interpret at the group level the average over the weights of a group of characteristics?

machine learning: why do we need to take the derivative of the activation function in backward propagation?

I was reading this article here:

When he reaches the part where he calculates the loss in each node, he says to use the following formula:

"delta_0 = w . delta_1 . f'(z)
where the delta_0, w and f ’(z) values ​​are those of the same unit, while delta_1 is the loss of the unit on the other side of the weighted link."

AND $ f $ It is the activation function.

Then say:

"You can think of it this way, to get the loss of a node (for example, Z0), we multiply the value of its corresponding f & # 39; (z) by the loss of the node to which it is connected in the next layer (delta_1 ), by the weight of the link that connects both nodes. "

However, it doesn't really explain why we need the term derivative. Where does that term come from and why do we need it?

My idea so far is this:

The fact that the identity activation function makes the term disappear is a clue. The node does not feed the next exactly as it is, it depends on the activation function. When the activation function is identity, the loss in that node simply goes to the next one based on weight.
Basically, you only need to consider the activation function in some way, specifically in a way that doesn't matter when the identity is, and of course, the derivative is a way of doing it.

The problem is that this is not very rigorous, so I am looking for a more detailed explanation.

machine learning – Java neural network implementation

Recently I tried to control machine learning better from an implementation point, not statistics. I have read several explanations of an implementation of a neural network through a pseudocode and this is the result: a toy neural network.
I have used several sources from and (if it is necessary to list all sources, I will make an edition).

I originally created a naive custom Matrix class with matrix multiplication O (N ^ 3), but deleted to use ujmp.
I used the Matrix class for faster matrix multiplication, but due to my lack of understanding of how to use accelerations, I think

This is the final code. Please comment and suggest improvements! Thank you. I will include SGD, backpropagation, advance and calculation of mini lots. Backprop is private due to a wrap method called train.

The NetworkInput class is a container for attributes and a DenseMatrix tag.

All functions here are interfaces for activation functions, functions for evaluating test data and calculating losses and errors.

This is the SGD.

 * Provides an implementation of SGD for this neural network.
 * @param training  a Collections object with {@link NetworkInput }objects,
 *                  NetworkInput.getData() is the data, NetworkInput.getLabel()is the label.
 * @param test      a Collections object with {@link NetworkInput} objects,
 *                  NetworkInput.getData() is the data, NetworkInput.getLabel is the label.
 * @param epochs    how many iterations are we doing SGD for
 * @param batchSize how big is the batch size, typically 32. See
public void stochasticGradientDescent(@NotNull List training,
    @NotNull List test,
    int epochs,
    int batchSize) {

    int trDataSize = training.size();
    int teDataSize = test.size();

    for (int i = 0; i < epochs; i++) {
        // Randomize training sample.

        System.out.println("Calculating epoch: " + (i + 1) + ".");

        // Do backpropagation.
        for (int j = 0; j < trDataSize - batchSize; j += batchSize) {
            calculateMiniBatch(training.subList(j, j + batchSize));

        // Feed forward the test data
        List feedForwardData = this.feedForwardData(test);

        // Evaluate prediction with the interface EvaluationFunction.
        int correct = this.evaluationFunction.evaluatePrediction(feedForwardData).intValue();
        // Calculate loss with the interface ErrorFunction
        double loss = errorFunction.calculateCostFunction(feedForwardData);

        // Add the plotting data, x, y_1, y_2 to the global
        // lists of xValues, correctValues, lossValues.
        addPlotData(i, correct, loss);

        System.out.println("Loss: " + loss);
        System.out.println("Epoch " + (i + 1) + ": " + correct + "/" + teDataSize);

        // Lower learning rate each iteration?. Might implement? Don't know how to.
        // ADAM? Is that here? Are they different algorithms all together?
        // TODO: Implement Adam, RMSProp, Momentum?
        // this.learningRate = i % 10 == 0 ? this.learningRate / 4 : this.learningRate;


Here we calculate the mini lots and update our weights with an average.

private void calculateMiniBatch(List subList) {
    int size = subList.size();

    double scaleFactor = this.learningRate / size;

    DenseMatrix() dB = new DenseMatrix(this.totalLayers - 1);
    DenseMatrix() dW = new DenseMatrix(this.totalLayers - 1);
    for (int i = 0; i < this.totalLayers - 1; i++) {
        DenseMatrix bias = getBias(i);
        DenseMatrix weight = getWeight(i);
        dB(i) = Matrix.Factory.zeros(bias.getRowCount(), bias.getColumnCount());
        dW(i) = Matrix.Factory
            .zeros(weight.getRowCount(), weight.getColumnCount());

    for (NetworkInput data : subList) {
        DenseMatrix dataIn = data.getData();
        DenseMatrix label = data.getLabel();
        List deltas = backPropagate(dataIn, label);
        DenseMatrix() deltaB = deltas.get(0);
        DenseMatrix() deltaW = deltas.get(1);

        for (int j = 0; j < this.totalLayers - 1; j++) {
            dB(j) = (DenseMatrix) dB(j).plus(deltaB(j));
            dW(j) = (DenseMatrix) dW(j).plus(deltaW(j));

    for (int i = 0; i < this.totalLayers - 1; i++) {
        DenseMatrix cW = getWeight(i);
        DenseMatrix cB = getBias(i);

        DenseMatrix scaledDeltaB = (DenseMatrix) dB(i).times(scaleFactor);
        DenseMatrix scaledDeltaW = (DenseMatrix) dW(i).times(scaleFactor);

        DenseMatrix nW = (DenseMatrix) cW.minus(scaledDeltaW);
        DenseMatrix nB = (DenseMatrix) cB.minus(scaledDeltaB);

        setWeight(i, nW);
        setLayerBias(i, nB);

This is the backward propagation algorithm.

private List backPropagate(DenseMatrix toPredict, DenseMatrix correct) {
    List totalDeltas = new ArrayList<>();

    DenseMatrix() weights = getWeights();
    DenseMatrix() biases = getBiasesAsMatrices();

    DenseMatrix() deltaBiases = this.initializeDeltas(biases);
    DenseMatrix() deltaWeights = this.initializeDeltas(weights);

    // Perform Feed Forward here...
    List activations = new ArrayList<>();
    List xVector = new ArrayList<>();

    // Alters all arrays and lists.
    this.backPropFeedForward(toPredict, activations, xVector, weights, biases);
    // End feedforward

    // Calculate error signal for last layer
    DenseMatrix deltaError;

    // Applies the error function to the last layer, create
    deltaError = errorFunction
        .applyErrorFunctionGradient(activations.get(activations.size() - 1), correct);

    // Set the deltas to the error signals of bias and weight.
    deltaBiases(deltaBiases.length - 1) = deltaError;
    deltaWeights(deltaWeights.length - 1) = (DenseMatrix) deltaError
        .mtimes(activations.get(activations.size() - 2).transpose());

    // Now iteratively apply the rule
    for (int k = deltaBiases.length - 2; k >= 0; k--) {
        DenseMatrix z = xVector.get(k);
        DenseMatrix differentiate = functions(k + 1).applyDerivative(z);

        deltaError = (DenseMatrix) weights(k + 1).transpose().mtimes(deltaError)

        deltaBiases(k) = deltaError;
        deltaWeights(k) = (DenseMatrix) deltaError.mtimes(activations.get(k).transpose());


    return totalDeltas;

I forgot to include the advance algorithm.

private void backPropFeedForward(DenseMatrix starter, List actives,
    List vectors,
    DenseMatrix() weights, DenseMatrix() biases) {
    DenseMatrix toPredict = starter;
    actives.add(Matrix.Factory.zeros(starter.getRowCount(), starter.getColumnCount()));
    for (int i = 0; i < getTotalLayers() - 1; i++) {
        DenseMatrix x = (DenseMatrix) weights(i).mtimes(toPredict).plus(biases(i));

        toPredict = this.functions(i + 1).applyFunction(x);

Typed learning before Javascript

With a background consisting mainly of Java and other OOP programming languages, is it recommended to learn typing before javascript, or should one first go deeper into javascript before typing?

Learning – What makes this Eggleston image great?

First, as @rfusca suggested more or less in the comments, much of this answer is quite general, not really about this particular image. Second, much more or less represents the end of the cynical's point of view. I am not sure that it is quite true, but I am convinced that some elements also have a good amount of validity.

I will position that at least 95% of art critics are basically over-educated morons. They studied art enough to know the names of the schools and the prominent members (and often at least some obscure ones) of each, but they have still memorized much more than understanding.

I will also postulate that many are (for various reasons) angry with the world (or more willing than most to eliminate perfectly normal levels of anger in the world), and use "art" as their way of getting even. Art is the ideal medium for this, because everything is a matter of taste. They claim that a "piece" is "high art", and anyone who has the nerve to point out that the emperor is naked is obviously a cretin.

However, to maximize the effectiveness of this tactic, what they advocate is mostly mediocre to poor. After all, even Ordinary People can admire images (sculptures, whatever) that are honestly worth seeing. Anyone can see that a sunset is beautiful. Only a true connoisseur can recognize the greatness of a dirty white canvas with a mostly painted black square, and those who do not look long enough to realize that the upper right corner had red under the black are obviously too blind to May your opinions matter at all!

Looking more specifically at this image, it clearly reminds me of the first dozen images I took after obtaining a wide-angle lens (24 mm in full frame). Exaggerated perspective is fun for a while. If I had to guess, I would say that this was probably taken when the widest angle that most people had was 28 mm, and this is wider enough to be noticeable (probably 20 or 24 mm).

For what it's worth, I think the same general trend continues much more recently: although I have sold very little archive photography, my biggest sellers were the ones I took right after I got an 11-18mm lens. I am pretty sure that most of them were simply the first images on the market of those particular subjects with that wide lens. Frankly, I was a little worried at the time to send those photos, none of them was (IMO) particularly great, either technically or artistically, but they gave a look and a point of view that was unique at that time, and that apparently was enough .

What are the basic principles of CS that I should know before starting my journey towards machine learning?

I am a non-cs graduate and I would love to be a machine learning engineer.

I have learned to code and know the basics of Machine learning as well. Now I would like to know what "basics of CS"I should learn to be fully prepared for work.

Sometimes I have difficulty reading CS documentation and I don't know how programs and computers work in the background, I'm also naive on topics like memory management, operating systems, networks, electronic elements such as microprocessor, compiler design, etc.. Are all these necessary for my transition to AI? If so, could you recommend me a short learning path or books or videos. I hope I don't have to delve into these areas. Thank you

dnd 5e – How can I help my players invest more in learning the mechanics of the game?

Many people do not want to learn things that will not benefit them. An easy way to show them the benefits without spooning them is to make the NPCs do what you want the players to learn.

For example, make the NPC fight against a group of goblins. Make the goblins force players to flee or relocate constantly. Players will quickly learn that being hit by attacks of opportunity stinks. Make the elves constantly run away and relocate, but have them always disengage to avoid being hit. Be open about the way elves behave; "This elf takes the disconnection action to avoid firing attacks of opportunity, and then moves here."

Present a situation where mechanics is important.

Often, in D&D players will face matches appropriate to their skill level. This prevents players from learning. You should consider dealing with encounters that cannot exceed your current skill level. Give them a match they won't win unless they use the mechanics you want them to learn. Just be sure to show them the mechanics with the NPCs first so they can tell.

Give them the opportunity to discover and apply knowledge. You will be surprised, they can understand the mechanics but they don't think it is useful because the fights are too easy or it is more fun to do something else.

Withdrawal is a great example of this problem. Most classes have to abandon an attack to disconnect, so I think many players will think "well, I can attack and try to finish the fight sooner instead of running and being hit the next turn anyway."

Accept the weaknesses of the player.

Keep in mind that learning can be a slow process. You can take several fights against different species and classes to get the idea across.

It is possible that even with an incentive your players do not want to learn. Okay, enjoy other parts of the game and your combat will suffer. There are many people who play D&D for combat and are bad role players. Does it really matter if the party is not great in combat?