machine learning – Modifying the code for reading .ogg datasets and apply LSTM

Deep learning/LSTM/Matlab

There is a Matlab code that is doing the following steps for deep learning and applying LSTM, I need to change first three steps to use our dataset to train this model and you don’t need to change other.

I need to apply that for .ogg audio files so Create and Use some audio files with .ogg format as sample data and give me the code.

The following steps is for your information:

Three classes of audio signals are generated and labeled as ‘white’, ‘brown’, and ‘pink’. Each class has 1000 samples.

800 samples from each class are used as the training samples to train the deep neural network, so total 800*3=2400 samples in the training dataset. Their labels are their class names ‘white’, ‘brown’, and ‘pink’. (Lines 29 and 30)

200 samples from each class are used as the validation samples to test the performance of deep neural network, so total 600 samples in the validation dataset. Their labels are their class names ‘white’, ‘brown’, and ‘pink’ (Lines 32 and 33)

Extract features from the training dataset and validation dataset.

  • define the structure of the neural network model (LSTM)
  • set training options
  • train the model iteratively using the training dataset and test the model using the validation dataset every iteration.
  • finish training and get the trained model.
  • generate test dataset and use the trained model to classify the test dataset into three classes, ‘white’, ‘brown’, and ‘pink’


fs = 44.1e3;
duration = 0.5;
N = duration*fs;
wNoise = 2*rand((N,1000)) - 1;
wLabels = repelem(categorical("white"),1000,1);
bNoise = filter(1,(1,-0.999),wNoise);
bNoise = bNoise./max(abs(bNoise),(),'all');
bLabels = repelem(categorical("brown"),1000,1);
pNoise = pinknoise((N,1000));
pLabels = repelem(categorical("pink"),1000,1)
title('White Noise')
title('Brown Noise')
title('Pink Noise')
featuresTrain = extract(aFE,audioTrain);
(numHopsPerSequence,numFeatures,numSignals) = size(featuresTrain)
audioTrain = (wNoise(:,1:800),bNoise(:,1:800),pNoise(:,1:800));
labelsTrain = (wLabels(1:800);bLabels(1:800);pLabels(1:800));
audioValidation = (wNoise(:,801:end),bNoise(:,801:end),pNoise(:,801:end));
labelsValidation = (wLabels(801:end);bLabels(801:end);pLabels(801:end));
aFE = audioFeatureExtractor("SampleRate",fs, ...
"SpectralDescriptorInput","melSpectrum", ...
"spectralCentroid",true, ...
featuresTrain = permute(featuresTrain,(2,1,3));
featuresTrain = squeeze(num2cell(featuresTrain,(1,2)));
numSignals = numel(featuresTrain)
(numFeatures,numHopsPerSequence) = size(featuresTrain{1})
featuresValidation = extract(aFE,audioValidation);
featuresValidation = permute(featuresValidation,(2,1,3));
featuresValidation = squeeze(num2cell(featuresValidation,(1,2)));
layers = ( ...
options = trainingOptions("adam", ...
"Shuffle","every-epoch", ...
"ValidationData",{featuresValidation,labelsValidation}, ...
"Plots","training-progress", ...
net = trainNetwork(featuresTrain,labelsTrain,layers,options);
wNoiseTest = 2*rand((N,1)) - 1;
bNoiseTest = filter(1,(1,-0.999),wNoiseTest);
bNoiseTest= bNoiseTest./max(abs(bNoiseTest),(),'all');
pNoiseTest = pinknoise(N);

python – LSTM Model – Validation Accuracy is not changing

I am working on classification problem, My input data is labels and output expected data is labels
I have made X, Y pairs by shifting the X and Y is changed to the categorical value

    X   Y
    2   1.0
    1   2.0
    1   1.0
    2   1.0
    2   2.0
encoder = LabelEncoder()
test_labels = to_categorical(encoder.fit_transform(values(:,1)),num_classes=3)
train_X,test_X,train_y,test_y= train_test_split(values(:,0), test_labels,test_size = 0.30,random_state = 42)


(154076, 3)
(66033, 3)
Converting this to LSTM format

train_X = train_X.reshape(train_X.shape(0),1,1)
test_X = test_X.reshape(test_X.shape(0),1,1)

# configure network
n_batch = 1
n_epoch = 10
n_neurons = 100



model = tf.keras.models.Sequential((
    tf.keras.layers.LSTM(n_neurons, batch_input_shape=(n_batch, train_X.shape(1),train_X.shape(2)), stateful=True),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(100, activation = 'relu',kernel_regularizer=regularizers.l2(0.0001)),
    tf.keras.layers.Dense(3, activation='softmax')

history =,train_y,validation_data=(test_X, test_y),epochs=n_epoch, batch_size=n_batch, verbose=1,shuffle= False)

Validation Accuracy is not Changing

Epoch 1/5
154076/154076 (==============================) - 356s 2ms/step - loss: 1.0844 - acc: 0.4269 - val_loss: 1.0814 - val_acc: 0.4310
Epoch 2/5
154076/154076 (==============================) - 354s 2ms/step - loss: 1.0853 - acc: 0.4256 - val_loss: 1.0813 - val_acc: 0.4310
Epoch 3/5
154076/154076 (==============================) - 355s 2ms/step - loss: 1.0861 - acc: 0.4246 - val_loss: 1.0814 - val_acc: 0.4310
Epoch 4/5
154076/154076 (==============================) - 356s 2ms/step - loss: 1.0874 - acc: 0.4228 - val_loss: 1.0825 - val_acc: 0.4310
Epoch 5/5
154076/154076 (==============================) - 353s 2ms/step - loss: 1.0887 - acc: 0.4208 - val_loss: 1.0828 - val_acc: 0.4310

What can be the changes to improve the model.

time series: LSTM cannot predict well when the basic truth is close to zero

While training the LSTM model, I ran into a problem that I couldn't solve. To start, let me describe my model: I used a Pytorch stacked LSTM model with 3 layers, 256 hidden units in each layer to predict human joint angles and pairs from EMG features. After training, the model can predict well when the fundamental truth is far from 0, but when the fundamental truth is near zero, there is always a shift between the predicted value and the fundamental truth. I suppose the reason would be that the great value of basic truth will have more impact during the training process to reduce the loss function. You can see the result here

I have tried different loss functions but the situation did not improve. Since I am a beginner in this field, I hope someone can point out the problem in my method and how to solve it. Thank you!

python – How to train LSTM with several time series of different sizes and multiple variables?

Hi, I'm trying to train an LSTM with data from flight messages, the messages have the following fields:
Flight id = 2, Timestamp = Time dial in s, altitude = xxx, latitude = xxxx, longitude = xxxx, speed = xxxx, airport Origin = xxxx, Destination = xxxx, Landing = (time stamp).

It should be noted that this means that there are different idVuelo therefore for each idVuelo there is a time series, the fields are the same for all.

What I want to do is to use all the series to give a series without finishing, for example, lacking 15 minutes of landing, I can predict the Landing field.

Because of how LSTMs work and given my problem I think I should use the whole series instead of x steps back to predict the series I would give it. For now what I have done is this:

I have changed all the variables to numeric, then following an LSTM book applied to time series, they recommend adapting the dataset to be a supervised ML problem:

from sklearn.preprocessing import MinMaxScaler
# Form dataset matrix
def create_dataset(dataset, previous=1):
    dataX, dataY = (), ()
    for i in range(len(dataset)-previous-1):
        a = dataset(i:(i+previous), 0)
        dataY.append(dataset(i + previous, 0))
    return np.array(dataX), np.array(dataY)

I have divided the dataset into test and train


It corresponds to a 70/30.

Here comes the tricky part, when it comes to training, what I understand is that you have to train each series independently and incrementally to the model, and that to use the entire series what I do is that the previous one has the length of the series-10 points. All of these are assumptions and I don't know if I'm doing it correctly.

Nmodel = 0

for name, group in dftrain.groupby('idVuelo'):
    seleccionX = group
    scaler = MinMaxScaler(feature_range=(0, 1))
    train = scaler.fit_transform(seleccionX)
    #Select full time series with len-10
    previous = len(group)-10
    X_train, Y_train = create_dataset(train, previous)
    X_train = np.reshape(X_train, (X_train.shape(0), 1, X_train.shape(1)))
    model = tf.keras.Sequential()
    model.add(tf.keras.layers.LSTM(50, input_shape=(1, previous)))
    model.compile(loss='mse', optimizer='adam'), Y_train, epochs=10, batch_size=128, verbose=2)
    modelosRestantes = modelosRestantes +1
    if((Nmodel%5) == 0):
        print("Model Saved: ", Nmodel)
    print("Ramaining series: ", Nmodel ,"de 367")

I do not know if this will give me problems later when making the prediction, because I am not sure if I am predicting the landing variable, because it is supposed to correspond to the value and I suppose, any help is welcome!

Thank you!

javascript: What would be the best approach to normalize data for an LSTM model (using Tensorflow) with this wide range of values?

I am new to machine learning, so I am still trying to understand the concepts, keep this in mind if my question may not be as concise as necessary.

I am building a Tensorflow JS model with LSTM layers for time series prediction (RNN).

The data set used is applied every hundreds of milliseconds (at random intervals). However, the data produced can come in very wide ranges, e.g. Most of the data received will be of value 20, 40, 45, etc. However, sometimes this value will reach 75,000 at the end.

Therefore, the data range is from 1 to 75,000.

When I normalize this data using a standard min / max method to produce a value between 0-1, the normalized data for most data requests will be in many small and significant decimals. for example: & # 39; 0.0038939328722009236 & # 39;

So my questions are:

1) Is this minimum / maximum the best approach to normalize this type of data?

2) Will the RNN model work with so many significant decimals and precision?

3) Should I also normalize the output tag? (of which there will be 1 exit)

python – Tensorflow, CNN + LSTM: An easier way to reuse CNN?

I am using tensorflow 1.15 to train an LSTM in 2D images sequentially in time. So I actually have the following (Input (3x) -> CNN -> LSTM -> Output). As I am training in several images, I want to apply the complete CNN subgraph to each image (I would like to reuse all the weights since the current frame is not different in terms of image of the 1-X frames), then feed all the outputs to the LSTM.

My current code I am using 3 data boxes, so I use a loop to create 3 placeholders (Input0 / Input1 / Input2) and 3 CNN (share weights by calling tf.Variable out of the loop, although tf.get_variable could simplify that a little).

Is there a simpler way to express in tensorflow I have a subgraph that I want to call & # 39; CNN & # 39; and I use it with X placeholders, then feed all those outputs to something like LSTM?

Neural networks: how does the forgotten layer of an LSTM work?

Can anyone explain the mathematical intuition behind the forgetting layer of an LSTM?

So, as far as I understand, the state of the cell is essentially an embedding of long-term memory (correct me if I am wrong), but I also suppose it is an array. Then, the forgetting vector is calculated by concatenating the previous hidden state and the current input and adding the bias, then putting it through a sigmoid function that generates a vector that is then multiplied by the state matrix of the cell.

How does a concatenation of the hidden state of the previous entry and the current entry help with bias with what to forget?

Why is the previous hidden state, the current input and the bias put into a sigmoid function? Is there any special feature of a sigmoid that creates a vector of important inlays?

I would really like to understand the theory behind the calculation of cell states and hidden states. Most people tell me to treat it like a black box, but I think that, in order to have a successful application of LSTM to a problem, I need to know what happens under the hood. If someone has resources that are good for learning the theory behind why the state of the cell and the calculation of the hidden state extract key characteristics in the short and long term memory, I would love to read it.

tensorflow – How to add dropout and attention in LSTM in python kers

I have about 1000 dataset nodes where each node has 4 time series. Each time series is exactly 6 lengths long. The label is 0 or 1 (that is, binary classification).

More precisely, my data set looks as follows.

node, time-series1, time_series2, time_series_3, time_series4, Label
n1, (1.2, 2.5, 3.7, 4.2, 5.6, 8.8), (6.2, 5.5, 4.7, 3.2, 2.6, 1.8), …, 1
n2, (5.2, 4.5, 3.7, 2.2, 1.6, 0.8), (8.2, 7.5, 6.7, 5.2, 4.6, 1.8), …, 0
and so on.

I normalise my time series before introducing it into my LSTM model for classification.

model = Sequential()
model.add(LSTM(10, input_shape=(6,4)))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=('accuracy'))

print(data.shape) # (1000, 6, 4), target)

I am new to Keras and that is why I started with the simplest LSTM model. However, now I would like to turn it into a level that I can use at the industrial level.

I read that it is good to add dropout Y attention layers to LSTM models. Let me know if you think adding those layers is applicable to my problem and, if so, how to do it? 🙂

Note: I am not limited to the group and attention layers and I am happy to receive other suggestions that I can use to improve my model.

I am pleased to provide more details if necessary.

Machine learning: perform LSTM in several columns separately

I have an LSTM neural network to test the predictability of this network and it works for a column. But now I want to use several columns for different elements and calculate the & # 39; ABSE & # 39; For each column. For example if I have two columns:

enter the description of the image here

You need to calculate the function & # 39; ABSE & # 39; for each column separately.

My code below fails. Can someone help me?

This is what I tried, but I get a value error:

ValueError: non-broadcastable output operand with shape (1,1) doesn't 
match the broadcast shape (1,2)

This happens on the line:

 ---> 51     trainPredict = scaler.inverse_transform(trainPredict)

the code:

def create_dataset(dataset, look_back=1):
    dataX, dataY = (), ()
    for i in range(len(dataset)-look_back-1):
        a = dataset(i:(i+look_back), 0)
        dataY.append(dataset(i + look_back, 0))
        return numpy.array(dataX), numpy.array(dataY)

def ABSE(a,b):
    ABSE = abs((b-a)/b)
    return numpy.mean(ABSE)

columns = df(('Item1','Item2'))

for i in columns:
    # normalize the dataset
    scaler = MinMaxScaler(feature_range=(0, 1))
    dataset = scaler.fit_transform(dataset)
    # split into train and test sets
    train_size = int(len(dataset) * 0.5)
    test_size = 1- train_size
    train, test = dataset(0:train_size,:), 
    look_back = 1
    trainX, trainY = create_dataset(train, look_back)
    testX, testY = create_dataset(test, look_back)
    trainX = numpy.reshape(trainX, (trainX.shape(0), 1, trainX.shape(1)))
    testX = numpy.reshape(testX, (testX.shape(0), 1, testX.shape(1)))
    # create and fit the LSTM network
    model = Sequential()
    model.add(LSTM(1, input_shape=(1, look_back)))
    model.compile(loss='mean_squared_error', optimizer='adam'), trainY, epochs=1, batch_size = 1, verbose = 0)
    # make predictions
    trainPredict = model.predict(trainX)
    testPredict = model.predict(testX)
    # invert predictions
    trainPredict = scaler.inverse_transform(trainPredict)
    trainY = scaler.inverse_transform((trainY))
    testPredict = scaler.inverse_transform(testPredict)
    testY = scaler.inverse_transform((testY))
    # calculate root mean squared error
    trainScore = ABSE(trainY(0), trainPredict(:,0))
    print('Train Score: %.2f ABSE' % (trainScore))
    testScore = ABSE(testY(0), testPredict(:,0))
    print('Test Score: %.2f ABSE' % (testScore))