I am calculating the average word vector of a tweet using two for loops. This is very slow, and when I have too many tweets, it consumes all my memory and my computer fails.
My function is to create a matrix that I will use to predict the feeling (positive or negative) of the tweets.
Here is my code to create the matrix:
def create_X (list_of_tweets, w2v, features): X = np.zeros ((len (list_of_tweets), features)) for indebtedness, tweet in list (list_of_tweets): per word in tweet: try: X[indeks,:] = X[indeks,:] + model_tot.wv[str(word)] except: happen N = len (tweet) if N> 0: #in this case tweet is empty X[indeks] = X[indeks]/ N # taking the average of the word vectors return X
I have tried to make an understanding list with the following code:
X[indeks,:] =[X[X[X[X[indeks,:] + model_tot.wv[str(word)] per word in tweet if word]
but then I get the following error: ValueError: I can not copy the sequence with size 7 to the matrix axis with dimension 20
So, in what I could mainly use the help, is to create a code that uses less RAM (this is the most important thing) and is faster. Any suggestion is very appreciated.