I want to create a network in which the links are formed based on a similarity metric defined as the Euclidean distance between the nodes. The distance is calculated using socio-demographic features of customers such as gender and age. The problem is the code takes 200 seconds to just create the network and as I am tuning my model and the code executes at least 100 times, the long execution time of this piece is making the whole code run slowly.

So, the nodes are in fact customers. I defined a class for them. They have two attributes gender (numerical; specified by number 0 or 1) and age (varies from 24 to 44) which are stored in a csv file. I have generated a sample csv file here :

```
#number of customers
ncons = 5000
gender = (random.randint(0, 1) for i in range(ncons))
age = (random.randint(22, 45) for i in range(ncons))
customer_df = pd.DataFrame(
{'customer_gender': gender,
'customer_age': age
})
customer_df.to_csv('customer_df.csv', mode = 'w', index=False)
```

The Euclidean distance delta_ik is of the form following. In the formula, ` n`

is the number of attributes (here n=2, age and gender). For customers ` i`

and ` k`

, ` S_f,i - S_f,k`

is the difference between attribute ` f = 1,2`

which is divided by the maximum range of attribute ` f`

for all the customers (`max d_f`

). So the distance is the distance in the values of socio-demographic attributes, not geographical positions.

Then I define the similarity metric H_ik which creates a number between 0 and 1 from delta_ik as follow:. Finally, For customers ` i`

and ` k`

, I generate a random number rho between 0 and 1. If rho is smaller than H_ik, the nodes are connected.

So, the code that keeps delta_ik in a matrix and then uses that to generate the network looks as below:

```
import random
import pandas as pd
import time
import csv
import networkx as nx
import numpy as np
import math
#Read the csv file containing the part worth utilities of 184 consumers
def readCSVPWU():
global headers
global Attr
Attr = ()
with open('customer_df.csv') as csvfile:
csvreader = csv.reader(csvfile,delimiter=',')
headers = next(csvreader) # skip the first row of the CSV file.
#CSV header cells are string and should be turned to a float number.
for i in range(len(headers)):
if headers(i).isnumeric():
headers(i) = float(headers(i))
for row in csvreader:
AttrS = row
Attr.append(AttrS)
#convert strings to float numbers
Attr = ((float(j) for j in i) for i in Attr)
#Return the CSV as a matrix with 17 columns and 184 rows
return Attr
#customer class
class Customer:
def __init__(self, PWU = None, Ut = None):
self.Ut = Ut
self.PWU = Attr(random.randint(0,len(Attr)-1)) # Pick random row from survey utility data
#Generate a network by connecting nodes based on their similarity metric
def Network_generation(cust_agent):
start_time = time.time() # track execution time
#we form links/connections between consumeragentsbasedontheirdegreeofsocio-demographic similarity.
global ncons
Gcons = nx.Graph()
#add nodes
(Gcons.add_node(i, data = cust_agent(i)) for i in range(ncons))
#**********Compute the node to node distance
#Initialize Deltaik with zero's
Deltaik = ((0 for xi in range(ncons)) for yi in range(ncons))
#For each attribute, find the maximum range of that attribute; for instance max age diff = max age - min age = 53-32=21
maxdiff = ()
allval = ()
#the last two columns of Attr keep income and age data
#Make a 2D numpy array to slice the last 2 columns (#THE ACTUAL CSV FILE HAS MORE THAN 2 COLUMNS)
np_Attr = np.array(Attr)
#Take the last two columns, income and age of the participants, respectively
socio = np_Attr(:, (len(Attr(0))-2, len(Attr(0))-1))
#convert numpy array to a list of list
socio = socio.tolist()
#Max diff for each attribute
for f in range(len(socio(0))):
for node1 in Gcons.nodes():
#keep all values of an attribute to find the max range
allval.append((Gcons.nodes(node1)('data').PWU(-2:)(f)))
maxdiff.append((max(allval)-min(allval)))
allval = ()
# THE SECOND MOST TIME CONSUMING PART ********************
for node1 in Gcons.nodes():
for node2 in Gcons.nodes():
tempdelta = 0
#for each feature (attribute)
for f in range(len(socio(0))):
Deltaik(node1)(node2) = (Gcons.nodes(node1)('data').PWU(-2:)(f)-Gcons.nodes(node2)('data').PWU(-2:)(f))
#max difference
insidepar = (Deltaik(node1)(node2) / maxdiff(f))**2
tempdelta += insidepar
Deltaik(node1)(node2) = math.sqrt(tempdelta)
# THE END OF THE SECOND MOST TIME CONSUMING PART ********************
#Find maximum of a matrix
maxdel = max(map(max, Deltaik))
#Find the homopholic weight
import copy
Hik = copy.deepcopy(Deltaik)
for i in range(len(Deltaik)):
for j in range(len(Deltaik(0))):
Hik(i)(j) =1 - (Deltaik(i)(j)/maxdel)
#Define a dataframe to save Hik
dfHik = pd.DataFrame(columns = list(range(ncons) ),index = list(range(ncons) ))
temp_h = ()
#For every consumer pair $i$ and $k$, a random number $rho$ from a uniform distribution $U(0,1)$ is drawn and compared with $H_{i,k}$ . The two consumers are connected in the social network if $rho$ is smaller than $H_{i,k}$~cite{wolf2015changing}.
# THE MOST TIME CONSUMING PART ********************
for node1 in Gcons.nodes():
for node2 in Gcons.nodes():
#Add Hik to the dataframe
temp_h.append(Hik(node1)(node2))
rho = np.random.uniform(0,1,1)
if node1 != node2:
if rho < Hik(node1)(node2):
Gcons.add_edge(node1, node2)
#Row idd for consumer idd keeps homophily with every other consumer
dfHik.loc(node1) = temp_h
temp_h = ()
# nx.draw(Gcons, with_labels=True)
print("Simulation time: %.3f seconds" % (time.time() - start_time))
# THE END OF THE MOST TIME CONSUMING PART ********************
return Gcons
#%%
#number of customers
ncons = 5000
gender = (random.randint(0, 1) for i in range(ncons))
age = (random.randint(22, 39) for i in range(ncons))
customer_df = pd.DataFrame(
{'customer_gender': gender,
'customer_age': age
})
customer_df.to_csv('customer_df.csv', mode = 'w', index=False)
readCSVPWU()
customer_agent = dict(enumerate((Customer(PWU = (), Ut = ()) for ij in range(ncons)))) # Ut=()
G = Network_generation(customer_agent)
```

I realized that there are two nested loops that are more time consuming than others, but I am not sure how to write them more efficiently. I would tremendously appreciate if you could please give me some advice on the ways to decrease the elapsed time.

Thank you so much