I am trying to perform the following task:

For a given data column (stored as a numpy matrix), "place" the data in a greedy way where I test the current object and the next to calculate its entropy.

Pseudocode would look like this:

```
split_data (feature):
BestValues = 0
For each value in the characteristic:
Calculate CurrentGain As InformationGain (Entropy (Characteristic) - Entropy (Value + Next Value))
If CurrentGain> BestGain:
Set BestValues = Value, Next value
Set BestGain = CurrentGain
Back BestValues
```

I currently have a Python code that looks like the following:

```
# This function finds the total entropy for a given data set
def entropy (data set):
# Declare variables
total_entropy = 0
# Determine the classes and the number of elements in each class
classes = numpy.unique (dataset)[:,-1])
# Scroll through each "class", or label
For classes in classes:
# Create temporary variables
currFreq = 0
currProb = 0
# Walk through each row in the data set
for the row in the data set:
# If that row has the same label as the current class, implement the frequency
if (aclass == row[-1]):
currFreq = currFreq + 1
# If not, continue
plus:
continue
# The current probability is the # of occurrences / total occurrences
currProb = currFreq / len (data set)
# If it is 0, then the entropy is 0. If not, use the entropy formula
if (currFreq> 0):
total_entropy = total_entropy + (-currProb * math.log (currProb, 2))
plus:
return 0
# Returns the total entropy
return total_entropy
# This function gets entropy for a single attribute.
def entropy_by_attribute (data set, characteristic):
# The attribute is the specific characteristic of the data set.
attribute = data set[:,feature]
# The target_variables are the unique values in that characteristic
target_variables = numpy.unique (dataset)[:,-1])
# The unique values in the column that we are evaluating.
variables = numpy.unique (attribute)
# The entropy of the attribute in question.
entropy_attribute = 0
# Go through each of the possible values.
by variable in variables:
denominator = 0
entropy_each_feature = 0
# For each row in the column
for the row in the attribute:
# If it is equal to the current value that we are estimating, increase its denominator
if row == variable:
denominator = denominator + 1
# Now go through each class
for target_variable in target_variables:
numerator = 0
# Go through the data set
for the row in the data set:
index = 0
# if the current row in the entity is equal to the value you are evaluating
# and the label is equal to the label you are evaluating, increase the numerator
yes data set[index][feature] == variable and data set[index][-1] == target_variable:
numerator = numerator + 1
plus:
continue
index = index + 1
# use eps to protect from dividing by 0
fraction = numerator / (denominator + numpy.finfo (float) .eps)
entropy_each_feature = entropy_each_feature + (-fraction * math.log (fraction + numpy.finfo (float) .eps, 2))
# Now calculate the total entropy for the attribute in question
big_fraction = denominator / len (data set)
entropy_attribute = entropy_attribute + (- big_fraction * entropy_each_feature)
# Return that entropy
return entropy_attribute
# This function calculates the information gain.
def infogain (data set, characteristic):
# Grab the entropy of the total data set
total_entropy = entropy (data set)
# Grabs entropy for the current feature that is being evaluated
feature_entropy = entropy_by_attribute (data set, characteristic)
# Calculate the infogain
infogain = float (abs (total_entropy - feature_entropy))
# Return the infogain
back infogain
```

However, I'm not sure how to do the following:

- For a characteristic, it takes its total entropy
- For a single characteristic, determine the entropy using a grouping technique where I am testing two values

I can not logically conceive how to develop codes to achieve 1 and 2, and I'm struggling a lot. I will continue updating with the progress I do.