algorithms – How does secondary clustering occur in hashing?

One of my friends said that secondary clustering is the phenomenon occurring when the probe sequence has the same initial value. This definition shows that secondary clustering occurs in linear probing as well and not only in quadratic probing (which I misunderstood while reading the CLRS text).

Secondary probing occurs as friend said and the same thing is written in CLRS and since the definition was given in the quadratic probing section, and how the language was used made it look like that secondary clustering occurs only in quadratic probing.

“Also, if two keys have the same initial probe position, then their probe sequences are the same, since $h(k_1, 0) = h(k_2,0)$ implies $h(k_1, i) = h(k_2, i)$. This property leads to a milder form of clustering, called secondary clustering.”- CLRS 2nd edition page $240$.

What I understood after reading the excerpt again is that : secondary clustering says that small small clusters start building at longer distances.

I feel that this is caused by two keys having the same probe sequence as a result of which say if $k_1$ , $k_2$, and so on the keys have the same probe sequence and if they are inserted in that order, and their probe sequence are $x_1,x_2,…$ then $k_1$ gets position at $x_1$, $k_2$ at $x_2$ and so on.. Thus this leads to the formation of clusters spaced at longer distances. But I feel this explanation which I came up to be quite vague, at least I could not visualize with a example of the situation.

I want to know how two keys having the same initial probe positions leads to this secondary clustering?

Primary clustering means that long stretches of clusters are formed and they tend to grow in side because of the fact that that a cell following a cluster of $i$ elements in a hash table with $m$ slots has a probability of $frac{i+1}{m}$ to get filled. This is the cause of primary clustering. I want something this concrete to convince me.

algorithms – Evolution strategies for TSP

I’m trying to figure out how evolution strategies would work for the traveling salesman problem. I’ve done this with the genetic algorithm already, but am curious about how this would be done using ES.

To my understanding, the main genetic operator for the GA is the crossover, and for the ES it’s the mutation. The ES can adapt its mutation step size for each gene to converge faster to an optimal solution. How would this work for a combinatorial problem like the TSP though ? Start by mutating 5 cities per mutation, and slowly progress to 1 ?

I know that ES are not primarily designed for combinatorial problems, but some papers compare different evolutionary algorithms for the TSP and ES are usually one of them, so I know it can be done. But I’m having a hard time finding some explicit implementations.

Thank you in advance!

algorithms – Formatting lines of text

I am working with a text file that needs to be formatted so I can use it for creating a database. So in my text file, I have words in the following format.

floorэтаж|[flɔː]

As you can see it has one problem, English word transitions into a non-English word.
The desired format is

floor|этаж|[flɔː]

My question is how do I format my list to the desired format? I guess it has something to do with different encodings.

algorithms – Max-Flow Problems – Computer Science Stack Exchange

Given the network below, we can formulate the maximum flow problem as a linear program. The general formulation is below:

enter image description here

(a) Reduce the number of constraints in the general formation above by indexing of edges instead of vertices.

(b) Write the linear program to solve the maximum flow program of the network below.

enter image description here

(c) Convert the linear program in standard form

(d) Convert the linear program in slack form

(e) Solve for the first two iterations of the linear program using the simplex algorithm. Show your work.

genetic algorithms – Travelling Salesman Problem: Distance between solutions

I’m designing a genetic algorithm to solve the travelling salesman problem. So far, I’ve gotten fairly good results. I’m now trying to improve on them by implementing some sort of diversification scheme (like fitness sharing and crowding), although I’m struggling with the conceptualisation of the inter-solution distance a bit.

Solutions represent a path that goes through all cities, i.e. a permutation of the order in which they are visited. This is represented in my code by np.arrays. If I want to know how similar two solutions are, I basically want to find the distance between two permutations of n_cities elements. I currently have two ideas for the distance function.

  1. Levenshtein distance, which is simply ‘how many atomic edits are two sequences removed from each other.
  2. Hamming distance, which denotes the number of positions that are the same.

Note that, for each solution, I make sure to cycle it so it starts in the same position (city). Otherwise these metrics won’t make sense.

Which of them is more appropriate? Is there a better solution? I’ve browsed a number of articles, but haven’t really found an answer yet.

algorithms – Two-Sum Design – Pre-sort, Range Allowance, Many Sums

I would love feedback on the pseudocode design objectives outlined below.

  • Pre-Sort: Optimize a two-sum solution that has a pre-sorted input.
  • Range Allowance: Find a two-sum solution within a range plus/minus the given capacity.
  • Many Sums: Convert a two-sum item solution to have as many items as possible based on a given capacity.

Two-Sum: Determine whether there are two items whose length will equal the total length while ensuring the same item cannot be used twice. This optimizes for runtime over memory.

  • Time complexity: $O(n)$
  • Space complexity: $O(n)$

Samples

Input: (4, 5, 2, 6)

  • Total length: 10
  • Expect: true

Input: (4, 5, 2, 5)

  • Total length: 10
  • Expect: true

Input: (4, 5, 2, 7)

  • Total length: 10
  • Expect: false

Code

fun isLengthMatch(totalLength: Int, lengths: IntArray): Boolean {
    handleErrors(totalLength, lengths)
    val searchSet = hashSetOf<Int>()
    for (length in lengths) {
        val targetLength = totalLength - length
        if (searchSet.contains(targetLength)) return true
        else searchSet.add(length)
    }
    return false
}

fun handleErrors(totalLength: Int, lengths: IntArray) {
    if (totalLength <= 0) throw IllegalArgumentException(""totalLength" must be greater than 0.")
    if (lengths.size < 2) throw IllegalArgumentException(""lengths" cannot be less than two items.")
}

Pre-Sort

  1. Save a new var lastTargetLength
  2. If the current length < lastTargetLength, there are no possible two-sums and return false.

i.e.

Input: (6,2,1,0)

Iterations

  1. targetLength = 9 - 6, lastTargetLength = 3
  2. Return false because the length of 2 < lastTargetLength of 3.

Range Allowance

  1. Generate a targetLengthsArray +/- the totalLength and the given range allowance.

  2. Check if the searchSet contains any of the values in the targetLengthsArray.

Many Sums

Find the items that will produce the optimum sum with the given capacity.

  1. Create a 2D array
    • Rows: Items, ascending in capacity
    • Columns: Capacity, ascending to max capacity
  2. For each item, find the max value based on the capacity
    • a. If the item capacity > current capacity: Choose the previous capacity value.
    • b. Else: Choose the max of i and ii.
      • i. Item value + Item value at (current capacity – item capacity)
      • ii. Previous capacity value
      • iii. If i == ii: Choose the previous item value.
  3. Find the max value items.
    • a. Check previous item at the same capacity until capacity == 0
      • i. If current value == previous capacity value: Add the previous capacity value.
      • ii. Else: Add the current item and look at the last item at (current capacity – item capacity).

See: 0/1 Knapsack Problem Dynamic Programming by Tushar Roy – Coding Made Simple

algorithms – How can I make the variance of a multiple sum of set of fixed number of variables minimum?

Here is the problem:

There are $MN$ people, where there are $M$ seeds and $N$ people are in each seed.
We have to make a team of $M$ people where everyone in the team have different seeds.
Each person have their own value; the seeds are aligned so that for any seeds $I$ and $J$ and for any $ain I$ and $bin J$, $a<b$ or $a>b$ holds. Assume there are no two people with same values.

  1. People assign individually.
  2. People may assign individually or as a couple. In the latter case these two should be in the same team.

For each cases, design an algorithm to make the variance of the sum of each teams to be minimum.

Well, the problem above is the optimal solution, and since it is NP (complete?), I would accept heuristics.
I hope that the heuristic method gives the solution with variance at most $25%$ larger than the optimal solution, as I don’t want to harm the balance of the team.

Could you please give an algorithm or heuristic that can solve this problem? Also, adding a time complexity would be highly appreciated! Thanks!

Edit: All values of each person are positive integers.

algorithms – Minimum sum of squared Euclidean distance between two arrays

Question:
Given two sorted sequences in increasing order, $X$ and $Y$. $Y$ is of size $k$ and $X$ is of size $m$.
I would like to find a subset of $X$, $i.e$, $X’$ of size $k$, and considering the following optimization problem:$$d(Y,X’) = sum_{j=1}^{k}(y_{j}-x’_{j})^{2}$$ And $X’$ is a subset of $X$ of size $k$, $y_{j} text{ and } x’_{j}$ is element in $Y$ and $X’$. I would like to find the subset of $X$, to reach the minimum of $d(Y,X’)$.
Note that $X’$ could have $k!$ numbers of arrangements, so its order is totally unknown.


What I have came up with so far:
I would like to approach it using Dynamic Programming, and I think I would first compute the squared distance between each element in $Y$ and $X$, but I’m having trouble in determining what is the subproblem and how to solve ths using DP. Thank you!

algorithms – What is your approach to writing code?

I’ve took on computer science school, now I mainly want to work in IT but I have to go through two years in which I have to study and pass programming. Now first semester, I got a C (because I couldn’t come up with a code) and my second semester was cancelled because of covid. This is my third and I have a huge problem coding. My exam will have a specific question and give you all the details and variables I UNDERSTAND what the program should do, I know the syntax, I just don’t know where to begin. And then my code will have all those errors (especially long codes) that I don’t know how they originated despite me following the rules very closely. Now if I look at my friend’s solution, I’ll think “That was easy, why couldn’t I think of this?”

What is your train of thought when solving a problem?

algorithms – How can I create the table and solve the error, why did the error happened?

import world_population.csv
population = Table.with_columns(
“Population”, population_amounts,
“Year”, years
)
population

No module named ‘world_population’