sorting – Fast hashing algorithm for mapping/quantizing a collection of floats onto a small set of sorted floats

With my application, I have

  • Collection X: thousands of floating-point numbers with the value range [0, 1]
  • Collection Y: 10 floats ranging from [0, 1].
  • The size of X is known. Let it be L.

The goal is to quantize X onto Y, so that we get a hash array of indices of Y for X.

Attempts

Naively, I’ve tried linear and binary searching for positioning each element of X in Y so that the element is between a lower and a higher element of Y.

However, the performance is not good enough for my application.

Question

What is the best way of this kind of hashing/quantization? X is not sorted.

Thanks!

authentication – Does client side hashing reduce insider risk?

No, that’s not a good reason:

This never helps. This architecture just replaced the secret “password” with the secret “hash of password”. Neither might ever fall into the wrong hands, because the hash is sufficient for logging on. (You can never assume the client has unmodified software. A simple debugger is sufficient to make a client program send whatever you want.)

And they’re logging these hashes, which they simply shouldn’t do, because now they’ve got a plaintext log of all the credentials necessary to log on.

The issue at hand is the overzealous logging, not the passwords.

Plain hashing, as far as I can tell, also doesn’t ever contribute any further factors into the credentials, so I only see very strange use cases where it might be helpful (for example, to avoid password reuse vulnerabilities should the server be compromised – but if your user reuses passwords, then your server being compromised should be the least of their worries). Instead, I often see it used as snake-oil (“magical cure”) for security issues that are much more severe but people try to distract from – exactly like this logging issue, or things like unencrypted database connections.

appsec – Encryption (not hashing) of credentials in a Python connection string

I would like to know how to encrypt a database connection string in Python – ideally Python 3 – and store it in a secure wallet. I am happy to use something from pip. Since the connection string needs to be passed to the database connection code verbatim, no hashing is possible. This is motivated by:

  • a desire to avoid hard-coding the database connection credentials in a Python source file (bad for security and configurability);
  • avoid leaving them plain-text in a configuration file (not much better due to security concerns).

In a different universe, I have seen an equivalent procedure done in .NET using built-in machineKey / EncryptedData set up by aspnet_regiis -pe, but that is not portable.

Though this problem arises from an example where an OP is connecting via pymysql to a MySQL database,

  • the current question is specific neither to pymysql nor MySql, and
  • the content from that example is not applicable as a minimum reproducible example here.

The minimum reproducible example is literally

#!/usr/bin/env python3

PASSWORD='foo'

Searching for this on the internet is difficult because the results I get are about storing user passwords in a database, not storing connection passwords to a database in a separate wallet.

I would like to do better than a filesystem approach that relies on the user account of the service being the only user authorized to view what is otherwise a plain-text configuration file.

Related questions

algorithm analysis – Uniform Hashing. Understanding space occupancy and choice of functions

I’m having troubles understanding two things from some notes about Uniform Hashing.
Here’s the copy-pasted part of the notes:

Let us first argue by a counting argument why the uniformity property,
we required to good hash functions, is computationally hard to
guarantee. Recall that we are interested in hash functions which map
keys in $U$ to integers in ${0, 1, …, m-1}$. The total number of such
hash functions is $m^{|U|}$, given that each key among the $|U|$ ones can be
mapped into $m$ slots of the hash table. In order to guarantee uniform
distribution of the keys and independence among them, our hash
function should be anyone of those ones.
But, in this case, its
representation would need $Omega(log_2 m^{|U|}) = Omega(|U| log_2 m)$ bits, which is
really too much in terms of space occupancy and in the terms of
computing time (i.e. it would take at least $Omega(frac{|U|log_2 m}{log_2 |U|})$ time to just
read the hash encoding).

The part I put in bold is the first thing is confusing me.

Why the function should be any one of those? Shouldn’t you avoid a good part of them, like the ones sending every element from the universe $U$ into the same number and thus not distributing the elements?

The second thing is the last “$Omega$“. Why would it take $Omega(frac{|U|log_2 m}{log_2 |U|})$ time just to read the hash encoding?

The numerator is the number of bits needed to index every hash function in the space of such functions, the denominator is the size in bits of a key. Why this ratio gives a lower bound on the time needed to read the encoding? And what hash encoding?

Combine Multiple Minimal Perfect Hashing

I’m building some Minimal Perfect Hash from multiple datasets using the CHD algorithm

I would like to know if there exists a "smart" way to combine all the different MPH obtained from previous steps and use them to build a new one at minimal cost.

hash – Double Hashing with Strings as key

Thanks for contributing an answer to Computer Science Stack Exchange!

  • Please be sure to answer the question. Provide details and share your research!

But avoid

  • Asking for help, clarification, or responding to other answers.
  • Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

What is the probability of mining a block by a miner controlling a fraction of hashing power?

Suppose a miner controls α fraction of the hashing power and the chain on which he is trying to mine has the mining rate given by f. How to find the probability of mining a block by the miner?

My question is about hashing and autopsy

1] Can a particular sector of an image be accessed in the autopsy software?

2] Is the calculation of an image allowed in the autopsy software? That is my question

Quadratic sounding and double hashing

What is the time complexity for the quadratic probe and double hashing in the hash table?

hashing: matches all the elements of a list with each entry of an entry

I have two text files. One is a list (key-value pairs) of items and the other is an input file that key-value pairs must match. If a match is found, it is marked with its corresponding value in the input file.

For example:

my list file:

food = ###food123
food microbiology = ###food mircobiology456
mirco organism = ###micro organims789
geo tagging = ###geo tagging614
gross income = ###gross income630
fermentation = fermentation###929
contamination = contamination##878
Salmonella species = Salmonella species###786
Lactic acid bacteria = Lactic acid bacteria###654

input file

There are certain attributes necessary for fermentation of meat. 
It should be fresh, with low level of contamination, the processing should be hygienic and the refrigeration should be resorted at different stages.
Elimination of pathogens like Coliform, Staphylococci, Salmonella species may be undertaken either by heat or by irradiation. There is need to lower the water activity and this can be achieved by either drying or addition of the salts. 
Inoculation with an effective, efficient inoculum consisting of Lactic acid bacteria and, or Micrococci which produces lactic acid and also contributes to the flavor development of the product. 
Effective controlled time, temperature humidity during the production is essential.
And, Salt ensures the low pH value and extends the shelf-life of the fermented meats like Sausages.

Expected performance

    There are certain attributes necessary for ((fermentation###929)) of meat. 
    It should be fresh, with low level of ((contamination##878)), the processing should be hygienic and the refrigeration should be resorted at different stages.
    Elimination of pathogens like Coliform, Staphylococci, ((Salmonella species###786)) may be undertaken either by heat or by irradiation. There is need to lower the water activity and this can be achieved by either drying or addition of the salts. 
    Inoculation with an effective, efficient inoculum consisting of ((Lactic acid bacteria###654)) and, or Micrococci which produces lactic acid and also contributes to the flavor development of the product. 
    Effective controlled time, temperature humidity during the production is essential.
    And, Salt ensures the low pH value and extends the shelf-life of the fermented meats like Sausages.

For this I am using python3, parsing the list file and storing it in a hash.
Hash has all the items in the list as key-value pairs.
Then each line in the input file matches all the keys present in the hash and when a match is found the corresponding hash value is overridden as shown in the output. This method works well when the list and entry size is small, but when the list and entry size increase, it takes a long time. How can I improve the temporal complexity of this matching method?

** Algorithm that I am using **:

#parse list and store in hash
for l in list:
  ll = l.split("=")
  hash(ll(0)) = ll(1)

#iterate input and match with each key
keys = hash.keys()
for line in lines:
    if(line != ""):
        for key in keys:
            my_regex = r"((,"'( /-|))" + key + r"(( ,.!"।'/-)))"
            if((re.search(my_regex, line, re.IGNORECASE|re.UNICODE))):
                 line = re.sub(my_regex, r"1" + "((" + hash(key) + "))" + r"2",line)