binary – Why does the BOM consist of two bytes instead of one for example in encoding utf-16

The BOM started out as an encoding trick. It was not part of Unicode, it was something that people discovered and cleverly (ab)used.

Basically, they found the U+FEFF Zero width non-breaking space character. Now, what does a space character that has a width of zero and does not induce a linebreak do at the very beginning of a document? Well, absolutely nothing! Adding a U+FEFF ZWNBSP to the beginning of your document will not change anything about how that document is rendered.

And they also found that the code point U+FFFE (which you would decode this as, if you decoded UTF-16 “the wrong way round”) was not assigned. (U+FFFE0000, which is what you would get from reading UTF-32 the wrong way round, is simply illegal. Codepoints can only be maximal 21 bits long.)

So, what this means is that when you add U+FEFF to the beginning of your UTF-16 (or UTF-32) encoded document, then:

  • If you read it back with the correct Byte Order, it does nothing.
  • If you read it back with the wrong Byte Order, it is a non-existing character (or not a code point at all).

Therefore, it allows you to add a code point to the beginning of the document to detect the Byte Order in a way that works 100% of the time and does not alter the meaning of your document. It is also 100% backwards-compatible. In fact, it is more than backwards-compatible: it actually works as designed even with software that doesn’t even know about this trick!

It was only later, after this trick had been widely used for many years that the Unicode Consortium made it official, in three ways:

  • They explicitly specified that the code point U+FEFF as the first code point of a document is a Byte Order Mark. When not at the beginning of the document, it is still a ZWNBSP.
  • They explicitly specified that U+FFFE will never be assigned, and is reserved for Byte Order Detection.
  • They deprecated U+FEFF ZWNBSP in favor of U+2060 Word Joiner. New documents that have an U+FEFF somewhere in the document other than as the first code point should no longer be created. U+2060 should be used instead.

So, the reason why the Byte Order Mark is a valid Unicode character and not some kind of special flag, is that it was introduced with maximum backwards-compatibility as a clever hack: adding a BOM to a document will never change it, and you don’t need to do anything to add BOM detection to existing software: if it can open the document at all, the byte order is correct, and if the byte order is incorrect, it is guaranteed to fail.

If, on the other hand, you try to add a special one-octet signature, then
all UTF-16 and UTF-32 reader software in the entire world has to be updated to recognize and process this signature. Because if the software does not know about this signature, it will simply try to decode the signature as the first octet of the first code point, and the first octet of the first code point as the second octet of the first code point, and further decode the entire document shifted by one octet. In other words: adding the BOM would completely destroy any document, unless every single piece of software in the entire world that deals with Unicode is updated before the first document with a BOM gets produced.

However, going back to the very beginning, and to your original question:

Why does the BOM consist of two bytes

It seems that you have a fundamental misunderstanding here: the BOM does not consist of two bytes. It consists of one character.

It’s just that in UTF-16, each code point gets encoded as two octets. (To be fully precise: a byte does not have to be 8 bits wide, so we should talk about octets here, not bytes.) Note that in UTF-32, for example, the BOM is not 2 octets but 4 (0000FEFF or FFFE0000), again, because that’s just how code points are encoded in UTF-32.

data structures – Binary Tree as 2D array with variable length raws

Usually we use the tree data structure when we care about time complexity for ins/del/…

-In this special case problem, space saving is mandatory too that is 2 pointers for each node is unaffordable; actual data are in leaves so infact even the internal nodes are considered overhead

-So, I thought of storing it as a 2D array with variable row size, we can assume the tree is almost always full complete power of 2, something like

R[0]= N leaf nodes

R[1]= N/2 level-1 nodes

R[2]= N/4 nodes

..
..

R[logN]= root

-I can derive the formulas for del/ins/… as the tree is easily mind-vewable from this presentation, without any pointers at all.

-Now, did I miss something???
-Is there any flaw in this?
-I’m checking for brainstorming or some opinions.

python – Climbing the leaderboard (Hacker Rank) via binary search

I see three ways to improve your runtime. I have never done this problem so I can’t guarantee that it will be enough to pass all the tests.

Sorted input

The leaderboard scores are sorted in decreasing order. Your duplicate-removal code converts a sorted list to a set to remove duplicates (O(n)), then converts it back to a list (O(n)), then sorts it (O(n*log(n)). You could use the fact that it is sorted: duplicates will always be side by side. You could do something like the following, for example:

prev = ranked(0)
duplicateIndices = set()
for index, val in enumerate(ranked(1:), 1):
    if val == prev:
        duplicateIndices.add(index)
    prev = val
ranked2 = (x for i,x in enumerate(ranked) if i not in duplicateIndices)

This may not be the most efficient way to remove duplicate but it runs in O(n).

Sorted input 2

player is sorted as well. That means that after each iteration, the player’s rank is at most the previous one. This has two consequences:

  1. You don’t need to add the player’s score to ranked2 (either two consecutive scores are equal, which is easy enough to detect without altering ranked2, or the second one is strictly better than the first and having inserted the first in ranked2 will not change the insertion point of the second)
  2. The right boundary of your binary search is your previous insertion point.

This will not change your asymptotical complexity but it should have an interesting effect on your runtime anyway.

Skipping the linear search

With a single line, your code goes from O(m*log(n)) to O(m*n). Since ranked2 is a list, and Python does not know that it is sorted, if i in ranked2 will iterate over the entire list searching for a match. This operation runs in O(n) and is repeated for each of the player’s score. Besides, you have already handled the equality case in your binary search, so why bother?

ubuntu – Lynis Audit Difference Between Found Known Binary and Just Found

When I run a Lynis audit of my Xubuntu system, under System Tools, binary scan, most of the binaries are returned as

Found known binary: package (pkg category) - /path/to/binary

But under /usr/bin directory scan it also shows three packages that are just “Found” instead of “Found known binary” and the output is formatted differently from above.

Found /usr/bin/openssl (version 1.1.1f)
Found /usr/bin/perl (version 5.30.0)
Found /usr/bin/wget (version 1.20.3)

Does that indicate that these three binaries are “not known”?

binary – Can possiblity of hash collision be “zero” when we hash same file in different formats?

Let’s say I have a file A, which is any normal file (pdf, jpeg, mp3 etc.)

Now I get the binary dump of file, say another file B{A}.

And the hexdump of file say, file H{A}.

Now I hash all the three files with any 256 bit HASH (SHA256, BLAKE256 etc.)

I want to know that :

1. What is the possiblity of hash collision in this case (Considering if somehow I find a collison in case of file A, I still can generate the hex and binary dump of that file to see if hashes of B{A} and H{A} matches or not).

2. Will it still be 1/256* 256* 256? Or

3. Will there be 0 collision ? (Considering collision between exactly same size of files)

binary search tree – BST implementation in rust

I am looking for some feedback on my implementation of binary search tree in rust and I would appreciate someone taking the time to go through it and suggest any improvements or corrections as they see fit. More specifically I am concerned about if I should have Clone derives in the BST and Node structs since I am not using them here.

#!(allow(unused))

use std::fmt::Debug;
fn main() {
    let mut bst = BST::new(3_i32);
    bst.append(4);
    bst.append(1);
    bst.append(12);
    bst.display();
}
#(derive(Debug))
pub struct BST<T> {
    root: Box<Node<T>>,
}
#(derive(Clone, Debug))
pub struct Node<T> {
    val: T,
    left: Option<Box<Node<T>>>,
    right: Option<Box<Node<T>>>,
}

impl<T> BST<T>
where
    T: PartialOrd + Debug + Clone,
{
    pub fn new(val: T) -> Self {
        let root = Box::new(Node {
            val,
            left: None,
            right: None,
        });
        Self { root }
    }
    pub fn append(&mut self, new_val: T) {
        let new_node = Box::new(Node {
            val: new_val,
            left: None,
            right: None,
        });
        Self::push_node(new_node, &mut self.root);
    }
    // Private and recursive method
    // recursively search through every node until the value is inserted
    fn push_node(new_node: Box<Node<T>>, current_node: &mut Box<Node<T>>) {
        let ref new_val = new_node.val;
        let ref current_val = current_node.val;
        if *current_val <= *new_val {
            if let Some(ref mut left) = current_node.left {
                Self::push_node(new_node, left);
            } else {
                current_node.left = Some(new_node);
            }
        } else if *current_val > *new_val {
            if let Some(ref mut right) = current_node.right {
                Self::push_node(new_node, right);
            } else {
                current_node.right = Some(new_node);
            }
        }
    }

    fn display(&self) {
        println!("{:#?}", self);
    }
}

nt.number theory – Enumerating multi-core binary partitions

An integer partition $lambda$ of $n$ is called a binary partition provided that its parts are powers of $2$ (dyadic). Example: Let $n=3$. The binary partitions are $lambda=(2,1)$ and $lambda=(1,1)$ but not $lambda=(3)$.

Given such a partition $lambda$, let $Y_{lambda}$ be its corresponding Young diagram. If $square$ a cell in $Y_{lambda}$, construct its hook-lengths $h_{square}$ in the usual manner. Example: the multi-set of hooks of $lambda=(2,1)$ is ${h_{square}: squarein Y_{lambda}}={3,1,1}$.

A partition $lambda$ is called an $(s,t)$-core if both $s$ and $t$ are absent from the hook-lengths in $Y_{lambda}$. There has been a flurry of activity regarding this notion.

Recall 1. The number of $(s,s+1)$-core (unrestricted) partitions is the Catalan number $frac1{s+1}binom{2s}s$.

Recall 2. The number of $(s,s+1)$-core partitions with distinct parts is the Fibonacci number $F_{s+1}$.

I like to ask:

QUESTION. What is the total number $f_{s,s+1}$ of $(s,s+1)$-core binary partitions?

Example. Here are the first few values I can compute: $f_{1,2}=1, f_{2,3}=2, f_{3,4}=4, f_{4,5}=9$.

c++ – Binary search algorithm for logarithm

c++ – Binary search algorithm for logarithm – Software Engineering Stack Exchange

base58 – Does a Bitcoin address encode the binary form of a public key hash, or the hexadecimal form?

In the bitcoin.org developer guide, within the transaction section, there is the following excerpt:

Pubkey hashes are almost always sent encoded as Bitcoin addresses, which are base58-encoded strings containing an address version number, the hash, and an error-detection checksum to catch typos.

Base58 encoding uses byte values from 0 to 57 to encode alphanumeric characters as binary. My thinking is that it would not be possible to encode the binary form of the Pubkey hash using base58 as that isn’t what the encoding scheme is for; it encodes alphanumeric characters, not binary. So is it the case that the Pubkey hash is sent as one part of a Bitcoin address, which itself is encoded using base58, and it is the hexadecimal representation of the Pubkey hash which is encoded? Not the binary representation.

complexity theory – What if have a algorithm that could generate a NFA of 42 states of any binary string of 2^32 length?

For example, if we have a true algorithm that could generate any NFA of at most 42 states from any binary string of 2^32 length. So, this algorithm can not just recognize the string but just recreate it from a NFA of 42 states.
The DFA taken from the NFA could be at most 42^2=1764 states.
So, even the DFA could generate the same binary string and can be easily programmed by anyone.
So, which is the implications of such method in cs theory?

DreamProxies - Cheapest USA Elite Private Proxies 100 Cheap USA Private Proxies Buy 200 Cheap USA Private Proxies 400 Best Private Proxies Cheap 1000 USA Private Proxies 2000 USA Private Proxies 5000 Cheap USA Private Proxies ExtraProxies.com - Buy Cheap Private Proxies Buy 50 Private Proxies Buy 100 Private Proxies Buy 200 Private Proxies Buy 500 Private Proxies Buy 1000 Private Proxies Buy 2000 Private Proxies ProxiesLive.com Proxies-free.com New Proxy Lists Every Day Proxies123.com Best Quality USA Private Proxies