binary – Why does the BOM consist of two bytes instead of one for example in encoding utf-16

The BOM started out as an encoding trick. It was not part of Unicode, it was something that people discovered and cleverly (ab)used.

Basically, they found the U+FEFF Zero width non-breaking space character. Now, what does a space character that has a width of zero and does not induce a linebreak do at the very beginning of a document? Well, absolutely nothing! Adding a U+FEFF ZWNBSP to the beginning of your document will not change anything about how that document is rendered.

And they also found that the code point U+FFFE (which you would decode this as, if you decoded UTF-16 “the wrong way round”) was not assigned. (U+FFFE0000, which is what you would get from reading UTF-32 the wrong way round, is simply illegal. Codepoints can only be maximal 21 bits long.)

So, what this means is that when you add U+FEFF to the beginning of your UTF-16 (or UTF-32) encoded document, then:

  • If you read it back with the correct Byte Order, it does nothing.
  • If you read it back with the wrong Byte Order, it is a non-existing character (or not a code point at all).

Therefore, it allows you to add a code point to the beginning of the document to detect the Byte Order in a way that works 100% of the time and does not alter the meaning of your document. It is also 100% backwards-compatible. In fact, it is more than backwards-compatible: it actually works as designed even with software that doesn’t even know about this trick!

It was only later, after this trick had been widely used for many years that the Unicode Consortium made it official, in three ways:

  • They explicitly specified that the code point U+FEFF as the first code point of a document is a Byte Order Mark. When not at the beginning of the document, it is still a ZWNBSP.
  • They explicitly specified that U+FFFE will never be assigned, and is reserved for Byte Order Detection.
  • They deprecated U+FEFF ZWNBSP in favor of U+2060 Word Joiner. New documents that have an U+FEFF somewhere in the document other than as the first code point should no longer be created. U+2060 should be used instead.

So, the reason why the Byte Order Mark is a valid Unicode character and not some kind of special flag, is that it was introduced with maximum backwards-compatibility as a clever hack: adding a BOM to a document will never change it, and you don’t need to do anything to add BOM detection to existing software: if it can open the document at all, the byte order is correct, and if the byte order is incorrect, it is guaranteed to fail.

If, on the other hand, you try to add a special one-octet signature, then
all UTF-16 and UTF-32 reader software in the entire world has to be updated to recognize and process this signature. Because if the software does not know about this signature, it will simply try to decode the signature as the first octet of the first code point, and the first octet of the first code point as the second octet of the first code point, and further decode the entire document shifted by one octet. In other words: adding the BOM would completely destroy any document, unless every single piece of software in the entire world that deals with Unicode is updated before the first document with a BOM gets produced.

However, going back to the very beginning, and to your original question:

Why does the BOM consist of two bytes

It seems that you have a fundamental misunderstanding here: the BOM does not consist of two bytes. It consists of one character.

It’s just that in UTF-16, each code point gets encoded as two octets. (To be fully precise: a byte does not have to be 8 bits wide, so we should talk about octets here, not bytes.) Note that in UTF-32, for example, the BOM is not 2 octets but 4 (0000FEFF or FFFE0000), again, because that’s just how code points are encoded in UTF-32.

How to decrypt MultiBit private key with 128 bytes line and 52 bytes line?

I want to recover my bitcoin private key stored by MultiBit. I’ve written down private key as text while using MultiBit classic. I don’t remember how I get the text of private key. The text consists of two line. First line has 128 characters and starts with U and contains +. Second line has 52 characters and starts with q, contains /.

As far as I know, the text may be encrypted. And I also wrote down password, so I assume I can decrypt the text with the password, but I don’t know how.

I’ve tried the instruction of Export and limited import of private keys with pasting the text to a file and executed the openssl command. However, I’ve got bad decrypt message and 64 bytes result file. It doesn’t seem successful.

I want to how to decrypt my private key text and get valid private key.

math – What is the size of the number 65535 in bytes?

There are likely several variables at play here, but 65525 is 5 characters. 1 byte per character = 5 bytes. Text editors don’t have a notion of what a number is the same way that your code does, so everything is (at a basic level) stored as a set of characters / a string.

When you write for example, C# code like

int number = 65535;

you’re explicitly allocating 4 bytes of memory for that number – even though in this case it could use only 2 bytes. Similarly, you could write

ushort number = 65535;

and then only 2 bytes would be allocated.

But if you write

string number = "65535"; or string otherNumber = "65534;" or string anotherNumber = "95536;" they will all be the same size, and that size has no relation to what the number actually is (with a few exceptions).

When you write things to disk, these can all go out the window, as there are many ways of compressing / optimizing things for storage.

mysql – mysqldump is returning 0 bytes. Storage and permissions are OK!

The mysqldump command is failing whereas it was working before. It now produces 0 bytes output. I have checked that permissions and space are OK!

I expect a dump around 3GB in size.

The script is as follows:

#mysqldump -u root -p dbname >/path/dump.dmp

Python loading image into memory (numpy arrays) from database bytes field fast

I am looking for feedback on the function below to load a png image stored as bytes from a MongoDB into numpy arrays.

from PIL import Image
import numpy as np


def bytes_to_matricies(image_bytes):
    """image bytes into Pillow image object
    image_bytes: image bytes accessed from Mongodb
    """

    raw_image = Image.open(io.BytesIO(image_bytes))
    greyscale_matrix = np.array(raw_image.convert("L"))
    color_matrix = np.array(raw_image.convert("RGB"))

    n = greyscale_matrix.shape(0)
    m = greyscale_matrix.shape(1)
    return greyscale_matrix, color_matrix, n, m

I have profiled my code with cProfile and found this function to be a big bottleneck. Any way to optimise it would be great. Note, I have compiled most of the project with Cython, which is why you’ll see .pyx files. This hasn’t affected much.

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       72  331.537    4.605  338.226    4.698 cleaner.pyx:154(clean_image)
        1  139.401  139.401  139.401  139.401 {built-in method builtins.input}
      356   31.144    0.087   31.144    0.087 {method 'recv_into' of '_socket.socket' objects}
    11253   15.421    0.001   15.421    0.001 {method 'encode' of 'ImagingEncoder' objects}
      706   10.561    0.015   10.561    0.015 {method 'decode' of 'ImagingDecoder' objects}
       72    5.044    0.070    5.047    0.070 {built-in method scipy.ndimage._ni_label._label}
     7853    0.881    0.000    0.881    0.000 cleaner.pyx:216(is_period)
       72    0.844    0.012    1.266    0.018 cleaner.pyx:349(get_binarized_matrix)
       72    0.802    0.011    0.802    0.011 {method 'convert' of 'ImagingCore' objects}
       72    0.786    0.011   13.167    0.183 cleaner.pyx:57(bytes_to_matricies)

If you are wondering how the images are encoded before being written into the MongoDB here is that code:

def get_encoded_image(filename: str):
    """Binary encodes image.
    """
    image = filesystem_io.read_as_pillow(filename) # Just reads file on disk into PILLOW Image object
    stream = io.BytesIO()
    image.save(stream, format='PNG')

    encoded_string = stream.getvalue()
    return encoded_string  # This will be written to MongoDB

Things I have tried:

  1. As mentioned above I tried compiling with Cython
  2. I have tried to use the lycon library but could not see how to load from bytes.
  3. I have tried using Pillow SIMD. It made things slower.
  4. I am able to use multiprocessing. But I want to optimise the function before I parallalize it.

Thank you!

database – Best way to store and fetch image bytes?

I have a case in my startup where I need to fetch images in bytes as fast as possible.

At the moment I’m storing the images in Azure Storage then I fetch them on the run and cache them into my database as bytes. On future requests these bytes will be fetched from the database directly. Fetching the bytes from the database is much faster, however, overtime it accumulated and now my database storage is 80% used.

My database is also on Azure and reserving more size will be expensive, so I’m looking into a better cheap way to achieve this.

php – POST Content-Length of 54042524 bytes exceeds the limit of 41943040 bytes

Tengo un servidor en php, lo he elaborado en base a una arquitectura de microservisios por lo que funciona más como una API, pero este no es el problema exacto. Al momento de subir archivos al servidor este retorna el error

Warning: POST Content-Length of 54042524 bytes exceeds the limit of 41943040 bytes in Unknown on line 0

Lo que estoy tratando de hacer es una página web para subir aplicaciones (parecida a una Play Store) por lo que nos encontramos en el escenario de subir aplicaciones .exe o apks y es en este momento en el que me tope con este error. Por lo visto es por el tamaño del archivo.

elseif ($_SERVER('REQUEST_METHOD') == "POST") {
    header("Content-type: application/json; charset=utf-8");
    
    check_basic(); //comprueba que exista la sesión
    //compruebo que la información no este vacía y que exista.
    if(!(isset($_POST("name"))) || !(isset($_POST("type"))) || !(isset($_POST("version"))) || !(isset($_POST("descripcion"))) ||  
    !(isset($_FILES("image"))) || !(isset($_FILES("app"))) ||empty($_POST("name")) || empty($_POST("type")) || empty($_POST("version"))  ||  
    empty($_POST("descripcion")) || empty($_FILES("image")) || empty($_FILES("app")) ){
        echo json_encode(array('message'=>"Data incomplete"));
        return http_response_code(403);
    }
//...
}

Debajo del mensaje que puse arriba me sale {"message":"Data incomplete"} por lo que significa que no pasa del primer if y por eso no pongo más código ya que todo termina en la primera condición.

He visto esta pregunta donde se detalla el mismo problema, pero no ha tenido una respuesta. Sin embargo, el problema no es necesariamente limitar el tamaño del archivo (modificando el php.ini), ya que por ejemplo, una aplicación de escritorio puede ser muy pesada y en ese caso no se podría subir a mi servidor.

Mi problema en si es el poder subir archivos sin una restricción de tamaño, para que así cualquier aplicación pueda ser subida. Desconozco si existe alguna forma de subir un archivo por bloques. Entonces ¿Cómo podría evitar un error por el tamaño de un archivo?.

8 – setError() breaks validation and causes an “Allowed memory size of xxx bytes exhausted” error

I’m creating a simple form with the following code.

<?php

namespace Drupalmy_example_moduleForm;

use DrupalCoreFormFormBase;
use DrupalCoreFormFormStateInterface;

/**
 * Class TestForm.
 */
class TestForm extends FormBase {

  /**
   * {@inheritdoc}
   */
  public function getFormId() {
    return 'test_form';
  }

  /**
   * {@inheritdoc}
   */
  public function buildForm(array $form, FormStateInterface $form_state) {
    $form('name') = (
      '#type' => 'textfield',
      '#title' => $this->t('Name'),
      '#maxlength' => 64,
      '#size' => 64,
      '#weight' => '0',
    );

    $form('submit') = (
      '#type' => 'submit',
      '#value' => $this->t('Submit'),
    );

    return $form;
  }

  /**
   * {@inheritdoc}
   */
  public function validateForm(array &$form, FormStateInterface $form_state) {
    $form_state->setErrorByName('name', $this->t('Error'));
  }

  /**
   * {@inheritdoc}
   */
  public function submitForm(array &$form, FormStateInterface $form_state) {
    $form_state->setRebuild(true);
  }

}

When I use setError() or setErrorByName(), I get an Allowed memory size of XXX bytes exhausted error when I submit the form.

I got the same output if I use setRebuild().

The logs gives me thousands of 404 errors for the form page, like it was giving me an infinite loop.

It can happen on any custom form, not specifically this one.

How can I fix this error?

react – Gerado apk com zero bytes

Fala pessoal estou com problema em gerar um apk com react native.

Ele chega buildar o projeto e executar no simulador, porém o arquivo app-release.apk que é gerado na pasta android/app/build/outputs/apk vem com zero bytes.

inserir a descrição da imagem aqui

Alguém já pegou algo parecido?

512 bytes missing from micro sd card when in Chipal CF card adapter

Background:
I am on Linux Mint 20,

I just bought a “Chipal” micro-sd to CF memory card adapter,
I have several others bought on other occasions,
some SDcard, some microSD > to CF.

Now I have a problem; on this last purchase there is ONE BLOCK missing when the micro sd card is in the adapter !
Missing 512 bytes from the sdcard when its inserted in this adapter !
Normal size for the sdcard when in my two direct micro-sd card adapters is “31116288” blocks (16 GB cards) – however in this odd CF converter its 31116287 , 0,5 KB or 512 bytes more accurately less.

That makes me fear foul play – can someone have inserted BAD USB code in this cfcard adapter ? Please alleviate my fears !
I dont know what to ask really.
Initially first insert, I got an error message too in DMESG about FAT partition being to big and some wraparound truncating being done.

Please – any insight or info being appreciated !

Best regards