binary – Why does the BOM consist of two bytes instead of one for example in encoding utf-16

The BOM started out as an encoding trick. It was not part of Unicode, it was something that people discovered and cleverly (ab)used.

Basically, they found the U+FEFF Zero width non-breaking space character. Now, what does a space character that has a width of zero and does not induce a linebreak do at the very beginning of a document? Well, absolutely nothing! Adding a U+FEFF ZWNBSP to the beginning of your document will not change anything about how that document is rendered.

And they also found that the code point U+FFFE (which you would decode this as, if you decoded UTF-16 “the wrong way round”) was not assigned. (U+FFFE0000, which is what you would get from reading UTF-32 the wrong way round, is simply illegal. Codepoints can only be maximal 21 bits long.)

So, what this means is that when you add U+FEFF to the beginning of your UTF-16 (or UTF-32) encoded document, then:

  • If you read it back with the correct Byte Order, it does nothing.
  • If you read it back with the wrong Byte Order, it is a non-existing character (or not a code point at all).

Therefore, it allows you to add a code point to the beginning of the document to detect the Byte Order in a way that works 100% of the time and does not alter the meaning of your document. It is also 100% backwards-compatible. In fact, it is more than backwards-compatible: it actually works as designed even with software that doesn’t even know about this trick!

It was only later, after this trick had been widely used for many years that the Unicode Consortium made it official, in three ways:

  • They explicitly specified that the code point U+FEFF as the first code point of a document is a Byte Order Mark. When not at the beginning of the document, it is still a ZWNBSP.
  • They explicitly specified that U+FFFE will never be assigned, and is reserved for Byte Order Detection.
  • They deprecated U+FEFF ZWNBSP in favor of U+2060 Word Joiner. New documents that have an U+FEFF somewhere in the document other than as the first code point should no longer be created. U+2060 should be used instead.

So, the reason why the Byte Order Mark is a valid Unicode character and not some kind of special flag, is that it was introduced with maximum backwards-compatibility as a clever hack: adding a BOM to a document will never change it, and you don’t need to do anything to add BOM detection to existing software: if it can open the document at all, the byte order is correct, and if the byte order is incorrect, it is guaranteed to fail.

If, on the other hand, you try to add a special one-octet signature, then
all UTF-16 and UTF-32 reader software in the entire world has to be updated to recognize and process this signature. Because if the software does not know about this signature, it will simply try to decode the signature as the first octet of the first code point, and the first octet of the first code point as the second octet of the first code point, and further decode the entire document shifted by one octet. In other words: adding the BOM would completely destroy any document, unless every single piece of software in the entire world that deals with Unicode is updated before the first document with a BOM gets produced.

However, going back to the very beginning, and to your original question:

Why does the BOM consist of two bytes

It seems that you have a fundamental misunderstanding here: the BOM does not consist of two bytes. It consists of one character.

It’s just that in UTF-16, each code point gets encoded as two octets. (To be fully precise: a byte does not have to be 8 bits wide, so we should talk about octets here, not bytes.) Note that in UTF-32, for example, the BOM is not 2 octets but 4 (0000FEFF or FFFE0000), again, because that’s just how code points are encoded in UTF-32.

encoding – Convert PathBuf to file URL in Rust

I’ve implemented a custom trait on std::path::PathBuf with two methods, one which turns a PathBuf into a file url and one which constructs a PathBuf from a file url.

Would love some feedback on the code, especially any violations of naming conventions or unnecessary allocations, I’m pretty new to Rust.

Also (of course) if there’s some edge case I’m failing to consider please let me know!

use std::error::Error;
use std::fmt;
use std::path::PathBuf;
use std::string::FromUtf8Error;

use lazy_static::lazy_static;
use regex::Regex;
use urlencoding::{decode, encode};

lazy_static! {
    // We don't want to percent encode the colon on a Windows drive letter.
    static ref WINDOWS_DRIVE: Regex = Regex::new(r"(a-zA-Z):").unwrap();
    static ref SEPARATOR: Regex = Regex::new(r"(/\)").unwrap();
}

#(derive(Debug))
pub struct UTFDecodeError {
    details: String,
}

impl UTFDecodeError {
    fn new(msg: &str) -> UTFDecodeError {
        UTFDecodeError {
            details: msg.to_string(),
        }
    }
}

impl fmt::Display for UTFDecodeError {
    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
        write!(f, "{}", self.details)
    }
}

impl Error for UTFDecodeError {
    fn description(&self) -> &str {
        &self.details
    }
}

pub fn encode_file_component(path_part: &str) -> String {
    // If it's a separator char or a Windows drive
    // return as-is.
    if SEPARATOR.is_match(path_part) || WINDOWS_DRIVE.is_match(path_part) {
        path_part.to_owned()
    } else {
        encode(path_part).to_string()
    }
}

pub trait PathBufUrlExt {
    fn to_file_url(&self) -> Result<String, UTFDecodeError>;
    fn from_file_url(file_url: &str) -> Result<PathBuf, FromUtf8Error>;
}

impl PathBufUrlExt for PathBuf {
    fn to_file_url(&self) -> Result<String, UTFDecodeError> {
        let path_parts: Result<PathBuf, UTFDecodeError> = self
            .components()
            .map(|part| match part.as_os_str().to_str() {
                Some(part) => Ok(encode_file_component(part)),
                None => Err(UTFDecodeError::new("File path not UTF-8 compatible!")),
            })
            .collect();

        match path_parts {
            // Unwrap shouldn't fail here since everything should be properly encoded.
            Ok(parts) => Ok(format!("file://{}", parts.to_str().unwrap())),
            Err(e) => Err(e),
        }
    }

    fn from_file_url(file_url: &str) -> Result<PathBuf, FromUtf8Error> {
        let without_prefix = file_url;
        let res: Result<Vec<String>, FromUtf8Error> = SEPARATOR.split(without_prefix)
            .enumerate()
            .map(|(i, url_piece)| {
                if i == 0 && url_piece == "file:" {
                    // File url should always be abspath
                    Ok("/".to_owned())
                } else {
                    let s = decode(url_piece);
                    match s {
                        Ok(cow) => Ok(cow.into_owned()),
                        Err(e) => Err(e),
                    }
                }
            })
            .collect();

        match res {
            Ok(parts) => Ok(parts.iter().collect::<PathBuf>()),
            Err(e) => Err(e),
        }
    }
}

#(cfg(test))
mod tests {
    use std::path::PathBuf;
    use crate::PathBufUrlExt;

    #(test)
    fn basic_pathbuf_to_url() {
        let p = PathBuf::from("/some/file.txt");
        let url = p.to_file_url().unwrap();
        let s = url.as_str();
        assert_eq!(s, "file:///some/file.txt");
    }

    #(test)
    fn oddball_pathbuf_to_url() {
        let p = PathBuf::from("/gi>/some & what.whtvr");
        let url = p.to_file_url().unwrap();
        let s = url.as_str();
        assert_eq!(s, "file:///gi%3E/some%20%26%20what.whtvr");
    }

    #(cfg(target_os = "windows"))
    #(test)
    fn windows_pathbuf_to_url() {
        let p = PathBuf::from(r"c:WINDOWSclock.avi");
        let url = p.to_file_url().unwrap();
        let s = url.as_str();
        assert_eq!(s, "file:///c:/WINDOWS/clock.avi");
    }

    #(test)
    fn basic_pathbuf_from_url() {
        let one = PathBuf::from("/some/file.txt");
        let two = PathBuf::from_file_url("file:///some/file.txt").unwrap();
        assert_eq!(one, two);
    }

    #(test)
    fn oddball_pathbuf_from_url() {
        let one = PathBuf::from_file_url("file:///gi%3E/some%20%26%20what.whtvr").unwrap();
        let two = PathBuf::from("/gi>/some & what.whtvr");
        assert_eq!(one, two);
    }
}

```

computability – Necessity of encoding for certain models of computation

Consider the following model of computation (from here).
FRACTRAN

Although Fractran is Turing-complete, it assumes that the “user” is able to perform the steps of encoding the input ($2^{n + 1}$) and decoding the output ($log_3(x) – 1$). That is, as mentioned by the second bullet, there are computable functions that cannot be computed by Fractran without an external encoder/decoder.

Is there any theory about this topic? I want to be able to characterize whether a computational model is “actually” Turing-complete, in that it can “natively” represent every computable function on the natural numbers, or if it presupposes the existence of an external encoder and decoder.

I would appreciate any terminology or references, since I’m not even sure what to Google in this case.

Is this a valid encoding of a tree structure using set theory and a valid way to extract the leaves from it?

I’m looking to formally define a tree and then extract the leaves from it in a concise way.
Does this look ok?
What is the best way of doing this?

$
Y = {a,b,c,d,e,f,g} \
R = {a mapsto b, a mapsto d, d mapsto e, d mapsto f, f mapsto g} text{, where R is a relation on Y.} \
R^+ = {a mapsto b, a mapsto d, d mapsto e, d mapsto f, a mapsto e, a mapsto f, a mapsto g} text{, where $R^+$ is the transitive closure of R.} \
leaves = {x in range(R^+) mid R(x) notin dom(R^+) }
$

web application – Exploit an XSS without and valid encoding

I’m making a web penetration test, so I trying to exploit a XSS, I’ve looked that if I use a payload like the following:

<script>alert(document.domain)</script>

The output in the page will be

&lt;script&gt;alert(document.domain)&lt;/script&gt;

So I’m looking for alternatives payloads or valid encoding, already I tryied these encoding:

http://evuln.com/tools/xss-encoder/

Anyone have an idea about exploiting this XSS or is it possible exploiting an xss without using </> ?

Change Azure encoding for PostgreSQL to SQL_ASCII

I need to create a database with the SQL_ASCII encoding on my Azure PostgreSQL server, however, it doesn’t allow me to.

Here is the SQL I attempt to execute:

CREATE DATABASE sipis OWNER sipis ENCODING 'SQL_ASCII' LC_COLLATE 'C' LC_CTYPE 'C';

And I get the following errors:

(2021-08-09 20:01:44) (22023) ERROR: new encoding (SQL_ASCII) is incompatible with the encoding of the template database (UTF8)
(2021-08-09 20:01:44) Hint: Use the same encoding as in the template database, or use template0 as template.

I have tried changing the client_encoding on the Azure portal to no avail (it still results in the exact same error).

Is this even possible? The following is how it can be done on a local server installed with PostgreSQL:

UPDATE pg_database SET datistemplate = FALSE WHERE datname = 'template1';
DROP DATABASE template1;
CREATE DATABASE template1 WITH TEMPLATE = template0 ENCODING = 'SQL_ASCII' LC_COLLATE 'C' LC_CTYPE 'C';
UPDATE pg_database SET datistemplate = TRUE WHERE datname = 'template1';

algorithms – If in a given text the frequency of the letter A is 0.5, then the number of bits encoding in the Hoffman code for the text is 1

Suppose that no symbol has frequency $0$ (otherwise the claim is false).

Consider the tree $T$ built by the standard greedy algorithm to construct the Huffman code. This algorithm maintains a forest $F$ where each node $v$ is associated with a frequency $f_v$. Initially $F$ contains a collection of isolated vertices, one per input symbol (with the corresponding frequencies).
Then the algorithm greedly selects the two trees $T_1, T_2 in F$ rooted in the vertices with minimum frequencies and replaces them with the tree obtained by merging $T_1$ and $T_2$ into a single tree via the addition of new root $r$. The frequency $f_r$ of $r$ is the sum of the frequencies of the roots of $T_1$ and $T_2$.

Let $a$ be the node corresponding to symbol $A$ and
suppose towards a contradiction that the depth of $a$ in $T$ is at least $2$.
Let $a’$ and $b$ be the parent and the sibling of $a$ in $T$, respectively.
Since $a’$ cannot be the root (it has depth $ge 1$), it must have a sibling $x$. Some vertex $y$ of the subtree of $T$ rooted in $x$ was the root of a tree in $F$ when the isolated vertex $a$ was merged. Therefore the frequency of $y$ is at least the frequency of $a$ (othewise either $a$ or $b$ would have been merged with $y$ instead).

This is a contradiction since $f_a + f_b + f_y ge 0.5 + f_b + 0.5 = 1 + f_b > 1$.

Encoding problems in drush 7 running on Windows

I am not very familiar with PHP, Drupal and drush but I inherited a site running on Windows Server 2019 (in the process of being migrated from Ubuntu). When I want to download some webforms using drush wfx, special characters like German “Umlaute” ä, ö and ü in the webform contents are not correctly displayed.

I am running Drupal 7.81 und using drush 7.0.0. I also tried to specifially set the encoding to UTF-8 in the config file drushrc.php (even though the description in the file says, it should not be necessary because drush will use UTF-8 anyway). In the browser, the characters in the webform show up correctly.

The database character set is utf8.

Any idea what to try and how to troubleshoot?

PS > Get-CimInstance -ClassName Win32_OperatingSystem | fl Caption,Version

Caption : Microsoft Windows Server 2019 Standard
Version : 10.0.17763
PS C:xamppmysqlbin> .mysql.exe -uroot
Welcome to the MariaDB monitor.  Commands end with ; or g.
Your MariaDB connection id is 7405
Server version: 10.1.38-MariaDB mariadb.org binary distribution

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or 'h' for help. Type 'c' to clear the current input statement.

MariaDB ((none))> SELECT @@character_set_database,@@collation_database;
+--------------------------+----------------------+
| @@character_set_database | @@collation_database |
+--------------------------+----------------------+
| latin1                   | latin1_swedish_ci    |
+--------------------------+----------------------+
1 row in set (0.00 sec)

MariaDB ((none))> USE drupal;
Database changed
MariaDB (drupal)> SELECT @@character_set_database,@@collation_database;
+--------------------------+----------------------+
| @@character_set_database | @@collation_database |
+--------------------------+----------------------+
| utf8                     | utf8_general_ci      |
+--------------------------+----------------------+
1 row in set (0.00 sec)
PS C:xampphtdocsonboarding> drush version
 Drush Version   :  7.0.0

PS C:xampphtdocsonboarding> drush status
 Drupal version                  :  7.81
 Site URI                        :  http://default
 Database driver                 :  mysql
 Database hostname               :  localhost
 Database port                   :
 Database username               :  drupal
 Database name                   :  drupal
 Drupal bootstrap                :  Successful
 Drupal user                     :
 Default theme                   :  garland
 Administration theme            :  seven
 PHP executable                  :  php.exe
 PHP configuration               :  C:drushphpphp.ini
 PHP OS                          :  WINNT
 Drush script                    :  C:drushvendordrushdrushdrush.php
 Drush version                   :  7.0.0
 Drush temp directory            :  C:Users%USERNAME%AppDataLocalTemp13
 Drush configuration             :  C:ProgramDatadrushetcdrushdrushrc.php
 Drush alias files               :
 Install profile                 :  standard
 Drupal root                     :  C:xampphtdocsonboarding
 Site path                       :  sites/default
 File directory path             :  sites/default/files
 Temporary file directory path   :  /tmp

0DayHost.com 2x E5-2678 v3/AMD Ryzen 3700X, NVMe, SSD, Streaming, Encoding 1/4/10Gbps

Hello WHT,

we are presenting you cheap 1Gbps, 4Gbps, 10Gbps RDP Plans @ Supreme Speed’s

Server Location: Amsterdam, Netherlands (NL)

Regular RDP 4Gbps, 10Gbps Plans:

2x Intel Xeon E5-2678 v3

128GB DDR4

200GB SAS

10Gbit @ Unlimited

$30.00 p/m

2x Intel Xeon E5-2678 v3

128GB DDR4

300GB SAS

10Gbit @ Unlimited

$40.00 p/m

2x Intel Xeon E5-2678 v3

128GB DDR4

500GB SAS

10Gbit @ Unlimited

$50.00 p/m

AMD Ryzen 7 3700X

32GB DDR4

500GB HDD

4Gbit @ Unlimited

$15.00 p/m

AMD Ryzen 7 3700X

32GB DDR4

750GB HDD

4Gbit @ Unlimited

$20.00 p/m

AMD Ryzen 7 3700X

32GB DDR4

1TB HDD

4Gbit @ Unlimited

$25.00 p/m

AMD Ryzen 7 3700X

32GB DDR4

1.5TB HDD

4Gbit @ Unlimited

$35.00 p/m

AMD Ryzen 7 3700X

32GB DDR4

2TB HDD

4Gbit @ Unlimited

$40.00 p/m

AMD Ryzen 7 3700X

32GB DDR4

4TB HDD

4Gbit @ Unlimited

$60.00 p/m

SSD RDP 4Gbps, 10Gbps Plans:

2x Intel Xeon E5-2678 v3

128GB DDR4

100GB SSD

10Gbit @ Unlimited

$30.00 p/m

2x Intel Xeon E5-2678 v3

128GB DDR4

200GB SSD

10Gbit @ Unlimited

$40.00 p/m

2x Intel Xeon E5-2678 v3

128GB DDR4

300GB SSD

10Gbit @ Unlimited

$50.00 p/m

2x Intel Xeon E5-2678 v3

128GB DDR4

500GB SSD

10Gbit @ Unlimited

$60.00 p/m

2x Intel Xeon E5-2678 v3

128GB DDR4

930GB SSD

10Gbit @ Unlimited

$70.00 p/m

AMD Ryzen 7 3700X

32GB DDR4

200GB SSD

4Gbit @ Unlimited

$15.00 p/m

AMD Ryzen 7 3700X

32GB DDR4

250GB SSD

4Gbit @ Unlimited

$20.00 p/m

AMD Ryzen 7 3700X

32GB DDR4

300GB SSD

4Gbit @ Unlimited

$25.00 p/m

AMD Ryzen 7 3700X

32GB DDR4

400GB SSD

4Gbit @ Unlimited

$30.00 p/m

NVMe RDP 4Gbps, 10Gbps Plans:

2x Intel Xeon E5-2678 v3

128GB DDR4

200GB NVMe

10Gbit @ Unlimited

$45.00 p/m

2x Intel Xeon E5-2678 v3

128GB DDR4

300GB NVMe

10Gbit @ Unlimited

$55.00 p/m

2x Intel Xeon E5-2678 v3

128GB DDR4

450GB NVMe

10Gbit @ Unlimited

$60.00 p/m

2x Intel Xeon E5-2678 v3

128GB DDR4

930GB NVMe

10Gbit @ Unlimited

$75.00 p/m

AMD Ryzen 7 3700X

32GB DDR4

232GB NVMe

4Gbit @ Unlimited

$25.00 p/m

10Gbps Encoding RDP Plans:

2x Intel Xeon E5-2678 v3

128GB DDR4

500GB SAS

10Gbit @ Unlimited

$35.00 p/m

2x Intel Xeon E5-2678 v3

128GB DDR4

750GB SAS

10Gbit @ Unlimited

$40.00 p/m

2x Intel Xeon E5-2678 v3

128GB DDR4

2TB SAS

10Gbit @ Unlimited

$50.00 p/m

2x Intel Xeon E5-2678 v3

128GB DDR4

300GB SSD

10Gbit @ Unlimited

$40.00 p/m

2x Intel Xeon E5-2678 v3

128GB DDR4

400GB SSD

10Gbit @ Unlimited

$50.00 p/m

2x Intel Xeon E5-2678 v3

128GB DDR4

300GB NVMe

10Gbit @ Unlimited

$50.00 p/m

2x Intel Xeon E5-2678 v3

128GB DDR4

500GB SSD

10Gbit @ Unlimited

$60.00 p/m

2x Intel Xeon E5-2678 v3

128GB DDR4

500GB NVMe

10Gbit @ Unlimited

$65.00 p/m

2x Intel Xeon E5-2678 v3

128GB DDR4

930GB SSD

10Gbit @ Unlimited

$70.00 p/m

Streaming RDP Plans:

2x Intel Xeon Silver 4208

64GB DDR4

50GB SSD

RTX 4000

1Gbit @ Unlimited

$20.00 p/m

2x Intel Xeon Silver 4208

64GB DDR4

100GB SSD

RTX 4000

1Gbit @ Unlimited

$25.00 p/m

2x Intel Xeon Silver 4208

64GB DDR4

150GB SSD

RTX 4000

1Gbit @ Unlimited

$30.00 p/m

You can create ticket for Custom Plans

We also provide Demo so you can check speeds and others things which you like https://www.webhostingtalk.com/

Payment Methods

Credit/Debit Cards, PayPal, Webmoney, Bitcoin & Altcoins, Perfect Money Also Accepted

https://0dayhost.com

Regards
0DayHost

Prevent Hydra from URL encoding spaces

I’m having some problems with JSON requests in Hydra. I have a wordlist which consists of lots of lines; each with 2 words separated by a space. I run Hydra using the following command:

sudo hydra -l "" -P wordlist.txt example.com http-post-form "/api/:{"password":"^PASS^"}:"success":false}:H=Accept: */*:H=Accept-encoding: gzip, deflate, br:H=Accept-Language: en-US,en;q=0.9:H=Content-Type: application/json"

When looking at the request body in Wireshark, it appears like this:

{"password":"example%20words"}

When what I would hope to see is:

{"password":"example words"}

I have looked around for a little bit, and can’t seem to find a way to disable this. Hope someone can help me out, thanks.