file systems: force ZFS to ignore checksum errors without removing the offending disk

Please address this question with a sense of humor and do not limit yourself to voting because it is a bad idea, sometimes (very rarely!) a user is totally good with data loss and he just needs help to load his gun! After all, ZFS offers other benefits that go beyond the integrity of the data, and I prefer to use them for my defective drives that ext4. If you're the type of system administrator who reads this with a sly smile and remembers the moment you lost data by doing exactly this, this question is for you.

I am running a group with some USB drives on a non-critical server with non-critical data, and I do not care if it gets corrupted. I am trying to configure it so that ZFS does not forcibly remove USB drives when they experience checksum errors (just like how ext4 or FAT handle this scenario, without noticing / worrying about data loss).

Resignation:

For readers who arrive here through Google trying to fix their ZFS group, do not try anything described in this question or your answers, you will lose your data.

Because the ZFS police love to shout at people who use USB drives
or have any other non-standard configuration: for the sake of this discussion,
I guess they are videos of cats that I have backed in another 32 physically
Remote places on 128 redundant SSDs. I fully recognize that I you will lose 100% of
My data can not be recovered in this group (many times) if I try to do this.
I address this question to people who are curious about
single how bad an environment in which ZFS is capable of running (people
who like to take systems to their breaking points and beyond, just to
funny).

So here is the configuration:

  • HP EliteDesk server running FreeNAS-11.2-U5
  • 2x WD Elements 8TB units connected via USB 3.0
  • The unreliable power environment, the server and the drives are often reset / disconnected without warning. (yes, I have a UPS, no, I do not want to use it, I want to break this server, did not you read the disclaimer?)
  • a mirror pool hdd with the two units (with failmode = continue set)
  • a unit is stable, even after multiple reboots and forced disconnections, it seems that it never reports checksum errors or any other problem in ZFS
  • a unit is not reliable, with occasional checksum errors during normal operation (even when it is not unexpectedly disconnected), errors appear to be unrelated to the bad power environment, as it will work well for more than 10 hours and will suddenly be expelled from the group due to checksum faults

I have confirmed that the untrusted unit is due to a software or hardware problem with the USB bus on the server and not an untrusted cable or a physical problem with the unit. The way I have confirmed it is by connecting it to my MacBook with USB ports in good condition, zeroing and then writing random data in the whole unit and verifying them (it is done 3 times, 100% success each time). The unit is almost new, without other SMART indicators below 100% status. However, even if the unit failed gradually and lost a few bits here and there, I'm fine with that.

Here is the problem:

When the defective unit has checksum errors, ZFS removes it from the pool. Unfortunately, FreeNAS does not allow me to re-add it to the group without physically rebooting, or disconnecting and reconnecting the USB cable, Y power supply unit. This means that I can not schedule the re-addition process or do it remotely without restarting the whole server, I would have to be physically present to disconnect things or have an Arduino connected to the Internet and a relay connected to both cables.

Possible solutions

I have already researched a little about whether this kind of thing is possible, and it has been difficult because every time I find a relevant thread, the data integrity police intervenes and convinces the person asking to leave their unreliable configuration instead to ignore mistakes or work around them. I am resorting to asking here because I have not been able to find documentation or other answers on how to achieve this.

  • shutting down the checksums completely with zfs set checksum = off hdd, I have not done this yet because, ideally, I would like to keep the checks, so I know when the unit is misbehaving, I just want to ignore the faults
  • an indicator that maintains the checksum but ignores checksum errors / attempts to repair them without removing the unit from the group
  • a ZFS indicator that increases the limit of maximum checksum error allowed before the unit is removed (currently, the unit starts after approximately ~ 13 errors)
  • a FreeBSD / FreeNAS command that allows me to force the device online after it was deleted, without having to restart the entire server
  • a FreeBSD / FreeNAS kernel option to force this unit to never be removed
  • a FreeBSD sysctl option that magically solves the USB bus problem causing errors / wait times only in this unit (unlikely)
  • a ZFS option in Linux that does the same (I'd be willing to move these units to my Ubuntu box if I know it's possible to do it there)
  • running zpool clear hdd in a loop every 500 ms to eliminate checksum errors before they reach the threshold

I'm really trying to avoid having to resort to the use of ext4 or another file system that does not force to remove the drives after the USB errors, because I want all the other functions of ZFS as snapshots, datasets, send / recv, etc. simply trying to disable the data integrity check.

Relevant records

This is the dmesg output every time the unit misbehaves and is removed

July 7 04:10:35 freenas-lemon ZFS: state change vdev, pool_guid = 13427464797767151426 vdev_guid = 11823196300981694957
July 7 04:10:35 freenas-limón ugen0.8:  in usbus0 (offline)
July 7 04:10:35 freenas-lemon umass4: in uhub2, port 20, addr 7 (disconnected)
July 7 04:10:35 freenas-lemon da4 in umass-sim4 bus 4 scbus7 target 0 mon 0
July 7 04:10:35 lemon-lemon da4:  s / n 5641474A4D56574C separate
July 7 04:10:35 lemon-freenas (da4: umass-sim4: 4: 0: 0): destroyed perifas
July 7 04:10:35 freenas-lemon umass4: separated
July 7 04:10:46 freenas-lemon usbd_req_re_enumerate: addr = 9, the whole address failed! (USB_ERR_IOERROR, ignored)
July 7, 04:10:52 freenas-lemon usbd_setup_device_desc: the device descriptor could not be obtained in the addr 9, USB_ERR_TIMEOUT
July 7, 04:10:52 freenas-lemon usbd_req_re_enumerate: addr = 9, the address could not be established! (USB_ERR_IOERROR, ignored)
July 7, 04:10:58 freenas-lemon usbd_setup_device_desc: the device descriptor could not be obtained in the addr 9, USB_ERR_TIMEOUT
July 7 04:10:58 freenas-lemon usb_alloc_device: Error selecting the configuration index 0: USB_ERR_TIMEOUT, port 20, addr 9 (ignored)
July 7 04:10:58 freenas-limón ugen0.8:  in usbus0
July 7 04:10:58 freenas-limón ugen0.8:  in usbus0 (offline)

Can I get the CRC32C checksum of a range content when I use the HTTP range? Get requests to get a range of content in Google's cloud storage

When I want to get a partial range of file content in Google's cloud storage, I used the XML API and the HTTP HTTP. From the Google Cloud response, I can find the header x-goog-hash, and contains the checksums CRC32C and MD5. But these checksums are calculated from the entire file. What I need is the checksum crc32c of the partial range of content in the response. With that partial check sum of crc32c, I can verify the data in response, otherwise, I can not verify the validation of the response.

XML API range query
xgooghash

kubernetes – Compaction error: Corruption: unequal checksum block: expected 862584094, obtained 1969278739

Currently I try to handle this right in my weekend of 3 commemorative days: D

  • Ceph 13.2.4 (Filestore)
  • Tower 0.9
  • Kubernetes 1.14.1

https://gist.github.com/sfxworks/ce77473a93b96570af319120e74535ec

My configuration is a Kubernetes user with the address of the Ceph tower. When using 13.2.4, I have this problem with one of my OSDs always restarting. This happened recently. No power failure occurred or nothing occurred at the node.

The only thing I could find to help was https://tracker.ceph.com/issues/21303, but this one seems to be a year old. I'm not sure where to start with this. Any clue or point to the documentation to follow, or a solution if you have one, would be of great help. I see some tools for Bluestore, but I do not know how they are applicable and I want to be very careful given the situation.

In the worst case, I have backups. Willing to try things within reason.

sql server 2012 – Will I be notified that BACKUP encountered a CHECKSUM error if I use CONTINUE_AFTER_ERROR?

I am in the process of adding the WITH CHECKSUM Mark our daily SQL backups as part of an effort to better guarantee the integrity of the data.

I definitely want to know if there is ever a checksum error, but I do not want my work to stop on the water in the middle of the night; I want you to finish making a backup copy of the "incorrect" database and then continue making a backup of the other databases on the server.

If I use BACKUP WITH CHECKSUM, CONTINUE_AFTER_ERROR, Will it continue to throw the appropriate error (SEV 22 or Error 825 or whatever) that will trigger my associated alert? Or do CONTINUE_AFTER_ERROR Delete this completely, and would I only know about the problem if I analyze the output of the work step?

I would just try it, but I do not have a known database CHECKSUM inconsistencies

validation: checksum hash verification functions for bash scripts

just a small script, or a bit of a template from one that I used to send a task.
I think everything works well, I just wanted to give it to other people and receive comments; maybe make some improvements. I have several variations of the main ones. check_hash function. I have included two of them here today and it is the main element on which I would like to receive comments. Not necessarily just for this single script, but more to help me design / write better scripts in the future.

I could not really decide if I should code the variables in place (like variation # 1), or make them more modular and have the take arguments (like variation # 2). You can have static values ​​like the ones I have here, or you could read Values ​​in the script or accept positional parameters or adapt them to your liking.

Some bits have been commented. I only used them while testing parts and pieces, and I thought I would leave them there for a quick debugging in the future. You probably want to know about md5 vs Md5sum: I was taking a class in Unix & C Programming, where we used a lot of different UNIX-based systems. Some (macOS, especially), do not have a native Md5sum utility, but they have a similar utility called md5. So that's all it is.

It is not really supposed to be some kind of security / protection mechanism, it is more simply to verify the integrity of the file and make sure it is not corrupted.

#! / usr / bin / env bash

directory = & # 39; 2CvAwTx & # 39;
tarball = "$ directory.tar.gz"
md5 = & # 39; 135c72bc1e201819941072fcea882d6f & # 39;
sha = & # 398e96fd35806cd008fe79732edba00908fcbeff & # 39;

variant # 1

#################################################################################################### #########################
check_hash ()
{
#################################################################################################### #####################
check_md5 ()
{
Yes [[ "$(command -v md5)" ]];
so [[ "$(md5 ./"$tarball" | awk '{print $4}')" == "$md5" ]];
elif [[ "$(command -v md5sum)" ]];
so [[ "$(md5sum ./"$tarball" | awk '{print $1}')" == "$md5" ]];
fi
}
#################################################################################################### #####################
check_sha ()
{
        [[ "$(shasum ./"$tarball" | awk '{print $1}')" == "$sha" ]];
}
#################################################################################################### #####################
check_md5 "$ @" ||
check_sha "$ @";
}
#################################################################################################### #########################
# check_hash &&
# printf & # 39;% s  n & # 39; & # 39; true & # 39; ||
# printf & # 39;% s  n & # 39; & # 39; false & # 39 ;;

variant # 2

#################################################################################################### #########################
check_hash ()
{
#################################################################################################### #####################
check_md5 ()
{
Yes [[ "$(command -v md5)" ]];
then read -r hash _ << (md5 -q "$ 1");
            [[ $hash == "$2" ]];
elif [[ "$(command -v md5sum)" ]];
then read -r hash _ << (md5sum "$ 1");
            [[ $hash == "$2" ]];
fi
}
#################################################################################################### #####################
check_sha ()
{
read -r hash _ << (shasum "$ 1");
        [[ $hash == "$2" ]];
}
#################################################################################################### #####################
check_md5 "$ @" ||
check_sha "$ @";
}
#################################################################################################### #########################
# check_hash ./"$tarball "" $ md5 "||
# check_hash ./"$tarball "" $ sha "&&
# printf & # 39;% s  n & # 39; & # 39; true & # 39; ||
# printf & # 39;% s  n & # 39; & # 39; false & # 39 ;;

wget "http://www.example.com/$tarball" &&
check_hash "$ tarball" "$ md5" &&
tar -xzf "$ tarball" &&
cd "$ directory" &&
make && make clean &&
./a.out

Why is it so slow to configure the UUID in a checksum file system, and should I worry?

I have cloned a unit and now I am trying to change the UUID of the cloned unit. There are several answers in AskUbuntu about how to do this. They asked me to run a new e2fsck, what I did. But when I ran tune2fs, I received this message unexpectedly:

Setting UUID in a checksum file system can take some time.
Proceed anyway (or wait 5 seconds to continue)? (and, N)

While trying to process what this meant to decide what to do, he went ahead and started. It took a few minutes to complete, but now I'm worried. What is the explanation of why it takes so long? Is this something I should worry about?

hash – MD5 Checksum and injection DLL

The simple fact of checking the MD5 hash of a DLL may not be secure enough, since MD5 is not resistant to hash collisions and can not be considered secure. An attacker could inject malicious code into a DLL and modify it so that the malicious DLL returns the same MD5 hash as the "real" DLL.

As mentioned by @schroeder, you may want to use a signature-based approach in which only DLLs signed with a trusted and known certificate (for example, the primary developer) can be used.

However, this leads to some new problems, for example, Certificate management, trust management, etc.

magento2: error in verifying the checksum of the file (downloaded from https://repo.magento.com/archives/vertex/module-tax/vertex-module-tax-3.0.0.0.zip)

Some of your previous responses have not been well received and you are in danger of being blocked.

Please pay close attention to the following guide:

  • Please make sure answer the question. Provide details and share your research!

But avoid

  • Ask for help, clarifications or respond to other answers.
  • Make statements based on opinion; Support them with references or personal experience.

For more information, see our tips on how to write excellent answers.