Subobject for given Subobject of kernel

Let $mathcal{C}$ be an abelian category and $f:Arightarrow B$ a morphism in $mathcal{C}$. Suppose there is a subobject $T leq ker(f)$. Is there always a subobject $A’leq A$ such that $ker(f_{|A’})=T$ and $mathrm{im}(f_{|A’})=mathrm{im}(f)$?

Missing usbhid driver. 18.04 kernel 5.4.0-1040-azure

USB keyboard and mouse not working as driver not present.

# uname -r
5.4.0-1040-azure

# modprobe usbhid
modprobe: FATAL: Module usbhid not found in directory /lib/modules/5.4.0-1040-azure

# ls /lib/modules/5.4.0-1040-azure/kernel/drivers/hid
hid-asus.ko     hid-hyperv.ko  hid.ko          hid-maltron.ko  hid-redragon.ko       hid-steam.ko      hid-wiimote.ko
hid-cougar.ko   hid-ite.ko     hid-led.ko      hid-mf.ko       hid-sensor-custom.ko  hid-udraw-ps3.ko  intel-ish-hid
hid-generic.ko  hid-jabra.ko   hid-macally.ko  hid-nti.ko      hid-sensor-hub.ko     hid-viewsonic.ko  uhid.ko

# apt list --installed | egrep '^linux'|grep "$(uname -r)"

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

linux-headers-5.4.0-1040-azure/bionic-updates,bionic-security,now 5.4.0-1040.42~18.04.1 amd64 (installed)
linux-image-5.4.0-1040-azure/bionic-updates,bionic-security,now 5.4.0-1040.42~18.04.1 amd64 (installed,automatic)
linux-modules-5.4.0-1040-azure/bionic-updates,bionic-security,now 5.4.0-1040.42~18.04.1 amd64 (installed,automatic)
linux-modules-extra-5.4.0-1040-azure/bionic-updates,bionic-security,now 5.4.0-1040.42~18.04.1 amd64 (installed,automatic)
linux-tools-5.4.0-1040-azure/bionic-updates,bionic-security,now 5.4.0-1040.42~18.04.1 amd64 (installed,automatic)
```

I've reinstalled linux-modules-extra-5.4.0-1040-azure without success, any suggestions ?

testing – Basic kernel test fails with Type of DrupalKernelTestsCoreEntityEntityKernelTestBase::$modules must be array

Trying to port a Drupal 7 module to 9 with accompanying test and getting this error immediately when running it:

PHP Fatal error:  Type of DrupalKernelTestsCoreEntityEntityKernelTestBase::$modules must be array (as in class DrupalKernelTestsKernelTestBase) in /var/www/html/docroot/core/tests/Drupal/KernelTests/Core/Entity/EntityKernelTestBase.php on line 12

The test:

class MyModuleTest extends EntityKernelTestBase {

  use NodeCreationTrait;

  /**
   * @var DrupalmymoduleQueryYearsQuery
   */
  protected $query;

  /**
   * @var array $modules
   */
  protected static $modules = (
    'node',
    'mymodule',
    'mymodule_test',
  );

  /**
   * {@inheritdoc}
   */
  protected function setUp() : void {
    parent::setUp();
    ...

I checked around some of the core classes – the $modules declaration looks the same. Note that I am using PHPUnit 9.

ubuntu – How can I add a kernel argument to a debian preseed file?

I’ve been using Debian preseed files for a while now doing netinstalls of Debian and Ubuntu.

Ubuntu 20.04 has a weird video problem even on the text terminal. After install, sometimes you can’t see anything. The host is working over the network properly but nothing on vga output.

Adding nomodeset to the kernel commmandline list fixes it.

I’d like to just add nomodeset to the kernel args that the system is installed with but I’m having a heck of a time finding the preseed option for specifying additional kernel arguments / kernel commandline.

I tried merely adding nomodeset during the launch of the installer but that didn’t appear to take hold on the installed environment eiher.

What is the proper way in a Debian Installer preseed file, to specify additional kernel arguments that should be applied to the installed system?

os kernel – OS: Why is it necessary to have hardware support for implementing Preemptive Scheduling Strategies?

I think, Preemption can easily be done in kernel mode, where it just have to call the Context-Switch procedure. Also, based on algorithm we can select the new process from the ready queue as well…

I’m unable to think and find Why and where the role of timers come and what exactly are significance of Hardware in preemptive scheduling.

Can we not do preemptive scheduling without hardware support?

Why I didn’t ask on Stack-overflow?

=> I need the theoretical point, although I’ll appreciate if someone mentions the actual implementations of Unix.

real analysis – A question on a simple integral with a singular kernel?

I asked this question on math.stackexchange

https://math.stackexchange.com/posts/4041263/edit

No answers or very useful comments there.
May be it is more appropraite for mathoverflow.

Fix a small $delta>0$ and let $p,q>1$. Consider the integral

$$I(p,q):=intlimits_{1-delta}^{1+delta}
intlimits_{y/2}^{2y}frac{1}{|y-x|^{frac{1}{p}}|1-x|^{frac{1}{q}}}
,mathrm{d}x,mathrm{d}y.
$$

I am trying to show that $I(p,q)$ diverges if $frac{1}{p}+frac{1}{q}geq 1$. I am not sure this is even the case ? Any hints on how to handle this?

Remark: This seems to be related to the failure of the Hardy-Littlewood-Sobolev inequality (HLS) at the endpoint $p=1$. HLS reads:

If $1<p,q<infty$, $fin L^p$ and

$$Tf(x):=int_{mathbb{R}^n} frac{f(y)}{|x-y|^{gamma}}dy$$

Then $$|Tf|_qleq |f|_p$$
if and only if
$$frac{1}{p}-frac{1}{q}=1-frac{gamma}{n}.$$

Many thanks.

nt.number theory – Sum of inverse squares of numbers divisible only by primes in the kernel of a quadratic character

Let $chi$ be a primitive quadratic Dirichlet character of d modulus $m$, and consider the product
$$prod_{substack{p text{ prime} \ chi(p) = 1}} (1-p^{-2})^{-1}.$$

What can we say about the value of this product? Do we have good upper or lower bounds?

Some observations, ideas, and auxiliary questions

  • When $chi$ is trivial, it has value $zeta(2)$.
  • In general, since Chebotarev density theorem (CDT) tells us that $chi(p)$ is equidistributed in the limit, I would “want” the value to be something like

$$Big(zeta(2)prod_{p | m} (1-p^{-2})Big)^{frac{1}{2}}.$$

However, if I’m not mistaken, it seems that the error terms in effective forms of CDT may cause this to be very far from the truth. We can’t ignore what happens before we are close to equidistribution as the tail and the head are both $O(1)$. We can’t even control the error term well (without GRH) because of Siegel zeroes.

  • I don’t think we can appeal to Dirichlet density versions of CDT since those only tell us things in the limit as $s$ goes to $1$ and here $s = 2$.
  • Is there a way to “Dirichlet character”-ify a proof of $zeta(2) = pi^2/6$ to get a formula for this more general case? At least with Euler’s proof via Weierstrass factorization, it seems that we would need some holomorphic function which has zeroes whenever $chi(n) = 1$.

I had a few other ideas but they all seem to run into the same basic problem of “can’t ignore the stuff before the limit”… am I missing something?

linux – How do I boot xen kernel on Fedora

In stalled xen on Fedora 33 and I get this error when I try to load the xen kernel

Loading Xen 4.14.1 ...
error: ../../grub-core/fs/fshelp.c:257:file
`/EFI/fedora/x86_64-efi/multiboot2.mod' not found.
error: ../../grub-core/script/function.c:119:can't find command `multiboot2'.
Loading Linux 5.8.15-301.fc33.x86_64 ...
error: ../../grub-core/script/function.c:119:can't find command `module2'.
Loading initial ramdisk ...
error: ../../grub-core/script/function.c:119:can't find command `module2'.

Is this the right kernel 1.1M gzip file and how do you boot this to test this?

kernel – How do I install drivers for AMD grpahics card?

So I’ve just downloaded Ubuntu(using it as a dual-boot) on my machine – a 6800 graphics card and 5900x processor. Although the install was successful, the screen is very laggy/unresponsive, like its using integrated graphics. I’ve heard you don’t need to install proprietary drivers, but it just feels like I’m running on integrated graphics and that its somehow not picking up my GPU. I’ve tried installing the amd-gpu-pro drivers but keep getting this same error. I’ve also noticed that it won’t let me change my resolution or refresh rate in settings. Any help would be appreciated.

trying to install amdgpu-pro

c++ – Optimizing a diagonal matrix-vector multiplication (?diamv) kernel

For an (completely optional) assignment for an introductory course to programming with C++, I am trying to implement a diagonal matrix-vector multiplication (?diamv) kernel, i.e. mathematically
$$mathbf{y} leftarrow alphamathbf{y} + beta mathbf{M}mathbf{x}$$
for a diagonally clustered matrix $mathbf{M}$, dense vectors $mathbf{x}$ and $mathbf{y}$, and scalars $alpha$ and $beta$. I believe that I can reasonably motivate the following assumptions:

  1. The processors executing the compute threads are capable of executing the SSE4.2 instruction set extension (but not necessarily AVX2),
  2. The access scheme of the matrix $mathbf{M}$ does not affect the computation and therefore temporal cache locality between kernel calls does not need to be considered,
  3. The matrix $mathbf{M}$ does not fit in cache, is very diagonally clustered with a diagonal pattern that is known at compile time, and square,
  4. The matrix $mathbf{M}$ does not contain regularly occurring sequences in its diagonals that would allow for compression along an axis,
  5. No reordering function exists for the structure of the matrix $mathbf{M}$ that would lead to a cache-oblivious product with a lower cost than an ideal multilevel-memory optimized algorithm,
  6. The source data is aligned on an adequate boundary,
  7. OpenMP, chosen for its popularity, is available to enable shared-memory parallelism. No distributed memory parallelism is necessary as it is assumed that a domain decomposition algorithm, e.g. DP-FETI, will decompose processing to the node level due to the typical problem size.

Having done a literature review, I have come to the following conclusions on its design and implementation (this is a summary, in increasing granularity, with the extensive literature review being available upon request to save space):

  1. “In order to achieve high performance, a parallel implementation of a sparse matrix-vector multiplication must maintain scalability” per White and Sadayappan, 1997.
  2. The diagonal matrix storage scheme,
    $$DeclareMathOperator{vec}{vec}DeclareMathOperator{val}{val}
    vecleft(val{(i,j)}equiv a_{i,i+j}right)$$

    where $vec$ is the matrix vectorization operator, which obtains a vector by stacking the columns of the operand matrix on top of one another. By storing the matrix in this format, I believe the cache locality to be as optimal as possible to allow for row-wise parallelization. Checkerboard partitioning reduces to row-wise for diagonal matrices. Furthermore, this allows for source vector re-use, which is necessary unless the matrix is re-used while still in cache (Frison 2016).
  3. I believe that the aforementioned should always hold, before vectorization is even considered? The non-regular padded areas of the matrix, i.e. the top-left and bottom-right, can be handled separately without incurring extra cost in the asymptotic sense (because the matrix is diagonally clustered and very large).
  4. Because access to this matrix is linear, software prefetching should not be necessary. I have included it anyways, for code review, at the spot which I considered the most logical.

The following snippet represents my best effort, taking the aforementioned into consideration:

#include <algorithm>
#include <stdint.h>
#include <type_traits>

#include <xmmintrin.h>
#include <emmintrin.h>

#include <omp.h>

#include "tensors.hpp"


#define CEIL_INT_DIV(num, denom)        1 + ((denom - 1) / num)

#if defined(__INTEL_COMPILER)
#define AGNOSTIC_UNROLL(N)              unroll (N)
#elif defined(__CLANG__)
#define AGNOSTIC_UNROLL(N)              clang loop unroll_count(N)
#elif defined(__GNUG__)
#define AGNOSTIC_UNROLL(N)              unroll N
#else
#warning "Compiler not supported"
#endif

/* Computer-specific optimization parameters */
#define PREFETCH                        true
#define OMP_SIZE                        16
#define BLK_I                           8
#define SSE_REG_SIZE                    128
#define SSE_ALIGNMENT                   16
#define SSE_UNROLL_COEF                 3


namespace ranges = std::ranges;


/* Calculate the largest absolute value ..., TODO more elegant? */
template <typename T1, typename T2>
auto static inline largest_abs_val(T1 x, T2 y) {
    return std::abs(x) > std::abs(y) ? std::abs(x) : std::abs(y);
}


/* Define intrinsics agnostically; compiler errors thrown automatically */
namespace mm {
    /* _mm_load_px - (...) */
    inline auto load_px(float const *__p) { return _mm_load_ps(__p); };
    inline auto load_px(double const *__dp) { return _mm_load_pd(__dp); };

    /* _mm_store_px - (...) */
    inline auto store_px(float *__p, __m128 __a) { return _mm_store_ps(__p, __a); };
    inline auto store_px(double *__dp, __m128d __a) { return _mm_store_pd(__dp, __a); };

    /* _mm_set1_px - (...) */
    inline auto set_px1(float __w) { return _mm_set1_ps(__w);};
    inline auto set_px1(double __w) { return _mm_set1_pd(__w); };

    /* _mm_mul_px - (...) */
    inline auto mul_px(__m128 __a, __m128 __b) { return _mm_mul_ps(__a, __b);};
    inline auto mul_px(__m128d __a, __m128d __b) { return _mm_mul_pd(__a, __b); };
}


namespace tensors {
    template <typename T1, typename T2>
    int diamv(matrix<T1> const &M, 
              vector<T1> const &x,
              vector<T1> &y,
              vector<T2> const &d,
              T1 alpha, T1 beta) noexcept {
        /* Initializations */
        /* - Compute the size of an SSE vector */
        constexpr size_t sse_size =  SSE_REG_SIZE / (8*sizeof(T1));
        /* - Validation of arguments */
        static_assert((BLK_I >= sse_size && BLK_I % sse_size == 0), "Cache blocking is invalid");
        /* - Reinterpretation of the data as aligned */
        auto M_ = reinterpret_cast<T1 *>(__builtin_assume_aligned(M.data(), SSE_ALIGNMENT));
        auto x_ = reinterpret_cast<T1 *>(__builtin_assume_aligned(x.data(), SSE_ALIGNMENT));
        auto y_ = reinterpret_cast<T1 *>(__builtin_assume_aligned(y.data(), SSE_ALIGNMENT));
        auto d_ = reinterpret_cast<T2 *>(__builtin_assume_aligned(d.data(), SSE_ALIGNMENT));
        /* - Number of diagonals */
        auto n_diags = d.size();
        /* - Number of zeroes for padding TODO more elegant? */
        auto n_padding_zeroes = largest_abs_val(ranges::min(d), ranges::max(d));
        /* - No. of rows lower padding needs to be extended with */
        auto n_padding_ext = (y.size() - 2*n_padding_zeroes) % sse_size;
        /* - Broadcast α and β into vectors outside of the kernel loop */
        auto alpha_ = mm::set_px1(alpha);
        auto beta_ = mm::set_px1(beta);

        /* Compute y := αy + βMx in two steps */
        /* - Pre-compute the bounding areas of the two non-vectorizable and single vect. areas */
        size_t conds_begin() = {0, M.size() - (n_padding_ext+n_padding_zeroes)*n_diags};
        size_t conds_end() = {n_padding_zeroes*n_diags, M.size()};
        /* - Non-vectorizable areas (top-left and bottom-right resp.) */
#pragma AGNOSTIC_UNROLL(2)
        for (size_t NONVEC_LOOP=0; NONVEC_LOOP<2; NONVEC_LOOP++) {
            for (size_t index_M=conds_begin(NONVEC_LOOP); index_M<conds_end(NONVEC_LOOP); index_M++) {
                auto index_y = index_M / n_diags;
                auto index_x = d(index_M % n_diags) + index_y;
                if (index_x >= 0)
                    y_(index_y) = (alpha * y_(index_y)) + (beta * M_(index_M) * x_(index_x));
            }
        }
        /* - Vectorized area - (parallel) iteration over the x parallelization blocks */
#pragma omp parallel for shared (M_, x_, y_) schedule(static)
        for (size_t j_blk=conds_end(0)+1; j_blk<conds_begin(1); j_blk+=BLK_I*n_diags) {
            /* Iteration over the x cache blocks */
            for (size_t j_bare = 0; j_bare < CEIL_INT_DIV(sse_size, BLK_I); j_bare++) {
                size_t j = j_blk + (j_bare*n_diags*sse_size);
                /* Perform y = ... for this block, potentially with unrolling */
                /* *** microkernel goes here *** */
#if PREFETCH
                /* __mm_prefetch() */
#endif
            }
        }

        return 0;
    };
}
 

Some important notes:

  1. tensors.hpp is a simple header-only library that I’ve written for the occasion to act as a uniform abstraction layer to tensors of various orders (with the CRTP) having different storage schemes. It also contains aliases to e.g. vectors and dense matrices.

  2. For the microkernel, I believe there to be two possibilities

    a. Iterate linearly over the vectorized matrix within each cache block; this would amount to row-wise iteration over the matrix $mathbf{M}$ within each cache block and therefore a dot product. To the best of my knowledge, dot products are inefficient in dense matrix-vector products due to both data dependencies and how the intrinsics decompose into μops.

    b. Iterate over rows in cache blocks in the vectorized matrix, amounting to iteration over diagonals in the matrix $mathbf{M}$ within each cache block. Because of the way the matrix $mathbf{M}$ is stored, i.e. in its vectorized form, this would incur the cost of broadcasting the floating-point numbers (which, to the best of my knowledge is a complex matter) but allow rows within blocks to be performed in parallel.

    I’m afraid that I’ve missed out some other, better, options. This is the primary reason for opening this question. I’m completely stuck. Furthermore, I believe that the differences in how well the source/destination vectors are re-used are too close to call. Does anyone know how I would approach shedding more insight into this?

  3. Even if the cache hit rate is high, I’m afraid of the bottleneck shifting to e.g. inadequate instruction scheduling. Is there a way to check this in a machine-independent way other than having to rely on memory bandwidth?

  4. Is there a way to make the “ugly” non-vectorizable code more elegant?

Proofreading the above, I feel like a total amateur; all feedback is (very) much appreciated. Thank you in advance.