performance – Prerendering vs. loadtime optimization

“It depends” – On the functionality of the site and the tradeoffs one is willing to make. Pre-rendering is indeed very useful for static, high volume pages or static pages where cpu is very constrained – but you give up an awful lot to get there, including a lot of user interactivity and harder maintenance.

Computers are unbelievably fast (and some software – like WordPress with lots if plugins stupidly bloated). The problem here is not dynamic content, its trying to be all things to all people. Of-course many of these frameworks have plugins ehich effectively pre-render pages on denand using caching.

Writing lightweight dynamic content can be almost indistinguishably aa fast as static content – and in these circumstances other factors like latency, bandwidth, disk IO, memory (caching( can be larger concern then CPU.

Of-course, static files can often cache better and can be easier to secure.

Prerendering can speed up complex sites, but in the simplest cases MySQL is a database and a filesystem is a database – but MySQL will allow me to index multiple keys, making searches faster on things other then a filename index.

If its speed over everything, you dont need per-user customisations and you can ignore the savings of a general solution over developer time costs – yes, pre-rendering is better. Those are big ifs.

mathematical optimization – FindMaximum works on desktop but not on laptop

I have a simple code to find maximum as follows.

myfunc = {-1 + 2/(1 + d), 1/2 (-1 + 1/d), 1 - d, 1 + 1/(-2 + d), 
   1 - d, 1 - d, -1 + 1/d, -1 + 1/d, 1/(1 + d), 1/(2 d), 1/d, 1/d, d/(
   1 - d)};
FindMaximum({##, 0 <= d <= 1}, d) & /@ myfunc

It works well on my desktop and the result is:

{{1., {d -> 0.}}, {(Infinity), {d -> 0.}}, {1., {d -> 
    0.}}, {0.5, {d -> 0.}}, {1., {d -> 0.}}, {1., {d -> 
    0.}}, {(Infinity), {d -> 0.}}, {(Infinity), {d -> 
    0.}}, {1., {d -> 0.}}, {(Infinity), {d -> 
    0.}}, {(Infinity), {d -> 0.}}, {(Infinity), {d -> 
    0.}}, {(Infinity), {d -> 1.}}}

However, I got error with $Failed when I ran it on my laptop.

{{1., {d -> 3.54538*10^-8}}, {-$Failed, {d -> 0.}}, {1., {d -> 
    0.}}, {0.5, {d -> 0.}}, {1., {d -> 0.}}, {1., {d -> 
    0.}}, {-$Failed, {d -> 0.}}, {-$Failed, {d -> 0.}}, {1., {d -> 
    0.}}, {(Infinity), {d -> Indeterminate}}, {(Infinity), {d -> 
    0.}}, {(Infinity), {d -> 0.}}, {(Infinity), {d -> 

Why does this happen? How can I solve this?

algorithms – Measure divergence in Particle Swarm Optimization

I’d like to monitor divergence/diversity in my swarm during the particle swarm optimization algorithm to measure when the swarm search space is converging.

This would be used as one metric to be recorded during the run and potentially to terminate the PSO process when not much further progress is to be expected.

Is there a common metric for measuring PSO swarm divergence?

reference request – Optimization approaches to solving PDEs

In modern numerical methods, a PDE is often recast into the form of a variational problem, which is sometimes equivalent to a minimization problem.
However in my courses on numerical analysis (say, finite element methods) the focus is not (apparently) on developing optimization techniques to minimize the arosen energy functional, but rather on approximating the variational problem on a smaller subspace.

Are there interesting approaches that focus on the minimization of the energy directly? Is research being done in this field, and could you maybe provide some reference?

cache – WP database optimization for data usage

While developing themes, I noticed that I use the same function to get some data from DB (eg the post featured image URL) in several places in my theme. It turns out that it sends several identical queries to the same melon data, so I started thinking about optimizing database calls.

At first I thought to use a static variable, which I will initialize once in functions.php with special hook at theme initialization and then refer to it in feature. But, this article recommends to use WP_Cache for best performance.

Ok, but the main drawback of this approach is its implicit use. I have to keep in mind all variables that I use for optimization, it also complicates the development team, because I have to tell other developers where and how I optimized the calls.

My main question, is there any point in such optimization, or maybe the WP engine has already taken care of this and all calls to the database are somehow cached (through object caching, for example)?

Or is it easier to use a caching plugin? For example, w3 total cache have an option to cache requests to the database.

postgresql – Ltree query performance optimization of Postgres RDS DB

I have a AWS RDS m5.large Postgres 10.13 database that performs a lot of the following queries

SELECT "bundles".* FROM "bundles" WHERE "bundles"."version_id" = $1 AND (tree_path ~ ?) LIMIT $2

Here the table structure (roughly 1M rows):

CREATE TABLE "public"."bundles" (
    "id" int8 NOT NULL DEFAULT nextval('bundles_id_seq'::regclass),
    "cached_name" varchar NOT NULL,
    "uuid" uuid NOT NULL,
    "typology" int4 NOT NULL,
    "created_at" timestamp NOT NULL,
    "updated_at" timestamp NOT NULL,
    "resource_template_id" int8,
    "state" varchar,
    "resource_id" int8,
    "version_id" int4,
    "tree_path" ltree NOT NULL,
    "tree_ordinal" int4 NOT NULL DEFAULT 1,
    CONSTRAINT "fk_rails_a0e6c8e3c8" FOREIGN KEY ("resource_template_id") REFERENCES "public"."resource_templates"("id"),
    CONSTRAINT "fk_rails_02b50dac11" FOREIGN KEY ("resource_id") REFERENCES "public"."bundles"("id"),
    PRIMARY KEY ("id")

the problem is the poor performance of the overall system. Via advanced monitoring we see a very high value for current activity:

enter image description here

and it seems that the forementioned query have some sort of impact on the load by waits enter image description here

What do you suggest to check? I’m not a DBA so I can’t judge if those queries are efficent.

mathematical optimization – Minimization of constrained variable

I am trying to perform a minimization of a variable but NMinimize() does not seems what I need (it minimize a function, but my variable is inside a function)

I want to minimize “h” with respect to “P0” and “yp” with the following constraints:

 c1 <= F(h,P0,yp) <= c2
 c1 <= G(h,P0,yp) <= c2
 c1 <= H(h,P0,yp) <= c2
 c1 <= J(h,P0,yp) <= c2

I tried:


but it does not work.

h,P0,yp are all variables and not functions.

optimization – $Omega(10n)$ algorithm to find a sequence of 3 or n tennis players with ratings in decreasing/increasing order

This problem is a follow up from this.

At a certain (unrealistic) event, we have infinitely many tennis players lined up who are conveniently numbered $(P_1,P_2, cdots)$. Each player has a certain rating which is unknown to us. A player with a higher rating ALWAYS beats a lower-rated player when the two face off in a game. The only way for us to see if $P_i$ is better than $P_j$ is to have them play a match and based off of who wins, we know which person has the higher rating.

In the previous question, we saw that it would take 5 matches to get a sequence of three players, $P_a, P_b, P_c$ where $a<b<c$ and either $P_a$ is worse than $P_b$ who is worse than $P_c$ or $P_a$ is better than $P_b$ who is better than P_c$.

This time, we need to generalize to $n$ players? For example, let’s say we wanted to get either $3$ players, $P_a, P_b, P_c$ where $a < b < c$ and $P_a$ is better than $P_b$ who is better than $P_c$ or $n$ players, $P_{a_1}, P_{a_2}, cdots, P_{a_n}$ where $a_i < a_j$ when $i < j$ and $P_{a_1}$ is worse than $P_{a_2}$ who is worse than $P_{a_3} cdots P_{a_{n-1}}$ is worse than $P_{a_n}.$

The handout asks to prove that a bound on the number of matches that need to be scheduled is $10n$. What might such an algorithm look like?

c++ – Strlen function optimization

This seems like the obvious choice searching withing a string. However, while pcmpistri is very general/powerful, it is also not very fast. On typical Intel processors it consists of 3 µops that all go to execution port p0 (therefore limiting this loop to at best running one iteration every 3 cycles), on AMD Zen(1/2) it’s slightly less bad coming in at 2 µops and executing once every 2 cycles.

There is an in way more primitive way (just using SSE2) based on pcmpeqb and pmovmskb. That leaves you with a mask instead of an index, but for most of the loop that doesn’t matter (all that matters is whether the mask is zero or not), and in the final iteration you can use tzcnt (or similar) to find the actual index of the zero byte within the vector.

That technique also scales to AVX2, which pcmpistri does not. Additionally, you could use some unrolling: pminub some successive blocks of 16 bytes together to go through the string quicker at first, at the cost of a more tricky final iteration and a more complex pre-loop setup (see the next point).

While an aligned load that contain at least one byte of the string is safe even if the load pulls in some data that is outside the bounds of the string (an aligned load cannot cross a page boundary), that trick is unsafe for unaligned loads. A string that ends near a page boundary could cause this function to fetch into the next page, and possibly trigger an access violation.

There are different ways to fix it. The obvious one is using the usual byte-by-byte loop until a sufficiently aligned address is reached. A more advanced trick is rounding the address down to a multiple of 16 (32 for AVX2) and doing an aligned load. There are bytes in it that aren’t from the string, maybe including a zero. Therefore those bytes must be explicitly ignored, for example by shifting the mask that pmovmskb returned to the right by data & 15. If you decide to add unrolling, then the address for the main loop should be even more aligned, to guarantee that all the loads in the main loop body are safe.

Different optimal results of SCE-UA optimization algorithm in Julia and MATLAB

I posted this question in Stackoverflow, but some people suggested me to post it here.

I rewrite the SCE-UA optimization algorithm in Julia.

However, when I run my script in Julia. The optimized value is quite different from the results in MATLAB. The global best cost in MATLAB is 2.4598e-63 while the global best cost in Julia is 8.264629809290885e-8. This means something went wrong. The objective function is z=sum(x.^2) for both scripts, and the search range and parameters for the algorithm were also the same. However, I checked, again and again, to make sure the algorithms are the same, but I could not find out why the results were quite different.

Could some please check if those two scripts were different? Please give me any suggestions, thanks a lot.

The SCE-UA code in MATLAB is written by Yarpiz, which can be found in

The code in Julia is as follows

using UnPack
using Plots
using StatsBase

mutable struct Pop

# CCE parameters
mutable struct CCE_params

# SCE parameters
mutable struct SCE_params

import Base.isless
isless(a::Pop, b::Pop) = isless(a.cost, b.cost)

function cost_func(x)
    return sum(x.^2)

function uniform_rand(lb::Array{Float64, 1}, ub::Array{Float64, 1})
    dim = length(lb)
    arr = rand(dim) .* (ub .- lb) .+ lb
    return arr
# SCE parameters
max_iter = 500
n_complex = 5
n_complex_pop = 10
dim = 10
lb = ones(dim) * -10
ub = ones(dim) * 10
obj_func = cost_func
n_complex_pop = max(n_complex_pop, dim+1) # Nelder-Mead Standard
sce_params = SCE_params(max_iter, n_complex, n_complex_pop, dim, lb, ub, obj_func)

# CCE parameters
cce_q = max(round(Int64, 0.5*n_complex_pop), 2)
cce_alpha = 3
cce_beta = 5

cce_params = CCE_params(cce_q, cce_alpha, cce_beta, lb, ub, obj_func)

function SCE(sce_params, cce_params)
    @unpack max_iter, n_complex, n_complex_pop, dim, lb, ub, obj_func = sce_params

    n_pop = n_complex * n_complex_pop
    I = reshape(1:n_pop, n_complex, :)

    # Step 1. Generate rand_sample
    best_costs = Vector{Float64}(undef, max_iter)

    pops = ()
    for i in 1:n_pop
        pop_position = uniform_rand(lb, ub)
        pop_cost = obj_func(pop_position)
        pop = Pop(pop_position, pop_cost)
        push!(pops, pop)
    complex = Array{Pop}(undef, n_complex_pop, n_complex)

    # Step 2. Rank Points
    best_pop = pops(1)

    # Main loop
    for iter in 1:max_iter

        # Step 3. Partion into complexes
        for j in 1:n_complex
            complex(:,j) = deepcopy(pops(I(j,:)))
            # Step 4. Evolve complex, run CCE
            complex(:,j) = CCE(complex(:,j), cce_params)
            pops(I(j,:)) = deepcopy(complex(:,j))
        # Step 5. Shuffle Complexes

        best_pop = pops(1)

        best_costs(iter) = best_pop.cost

        # Show Iteration Information
        println("Iter = ", iter)
        println("The Best Cost is: ", best_costs(iter))

function rand_sample(P, q)
    L = Vector{Int64}(undef, q)
    for i in 1:q
        L(i) = sample(1:length(P), Weights(P))
        # L(i) = sample(1:sizeof(P), weights(P), 1, replace=true)
    return L

function not_in_search_space(position, lb, ub)
    return any(position .<= lb) || any(position .>= ub)

function CCE(complex_pops, cce_params)
    # Step 1. Initialize
    @unpack q, alpha, beta, lb, ub, obj_func = cce_params
    n_pop = length(complex_pops)

    # Step 2. Assign weights
    P = (2*(n_pop+1-i) / (n_pop*(n_pop+1)) for i in 1:n_pop)

    # Calculate Population Range (Smallest Hypercube)
    new_lb = complex_pops(1).position
    new_ub = complex_pops(1).position
    for i in 2:n_pop
        new_lb = min.(new_lb, complex_pops(i).position)
        new_ub = max.(new_ub, complex_pops(i).position)

    # CCE main loop
    for it in 1:beta
        # Step 3. Select parents
        L = rand_sample(P, q)
        B = complex_pops(L)

        # Step 4. Generate Offspring
        for k in 1:alpha
            # a) Sort population
            sorted_indexs = sortperm(B)
            L(:) = L(sorted_indexs)

            # Calculate the centroid
            g = zeros(length(lb))
            for i in 1:q-1
                g .= g .+ B(i).position
            g .= g ./ (q-1)

            # b) Reflection step
            reflection = deepcopy(B(end))
            reflection.position = 2 .* g .- B(end).position # newly generated point using reflection
            if not_in_search_space(reflection.position, lb, ub)
                reflection.position = uniform_rand(new_lb, new_ub)
            reflection.cost = obj_func(reflection.position)

            if reflection.cost < B(end).cost
                B(end) = deepcopy(reflection)
            else # Contraction
                contraction = deepcopy(B(end))
                contraction.position = (g .+ B(end).position) ./ 2
                contraction.cost = obj_func(contraction.position)

                if contraction.cost < B(end).cost
                    B(end) = deepcopy(contraction)
                    B(end).position = uniform_rand(new_lb, new_ub)
                    B(end).cost = obj_func(B(end).position)


        complex_pops(L) = B
    return complex_pops
best_costs = SCE(sce_params, cce_params)

plot(best_costs, yaxis=:log, label = "cost")

# savefig("Julia.png")

Both scripts could run normally with just one click.

The result of SCE-UA algorithm from MATLAB is as follows
enter image description here

The result from Julia is as follows

enter image description here

Obviously, the convergence curves are quite different. The result from Julia is not accurate enough, but I don’t know why.