performance tuning – Replacing an part in a dataset

I have a nested dataset, which has been created to be the correct size from using Table, and I wish to replace parts of the dataset with the data. My dataset would look like (with placeholder values for the number of repeats etc, so that you can run the code yourself):

NumberOfRepeats = 3;
AllParameters = {58422.1`, 58427.6959583`, 58433.2919159`, 
   58438.8878726`, 58444.4838286`, 58450.0797838`, 58455.6757383`, 
   58461.2716919`, 58466.8676448`, 58472.4635969`, 58477.1595489`, 
   58480.3555014`, 58483.5514534`, 58486.7474049`, 58489.9433561`, 
   58493.1393068`, 58498.135255`, 58503.7312019`, 58509.3271479`, 
   58514.9230932`};
NumberOfExperiments = 5;
NumberOfWaveforms = 2;

NumberOfBlocks = 20;

(*Create dataset for data*)
Blocks = Table(<|"Block Number" -> i, 
    "Parameters" -> <|"Freq" -> AllParameters((i))|>, 
    "Waveforms" -> 
     Table(<|"Waveform Type" -> j, 
       "Repeats" -> 
        Table(<|"Repeat number" -> k, "Spectrum" -> {}, 
          "Fitting Parameters" -> {}|>, {k, NumberOfRepeats}), 
       "Fitting Parameter" -> {}, "Spectral Splitting" -> {}|>, {j, 
       NumberOfWaveforms})|>, {i, NumberOfBlocks});

DatasetRawDataAnomaly = 
 Table(<|"Experiment" -> i, "Data" -> Blocks|>, {i, 
    NumberOfExperiments}) // Dataset

My issue is wanting to replace an element of that dataset. I have tried using ReplacePart, but I need to make so many changes, that it is too slow, taking up to half an hour with the code I am running this in. An example of what I would do is:

DatasetRawDataAnomaly = ReplacePart(DatasetRawDataAnomaly,
   {ExperimentType, "Data", BlockType, "Waveforms", WaveformType, 
     "Repeats", RepeatCounter, "Spectrum"} -> IntensityType);

Where IntensityType is a list of lists of values of the form {{1,2},{3,4},{5,6}…}

I think running something like:

IntensityType = {{1, 2}, {2, 3}, {3, 4}};

DatasetRawDataAnomaly(1, "Data", 1 , "Waveforms", 1, "Repeats", 1 , 
{(<|#, "Spectrum" -> {IntensityType}|> &)}) (*Ones for: Experiment number, 
Block 1, Waveform 1, Repeat 1*);

would be quicker, but is there a way to change the whole dataset, rather than extracting the part and changing it?

Thank you for any help you can provide.

Nginx SSL tuning tips

performance tuning – speed up symbolic summation

I have the following summation

 L=24;
 sind=Range[-Pi,Pi,2*Pi/L];
 Sum[f, {x, sind}, {y, sind}, {x1, sind}, {y1, sind}, {x2, sind}, {y2, sind}]

where f has been evaluated at a previous step and is a function of x,y,x1,x2,y1,y2.
However f contains symbols, e.g f= Cos[x]*Sin[y]*Sin[x+x1]*Cos[y+y1]*a[x,y]+g[x,y]*Cos[x2-y2]*Sin[x+x1+x2,y+y1+y2] where a, g are symbols that don’t take explicit real values.

Is there a way to speed up such sums? They are very slow…

linux – Application specific system tuning

I’m running an application under wine (Navico insight map creator) on multiple machines and am in the process of benchmarking and optimizing the performance of this process.

I find the following interwsting.
Windows:
Application set to unlimitrd cache takes 25% of the time that using no cache / decache unused does.
While using unlimited cache, it seem a to use a bit more CPU, about 30% load on an 8 core and (fx-8150) with ram at 2133mhz

But under Linux, CPU is maxed out (according to system monitor) on dual e5-2697 v2’s in an dell R720
With no change in processing time between cache settings.

So, I’m trying to figure out how i can find the bottlenecks in the Linux system and if the system monitor is actually reporting CPU usage properly or if there is an I/o bottleneck somewhere. Ram is too slow? Or maybe its just an issue with running under wine?

I’m currently in the process of running the same exact data through this application and timing it with the various systems to have an equal comparison of performance.

Any information regarding identifying bottlenecks would be greatly appreciated.

performance tuning – Why does the Frontend slow down in long sessions?

I’m running Mathematica 12.1.1 on my Windows 10 laptop (Dell Latitude E5550 with Intel iCore 5, 8 GB RAM). My MMA sessions tend to be very long, sometimes extending on over days and weeks, interrupted by hibernation and wakeup. I develop software that uses a lot of graphical output such as ArrayPlot() of matrices size 300×300 or so.

The stability is good, but the MMA Frontend always slows down over time. In heavy use, the slowdown becomes already noticeable within a single working day, i.e., before I hibernate the laptop at the end of the working day. The slowdown is most obvious when editing or typing commands and I sometimes reach the point that it takes the Frontend a second or so to display character after character when quickly typing a new command or line in a Module(). Quitting and restarting MMA is then the only option (ClearAll(“`*”) or even Quit() and kernel restart doesn’t help), but it takes me about 10 minutes to reload all notebooks and definitions and resume work, hence I’m hesitant to do so too often.

I usually have about 10 to 15 notebooks open, but normally I either save and close notebooks with lots of figure output, or else overwrite them on each rerun of the code I’m testing. I monitor the Frontend and Kernel memory which both keep climbing steadily, but not to the point that they consume more than 1 GB or so, hence just a small fraction of my total RAM. I already set $HistoryLength=150 to prevent the command history from accumulating.

Can anybody tell me what’s the reason of the slowdown (memory leaks, excessive number of variables to look up in the automatic variable suggestion/completion function, stale file handles, memory page faults, or a bug in the Frontend….)? Any suggestions how to better monitor my session or find/remove the root cause of the slowdown?

Thanks,
Ron

performance tuning – Can’t get this code to run

Code shows memory error for higher values of the variable Numofbits. I can only go until 4.

In(1):= Numofbits = 4;

In(2):= NumOfFunc = 15;

In(3):= AllFunc = BitXor @@@ Subsets(Transpose@Tuples({0, 1}, Numofbits), {1, Numofbits});

In(4):= AllStr = Tuples({0, 1}, 2^Numofbits);

In(5):= BigTable = Table(If(AllFunc((k, i)) == AllStr((j, i)), 1, 0), {i, 1, 2^Numofbits}, {j, 1, 2^(2^Numofbits)}, {k, 1, NumOfFunc});


In(6):= SumOnEach = Total(BigTable, {1});

In(7):= Gretst = Table(If(SumOnEach((i, j)) > 2^(Numofbits - 1), SumOnEach((i, j)), 
2^Numofbits - SumOnEach((i, j))), {i, 1, 2^(2^Numofbits)}, {j, 1, NumOfFunc});

In(8):= AllSuccProb = Total(Gretst, {2})/(2^Numofbits*NumOfFunc);

In(9):= MaxSuccProb = Max(AllSuccProb)

Out(9)= 5/8

In(10):= MaxMemoryUsed()

Out(10)= 421781872

performance tuning – Will running a command line on a supercomputer compute faster than running it on a domestic PC?

I have a Mathematica .nb file currently executing some commands in my personal computer for several days now. The commands are not written for parallel computing and they don’t seem to work with Parallelize. The time it takes to compute them has been too long for my progress and schedule, so I’m considering running the .nb via private supercomputer service provider. My question is, does running the commands that don’t support Parallelize will nevertheless compute significantly faster, or am I being naive believing that rewriting the code suitable for supercomputing won’t be necessary?

performance tuning – Vectorization of multifold summation to speedup

I searched this website but didn’t find any suitable answer describing how one can speed up summation in Mathematica using vectorization techniques and other techniques.

I often have to numerically sum over a multi-fold series of the hypergeometric type in my research work. One toy example is

lim = 150;
Sum(
  Gamma(1 + n1 + n2 + n3)/(n1! n2! n3!) (0.1)^n1 (0.1)^n2 (0.1)^
   n3, {n1, 0, lim}, {n2, 0, lim}, {n3, 0, lim}) // AbsoluteTiming

which takes about 42 sec on my laptop.
The only way I know to speed-up is by using ParallelSum instead of Sum, which takes 9 sec, thanks to my 8 core processor.
I want to know if there are any tricks or techniques to speed-up?

performance tuning – Reading product of same ByteArrays

I am handling a huge set of data and I am trying to optimize my code. I have very large expressions to expand and simplify and I’ve seen that converting the expressions to ByteArrays improves a lot the time spent by Mathematica to expand and simplify.

However, I have the following problem. I want to convert the following expressions

expr = ByteArray[2] ByteArray[2]
expr2 = BinarySerialize[3] BinarySerialize[2]

To do that, I run the following commands

BinaryDeserialize /@ expr
BinaryDeserialize /@ expr2

In the first case, I get the error BinaryDeserialize: 2 is not a valid ByteArray, while the second case works perfectly. This is because expr automatically takes the form of Power[a_,b_]. How can I overcome this difficulty in a smart way?

performance tuning – Investigating RDS MySQL 8.0.16 stalls on write-heavy workload

I’m trying to track down some sporadic and difficult-to-reproduce issues with a MySQL 8.0.16 RDS instance connected to a web application with a write-heavy workload on a couple tables with large JSON columns. One table has ~3M rows, average row length is ~120kb, like 350GB of data total, almost all on the clustered index. The other has ~12M rows, average row length is ~20kb. The workload is like 80% INSERT and INSERT ON DUPLICATE KEY UPDATE, with some SELECTs and DELETEs. We don’t run any JOINs or complicated SELECTs that could be bogging everything else down.

The issue is:

For brief periods, like 1-20 seconds, once or twice every couple weeks, all INSERT, IODKU, and DELETE queries appear to be totally unresponsive. Write latency spikes, dirty pages spike, network transmit (but not receive) spikes, everything shows up in the slow query log, and then the issue resolves itself and everything returns to normal. We have a heavily seasonal traffic pattern and this issue first appeared on day 2 of our busy season. I’m hoping to track it down before the last week of the season which will have a few days of very high, sustained traffic.

I have pt-stalk watching production and caught maybe a smaller version of one of these when Threads_running spiked up to ~150 (usually it’s less than 10). Of course, since it’s RDS we don’t have the diskstat/iostat/netstat output or anything, and the CloudWatch Metric intervals are too wide to tell me anything, but have a lot of information from SHOW ENGINE INNODB STATUS and SHOW GLOBAL STATUS which I’m struggling to sift through. So far, the thing that stood out most was the File I/O part of the innodbstatus output during this Threads running spike:

Pending normal aio reads: (0, 0, 0, 0) , aio writes: (27, 25, 21, 20) ,
 ibuf aio reads:, log i/o's:, sync i/o's:
Pending flushes (fsync) log: 1; buffer pool: 6251
138942220 OS file reads, 8230899625 OS file writes, 1653263701 OS fsyncs
5.60 reads/s, 16384 avg bytes/read, 1975.05 writes/s, 215.99 fsyncs/s

Most of the time we have 0 pending aio writes so that stood out to me.

Most of our innodb parameters are at default levels. I’ve tried bumping innodb_log_file_size to 10GB (since we write about 287 MB/min according to polling changes in Innodb_os_log_written and everybody says we should have an hour of redo) but that doesn’t seem to have any effect when I run performance tests in a non-production environment. Production is still at the default for our version of MySQL (128MB).

Other important settings:

innodb_buffer_pool_size 48318382080
innodb_flush_log_at_trx_commit  1
innodb_flush_method O_DIRECT
innodb_flush_neighbors  0
innodb_io_capacity  200
innodb_io_capacity_max  2000
innodb_log_buffer_size  8388608
innodb_max_dirty_pages_pct  90.000000
innodb_max_dirty_pages_pct_lwm  10.000000

RDS is specced out like so:

64 GB RAM
16 vCPU
4096 GiB General Purpose (SSD)

To try and come up with a concrete question:

  1. Is it normal/expected to see pending aio writes like in that INNODB STATUS output?
  2. We got up to like 8000 write IOPS according to CloudWatch when the incidents occurred — does that mean innodb_io_capacity_max is too low at 2000?
  3. Do RDS specs look appropriate?
  4. What can I look at to pin this down further as either a configuration issue, a hardware issue with the RDS SSD, or a spec issue with the instance?

I appreciate your time and patience, as a humble application developer it’s hard to know if I’m barking up the right tree or even in the right forest.