8 – How to handle large queue process

I’m using the QueueWorker plugin to do some update/create node process in the background. On local there isn’t an issue, it completes the whole queue process. However on the aws server it usually stops at some point.
I’m assuming because of the resource consumption on the server. Whats the ideal way to optimize my QueueWorker?

Here is my code:

$offset = 0;

while (TRUE) {
  $nodes= Drupal::entityQuery('node')->condition('type', 'article')->range($offset, $limit)->execute();

  $offset = $offset + $limit;
  if (empty($nodes)) {
    break;
  }

  // Initialize QUEUE
  $queue_manager = Drupal::service('plugin.manager.queue_worker');
  $queue_worker = $queue_manager->createInstance('ex_queue');
  $queue = $this->queueFactory->get('ex_queue');
  // Create QUEUE items
  foreach ($nodes as $node) {
    $item = new stdClass();
    $item->content = $node;
    $queue->createItem($item);
  }
  // Execute QUEUE items
  while ($item = $queue->claimItem()) {
    try {
      $queue_worker->processItem($item->data);
      $queue->deleteItem($item);
    }
    catch (RequeueException $e) {
      $queue->releaseItem($item);
      Drupal::logger('system')->warning('RequeueException');
    }
    catch (SuspendQueueException $e) {
      $queue->releaseItem($item);
      Drupal::logger('system')->error('SuspendQueueException');
    }
    catch (Exception $e) {
      $queue->releaseItem($item);
      Drupal::logger('system')->error('Exception');
    }
  }
}

and my QueueWorker

/**
 * @QueueWorker(
 *   id = "ex_queue",
 *   title = @Translation("Ex Processor"),
 *   cron = {"time" = 3600}
 * )
 */
class ExQueueProcessor extends QueueWorkerBase implements ContainerFactoryPluginInterface {

  protected $configuration;

  /**
   * {@inheritdoc}
   */
  public function __construct(array $configuration) {
    $this->configuration = $configuration;
  }

  /**
   * {@inheritdoc}
   */
  public static function create(ContainerInterface $container, array $configuration, $plugin_id, $plugin_definition) {
    return new static(
      $configuration
    );
  }
       /**
       * {@inheritdoc}
       */
      public function processItem($item) {
        // do things
      }

Lets say, the total count of $nodes is 17k items, and it stops at around 15k. Is there anyway to optimize this to make it handle large data?

nt.number theory – Iterating Diophantine equations over Q to quickly get a large interval with just integer solutions

Hilbert’s Tenth Problem was whether there is an algorithm which will answer whether any Diophantine equation has solutions (where we want integer solutions). Hilbert’s Tenth has a negative solution by Matiyasevich’s theorem which showed that the question is equivalent to the Halting problem.

The equivalent problem is now known to also have a negative solution for many rings other than Z by essentially generalizing Matiyasevich’s theorem to other rings. Early work allowed one to replace Z by any quadratric extension of Z, and later work showed the same result ring of integers of any algebraic number field whose Galois group over the rationals is abelian.

However, the case of the rationals is still open. The naive thing to do would be to try to find a Diophantine equation over Q whose solutions are exactly the integer. To see why this would suffice, note that given two Diophantine equations over Q one can make a single Diophantine equation whose solutions is their intersection, so one could just then combine this equation with Matiyasevich’s equations. However, attempts to construct such a Diophantine equation failed. There is a conjecture of Barry Mazur that implies that in fact no such equation exist. Mazur conjectured the following:

Conjecture (Mazur): Given a variety over the rationals, the topological closure over the reals of the set of solutions has only finitely many connected components.

This question concern’s what happens if one tries to make such an equation in the most naive way possible. Say for example, that one knows that every positive integer is the sum of four perfect squares. So one could try looking at the Diophantine equation $$x -a_1^2 +a_2^2 + a_3^2 +a_4^2=0. ,, (1)$$

One might hope that maybe when one looks at Equation 1 that only some positive rational values of $x$ allow a solution in the $a_i$. But alas, the homogenous nature of $a_1^2 +a_2^2 + a_3^2 +a_4^2$ implies that every positive rational number leads to solutions. And the same would hold true if one tried to use any homogenous way of representing any positive integer (such as as a sum of cubes, or anything else arising from Waring’s problem). But we can without too much work modify this equation a little bit, and show that the set $A_0$= {$x|x$ x is rational and $x =0, 1$ or $x geq 2$ }is Diophantine over Q. So now, we can define for $i geq 1$
$$A_i = {x|x in A_{i-1}, exists a_1, a_2, a_3, a_4 in A_{i-1}, s.t. x -a_1^2 +a_2^2 + a_3^2 +a_4^2=0 }.$$

Notice that if $r in A_i$, and $r$ is not an integer, then $r > 2^{2^i}$. But for any $i$, $A_i$ will contain non-integer values. One could try the same sort of approach using instead of Equation 1, some other quadratic form which represents all positive integers, and one would get roughly the same growth. If one instead used for example that every positive integer is expressible as the sum of 16 perfect fourth powers, and called that $B_i$ then one could still construct an explicit constant $C$ such that for $i>1$, the smallest non-integer rational in $B_i$ is no more than $2^{C^i}$, and one could do the same for any other Waring type relation.

What we would like to do is essentially the same approach as with the $A_i$ or $B_i$ but where the iteration of constructing more and more complicated Diophantine equations, which correspond to a series of Diophantine sets $S_i$ recursively by insisting that all variables in our corresponding equation live in $S_{i-1}$. More explicitly:

Given a set of rational numbers $R$, let us write $f(R)$ to be the infimum of all non-integer elements of $R$.

Let $P(x,y_1, y_2, cdots y_k)$ be a polynomial with integer coefficients where for any sufficiently large integer $x geq M$ (for some integer M), the Diophantine relationship satisfied $P(x,y_1, y_2, cdots y_k)=0$ has a non-negative integer solution $y_i$. Then we can define $S_0(P)$ to be the set of numbers which are $1, 2, 3 cdots max{2,M}$, or are rationals $r$ such that $P(r, y_1, y_2, cdots y_k)=0$ has a solution in rational $y_j$. We can then define $S_i$ recursively, by at each stage insisting that the $y_j$ be in $S_{i-1}$.

The question then is:

Question 1: Can we construct a $P$ such that there for any positive real numbers $beta$ and $alpha$, for sufficiently large $i$, $f(S_i(P)) > beta^{alpha^i}$?

Essentially this is asking if we can by repeatedly applying the same Diophantine relations that we know are true for integers, can we very quickly build up Diophantine relations over Q which avoid many small non-integer values.

If $P$ is restricted to being just a set of homogenous relations like those from Waring’s problem, we cannot get the fast growth we want for Question 1. We also cannot do so even if one is non-homogenous if one is allowed to have floating isolated powers. For example, a form like $x=a^2 + b^2c + bc^2 + bcd^4 +e^3$ would not beat what we want because of the $a^2$ and $e^3$ terms.

Note also that if Z or N is Diophantine over Q, then we can trivially get a solution to Question 1 by using the relevant relation, since then we can construct a set of $S_i$ such that $f(S_i) = infty$ for all $i$. So if the answer to Question 1 is “No” this would be likely very difficult to prove. However, my suspicion is that the answer is in fact yes, although it may take more than simply a series of relations of the form $x=$ blah, and may require more sophisticated interaction.

Minizinc – understanding why my model runs for forever without any result for large data


Your privacy


By clicking “Accept all cookies”, you agree Stack Exchange can store cookies on your device and disclose information in accordance with our Cookie Policy.




searching – Intelligent text search on large directories of notebooks?

I have a (deeply nested) directory containing 10s of thousands of wolfram files (mostly notebooks (nbs), but also scripts (.wls), packages (.m, .q
wl), my and wxf data) and many subfolders each with a few hundred notebooks on average. Each notebook contains 10-100 of pages. I want to search on text in the files and within relacant notebook cells types e.g. Code, Text, Title, SubTitle, Item, Program, ExternalLanguage, etc.

I’d like to have a live search interface to quickly search through the contents of all files and visually show the matches highlighted within their context.

Is there an existing project or best practices for doing this?

The TextSeach related wolfram symbols are old and seem slow/weak, can mma even do this? Mma is obviously needed to preview the content of matches within notebooks, but what other tools would be used to build such a live deep search?

calendar – Color keys for large data sets?

I have a task to create a day-by-day calendar list within a SaaS platform.

The calendar will encompass many different types of events and roles within the software. As part of this calendar, the Client feels it is important to show a color key for each type of event to clear up user confusion when quickly browsing events. For reference, a common use case would be that a user has 450+ events happening on a single day. There will be a total of 6 event types amongst the hundreds a user will see daily.

I’ve done a lot of digging for UI inspiration, and noticed something- most software does not include a color key in big or small event lists. I cannot find any particular reason that most mainstream software does NOT use a color key in their calendars. Perhaps this is because the data is not quite as large with what I am working with. Is there any reason people more seasoned than myself think that I should or shouldn’t include a color key when defining event types?

web applications – How to optimize the request time for large data response?

I have created a dashboard for rendering a list of clients into a DataTable. Below shown is the data model structure:
data_model_structure

When I had a few records in the clients schema, let’s say a thousand rows, the request time was fairly okay. It would take around of 4-5 seconds for the whole trip, from requesting to processing at backend to sending response with data and rendering it to frontend. Once the data reached to 10,000+ rows, the time it’s taking is too much. Now it takes anywhere near 17 seconds or sometimes even more. I’m using Laravel, the Eloquent ORM of this framework brings the data from the related tables (which is highly useful) but as the data is growing it’s increasing the requesting time. My question is what could be a better approach to minimize the request time? How can I decrease the time it takes to request??

Move from mft to api REST style to get or post large data 200mb , is it best practice?

In your diagram it’s somewhat unclear in which direction the data flows. Your arrows are bidirectional, does that mean each participant sends as well as receives files? It is also not clear which party initiates the transfer,
which party runs a server, and whether data is pulled or pushed.

HTTP, upon which REST services are built, has generally pretty good performance in pull scenarios because that’s the primary usage pattern in the web. If it is possible to make files available to be fetched by a normal HTTP request that might be the best solution regarding performance and simplicity. Pagination most likely wouldn’t be necessary, and if you use content compression your 200MB CSV files are probably less than 100MB during transfer, which is somewhat big but not impossible using single HTTP requests.

If the partner can’t provide the files on a server you need to go for a push model which is likely more complicated and a little slower unless you can transfer the file in a single POST request with compression. This is being used for image uploads on the web all the time, so it’s generally well-supported, too.

Note that a RESTful endpoint does not need to be restricted to JSON data, it is perfectly ok to use a text/csv content type, and the actual implementation of this endpoint might be directly in the web server (if it can handle the authorization stuff correctly), so you don’t need to go through some application REST framework although that would still be possible.

lighting – Cross-polarization with multiple lights, multiple cameras on large objects in 360

I’m trying to utilize cross-polarization to eliminate reflections on objects. I use linear polarizing filters on multiple light sources (polarized lights are my only light sources, no bounce lights) and a circular polarizer filter (CPL) on the camera lens to achieve cross-polarization. The object is on a turntable so I can shoot 360 photos without moving the camera around.

I’m using Canon EOS 5D Mark IV, 24-105 mm & 16-35 mm zoom lenses, with 4+ 1000W tungsten lights. I’m currently at the testing phase, I will eventually scale the whole setting up for objects as large as 10 feet wide.

I’ve lined everything perfectly so there are no reflection on even somewhat glossy objects. After that I start to move the camera on the vertical plane (up and down) and tilting the aim angle to aim at the object, I don’t move the lights, only the camera.

Here are the issues I ran into when I do so:

  1. When I changed the camera heights, I lost the cross-polarization. If I rotate the CPL, it will only get some lights into the cross-polarization, not all of them. I thought the polarization will offset by the same amount when viewing from different angle, but it seems they are offset by different amount.
  2. When I start to rotate the object on a turntable (it is not perfectly centered, so it will rotate off the vertical axis ), I lose cross-polarization on some lights, which light to lose the cross-polarization depending on the rotation of the subject.

I would like to know how the above happen and how to solve these issues. Also, any practical suggestions on how to do a larger version (10 feet wide) of the setup while still maintaining cross-polarization across the subject, preferably with multiple light sources & multiple cameras (angles)?

Thanks in advance!

java – Architectural design for sending large amount of analytics data from production servers to s3 without impacting request performance

Lets say we have a server getting upto 1000 requests per second, serving them at p99 of 20ms (strong business case for not increasing this latency). The server gc parameters have been carefully tuned for this performance and current latency is already bottlenecked by gc. We want to log structured data related to requests and responses, ideally 100% of it without dropping anything, to S3 in for example gzipped jsonlines format (analytics will be done on this data, each file should be ideally 100MB-500MB in size). Analytics does not have to be realtime. A few hours of delay, for example, is fine. Also the IOUtilization already approaches 100% so writing this data to disk at any time is likely not an option. All code is in Java.

Solution 1:
Use the threads getting and serving requests as producers and have them enqueue each request/response into blocking buffer(s) with error/edge case handling of buffer being full, exception, etc. This way the producer threads dont get blocked no matter what. Then have a consumer threadpool consume from these buffer(s) in a batched way, compress and send to s3. The upside is that it is a simple(ish) solution. The main downside is that all this is done in the same jvm and might increase allocation rate and degrade performance for main requests? I suspect the main source of new object creation might be during serialization to string (is this true?). Putting objects into a fixed queue size or draining to (using drainTo method on BlockingQueue) to an existing collection should not allocate anything new I think.

Solution 2:
Setup a separate service running on the same host (so separate jvm with its own tuned gc if necessary) that exposes endpoints like locaholhost:8080/request for example. Producers send data to these endpoints and all consumer logic lies in this service (mostly same as before). Downside is that this might be more complex. Also sending data, even to localhost, might block the producer thread (whose main job is to serve requests) and decrease throughput per host?

For Solution 1 or 2 are there any Java compatible libraries (producer/consumer libraries or high performance TCP based messaging libraries) that might be appropriate to use instead of rolling my own?

I know these questions can be answered by benchmarking and making a poc, but looking for some direction in case someone has suggestions or maybe a third way I haven’t though of.

overflow – How should large table columns be handled on a responsive design?

There are a couple of implied assumptions in the scenario you’ve presented that, if thoroughly examined, will make your decision easier.

1 The data is tabular.

Just because you are returning records from a query and by default displaying them in a table, does not make the data “tabular” from a UI perspective. The key is that the primary function of tabular display is to compare multiple set items by one or more item characteristics. Think analysis of data. Grouping and sorting in tables are convenience options that make the analysis of the data easier to achieve using a single table, but are still secondary to the primary function of comparison.

Lists are usually better suited to displaying a set of data for the purpose of locating one or more items of interest based on a few key characteristics. Sorting and grouping a list of data facilitates bubbling items of interest to the top of the list for improved “findability”. Ordered lists imply sorting by one or more characteristics, and grouping is achieved by multiple sequential lists or sub-lists.

In my opinion, based on a cursory look at the data you linked, you may want to question whether the occasion of use for your data is more comparing or finding.

2 The mobile use case is fundamentally the same as the desktop use case

This is the big question that underlies the decision to “go responsive” only. Responsive design primarily addresses displaying (mostly) the same information on different screen sizes. In some situations, the mobile use case (context, user goals, info needs) is different enough from the desktop need that a responsive solution forces an unacceptable level compromise for one (or more) sets of users. When this happens, a distinct mobile solution is a better option.

In your case, I would decide first if the use cases are the same, or whether designing distinct experiences is a better fit. If they are the same, then examine the use case of the “tabular” data. If it’s primarily finding item(s), then go with a list based solution and employ sorting/grouping options and potentially collapse/expand list items (perhaps only on mobile) if it helps focus attention on the characteristics of primary interest.

If the use cases are the same, and comparison of multiple items by characteristic(s) is the primary purpose of the data display, then collapsing the table down to fit 320px by removing or hiding data is actually worse for the user than zooming and/or scrolling horizontally over the complete table. In this scenario, fully responsive design is not appropriate.

Sorry for the long-winded response – hope this helps you make a choice.