performance – mariadb: Aborted connection .. Got timeout reading communication packets

What is the typical cause of warnings such as this? They appear periodically, sometimes multiple times per day then not for a day or so.

2021-01-08 13:20:46 203939 (Warning) Aborted connection 203939 to db: ‘lsv’ user: ‘finder’ host: ‘23.227.111.186’ (Got timeout reading communication packets)

This database server is only queried by a few hosts, and it seems to happen with all hosts and all databases on the host. This server is connected by a 1gbit link to the Internet as well as a 10gbit local link to a web server.

This is a mariadb-10.4.17 server on fedora33 with a 5.9.16 kernel and 128GB of RAM. It’s the only function of this box. It’s been happening for quite some time. It doesn’t seem to matter How do I troubleshoot this? Could this be a networking problem?

I would appreciate any ideas you might have. Here is the contents of the my.cnf.

# cat my.cnf |grep -Ev '^$|^#'
(client)
port            = 3306
socket          = /var/lib/mysql/mysql.sock
default-character-set = utf8mb4

(mysqld)
character-set-client-handshake = FALSE
character-set-server = utf8mb4
collation-server = utf8mb4_unicode_ci
max_connections=600
replicate_do_db='txrepdb'
replicate_do_db='sqlgrey'
replicate_do_db='sbclient'
port            = 3306
socket          = /var/lib/mysql/mysql.sock
skip-external-locking
key_buffer_size = 256M
max_allowed_packet = 512M
join_buffer_size = 2M 
read_rnd_buffer_size = 4M
myisam_sort_buffer_size = 64M
query_cache_size = 0
query_cache_type = 0
relay_log_space_limit = 500M
relay_log_purge = 1
log-slave-updates = 1
local_infile = OFF
binlog_format = ROW
max_heap_table_size = 1024M 
tmp_table_size = 1024M 
performance_schema=ON
performance-schema-instrument='stage/%=ON'
performance-schema-consumer-events-stages-current=ON
performance-schema-consumer-events-stages-history=ON
performance-schema-consumer-events-stages-history-long=ON
relay-log=havoc-relay-bin
log_bin                 = /var/log/mariadb/mysql-bin.log
expire_logs_days        = 2
max_binlog_size         = 500M
plugin_load=server_audit=server_audit.so
plugin_load_add = query_response_time
server_audit_events=connect,query
server_audit_file_path                  = /var/log/mariadb/server_audit.log
server_audit_file_rotate_size           = 1G
server_audit_file_rotations             = 1
slow-query-log = 1
slow-query-log-file = /var/log/mariadb/mariadb-slow.log
long_query_time = 1
log_error = /var/log/mariadb/mariadb-error.log
binlog_format=mixed
server-id       = 590
report-host=havoc.example.com
innodb_data_home_dir = /var/lib/mysql
innodb_defragment=1
innodb_file_per_table
innodb_data_file_path = ibdata1:10M:autoextend:max:500M
innodb_buffer_pool_size=60G
innodb_log_file_size = 1G
innodb_flush_log_at_trx_commit = 2
innodb_flush_method=O_DIRECT
innodb_lock_wait_timeout = 50
innodb_buffer_pool_instances = 40
open_files_limit=30000  # from 1222 for ~ 50% of planned ulimit -a Open Files of 65536
innodb_open_files=10000  # from 512 to match table_open_cache
innodb_log_buffer_size=64M  # from 8M for ~ 30 minutes log buffered in RAM
innodb_page_cleaners=15  # from 4 to expedite page cleaning
innodb_purge_threads=15  # from 4 to expedite purge processing
innodb_write_io_threads=64  # from 4 to expedite multi core write processing SE5666 Rolando
innodb_read_io_threads=64  # from 4 to expedite multi core read processing SE5666 9/12/11
read_rnd_buffer_size=262144  # from 4M to reduce handler_read_rnd_next of 124,386 RPS
innodb_io_capacity=2100  # from 1100 to allow higher SSD iops
innodb_lru_scan_depth=100  # from 1024 to conserve CPU cycles every SECOND
max_connect_errors=10
table_open_cache=10000  # from 512 to reduce opened_tables RPS of 1
read_buffer_size=1572864 # from 1M to reduce handler_read_next of 32,317 RPS
table_definition_cache=10000  # from 400 to reduce opened table_definitions RPS of 1
log_slow_verbosity=explain  # from nothing or ADD ,explain to enhance SLOW QUERY log
query_prealloc_size=32768 # from 24K to reduce CPU malloc frequency
query_alloc_block_size=32768 # from 16K to reduce CPU malloc frequency
transaction_prealloc_size=32768 # from 4K to reduce CPU malloc frequency
transaction_alloc_block_size=32768 # from 8K to reduce CPU malloc frequency
innodb_fast_shutdown=0
aria_pagecache_division_limit=50  # from 100 for WARM blocks percentage
aria_pagecache_age_threshold=900
innodb_adaptive_max_sleep_delay=20000  # from 150000 ms (15 sec to 2 sec) delay when busy
innodb_flushing_avg_loops=5  # from 30 to minimize innodb_buffer_pool_pages_dirty count
max_seeks_for_key=64  # from ~ 4 Billion to conserve CPU
max_write_lock_count=16  # from ~ 4 Billion to allow RD after nn lck requests
optimizer_search_depth=0  # from 62 to allow OPTIMIZER autocalc of reasonable limit
innodb_print_all_deadlocks=ON  # from OFF to log event in error log for DAILY awareness
wait_timeout=7200
innodb_flush_neighbors=0 # from ON to conserve CPU cycles when you have SSD/NVME
interactive_timeout=7200
innodb_buffer_pool_dump_pct=90  # from 25 to minimize WARM time on STOP / START or RESTART
innodb_fill_factor=93
innodb_read_ahead_threshold=8  # from 56 to reduce delays by ReaDing next EXTENT earlier
sort_buffer_size=1572864 # from 1M to reduce sort_merge_passes RPS of 1
innodb_stats_sample_pages=32  # from 8 for optimizer to use more accurate cardinality
min_examined_row_limit=1  # from 0 to reduce clutter in slow query log
query_cache_limit=0  # from 2M to conserve RAM because your QC is OFF, as it should be.
query_cache_min_res_unit=512  # from 4096 to increase QC capacity, if EVER used

(mysqldump)
quick
max_allowed_packet = 16M

(mysql)
no-auto-rehash
default-character-set = utf8mb4

(myisamchk)
key_buffer_size = 128M
sort_buffer_size = 128M
read_buffer = 2M
write_buffer = 2M

(mysqlhotcopy)
interactive-timeout

performance – mariadb: Aborted connection .. Got timeout r eading communication packets

What is the typical cause of warnings such as this? They appear periodically, sometimes multiple times per day then not for a day or so.

2021-01-08 13:20:46 203939 (Warning) Aborted connection 203939 to db: ‘lsv’ user: ‘finder’ host: ‘23.227.111.186’ (Got timeout reading communication packets)

This database server is only queried by a few hosts, and it seems to happen with all hosts and all databases on the host. This server is connected by a 1gbit link to the Internet as well as a 10gbit local link to a web server.

This is a mariadb-10.4.17 server on fedora33 with a 5.9.16 kernel and 128GB of RAM. It’s the only function of this box. It’s been happening for quite some time. It doesn’t seem to matter How do I troubleshoot this? Could this be a networking problem?

I would appreciate any ideas you might have. Here is the contents of the my.cnf.

# cat my.cnf |grep -Ev '^$|^#'
(client)
port            = 3306
socket          = /var/lib/mysql/mysql.sock
default-character-set = utf8mb4

(mysqld)
character-set-client-handshake = FALSE
character-set-server = utf8mb4
collation-server = utf8mb4_unicode_ci
max_connections=600
replicate_do_db='txrepdb'
replicate_do_db='sqlgrey'
replicate_do_db='sbclient'
port            = 3306
socket          = /var/lib/mysql/mysql.sock
skip-external-locking
key_buffer_size = 256M
max_allowed_packet = 512M
join_buffer_size = 2M 
read_rnd_buffer_size = 4M
myisam_sort_buffer_size = 64M
query_cache_size = 0
query_cache_type = 0
relay_log_space_limit = 500M
relay_log_purge = 1
log-slave-updates = 1
local_infile = OFF
binlog_format = ROW
max_heap_table_size = 1024M 
tmp_table_size = 1024M 
performance_schema=ON
performance-schema-instrument='stage/%=ON'
performance-schema-consumer-events-stages-current=ON
performance-schema-consumer-events-stages-history=ON
performance-schema-consumer-events-stages-history-long=ON
relay-log=havoc-relay-bin
log_bin                 = /var/log/mariadb/mysql-bin.log
expire_logs_days        = 2
max_binlog_size         = 500M
plugin_load=server_audit=server_audit.so
plugin_load_add = query_response_time
server_audit_events=connect,query
server_audit_file_path                  = /var/log/mariadb/server_audit.log
server_audit_file_rotate_size           = 1G
server_audit_file_rotations             = 1
slow-query-log = 1
slow-query-log-file = /var/log/mariadb/mariadb-slow.log
long_query_time = 1
log_error = /var/log/mariadb/mariadb-error.log
binlog_format=mixed
server-id       = 590
report-host=havoc.example.com
innodb_data_home_dir = /var/lib/mysql
innodb_defragment=1
innodb_file_per_table
innodb_data_file_path = ibdata1:10M:autoextend:max:500M
innodb_buffer_pool_size=60G
innodb_log_file_size = 1G
innodb_flush_log_at_trx_commit = 2
innodb_flush_method=O_DIRECT
innodb_lock_wait_timeout = 50
innodb_buffer_pool_instances = 40
open_files_limit=30000  # from 1222 for ~ 50% of planned ulimit -a Open Files of 65536
innodb_open_files=10000  # from 512 to match table_open_cache
innodb_log_buffer_size=64M  # from 8M for ~ 30 minutes log buffered in RAM
innodb_page_cleaners=15  # from 4 to expedite page cleaning
innodb_purge_threads=15  # from 4 to expedite purge processing
innodb_write_io_threads=64  # from 4 to expedite multi core write processing SE5666 Rolando
innodb_read_io_threads=64  # from 4 to expedite multi core read processing SE5666 9/12/11
read_rnd_buffer_size=262144  # from 4M to reduce handler_read_rnd_next of 124,386 RPS
innodb_io_capacity=2100  # from 1100 to allow higher SSD iops
innodb_lru_scan_depth=100  # from 1024 to conserve CPU cycles every SECOND
max_connect_errors=10
table_open_cache=10000  # from 512 to reduce opened_tables RPS of 1
read_buffer_size=1572864 # from 1M to reduce handler_read_next of 32,317 RPS
table_definition_cache=10000  # from 400 to reduce opened table_definitions RPS of 1
log_slow_verbosity=explain  # from nothing or ADD ,explain to enhance SLOW QUERY log
query_prealloc_size=32768 # from 24K to reduce CPU malloc frequency
query_alloc_block_size=32768 # from 16K to reduce CPU malloc frequency
transaction_prealloc_size=32768 # from 4K to reduce CPU malloc frequency
transaction_alloc_block_size=32768 # from 8K to reduce CPU malloc frequency
innodb_fast_shutdown=0
aria_pagecache_division_limit=50  # from 100 for WARM blocks percentage
aria_pagecache_age_threshold=900
innodb_adaptive_max_sleep_delay=20000  # from 150000 ms (15 sec to 2 sec) delay when busy
innodb_flushing_avg_loops=5  # from 30 to minimize innodb_buffer_pool_pages_dirty count
max_seeks_for_key=64  # from ~ 4 Billion to conserve CPU
max_write_lock_count=16  # from ~ 4 Billion to allow RD after nn lck requests
optimizer_search_depth=0  # from 62 to allow OPTIMIZER autocalc of reasonable limit
innodb_print_all_deadlocks=ON  # from OFF to log event in error log for DAILY awareness
wait_timeout=7200
innodb_flush_neighbors=0 # from ON to conserve CPU cycles when you have SSD/NVME
interactive_timeout=7200
innodb_buffer_pool_dump_pct=90  # from 25 to minimize WARM time on STOP / START or RESTART
innodb_fill_factor=93
innodb_read_ahead_threshold=8  # from 56 to reduce delays by ReaDing next EXTENT earlier
sort_buffer_size=1572864 # from 1M to reduce sort_merge_passes RPS of 1
innodb_stats_sample_pages=32  # from 8 for optimizer to use more accurate cardinality
min_examined_row_limit=1  # from 0 to reduce clutter in slow query log
query_cache_limit=0  # from 2M to conserve RAM because your QC is OFF, as it should be.
query_cache_min_res_unit=512  # from 4096 to increase QC capacity, if EVER used

(mysqldump)
quick
max_allowed_packet = 16M

(mysql)
no-auto-rehash
default-character-set = utf8mb4

(myisamchk)
key_buffer_size = 128M
sort_buffer_size = 128M
read_buffer = 2M
write_buffer = 2M

(mysqlhotcopy)
interactive-timeout

javascript – advice for web communication protocol for “streaming” multiple JSON objects to multiple clients

As a hobby / to learn I am building an app in JavaScript using Node.js where a component of it will take input from a client, send it to a server, and then broadcast it to other clients. For simplicity let’s say that the data looks like: {"x_pos":0.4, "y_pos":0.2}, and specifies an avatar’s (x,y) position on a map in a game. I want each user to have an avatar, and each avatar’s (x,y) position shared.

Currently I am using Websocket (socket.io) to do this. I figured Websocket would be ideal because it is TCP, and will include an identifier of who each user is. However, the fact that communication is bidirectional seems to be sub-optimal. Additionally, I am emitting position data from all clients 30 times a second to the server, which then broadcasts it to all users. This works well for one user, but I do not know how it would scale.

However, I have also heard that UDP is ideal for games, but I understand that UDP is connectionless and doesn’t track user connections etc. So then would this mean that I would not be able to keep track of who incoming (x,y) data belongs to? (I suppose I could change the data to be something like {"user":"id", "x_pos":0.4, "y_pos":0.2} and handle updates on the Client side that way). There is also WebRTC, that uses UDP, but peer to peer connections I doubt would scale well.

So I am curious what people think is the best protocol here. Am I on the right track by using Websocket to broadcast player position? Or should I be using something else?

I would like to note I am not building a commercial app in any way, and I anticipate the load to be no more than 6 people at once for this. But 6 people * 30 emits a second to the server + 6 * 30 emits to all clients means 360 socket.io emit() events a second, which seems maybe not what socket.io was built for here? That said, I hear that Websocket establish a data stream, where UDP does not, so maybe that means that UDP may be more overhead? I honestly do not know and cannot find this information readily online.

networking – Kuberenetes Node to Node Communication Not working as expected

Hello to All

I do Have problem on My Kubernete Cluster

Specification

Cluster

NAME                 STATUS   ROLES    AGE   VERSION   INTERNAL-IP     EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION     CONTAINER-RUNTIME
k8s-w02-prod   Ready    <none>   40d   v1.19.2   192.168.25.20   <none>        Ubuntu 20.04.1 LTS   5.4.0-54-generic   docker://19.3.8
k8s-m01-prod   Ready    master   40d   v1.19.2   10.60.17.15    <none>        Ubuntu 20.04.1 LTS   5.4.0-58-generic   docker://19.3.8
k8s-m02-prod   Ready    master   40d   v1.19.2   10.60.17.16    <none>        Ubuntu 20.04.1 LTS   5.4.0-54-generic   docker://19.3.8
k8s-m03-prod   Ready    master   40d   v1.19.2   10.60.17.17    <none>        Ubuntu 20.04.1 LTS   5.4.0-54-generic   docker://19.3.8
k8s-w01-prod   Ready    <none>   40d   v1.19.2   192.168.29.20    <none>        Ubuntu 20.04.1 LTS   5.4.0-54-generic   docker://19.3.8

Cluster Network Plugin
Using Calico

Pod :
networking:
podSubnet: 10.65.0.0/16

Calico process is running.

IPv4 BGP status
+---------------+-------------------+-------+------------+-------------+
| PEER ADDRESS  |     PEER TYPE     | STATE |   SINCE    |    INFO     |
+---------------+-------------------+-------+------------+-------------+
| 192.168.25.20 | node-to-node mesh | up    | 23:37:55   | Established |
| 10.60.11.156  | node-to-node mesh | up    | 2021-01-04 | Established |
| 10.60.11.157  | node-to-node mesh | up    | 2021-01-04 | Established |
| 192.168.29.20 | node-to-node mesh | up    | 2021-01-04 | Established |
+---------------+-------------------+-------+------------+-------------+

IPv6 BGP status
No IPv6 peers found.

It Uses Node to Node Mesh

Problem

When i run the Simple application for example: ArgoCD

argo-cd-argocd-application-controller-74dd8b79f5-vldhb   1/1     Running   0          14h   10.65.102.48   k8s-w02-prod   <none>           <none>
argo-cd-argocd-dex-server-5c656d6c6c-shb69               1/1     Running   0          14h   10.65.102.52   k8s-w02-prod   <none>           <none>
argo-cd-argocd-redis-9757589c5-6w2p6                     1/1     Running   0          14h   10.65.102.60   k8s-w02-prod   <none>           <none>
argo-cd-argocd-repo-server-774c6856f9-vgmq8              1/1     Running   0          14h   10.65.102.4    k8s-w02-prod   <none>           <none>
argo-cd-argocd-server-669fc6db5c-x5w4k                   1/1     Running   0          13h   10.65.72.159   k8s-w01-prod   <none>           <none>

Q) I can not Access the ArgoCD Web UI Bcoz as i see Pods are running on

  • Worker01:k8s-w01-prod 192.168.25.20
ip route | grep tun

10.65.69.192/26 via 10.60.17.17 dev tunl0 proto bird onlink 
10.65.102.0/26 via 192.168.25.20 dev tunl0 proto bird onlink 
10.65.187.64/26 via 10.60.17.15 dev tunl0 proto bird onlink 
10.65.233.192/26 via 10.60.17.16 dev tunl0 proto bird onlink 


worker02:k8s-w02-prod 192.168.29.20

10.65.69.192/26 via 10.60.17.17 dev tunl0 proto bird onlink 
10.65.72.128/26 via 192.168.29.20 dev tunl0 proto bird onlink 
10.65.187.64/26 via 10.60.17.15 dev tunl0 proto bird onlink 
10.65.233.192/26 via 10.60.17.16 dev tunl0 proto bird onlink 

Runs on Different Subnet

Ping Works Completely OK on both Side

When i Use the Node Selector labels to run pods on selected Node i.e worker01 or worker02 then the issue is solved.

Q) How can i route the Traffic so that the Application ArgoCD WebUI works without any node-node communication problem(Pod can run on any Node and it can communicate each other)

Q) Is it good Idea to use BGP on calico? with out Node-to-Node mesh

https://docs.projectcalico.org/networking/bgp

Please Advice me how can i fix this issue

branding – change default root communication site to classic site – sharepoint online

Stack Exchange Network


Stack Exchange network consists of 176 Q&A communities including Stack Overflow, the largest, most trusted online community for developers to learn, share their knowledge, and build their careers.

Visit Stack Exchange

mysql – mariadb Error Reading Communication Packets

hi i got problem when i spit my mariadb from main server to another server(my db server is running mysql docker from latest tag) i got error:

Got an error writing communication packets

the port is open

and my ping is less than 1ms

i tried with a basic wp site db and the connection is ok and no problem

but mydb is about 1gb and i guess this made this problem

i also try to connect over private network(192.168.100.25) instead of public ip but the problem is same

here is my mysql log

Aborted connection 3 to db: 'wpdb' user: 'root' host: 'myip' (Got an error reading communication packets)

Aborted connection 5 to db: 'wpdb' user: 'root' host: 'myip' (Got an error writing communication packets)

i also edit mysql config

increase max_allowed_packet to 1gb
and
net_buffer_length to 1000000
but nothing changes!

In microservice architecture, when to prefer synchronous over asynchronous communication and vice versa?

In our microservice architecture, this is how requests flow.

enter image description here

Service Layer: Requests hit this layer from public LB. Also, response composition is performed here.

API Gateway: The core job of API gateway is to make parallel calls to respective microservices.

Please note Usually response composition is done at API Gateway. However, we intentionally moved composition at service layer due to frequent changes in compositions.

Being in the automotive domain, we deal with vehicles, dealers, leads etc.
Now, there is a confusion on when to prefer synchronous over asynchronous and vice versa. Let me take an example of below two microservices.

DealersService: This service holds all the information about dealers.

LXMSService: This service is responsible for processing leads we capture on our platforms and send it to respective clients(dealers).

While lead is getting captured, we want to show the list of dealers user can choose from. The logic of showing dealers may vary client to client. These logics are configured in LXMS, and we store only dealer ids in this service(LXMS).

Now there can be two possible ways to show dealers!

Possibility 1: Synchronous way

Step 1: Make a call to LXMS microservice

Step 2: Get a list of dealer ids

Step 3: Pass these list if dealer ids to Dealer microservice to resolve.

Step 4: Dealer microservice returns id, name of a dealer which we show to the user.

enter image description here

Possibility 2: Asyncronous way

We remove the dependency by storing the dealer name in LXMS microservice. And maintain consistency using Pub/Sub. This way we are isolating LXMS and removing run-time dependencies from the service layer.

enter image description here

Here are the questions

  1. Which approatch we should go with & why?

  2. What is the problem in making sequential calls?

  3. In first approatch, when service layer making sequential calls. Can we call microservices independent?

  4. Isolation comes at the additional overhead of data synching using pub/sub. Is the isolation worth this overhead?

  5. When to prefer synchronous over asynchronous and vice versa?

Thank you!

c++ – Bidirectional thread communication with multiple condition_variable has rare hang / race condition

I have a rather strange example, and so I will just put the context out here briefly, and we can hopefully just pretend it’s a good idea.

I’m using a profiler that requires regular calls to its FRAME() macro so that it knows where CPU frames of a game start and end (the object the macro builds is RAII/scope based). I’m using fibers for my threading (main ‘thread’ is also a fiber worker), and this profiling macro only supports being called from a thread not registered with the profiler as a fiber worker thread. Ergo, I have this awful solution in the short-term, where I communicate to a separate thread just for this macro. The goal is to get the timing of construction/destruction of the RAII object as accurate as possible on this separate thread without disrupting the calling thread’s timing. But sometimes, the entire application hangs. I don’t understand how that’s possible here.

Main ‘thread’ (actually on a fiber but that doesn’t matter) / game loop:

FrameProfile frameProfile("Client Update");
while (!bShouldQuit)
{
    frameProfile.StartFrame();
    
    /* Do the game client's work for this frame */

    frameProfile.EndFrame();
}

And then this FrameProfile object is responsible for spinning up a separate thread, and will let that thread enter the FRAME macro scope when StartFrame is called from the above, and that thread will sleep in that scope until EndFrame is called, at which point it will wake up and exit the scope, destroying the profiler’s frame-measuring object, and giving us a hopefully-accurate frame time.

struct FrameProfile
{
    FrameProfile(const char* tag)
    {
        pthread_ = std::make_unique<std::thread>(
            (tag, this)(std::atomic_bool& killFlag) {
                while (!killFlag)
                {
                    assert(!endThreadFrame.WasSignalled());
                    startThreadFrame.WaitConsume();
                    {
                        assert(!startThreadFrame.WasSignalled());
                        assert(!endedThreadFrame.WasSignalled());

                        // Construct the frame-measuring object using this macro
                        OPTICK_FRAME(tag);

                        startedThreadFrame.Signal();

                        endThreadFrame.WaitConsume();
                        // endThreadFrame has been signalled - we need to exit scope
                        // to finish measuring ASAP
                    }
                    assert(!endThreadFrame.WasSignalled());
                    endedThreadFrame.Signal();
                }
            },
            std::ref(bKill_)
        );
    }

    ~FrameProfile()
    {
        bKill_ = true;
        if (pthread_)
        {
            if (pthread_->joinable())
            {
                pthread_->join();
            }
        }
    }

    void StartFrame()
    {
        assert(!startThreadFrame.WasSignalled());
        assert(!startedThreadFrame.WasSignalled());

        // Tell thread to start measuring the frame
        startThreadFrame.Signal();

        // Wait for thread to have started frame measurement
        startedThreadFrame.WaitConsume();
    }
    void EndFrame()
    {
        assert(!endThreadFrame.WasSignalled());
        assert(!endedThreadFrame.WasSignalled());

        // Tell thread to end frame measurement
        endThreadFrame.Signal();

        // Wait for thread to have ended frame measurement
        endedThreadFrame.WaitConsume();
    }


private:
    std::unique_ptr<std::thread> pthread_;
    std::atomic_bool bKill_ = false;

    struct ThreadSignal
    {
        std::atomic_bool bSignalled;
        std::mutex mutex;
        std::condition_variable cv;

        void Signal()
        {
            assert(!bSignalled);
            {
                std::unique_lock<std::mutex> _(mutex);
                bSignalled = true;
            }
            cv.notify_all();
        }

        bool WasSignalled()
        {
            return bSignalled;
        }

        void WaitConsume()
        {
            std::unique_lock unique(mutex);
            cv.wait(unique, (this)() { return bSignalled == true; });
            unique.unlock();
            bSignalled = false;
        }
    };

    ThreadSignal startThreadFrame;
    ThreadSignal endThreadFrame;

    ThreadSignal startedThreadFrame;
    ThreadSignal endedThreadFrame;
};

Can you spot what I’m doing wrong here? Or even a much better solution, I’d be open to it! It’s rare but it hangs sometimes – one of the ‘ThreadSignal’ objects will have its bool as ‘true’, but will still be stuck – I guess there’s a rare timing issue here.

Many thanks! Been tearing my hair out.

iOS data communication between devices (iPhones) over cellphone internet

i am looking to share data live between device…gps position tracking for a workout app.

I know i could do cloud update using firestore but that could end up costly.

I saw i could also communicate between devices using WebSockets but it’s a little more complicated and i still need to pay for bandwidth / web server…which could add up.

What i’m looking for is a way to do an handshake using maybe a cloud platform or websocket but then communicate directly between device so that only data bandwidth on the iPhones are used directly and not via cloud or web server.

Is there a way to do that ? TCP hole punching ?

And no, i won’t be in range for bluetooth or be on the same wifi network. Assume i’m using cellphone network for internet.

Any idea can help !

What is the neatest software design for this system with multiple bi-directional communication streams?

The responses I’ve received thus far have helped me formulate this TLDR:

Currently the Observer Pattern is implemented to get information from the sensor-actuator modules to the individual Bluetooth services and LwM2M objects. The actuators should also be controllable from Bluetooth/LwM2M. This can be solved by just having all individual Bluetooth services / LwM2M objects have direct connections to the sensor-actuator modules.

This leads to very many links between modules, whereas many of these services/objects require the same data. This approach seems unscalable and ugly when services/objects require the same data.

So the problem (which perhaps is just a design flaw in the services/objects design):
When trying to reduce the amount of links between the modules and services/objects by moving the data acquisition up a layer, and dispatching that data to the dependent services/objects, the actuator-part becomes harder, because the services/modules lost their direct link to those sensor-actuator modules.

Original (long) problem description:

I have a system with two interfaces to the outside world: LwM2M and Bluetooth.

The system is Zephyr RTOS based, and both Bluetooth and LwM2M should send notifications matching with what they receive from the data-providing threads that obtain data from the (hardware) modules they control. Also, the Bluetooth and LwM2M interfaces can trigger actuators on the system. (All communication is bi-directional)

In particular:
Bluetooth has to read a battery level from the battery management thread. The system also has another sensor-actuator service, from which the latest readings should be obtained and the actuator should be controllable from Bluetooth. No problems here, the services could just include the header file containing the API for the required modules. The services do not overlap in which threads they access.

The problem comes with LwM2M. Some of the objects require information and should be able to trigger actions from the same threads. I could let all objects directly interface from/to the required threads, but perhaps it is more elegant to let a single LwM2M module talk with the other threads, and act as a bridge between the objects and the threads. However, this would require much of the internals of the objects to be exposed, and extra communication because of the middle-man module.

I have made a simplified schematic overview of my situation to help illustrate my struggle (Sensor-actuator 2 goes to two (actually more) different LwM2M objects):

Simplified illustration of the problem

What would be the most elegant way to solve this? I would like to have a similar solution for both Bluetooth and LwM2M (perhaps this is my mistake).

I have thought and partially implemented the following ideas, but each time I end up discouraged that it’s the best approach:

  • The different Bluetooth services and LwM2M objects all have direct access to the required threads. Programming seems relatively easy, but this introduces many direct dependencies / accesses to overlapping data from the target modules/threads.
  • Have the main Bluetooth module and the main LwM2M module control all the data flow between the services /objects and the target modules/threads. This would reduce the data-read dependencies between the services/objects and the target modules/threads, but it increases complexity quite severely when data is received from Bluetooth/LwM2M and something has to through the additional main Bluetooth/LwM2M module.
  • Have data-reads go through the Bluetooth and main LwM2M modules, such that those two modules listen for important data-updates from the modules/threads, which then update the services/objects. But as opposed to the first option, also add direct access in the services/objects so that they can directly trigger actions from the target modules/threads. This also feels odd. Programming is likely quite easy in this way, but there are more direct dependencies from/to the modules/threads.

I’m sure I’m overlooking a better and more elegant solution, I would very much appreciate your insight!