python – Selection of independent variables in K means clustering among a vast dataset

As I understand it, the process of K means clustering takes a set of sample points with k arbitrary centroids and uses Euclidean distance to classify the points closest to centroids to k groups.

What I am unable to understand is a point in the cartesian plane has only an x and y coordinate and so, amongst a given dataset we can only chose 2 independent variables and plot the points and proceed with the algorithm. However, there might be many more independent variables which could influence classification, for example, if we are trying to classify dogs based on their breeds using physical attributes such as size of ears, radius of eyes, body weight, length of legs, lifespan and so on. I’m not sure how this problem is resolved in K clustering.

Are the two variables with maximum information gain considered or are the points plotted in an n-dimensional space where each axis defines each attribute.

Could someone provide clarity on this issue. Thanks for any help

clustering – Find Smooth Functions from Discrete Datas

I’m trying to write a algorithm to solve the following problem, which I did not find any related papers:

Given a set of discrete data points generated by n unkonwn smooth functions $f_1(x), f_2(x), cdots, f_n(x)$ at $x_{1}, x_{2}, cdots ,x_{m}$, how to group these data points into $n$ arrays $a^{(n)}_{i}$ so that data points in each array comes from a single smooth function, i.e $a^{(n)}_{i}=f_{n}(x_i) forall i$.

For example, if I have a straight line and a quadratic function plotted together in a single figure, I can easily distinguish between these two functions. However, it is very difficult for computer to distinguish these patterns.

This problem seems related to the clustering analysis. However, I still didn’t find any similar research result about it.

availability groups – The Windows Server Failover Clustering (WSFC) resource control API returned error code 19

Here are the details of that error code (19):

ERROR_WRITE_PROTECT

19 (0x13)

The media is write protected.

You need to consult the cluster log (use Get-ClusterLog to get that) for additional details about what writes failed within the cluster operation being performed. Check that out and update your question with any errors you see.

That being said, combined with this symptom:

…if you reboot second node, database wont up.

You might be experiencing disk problems. Check the Windows system event log and SQL Server error log for messages related to failed writes or corruption.

algorithms – Given a unit vector $xinmathbb R^d$, what is the worst possible within-cluster sum of squares for 2-means clustering?

This is a question I originally posted to math.stackexchange.com but it didn’t attract any answers, and I was wondering if someone here can help.


Consider a unit vector $xinmathbb R^d$ ($|x|_2=1$), and a $k$-means clustering of it for $k=2$.

How big can the within-cluster sum of squares get?

Formally:

How to upper bound
$$
max_{|x| = 1}min_{mu_1,mu_2inmathbb R} left(sum_{i=1}^dminleft{left(x_i-mu_1right)^2,left(x_i-mu_2right)^2right}right)quad ?
$$


A trivial bound is $1$, but I suspect a much tighter bound exists.


It’s easy to show that this quantity is at least $1/4$, by considering $x$ for which third of its coordinates are proportional to $1$, third to $-1$, and third are zeros (e.g., $x=(1/sqrt 2, -1/sqrt 2, 0)$ if $d=3$).

clustering – Is n-dimentional assignment problem for points NP-hard?

We have $n$ sets of $k$ points in $mathbb R^d$ and we are trying to partition them to $k$ clusters of $n$ points such that from each set every point is mapped to a different cluster and the sum of variances of each cluster is minimized, i.e., we are trying to solve the $k$-means problem but with the constraint that points from the same set can’t go to the same cluster.

I know that even for $n=3$ this problem is NP-hard in $k$ but I’m trying to figure out if for constant $k$ this problem is NP-hard in $n$?

I’ve looked in the Wikipedia page of NP-hard problems and tried to think if I can do a reduction to one of them but i couldn’t think of any way to do so.

optimization – How do I assign homes to hospitals based on locality? (clustering, kmeans?)

I have a large set of $(X)$ hospitals and $(Y)$ homes, where $(Y)$ is much larger than $(X)$, and their respective coordinates. Each hospital can handle any home within a 50 mile radius, and up to 10,000 homes. Homes can be assigned to one hospital. How do I create assignments of homes to hospitals such that as many homes can be assigned a hospital? Performance doesn’t matter that much.

I was thinking of potentially getting each hospital to do a breadth first search to reach as many homes as possible near it. For this, I was thinking of calculating the distance to all homes from each hospital, then going through each home and matching with the nearest hospital until all homes are filled or can’t be any more.

Would this be a good approach? What would a better approach be? Are there clustering algorithms that could help here such as kmeans?

clustering – SQL Failover Cluster

First sorry for my English, I am French speaking 🙂

I’ve searched and did not found any information on how to do this.

Let me explain (see file attached)

I know how to setup an SQL failover cluster, did it and it works perfectly.

Here is my situation : Node1 and Storage A is in one location (city) and Node 2 and Storage B is in a second location (different city).

Node 1 and Node 2 are connected to Storage A and Failover works perfectly.

NOW here is the catch.

How to setup so that if Storage A fails and I want to failover to Storage B.

I am wandering if its even possible since I have not found a single example on the net.

Thanks you all.

SQL Failover Cluster

hierarchical clustering – Getting gravity cluster centers (centroids) coordinates of each label of SciKitlearn AgglomerativeClustering function in Python

I’m newbie in ML / non supervised clustering and especially in Hierarchical clustering. I was wondering how getting / computing gravity centers coordinates of each label (labels_) found as the result of SciKitlearn function AgglomerativeClustering found

cluster = AgglomerativeClustering (n_clusters = , affinity= ‘euclidean’, linkage= ‘ward’) #compute_distances= True)
cah = cluster.fit(centroids)
cah_labels = cah.labels_

From this function with n = 2 clusters I found 50 labels cah.labels_ (and np.unique(cah_labels) = 14

I investigated a lot on scipy methods that may be used to solve that problem but not able to found one that may help

I will need these gravity centers to use later with k-mean function as a mix classification for a non supervised ML problem

Thanks in advance for any advises on this !

node.js – My approach on nextjs clustering and socket handeling

I’m interested in feedback on my approach to handling cluster on Next.js SSG app with express hosted on Heroku.

The app is working however, please let me know if this is the wrong approach or if you see any potential mistake.

Project specs:
sticky-session : “^1.1.2”
express : “^4.17.1”,
next : “^10.0.5”,
socket.io : “^2.4.0”,
socket.io-client : “^2.4.0”,
socket.io-redis : “^5.4.2”

Server code:
const sticky = require(“sticky-session”);

const app = require("express")();
const server = require("http").Server(app);
const io = require("socket.io").listen(server);
io.set("transports", ("websocket"));
const redis = require("socket.io-redis");

const PORT = process.env.PORT || 3000;
const next = require("next");
const dev = process.env.NODE_ENV !== "production";
const nextApp = next({ dev });
const handle = nextApp.getRequestHandler();

const keys = require("./config/keys");
const mongoose = require("mongoose");
const cookieSession = require("cookie-session");
const passport = require("passport");
const bodyParser = require("body-parser");
const cookieParser = require("cookie-parser");
const sslRedirect = require("heroku-ssl-redirect").default;

io.adapter(redis(keys.REDIS_URL));

const worker = () => {
    nextApp
        .prepare()
        .then(() => {
            app.use(cookieParser());

            app.use(bodyParser.json());

            app.use(
                cookieSession({
                    maxAge: 30 * 24 * 60 * 60 * 1000,
                    keys: (keys.cookieKey),
                })
            );

            app.use(passport.initialize());
            app.use(passport.session()); // persistent login sessions

            // IMPORT MODELS
            mongoose.connect(keys.mongoURI, {
                useNewUrlParser: true,
                useUnifiedTopology: true,
            });

            const db = mongoose.connection;

            db.once("open", function () {
                console.log("MongoDB database connection established successfully");
            });
            db.on("error", console.error.bind(console, "MongoDB connection error:"));



            require("./sockets/index")(io);
            require("./routes/api")(app, io);

            app.get("*", (req, res) => {
                return handle(req, res);
            });
        })
        .catch((ex) => {
            console.error(ex.stack);
            process.exit(1);
        });
};

//sticky.listen() will return false if Master
if (!sticky.listen(server, PORT)) {
} else {
    worker();
}

On the client I simply use:

const socket = io({ transports: ("websocket") });

Heroku settings:

heroku features:enable http-session-affinity

clustering – Why setting up new galera cluster with mariabackup as sst starts but all other nodes failed with same error?

I did a fresh reinstallation of mariadb-server on all the nodes
(I removed using sudo apt purge mariadb-*)

I started the first node using sudo galera_new_cluster it went fine and is still running. but other nodes threw this error:

● mariadb.service - MariaDB 10.3.27 database serverLoaded: 
loaded (/lib/systemd/system/mariadb.service; 
enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Sat 2020-12-19 20:23:19 IST; 2min 9s ago
Docs: man:mysqld(8)
Process: 7089 
ExecStartPre=/usr/bin/install -m 755 -o mysql -g root -d /var/run/mysqld (code=exited, status=0/SUCCESS)Process: 7090 ExecStartPre=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)Process: 7092 ExecStartPre=/bin/sh -c ( ! -e /usr/bin/galera_recovery ) && VAR= || VAR=cd /usr/bin/..; /usr/bin/galera_recovery; ( $? -eq 0 ) && s`
    Process: 7330 ExecStart=/usr/sbin/mysqld $MYSQLD_OPTS $_WSREP_NEW_CLUSTER $_WSREP_START_POSITION (code=exited, status=1/FAILURE)Main PID: 7330 (code=exited, status=1/FAILURE)
    Status: "MariaDB server is down"
    Dec 19 20:22:53 phl-pi-3 systemd(1): Starting MariaDB 10.3.27 database server...
    Dec 19 20:22:59 phl-pi-3 sh(7092): WSREP: Recovered position 00000000-0000-0000-0000-000000000000:-1
    Dec 19 20:22:59 phl-pi-3 mysqld(7330): 2020-12-19 20:22:59 0 (Note) /usr/sbin/mysqld (mysqld 10.3.27-MariaDB-0+deb10u1-log) starting as process 7330 ...
    Dec 19 20:22:59 phl-pi-3 mysqld(7330): 2020-12-19 20:22:59 0 (Warning) Could not increase number of max_open_files to more than 16384 (request: 32186)
    Dec 19 20:23:19 phl-pi-3 systemd(1): mariadb.service: Main process exited, code=exited, status=1/FAILURE
    Dec 19 20:23:19 phl-pi-3 systemd(1): mariadb.service: Failed with result 'exit-code'.
    `Dec 19 20:23:19 phl-pi-3 systemd(1): Failed to start MariaDB 10.3.27 database server.``

this is my galera config:

(mysqld)#mysql settings
binlog_format=ROW
default-storage-engine=innodb
innodb_autoinc_lock_mode=2
innodb_doublewrite=1
query_cache_size=0query_cache_type=0
bind-address=0.0.0.0
#galera settings
wsrep_on=ON
wsrep_provider=/usr/lib/galera/libgalera_smm.so
wsrep_cluster_name="test_cluster"
wsrep_cluster_address=gcomm://192.168.0.15,192.168.0.16,192.168.0.12,10.8.0.6
wsrep_node_address="192.168.0.15"
wsrep_sst_method=mariabackup
wsrep_sst_donor=192.168.0.16

all other nodes have same galera config except different wsrep_address and dont have wsrep_sst_donor set.

And the other server config is as below:

$ cat 50-server.cnf
# These groups are read by MariaDB server.
# Use it for options that only the server (but not clients) should see
## See the examples of server my.cnf files in /usr/share/mysql/
## this is read by the standalone daemon and embedded servers
(server)
skip_name_resolve = 1
# this is only for the mysqld standalone daemon
(mysqld)
transaction_isolation = READ-COMMITTED
binlog_format = ROW
innodb_large_prefix=on
innodb_file_format=barracuda
innodb_file_per_table=1
innodb_io_capacity=4000
# * Basic Settings
user = mysqlpid-file = /var/run/mysqld/mysqld.pid
socket = /var/run/mysqld/mysqld.sock
port = 3306
basedir = /usr
datadir = /var/lib/mysql
tmpdir = /tmp
lc-messages-dir = /usr/share/mysql
skip-external-locking
bind-address = 0.0.0.0
# * Fine Tuning
key_buffer_size = 16
Mmax_allowed_packet = 16M
thread_stack = 192K
thread_cache_size = 8
myisam_recover_options = BACKUP
#max_connections = 100
#table_cache = 64
#thread_concurrency = 10
## * Query Cache Configuration
#query_cache_limit = 1M
query_cache_type = 1
query_cache_limit = 2M
query_cache_min_res_unit = 2k
query_cache_size = 64M
## * Logging and Replication
## Error log - should be very few entries.
#log_error = /var/log/mysql/error.log
server-id = 16
log_bin = mariadb_bin
expire_logs_days = 10
max_binlog_size = 100M
#binlog_do_db = include_database_name
#binlog_ignore_db = exclude_database_name
innodb_buffer_pool_size = 128M
innodb_buffer_pool_instances = 1
innodb_flush_log_at_trx_commit = 2
innodb_log_buffer_size = 32M
innodb_max_dirty_pages_pct = 90
# For generating SSL certificates you can use for example the GUI tool "tinyca".
## ssl-ca=/etc/mysql/cacert.pem
# ssl-cert=/etc/mysql/server-cert.pem
# ssl-key=/etc/mysql/server-key.pem
## Accept only connections using the latest and most secure TLS protocol version.
# ..when MariaDB is compiled with OpenSSL:
# ssl-cipher=TLSv1.2
# ..when MariaDB is compiled with YaSSL (default in Debian):
# ssl=on
## * Character sets## MySQL/MariaDB default is Latin1, but in Debian we rather default to the full
# utf8 4-byte character set. See also client.cnf
#character-set-server = utf8mb4
collation-server = utf8mb4_general_ci
tmp_table_size= 64Mmax_heap_table_size= 64M
slow_query_log = 1
slow_query_log_file = /var/log/mysql/slow.log
long_query_time = 1

all other nodes have same as above except different server-id