postgresql – Does daily pg_dump mess up postgres cache?

I migrated my geospatial Postgres 12.5 database to another cloud provider. I use postgis and I have around 35GB of data and 8GB of memory.

Performances are way worse than on my previous provider, and new provider claims this is because the pg cache has to been “warmed up” everyday after automatic pg_dump backuping operations occuring in the night.

Geospatial queries that would normally take 50ms sometimes take 5-10s on first request, and some that would run in 800ms take minutes.

Is there something else looming or is the technical support right ?

If so, should I disable daily backups ? Or can I somehow use a utility function to restore the cache ? (pg_prewarm ?)

postgresql – Update Null columns to Zero dynamically in Redshift

Here is the code in SAS, It finds the numeric columns with blank and replace with 0’s

DATA dummy_table;
SET dummy_table;
ARRAY DUMMY _NUMERIC_;
DO OVER DUMMY;
  IF DUMMY=. THEN DUMMY=0;
END;
RUN;

I am trying to replicate this in Redshift, here is what I tried

create or replace procedure sp_replace_null_to_zero(IN tbl_nm varchar) as $$
Begin

Execute 'declare ' ||
            'tot_cnt int := (select count(*)  from information_schema.columns where table_name = ' || tbl_nm || ');' ||
            'init_loop int := 0; ' ||
            'cn_nm varchar; ' 
  
Begin
  While init_loop <= tot_cnt    
  Loop
  Raise info 'init_loop = %', Init_loop; 
  Raise info 'tot_cnt = %', tot_cnt; 
  
  Execute 'Select column_name into cn_nm from information_schema.columns ' ||
  'where table_name ='|| tbl_nm || ' and ordinal_position = init_loop ' ||
  'and data_type not in (''character varying'',''date'',''text''); '
  
  Raise info 'cn_nm = %', cn_nm;  
  
    if cn_nm is not null then
      Execute 'Update ' || tbl_nm ||
              'Set ' || cn_nm = 0 ||
              'Where ' || cn_nm is null or cn_nm =' ';
    end if;
 init_loop = init_loop + 1;
 end loop;             
End;  
End;
$$ language plpgsql;

Issues I am facing

  1. When I pass the Input parameter here, I am getting 0 count

    tot_cnt int := (select count(*) from information_schema.columns where table_name = ‘ || tbl_nm || ‘);’

For testing purpose I tried hardcode the table name inside proc, I am getting the error amazon invalid operation: value for domain information_schema.cardinal_number violates check constraint “cardinal_number_domain_check”

Is this even possible in redshift, How can I do this logic or any other workaround.

Need Expertise advise here!!

postgresql – Install postgres in VeraCrypt container

I would like to install postgres in a VeraCrypt container, to prevent saving data in plain text. During installtion process on the mounted container I got the error

"Problem running post-install step. Installation may not correctly. The database cluster initialisation failed."

After finishing the installation I couldn´t connect to the database via the command line psql (“postgres could not connect to server connection refused”).

I am using the windows installer on Windows 10.

How could I handle this problem?

postgresql – Suppressing SELECT output before gexec?

This answer is great for when I needed to reown 150 tables, each with 40 tables, but displays the 6000 rows from SELECT before running the 6000 ALTER statements.

Thus, as stated in the Subject, is there any way to suppress the SELECT output?

sides=> SELECT format(
sides(>   'ALTER TABLE %I.%I.%I OWNER TO sides_owner;',
sides(>   table_catalog,
sides(>   table_schema,
sides(>   table_name
sides(> )
sides-> FROM information_schema.tables
sides-> WHERE table_schema = 'strans';
                                       format                                        
---------------------------------------------------------------------
 ALTER TABLE sides.strans.foo OWNER TO sides_owner;
 ALTER TABLE sides.strans.blarg_p2020_02 OWNER TO sides_owner;
 ALTER TABLE sides.strans.blarg_p2020_03 OWNER TO sides_owner;
 ALTER TABLE sides.strans.blarg_p2020_04 OWNER TO sides_owner;
 ALTER TABLE sides.strans.blarg_error_p2019_01 OWNER TO sides_owner;
 etc
 etc
 etc

postgresql – Explaining what the OR operator does in the following code

I ran into the following SQL code which (successfully) outputs a list of parents and the age of their youngest child:

SELECT parents.name AS name, MIN(children.age) AS age FROM people children
INNER JOIN people parents ON (parents.id = children.fatherId OR parents.id = children.motherId)
GROUP BY parents.id

The code self joins a table named “people” on itself. I just wanted to ask how does the OR operator work here? I know OR as a logical operator but here it seems it does something else. It takes two arguments and just joins on both of them. What does it have to do with logical OR?

PostgreSQL: How to backup only One Schema from a database and restore it to other schema having same name of database?

I have a database named "A" which has three schemas "B","C" and "D". I want to take backup of schema "C" and restore the data to Schema "B"? not sure how to do this as I am new to Postgres. pls help with commands.

Add auto increment to already existing primary key column PostgreSQL

I have a database schema of the following table:

database=# d person
              Table "public.person"
   Column    |         Type          | Modifiers
-------------+-----------------------+-----------
 person_id   | smallint              | not null
 fname       | character varying(20) |
 lname       | character varying(20) |
 eye_color   | color_enum            |
 birth_date  | date                  |
 street      | character varying(30) |
 city        | character varying(20) |
 state       | character varying(20) |
 country     | character varying(20) |
 postal_code | character varying(20) |

I want to add AUTO_INCREMENT in one ALTER statement the way we can do in MySQL

ALTER TABLE person MODIFY person_id SMALLINT UNSIGNED AUTO_INCREMENT;

I have tried this in Postgres but I am getting this error:

ALTER TABLE person ALTER COLUMN person_id SERIAL;
ERROR:  syntax error at or near "SERIAL"

I have seen we can create a sequence in the following fashion

ALTER SEQUENCE tablename_colname_seq OWNED BY tablename.colname;
CREATE SEQUENCE test_id_seq OWNED BY test1.id;
ALTER TABLE test ALTER COLUMN id SET DEFAULT nextval('test_id_seq');
  UPDATE test1 SET id = nextval('test_id_seq');

But this is too much boilerplate code. Is there a one-line statement to add AUTO_INCREMENT to an existing column in Postgres?

Postgres Version: 9.6.16

After doing the boilerplate code, I am trying to INSERT using the following query:

 INSERT INTO person
 (person_id, fname, lname, eye_color, birth_date)
 VALUES (null, 'William','Turner', 'BR', '1972-05-27');

ERROR:  null value in column "person_id" violates not-null constraint
DETAIL:  Failing row contains (null, William, Turner, BR, 1972-05-27, null, null, null, null, null).

Is there a workaround by which I can pass null values to primary key where the value of that coloumn is from the sequence?

recovery – Restore partially broken PostgreSQL Database

I have managed to break my own server, and undo months of work.

The thing is, I hosted a nextcloud instance on the server, and had it configured to use WasabiS3 as primary storage, which means the data is still intact.

However, the files are stored in a urn:oid format, which means without the database, the data is pretty much useless.

I need to recover the database that contains the filetable, but I cannot connect to pgsql.

After restoring the corrupted folder to /var/lib/postgresql/12/main, I have successfully managed to start the postgresql server.

Trying to log in using the pgsql command, however, this error is encountered.

psql: error: FATAL:  "base/16408" is not a valid data directory
DETAIL:  File "base/16408/PG_VERSION" is missing.

And, the directory /var/lib/postgresql/12/main/base/16408 is indeed empty.

However, I would at least like to restore part of the database if that is possible.
Is this a possibility?

Thanks in advance.

postgresql – Why is an unindexed range operator (

NB This is the same setup as this question, where here I’m asking specifically about something I was specifically not asking about over there.

I’ve got a table with a column utc timestamptz, with a “btree” index on the utc column:

CREATE TABLE foo(utc timestamptz)

CREATE INDEX ix_foo_utc ON foo (utc);

This table contains about 500 million rows of data.

When I filter utc using BETWEEN, the query planner uses the index as expected:

> EXPLAIN ANALYZE
SELECT
   utc
FROM foo
WHERE
    utc BETWEEN '2020-12-01' AND '2031-02-15'
;

QUERY PLAN
Bitmap Heap Scan on foo  (cost=3048368.34..11836322.22 rows=143671392 width=8) (actual time=12447.905..165576.664 rows=150225530 loops=1)
Recheck Cond: ((utc >= '2020-12-01 00:00:00+00'::timestamp with time zone) AND (utc <= '2031-02-15 00:00:00+00'::timestamp with time zone))
Rows Removed by Index Recheck: 543231
Heap Blocks: exact=43537 lossy=1818365
->  Bitmap Index Scan on ix_foo_utc  (cost=0.00..3012450.49 rows=143671392 width=0) (actual time=12436.236..12436.236 rows=150225530 loops=1)
Index Cond: ((utc >= '2020-12-01 00:00:00+00'::timestamp with time zone) AND (utc <= '2031-02-15 00:00:00+00'::timestamp with time zone))
Planning time: 0.127 ms
Execution time: 172335.517 ms

I could write the same query (ignoring RHS-exclusivity) using a range operator without an index at all:

> EXPLAIN ANALYZE
SELECT
   utc
FROM quotation.half_hour_data
WHERE
    utc <@ tstzrange('2020-12-01', '2031-02-15')
;

QUERY PLAN
Gather  (cost=1000.00..9552135.30 rows=2556133 width=8) (actual time=0.179..145303.094 rows=150225530 loops=1)
Workers Planned: 2
Workers Launched: 2
->  Parallel Seq Scan on foo  (cost=0.00..9295522.00 rows=1065055 width=8) (actual time=5.321..117837.452 rows=50075177 loops=3)
"Filter: (utc <@ '(""2020-12-01 00:00:00+00"",""2031-02-15 00:00:00+00"")'::tstzrange)
Rows Removed by Filter: 120333718
Planning time: 0.069 ms
Execution time: 153384.494 ms

I would have expected the query planner to realise that these are doing the same operation (albeit that <@ is right-hand exlusive and BETWEEN is inclusive.)

How can the unindexed query with <@ be faster than the indexed query with BETWEEN?

Surely if ignore an index is faster, the query planner should know that in advance?

Or is this something specific to do with the amount of memory my PG instance has, and the size of the query (big!)


My Postgres version:

"PostgreSQL 10.13 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-11), 64-bit"

performance – PostgreSQL: Simple IN vs IN values performace

We have queries with IN filters with long list of INT values

Sometimes, they running extremely slow, and I have found suggestion to use syntax

Field IN (VALUES(1465), (1478), ...

Instead of

Field IN (1465, 1478, ...

On some cases it helps, but on other it makes query running 1000+ times slower

So the issue is:

  • some queries much faster with simple IN
  • some queries much faster with IN + VALUES

Here is explain on case when it’s extremely slow:

->  HashAggregate  (cost=5.78..9.62 rows=385 width=4)
       Group Key: ""*VALUES*"".column1" 
                   ->  Values Scan on ""*VALUES*""  (cost=0.00..4.81 rows=385 width=4)" 

What is the right way to pass long INT values list to IN filter?

I am using PostgreSQL 13.1 on Ubuntu