Ext. Links
- Manuals: (High quality)

- PostgreSQL internals (Hironobu SUZUKI) [TODO]

- DB engines Rank:

- Percona Blog:

- PSQL Hackers list (Patches, dev. discussions, ...)

- P. A. Alsberg and J. D. Day. A principle for resilient
  sharing of distributed resources. In Proceedings of the 2Nd International Conference on Software Engineering
  ICSE ’76, pages 562–570, Los Alamitos, CA, USA, 1976. IEEE Computer Society Press.
-  E. Cecchet, G. Candea, and A. Ailamaki.
  Middleware-based database replication: The gaps between theory and practice. In Proceedings of the
  2008 ACM SIGMOD International Conference on Management of Data , SIGMOD ’08, pages 739–752,
  New York, NY, USA, 2008. ACM
- M. Stonebraker. Concurrency control and consistency of multiple copies of data in distributed ingres.
  IEEE Transactions on Software Engineering,
  SE-5(3):188–194, May 1979
- M. Wiesmann, F. Pedone, A. Schiper, B. Kemme, and G. Alonso. Understanding replication in databases
  and distributed systems. In Proceedings 20th IEEE International Conference on Distributed Computing
  Systems, pages 464–474, April 2000.
- . H. Thomas. A majority consensus approach to concurrency control for multiple copy databases.
  ACM Trans. Database Syst.  , 4(2):180–209, June 1979. H. Thomas. A majority consensus approach to
  concurrency control for multiple copy databases.  ACM Trans. Database Syst.  , 4(2):180–209, June 1979
- M. Wiesmann, F. Pedone, A. Schiper, B. Kemme, and G. Alonso. Understanding replication in databases
  and distributed systems. In Proceedings 20th IEEE International Conference on Distributed Computing
  Systems , pages 464–474, April 2000.

Who is Who @[]
PostgreSQL 101
BºCore ENV.VARSº              BºShow DB infoº
  PGPORT                        mydb=˃ SELECT version();
  PGUSER  (alt. -U option)      mydb=˃ SELECT current_date;
                                mydb=˃ SELECT current_database();
                                mydb=˃ SELECT rolname  FROM pg_roles; ← List users
                                - A ROLE is an entity that can own database objects and have database 
                                  privileges. It can be considered a "user", a "group", or both depending on 
                                - New roles can be created with CREATE ROLE

Bºpsql commands:º
  \l            list databases
  \c db_name    connect to ddbb
  \q            quit
  \dt           show all tables in ddbb
  \dt º.º.      show all tables globally
  \d  table     show table schema
  \d+ table
  \du           list current user's permissions

BºPostgresSQL instance entities "layout"º:
 Server  == DDBB Cluster  1 ←→ N GºCatalogº  1 ←→ N Schema
 └──────────┬──────────┘         Gº(Database)º      └──┬─┘
            │                                 NAMESPACE FOR TABLES
            │                                 and security boundary
  ┌─────────┘                                 └─────────┬─────────┘
  ├GºDatabasesº                        ┌────────────────┘
  │  │                         ┌───────┴───────┐
  │  ├─ postgres               ºSCHEMA COMMANDSº
  │  │  │
  │  │  ├─ Casts               \dn  ← list existing schemas
  │  │  │                      SELECT schema_name FROM information_schema.schemata;
  │  │  ├─ Catalogsº*1º        SELECT nspname     FROM pg_catalog.pg_namespace;
  │  │  │
  │  │  ├─ Event Triggers      @[]
  │  │  │                     ºCREATE SCHEMAºIF NOT EXISTS accountancy;
  │  │  ├─ Extensions         
  │  │  │                      CREATE TABLE accountancy.employee (....); ← employees table scoped 
  │  │  ├─ Foreing            
  │  │  │  Data Wrap           @[]
  │  │  │                     ºDROP SCHEMAºIF EXISTS accountancy CASCADE;
  │  │  ├─ Languages          └───┬────────────...
  │  │  │                         │
  │  │  └─ Schemas  ←─────────────┘
  │  │     └─ schema01
  │  │        └─ table01, table02, ...
  │  │
  │  ├─ myDDBB01
  │  ...
  ├─ Login/Group @[]
  └─ Tablespaces   ←  Allow storage-admins define mapping "objects ←→ file system"

º*1º: regular tables used internally by cluster instances to store
      schema metadata (tables⅋columns info, internal bookkeeping, ...)
      They are special in the sense that they must not be changed by hand, but
      just let the cluster instance manage them.

BºBootstrap new Postgresql Server (Cluster)º

  Bootstrap Cluster == Init storage area 
                       (data directory in FS terms) 
  Server 1 → N Databases

 $º$ sudo su postgres                     º ← Switch to postgres OS user
 $º$ initdb -D /usr/local/pgsql/data       º  ← alt.1
 $º$ pg_ctl -D /usr/local/pgsql/data initdbº  ← alt.2
    'postgres' and 'template1' ddbbs will be created automatically
BºStarting the Serverº
 $º$ sudo su postgres                     º ← Switch to postgres OS user
 $º$ postgres -D /usr/local/pgsql/data \  º ← Alt.1
 $º$ 1˃/var/log/postgresql/logfile 2˃⅋1 ⅋ º
 $º$ pg_ctrl start \                      º ← Alt.2. Using pg_ctl "easy" wrapper
 $º$   -l /var/log/postgresql/logfile'    º

 $º$ sudo su root                         º ← Alt.3. SystemD (recomended in SystemD enabled OSs)
 $º$ systemctl enable postgresql.service  º ← Enable at OS reboot
 $º$ systemctl start postgresql.service   º ← Start now without rebooting
 $º$ journalctl --unit postgresql.service º ← Check SystemD unit logs 

BºCRUD Usersº
 $º$ sudo su postgres                     º ← Switch to postgres OS user
 $º$ psql                                 º
 # ºCREATE USER IF NOT EXISTSºmyUser01ºWITH PASSWORDº'my_user_password';
 # ºALTER USERºmyUser01ºWITH PASSWORDº'my_new_password';
 # ºDROP USER IF EXISTSºmyUser01 ;

BºGranting privileges to usersº
 #ºGRANT ALL PRIVILEGES ONºtableºTOºmyUser01;           ← Grant all permissions to table
 #ºGRANT ALL PRIVILEGES ON DATABASEºmyDB01ºTOºmyUser01; ← Grant all permissions in DB 
                                                         (tables, connection, ...)
 #ºGRANT CONNECT ON DATABASEºmyDB01ºTOºmyUser01;        ← Grant connection permissions to DB
 #ºGRANT USAGE ON SCHEMAºpublicºTOºmyUser01;            ← Grant permissions on schema
 #ºGRANT EXECUTE ON ALL FUNCTIONS IN SCHEMAºpublic      ← Grant permissions to functions

 #ºGRANT SELECT, UPDATE, INSERT ON ALL TABLES IN SCHEMAº← Grant permissions select/... on all tables

 #ºGRANT SELECT, INSERT       ONºmyTable01ºTOºmyUser01; ← Grant permissions on a single table

BºCreate/Remove new DDBBº
  (PRE-SETUP: Create user account for the new DDBB, ex: department01, ...)
$º$ createdb mydbº ← create new
$º$ psql mydb    º ← access it to check everything is OK
$º$ dropdb   mydbº ← Remove. RºWARNº: can NOT be undone
  NOTE:  Many tools assume ddbb names -U flag or PGUSER Env.Var as username 
         by default

BºCreate tableº
  -- DO $$
  -- BEGIN
  --   EXECUTE 'ALTER DATABASE ' || current_database() || ' SET TIMEZONE TO UTC';
  -- END; $$;
    ID         VARCHAR(40)             NOT NULL
                                       MYTABLE01_PKºPRIMARY KEYº,
    NAME       VARCHAR(255)            NOT NULL,
    PUB_KEY    VARCHAR(88)             NOT NULL UNIQUE,
    PARENT_ID  VARCHAR(40)           RºNULLº
                                          myTable01_FK REFERENCES myTable01(ID),
 ºCREATE IF NOT EXISTS INDEXº        ← Create new index
           myTable01 (PUB_KEY);
  ALTER TABLE myTable01ºADD PRIMARY KEYº(ID); ← Alternative to create PRIMARY KEY

BºROLESº @[]
  ALTER ROLE name [ [ WITH ] option [ ... ] ]

  where option can be:
      | LOGIN      | NOLOGIN
      | CONNECTION LIMIT connlimit
      | VALID UNTIL 'timestamp'
    ALTER ROLE name RENAME TO new_name

    ALTER ROLE name [ IN DATABASE database_name ] SET configuration_parameter { TO | = } { value | DEFAULT }
    ALTER ROLE name [ IN DATABASE database_name ] SET configuration_parameter FROM CURRENT
    ALTER ROLE name [ IN DATABASE database_name ] RESET configuration_parameter
    ALTER ROLE name [ IN DATABASE database_name ] RESET ALL

BºBACKUP/RESTOREº   [disaster_recovery]
  ºBACKUPº                                ºRESTOREº
$º$ pg_dump ${dbName} ˃ dbName.sqlº      $º$ pg_restore -d ddbb_name  -a backup.sql º
    └──────┬────────┘                        └──────┬────────┘
    backup full ddbb (schema + data).        restore full ddbb (schema + data).
    (or --data-only | --schema-only )        (or --data-only | --schema-only )
$º$ pg_dumpall ˃ pgbackup.sql     º
    backup all ddbbs

EXPORT/IMPORT (COPY) file \copy myTable01 TO '/home/user/weather.csv' CSV \copy myTable01(col1,col2) TO '/home/user/weather.csv' CSV \copy myTable01 FROM '/home/user/weather.csv' CSV \copy myTable01(col1,col2) FROM '/home/user/weather.csv' CSV
Table Maintenance VACUUM ANALYZE table; ← VACUUM: (Compact table after many deletion holes) REINDEX DATABASE dbName; ← Reindex a database, table or index [performance] @[] EXPLAIN SELECT * FROM table; ← Show query plan: [performance]
Rotate logs [TODO]
AAA "Filters"
  Note: "Filter" is an invented but intuitive nomenclature based on IP filters)

- RºWARNº: initial AAA setup allows any local user connect and become ddbb superuser.
       1 - use one initdb -W, --pwprompt or --pwfile options
           to assign a password to the database superuser.
       2 - set -A md5 | -A password to disable  default trust Authen. mode.
       3 - modify auto-generated ºpg_hba.confº after running initdb and
           before starting server for the first time.
Bº│ 1st Filter: host based (pg_hba.conf) to authorize incoming connections.│º
    official doc: @[]

  BºIMPORTANT:º order of the entries matters.

    host    · ddbb1 · pguser · ip1/32 · md5 Rº*1º           ← connections from 'ip1' only allowed 
            ·       ·        ·        ·                      from user pguser and only to ddbb percona.
            ·       ·        ·        ·                      using  md5 password authentication. º*2º

    hostssl · all   · all    · ip2/32 · md5               ← SSL connections allowed 
            ·       ·        ·        ·                      to any user@ip2 to any ddbb 
    hostssl · all   · all    · ip3/32 · cert clientcert=1 ← SSL connections allowed  
            ·       ·        ·        ·                      to any user@ to any ddbb 
   └──┬───┘  └─┬──┘   └─┬──┘   └─┬──┘   └──────┬────────┘    presenting valid client cert. º*2º
                                         (MD5, SCRAM,SSL certs,
                                          PAM/LDAP, Kerberos,...)º*3º
  Rº*1º MD5 hasing problem: always return same hash for same password.
        - in ver.10+ prefer SCRAM (SHA-256 with salts)
          for extra security against password-dictionary attacks. 
   º*2º @[]
   º*3º @[]

Bºdata at reset encryp. Optionsº [TODO]
  - The pgcrypto module allows certain fields to be stored encrypted. This
    is useful if only some of the data is sensitive. The client supplies the
    decryption key and the data is decrypted on the server and then sent to the client.

Bº│ 2nd Filter: roles and privileges │º
    user 1 ←-·······→ N Roles 1←···········→ M privileges
    developer01 ←─┐*1 dev_read_only              ... 
    developer02 ←─┴─→ dev_read_write             
                      admin_read_only        ┌→ SELECT on hhrr.employee
                      admin_read_write       ├→ INSERT on hhrr.employee
    manager01 ←─*2 ─→ managers ←─┐           ├→ UPDATE on hhrr.employee
    manager02 ←─*2 ─→ managers ←─┴───────────┴→ DELETE on hhrr.employee
    ^                                  ^                  ^
    ·     *1 GRANT dev_read_wite to \  ·     CREATE TABLE hhrr.employee (
    ·        developer01, developer02; ·       id INT,
    ·                                  ·       first_name VARCHAR(20), ...,
    ·     *2 CREATE ROLE managers;     ·       manager VARCHAR(20) );
    ·     *2 GRANT managers      to \  ·
    ·        manager01, manager02;     ·
    ·        ┌·························┘
    ·        ON hhrr.employee TO managers;
    └······· CREATE USER manager01 WITH ENCRYPTED PASSWORD 'manager01'; 
             CREATE USER manager02 WITH ENCRYPTED PASSWORD 'manager02';

Bº│ 3rd Filter, Row level Security (9.5+) │º
    INSERT INTO hhrr.employee VALUES \
      (1, 'user01','...','manager01'),
      (2, 'user02','...','manager02'),
      (3, 'user03','...','manager01');

    GRANT USAGE ON SCHEMA hhrr TO managers;

    CREATEºPOLICYºemployee_managers  ← allows the managers role to 
      ON hhrr.employee                 only view/modify their own
      TO managers                      subordinates’ records:
     ºUSING (manager = current_user);º

  $º$ psql -d ddbb01 -U manager01 º          $º$ psql -d ddbb01 -U manager02 º
    ...                                        ...
    =→  select * from hhrr.employee ;          =→  select * from hhrr.employee ;
    id │ first_name │ last_name │ manager      id │ first_name │ last_name │ manager
    ───┼────────────┼───────────┼─────────    ────┼────────────┼───────────┼─────────
     1 │ user01     │ ...       │ manager01     1 │ user02     │ ...       │ manager02
     3 │ user03     │ ...       │ manager01                                           
     (2 rows)                                   (1 rows)

postgresql.conf (Resource tunning)

- Set config Settings for logs, buffers, cahce, ...
  - Re-read changes with ºSIGHUPºsignal or $ºpg_ctl reloadº

BºPGConfigº: Online postgresql.conf tool: [low_code][perfomance]

 Example INPUT:            →  OUTPUT

 DB Version: 13               max_connections = 20
 OS Type: linux               shared_buffers = 3840MB
 DB Type: web                 effective_cache_size = 11520MB
 Total Memory (RAM): 15 GB    maintenance_work_mem = 960MB
 CPUs num: 4                  checkpoint_completion_target = 0.7
 Connections num: 20          wal_buffers = 16MB
 Data Storage: ssd            default_statistics_target = 100
                              random_page_cost = 1.1
                              effective_io_concurrency = 200
                              work_mem = 96MB
                              min_wal_size = 1GB
                              max_wal_size = 4GB
                              max_worker_processes = 4
                              max_parallel_workers_per_gather = 2
                              max_parallel_workers = 4
                              max_parallel_maintenance_workers = 2

BºPgTune Config.Wizardº [low_code][perfomance]
- Python script taking the wimpy default postgresql.conf and 
expanding the cluster server to be as powerful as the hardware it's 
being deployed on.
- Server instance configuration
- can be used to debug or pre-test changes in conf.
- @[]
- log analyzer with detailed reports from PSQL log files
  (syslog, stderr or csvlog) with in-browser zoomable graph
- designed to parse huge log files as well as gzip compressed file
Upgrading the server
- @["]
  (pg_dumpall, pg_upgrade, replication)
Network latency does BIG difference

- Measuring the performance impact caused by latency

- "tc": linux command to control network settings in Linux kernel.
  It allows you to do all kinds of trickery such as adding latency,
  bandwidth limitations and so on.
- tcconfig: python wrapper around tc.
  $ sudo pip3 install tcconfig

- compare performance vs network latency:
  $ createdb test
  $ pgbench -i test
  dropping old tables...
  generating data...
  100000 of 100000 tuples (100%) done (elapsed 0.11 s, remaining 0.00 s)
  creating primary keys...

   NO Network Latency                           10 ms Network latency
                                                (# tcset --device lo --delay=10)
   $ pgbench -S -c 10 -h localhost \            $ pgbench -S -c 10 -h localhost \            
     -T 20 test                                   -T 20 test                                   
   starting vacuum...end.                       starting vacuum...end.
   transaction type:                            transaction type:
   scaling factor: 1                            scaling factor: 1
   query mode: simple                           query mode: simple
   number of clients: 10                        number of clients: 10
   number of threads: 1                         number of threads: 1
   duration: 20 s                               duration: 20 s
   number of TX processed: 176300               number of TX processed: 9239
   latency average = 1.135 ms                   latency average = 21.660 ms
 Bºtps = 8813.741733º...                      Rºtps = 461.687712º(...)
                                                      20 times slower!!!.
                                                      Specailly painful on OLTP application
                                                      (sort queries)
                                                With 50ms performance drops 100xtimes!!
Performance Optimization
- Enable autovacuum. The working memory for autovacuum should be no more than 2% of
    the total available memory.
- Enable database caching with an effective cache size between 6% and 8% of the total
    available memory.
- To increase performance, the working memory should be at least 10% of the total
    server_encoding = UTF8
    lc_collate = en_US.UTF-8
    lc_ctype = en_US.UTF-8

Tunning OS
(Shared Memory/Semaphores/...):
- Show all runtime parameters:
 º#- SHOW ALL;º
PSQL vs MSQL comparative for data analytics
PSQL: PostgreSQL's CSV support is very good with RFC4180 support
     (which is the closest thing there is to an official CSV standard)
      COPY TO
      Helpul error message in case of error with fail-fast approach:
      - Abort import on problem
      (vs silently corrupt, misunderstand or alter data)

PostgreSQL                     | MS SQL Server:
                               | IF OBJECT_ID (N'dbo.my_table', N'U') IS NOT NULL
                               | DROP TABLE dbo.my_table;

 º  This is very, very important for a robust analytics delivery methodology,º
 ºwhere tear-down-and-rebuild is the underlying principle of repeatable,     º
 ºauditable, collaborative analytics work.                                   º
  - drop  schema and all the database objects inside it.

PostgreSQL                  |  MS SQL Server
SELECT                      |    *
  *                         |  INTO
FROM                        |    good_films
  all_films                 |  FROM
WHERE                       |    all_films
  imdb_rating ˃= 8;         |  WHERE

  - In PostgreSQL, you can execute as many SQL statements as you like in one
  batch; as long as you've ended each statement with a semicolon, you can
  execute whatever combination of statements you like. For executing
  automated batch processes or repeatable data builds or output tasks, this
  is critically important functionality.
  - PostgreSQL supports the RETURNING clause, allowing UPDATE, INSERT and
  DELETE statements to return values from affected rows. This is elegant and
  useful. MS SQL Server has the OUTPUT clause, which requires a separate
  table variable definition to function. This is clunky and inconvenient and
  forces a programmer to create and maintain unnecessary boilerplate code.
  - PostgreSQL supports $$ string quoting, like so:
SELECT $$Hello, World$$ AS greeting;
    This is extremely useful for generating dynamic SQL because (a) it allows
  the user to avoid tedious and unreliable manual quoting and escaping when
  literal strings are nested and (b) since text editors and IDEs tend not to
  recogniise $$ as a string delimiter, syntax highlighting remains functional
  even in dynamic SQL code.
  - PostgreSQL lets you use procedural languages simply by submitting code to
  the database engine; you write procedural code in Python or Perl or R or
  JavaScript or any of the other supported languages (see below) right next
  to your SQL, in the same script. This is convenient, quick, maintainable,
  easy to review, easy to reuse and so on.
  - "Pure" declarative SQL is good at what it was designed for – relational
  data manipulation and querying. You quickly reach its limits if you try to
  use it for more involved analytical processes, such as complex interest
  calculations, time series analysis and general algorithm design. SQL
  database providers know this, so almost all SQL databases implement some
  kind of procedural language. This allows a database user to write imperative
  - style code for more complex or fiddly tasks.
  - PostgreSQL's procedural language support is exceptional:
    - PL/PGSQL: this is PostgreSQL's native procedural language. It's like Oracle's
        PL/SQL, but more modern and feature-complete.
    - PL/V8: the V8 JavaScript engine from Google Chrome is available in PostgreSQL.
      Even better, PL/V8 supports global (i.e. cross-function call) state,
      allowing the user to selectively cache data in RAM for fast random access.
      Suppose you need to use 100,000 rows of data from table A on each of 1,000,000
      rows of data from table B. In traditional SQL, you either need to join
      these tables (resulting in a 100bn row intermediate table, which will
      kill any but the most immense server) or do something akin to a scalar
      subquery (or, worse, cursor-based nested loops), resulting in crippling
      I/O load if the query planner doesn't read your intentions properly.
     º In PL/V8 you simply cache table A in memory and run a function on each        º
     ºof the rows of table B – in effect giving you RAM-quality access (             º
     ºnegligible latency and random access penalty; no non-volatile I/O load)        º
     ºto the 100k-row table. I did this on a real piece of work recently – my        º
     ºPostgreSQL/PLV8 code was about 80 times faster than the MS T-SQL solution      º
     ºand the code was much smaller and more maintainable. Because it took about     º
     º23 seconds instead of half an hour to run, I was able to run 20 run-test-modifyº
     ºcycles in an hour, resulting in feature-complete, properly tested, bug-free    º
     ºcode.                                                                          º
      (All those run-test-modify cycles were only possible because of DROP SCHEMA CASCADE
       and freedom to execute CREATE FUNCTION statements in the middle of a statement
       batch, as explained above. See how nicely it all fits together?)
    - PL/Python: Fancy running a SVM from scikit-learn or some
      arbitrary-precision arithmetic provided by gmpy2 in the middle of a SQL query?
      No problem!
    - PL/R
    - C: doesn't quite belong in this list because you have to compile it
      separately, but it's worth a mention.  In PostgreSQL it is trivially easy
      to create functions which execute compiled, optimised C (or C++ or assembler)
      in the database backend.
  - In PostgreSQL, custom aggregates are convenient and simple to use,
     resulting in fast problem-solving and maintainable code:
CREATE FUNCTION interest_sfunc(state JSON, movement FLOAT, rate FLOAT, dt DATE) RETURNS JSON AS
state.balance += movement;  //payments into/withdrawals from account
if (0 === dt.getUTCDate()) //compound interest on 1st of every month
  state.balance += state.accrual;
  state.accrual = 0;
state.accrual += state.balance * rate;
return state;
$$ LANGUAGE plv8;

  INITCOND='{"balance": 0, "accrual": 0}'

--assume accounts table has customer ID, date, interest rate and account movement for each day
CREATE TABLE cust_balances AS
  (interest(movement, rate, dt ORDER BY dt)-˃˃'balance')::FLOAT AS balance
Elegant, eh? A custom aggregate is specified in terms of an internal state
  and a way to modify that state when we push new values into the aggregate
  function. In this case we start each customer off with zero balance and no
  interest accrued, and on each day we accrue interest appropriately and
  account for payments and withdrawals. We compound the interest on the 1st
  of every month. Notice that the aggregate accepts an ORDER BY clause (since
  , unlike SUM, MAX and MIN, this aggregate is order-dependent) and
  PostgreSQL provides operators for extracting values from JSON objects. So,
  in 28 lines of code we've created the framework for monthly compounding
  interest on bank accounts and used it to calculate final balances. If
  features are to be added to the methodology (e.g. interest rate
  modifications depending on debit/credit balance, detection of exceptional
  circumstances), it's all right there in the transition function and is
  written in an appropriate language for implementing complex logic. (Tragic
  side-note: I have seen large organisations spend tens of thousands of
  pounds over weeks of work trying to achieve the same thing using poorer tools.)
- Date/Time
  all of which do exactly what you would expect. They also have fantastic
  range and precision, supporting microsecond resolution from the 5th
  millennium BC to almost 300 millennia in the future. They accept input in a
  wide variety of formats and the last one has full support for time zones
    - They can be converted to and from Unix time, which is very important
      for interoperability with other systems.
    - They also support the INTERVAL type, which is so useful it has its own
      section right after this one.
      SELECT to_char('2001-02-03'::DATE, 'FMDay DD Mon YYYY');  ← "Saturday 03 Feb 2001"

      SELECT to_timestamp('Saturday 03 Feb 2001', 'FMDay DD Mon YYYY');  ←TS 2001-02-03 00:00:00+00

    - PostgreSQL: the INTERVAL type represents a period of time, such as "30
  microseconds" or "50 years". It can also be negative, which may seem
  counterintuitive until you remember that the word "ago" exists. PostgreSQL
  also knows about "ago", in fact, and will accept strings like '1 day ago'
  as interval values (this will be internally represented as an interval of -
  1 days). Interval values let you do intuitive date arithmetic and store
  time durations as first-class data values. They work exactly as you expect
  and can be freely casted and converted to and from anything which makes sense
  - PostgreSQL arrays are supported as a first-class data type
    - eaning fields in tables, variables in PL/PGSQL, parameters to functions
      and so on can be arrays. Arrays can contain any data type you like,
      including other arrays. This is very, very useful. Here are some of the
      things you can do with arrays:

    - Store the results of function calls with arbitrarily-many return values, such as regex matches
    - Represent a string as integer word IDs, for use in fast text matching algorithms
    - Aggregation of multiple data values across groups, for efficient cross-tabulation
    - Perform row operations using multiple data values without the expense of a join
    - Accurately and semantically represent array data from other applications in your tool stack
    - Feed array data to other applications in your tool stack
  - PostgreSQL: full support for JSON, including a large set of utility functions for
     transforming between JSON types and tables (in both directions)
  - PostgreSQL: HSTORE is a PostgreSQL extension which implements a fast key-value store as a data type.
     Like arrays, this is very useful because virtually every high-level programming language has such
     a concept (associative arrays, dicts, std::map ...)
     There are also some fun unexpected uses of such a data type. A colleague
     recently asked me if there was a good way to deduplicate a text array. Here's
     what I came up with:
        SELECT akeys(hstore(my_array, my_array)) FROM my_table;
     i.e. put the array into both the keys and values of an HSTORE, forcing a
     dedupe to take place (since key values are unique) then retrieve the keys
     from the HSTORE. There's that PostgreSQL versatility again.

  - PostgreSQL:range types.
    Every database programmer has seen fields called start_date and end_date,
    and most of them have had to implement logic to detect overlaps. Some have even found, the hard way,
    that joins to ranges using BETWEEN can go horribly wrong, for a number of reasons.
    PostgreSQL's approach is to treat time ranges as first-class data types. Not only can you put a
    range of time (or INTs or NUMERICs or whatever) into a single data value, you can use a host of
    built-in operators to manipulate and query ranges safely and quickly. You
    can even apply specially-developed indices to them to massively accelerate
    queries that use these operators.
  - PostgreSQL: NUMERIC (and DECIMAL - they're symonyms) is near-as-dammit arbitrary
    precision: it supports 131,072 digits before the decimal point and 16,383 digits after the decimal point.
  - PostgreSQL: XML/Xpath querying is supported
  - PostgreSQL's logs, by default, are all in one place. By changing a couple of settings in a text file,
    you can get it to log to CSV (and since we're talking about PostgreSQL, it's proper CSV, not broken CSV).
    You can easily set the logging level anywhere from "don't bother logging
    anything" to "full profiling and debugging output".
    The documentation even contains DDL for a table into which the CSV-format
    logs can be conveniently imported.
    You can also log to stderr or the system log or to the Windows event log
    (provided you're running PostgreSQL in Windows, of course).
      The logs themselves are human-readable and machine-readable and contain data
    likely to be of great value to a sysadmin.
      Who logged in and out, at what times, and from where? Which queries are being
    run and by whom?
      How long are they taking? How many queries are submitted in each batch?
    Because the data is well-formatted CSV,
      it is trivially easy to visualise or analyse it in R or PostgreSQL itself or
    Python's matplotlib or whatever you like.
  - PostgreSQL comes with a set of extensions called contrib modules. There are libraries of functions,
    types and utilities for doing certain useful things which don't quite fall
    into the core feature set of the server. There are libraries for fuzzy
    string matching, fast integer array handling, external database connectivity,
    cryptography, UUID generation, tree data types and loads, loads more. A few
    of the modules don't even do anything except provide templates to allow
    developers and advanced users to develop their own extensions and custom functionality.
PostgreSQL for Analytics Apps
Intelligent Analytics with PostgreSQL
TODO: (Video) InfoQ: Developing an Intelligent Analytics App with PostgreSQL


- Azure Database for PostgreSQL brings together the community edition 
  database engine and capabilities of a fully managed service - so you 
  can focus on your apps instead of having to manage a database.
Non classified
- builds a powerful, extensible and performant GraphQL API from a
  PostgreSQL schema in seconds; saving you weeks if not months of
  development time.

- If you already use PostgreSQL then you understand the value that a
  strongly typed and well defined schema can bring to application
  development; GraphQL is the perfect match for this technology when it
  comes to making your data layer accessible to your frontend
  application developers (or even API clients). Why duplicate your
  authorization and business logic in a custom API when you can
  leverage the tried and tested capabilities built into the worlds most
  advanced open source database?

- If you are new to GraphQL then we recommend you read through the
  official introduction to GraphQL here before continuing through the
  PostGraphile documentation.

- By combining powerful features such as PostgreSQL's role-based
  grant system and row-level security policies with Graphile Engine's
  advanced GraphQL look-ahead and plugin expansion technologies,
  PostGraphile ensures your generated schema is secure, performant and

Some of the features we offer:
- Incredible performance - no N+1 query issues
- Extensibility via schema and server plugins
- Auto-discovered relations e.g. userByAuthorId
- Computed columns allowing easy expansion of your API
- Custom query procedures enabling arbitrary SQL queries
- Automatic CRUD mutations e.g. updatePost
- Custom mutation procedures enabling complex changes to be exposed simply
- Real-time features powered by LISTEN/NOTIFY and/or logical decoding

- The easiest way to get started is with the CLI interface;
 $º$ npx postgraphile -c postgres://user:pass@localhost/mydb \ º
 $º  --watch --enhance-graphiql --dynamic-json                 º

 (replacing user, pass and mydb with your PostgreSQL username,
  password and the name of your database)
Source control for the DDBB schema.
- Eliminate errors and delays when releasing databases.
- Deploys and Rollback changes for specific versions without needing
  to know what has already been deployed.
- Deploy database and application changes together so they always
  stay in sync.
- Supports code branching and merging
- Supports multiple developers
- Supports multiple database types
- Supports XML, YAML, JSON and SQL formats
- Supports context-dependent logic
- Cluster-safe database upgrades
- Generate Database change documentation
- Generate Database "diffs"
- Run through your build process, embedded in your application or on demand
- Automatically generate SQL scripts for DBA code review
- Does not require a live database connection

Java Maven Plugin:
AWS Serverless PostgreSQL

- "serverless" relational database service (RDS) in AWS Aurora.
- Automatically starts, scales, and shuts down database capacity
- per-second billing for applications with less predictable usage patterns.

- It's a *different implementation of the standard versions of these open-source databases.

From the RDS console:
- select the Amazon Aurora database engine PostgreSQL
- set new DB cluster identifier, specification of credentials,
- set capacity:
  - minimum and maximum capacity units for their database, in terms of Aurora Capacity Units (ACUs)
    – a combination of processing and memory capacity. Besides defining the ACUs, users can also
      determine when compute power should stop after a certain amount of idle time.

ºhow capacity settings will work once the database is availableº
- client apps transparently connect to a proxy fleet
  that routes the workload to a pool of resources that
  are automatically scaled.
- Scaling is very fast because resources are "warm" and
  ready to be added to serve your requests.

- Minimum storage: 10GB, will automatically grow up to 64 TB
 (based on the database usage) in 10GB increments
 ºwith no impact to database performanceº

ºpricing modelsº
- On-Demand Instance Pricing: pay by hour, no long-term commitments
- Reserved  Instance Pricing: steady-state database workloads


Reactive java Client

High performance reactive PostgreSQL client written in Java
(By Julien Viet, core developer of VertX and Java Crash shell)
Patroni: A Template for PostgreSQL HA with ZooKeeper, etcd or Consul

You can find a version of this documentation that is searchable and also easier
to navigate at

There are many ways to run high availability with PostgreSQL; for a list, see
the PostgreSQL Documentation.

Patroni is a template for you to create your own customized, high-availability
solution using Python and - for maximum accessibility - a distributed
configuration store like ZooKeeper, etcd, Consul or Kubernetes. Database
engineers, DBAs, DevOps engineers, and SREs who are looking to quickly deploy
HA PostgreSQL in the datacenter-or anywhere else-will hopefully find it useful.

We call Patroni a "template" because it is far from being a one-size-fits-all
or plug-and-play replication system. It will have its own caveats. Use wisely.

Note to Kubernetes users: Patroni can run natively on top of Kubernetes. Take a
look at the Kubernetes chapter of the Patroni documentation.
Error Reporting+Logging
Regex support
- fundamental in analytics work involving text processing tasks.
  """  A data analytics tool without regex support is like a bicycle without a
      saddle – you can still use it, but it's painful. """

- Great support in PostgreSQL
RºWARN:º Non-portable to other DDBBs.

- Exs:

  > SELECT * FROM my_table
       WHERE my_field ~ E'^([0-9])\\1+[aeiou]';   ← Get all lines starting with a repeated digit
                                                    followed by a vowel

  > SELECT SUBSTRING(my_field FROM E'\\y[A-Fa-f0-9]+\\y') ← Get first isolated hex-string
    FROM my_table;                                          occurring in a field:

  > SELECT REGEXP_SPLIT_TO_TABLE('The quick brown fox', E'\\s+'); ← split string based on regex
                                                                    return each fragment in a row
    │ column │
    │ The    │
    │ quick  │
    │ brown  │
    │ fox    │

      REGEXP_MATCHES(my_string, E'\\y[a-z]{10,}\\y', 'gi')   ← 'gi' flags:
    FROM my_table;                └──────┬───────┘              g: All matches (vs just first match)
                                         │                      i: Case insensitive
                                         └───────────────────── word with 10+ letters
                                            └────────────────── find all words (case─insensitive) in my_string
                                                                with at least 10 letters:
Citus Sharding Extension

Distributed PostgreSQL extension for multi-tenant and
real-time analytics workloads

- cluster of databases with Citus extension (sharding for PostgreSQL).
  Configured High-Availability for the coordinator node.
Wide Column Extension
- cstore_fdw extension is a columnar store extension for analytics
  use cases where data is loaded in batches.
- Cstore_fdw’s columnar nature delivers performance by only reading relevant
  data from disk.
- It may compress data by 6 to 10 times to reduce space requirements 
  for data archive.
TimescaleDB (Time Series) Extension
"Supercharged PostgreSQL"
- Reuse PostgreSQL massive ecosystem.

- 10-100x faster queries than PostgreSQL, InfluxDB, and MongoDB.
  Native optimizations for time-series.
- scale to millions of data points per second per node.
  Horizontally scale to petabytes. Don’t worry about cardinality.

- ask complex questions thanks to Relational+time series integration

- Optionally SaaS on AWS, Azure, or GCP in 75+ regions. 

- 94 - 97% compression rates from best-in-class algorithms and 
  other performance improvements.

- European Space Agency Science Data Center
  switched to PostgreSQL with the TimescaleDB extension for their
  data storage.
  ESDC’s diverse data includes structured, unstructured and time 
  series metrics runningºto hundred of terabytes (per day), andº
 ºquerying requirements across datasets with open source tools.º
  - Cross referencing of datasets was a requirement while choosing 
    a data storage solution, as was the need to be able to use readily
    available, open source tools to analyze the data.
  - PostgreSQL selected for its maturity and support for various data
    types, unstructured data, geo-spatial and time series data,
    native support for JSON and full text search.
  - TS Data analaysis required low write speed requirements but
    queries had to support structured data types, ad-hoc matching 
    between datasets and large datasets of up to hundreds of TBs.
    ... It's unclear which alternative time series databases were evaluated,
    but the team did not opt for any of them as they had standardized on
    SQL as the query language of choice,
    - PostgreSQL 10+ partitioning support attempts to solve the problem of
      keeping large table indexes in memory and writing them to disk on every
      update by splitting tables into smaller partitions.
      Partitioning can also be used to store time series data when the 
      partitioning is done by time, followed by indices on those partitions.
      ESDC's efforts to store time series data ran into performance issues, 
      and they switched to an extension called TimescaleDB.

    - TimescaleDB uses an abstraction called a Hypertable
      to hide partitioning across multiple dimensions like time and space.
      Each hypertable is split into "chunks", and each chunk corresponds
      to a specific time interval. Chunks are sized so that all of the B-tree
      structures for a table's indices can reside in memory during inserts,
      similar to how PostgreSQL does partitioning. 
      Indices are auto-created on time and the partitioning key. 
      Queries can be run against arbitrary dimmensions
      @[] (just like TS ddbb alternatives) 
      - One  differences between TimescaleDB and other partitioning tools like
      pg_partman is support for auto-sizing of partitions.
      - TimescaleDB has reported higher performance benchmarks compared to 
      PostgreSQL 10 partitioning based solutions and InfluxDB
      - there have been concerns about maintainability
        Clustered deployments of TimescaleDB still under development at the time of writing.

PostgreSQL vs MSQL doc. comparative

- MS SQL Server's documentation is all ... unfriendly, sprawling mess,  [humor]
  conservative, humourless, "business appropriate" – i.e. officious, 
  boring and dry.
  Not only does it lack amusing references to the historical role of Catholicism 
  in the development of date arithmetic, it is impenetrably stuffy and 
  hidden behind layers of unnecessary categorisation and ostentatiously 
  capitalised official terms. Try this: go to the product documentation 
  page for MS SQL Server 2012 and try to get from there to something 
  useful. Or try reading this gem (not cherry-picked, I promise):
- A  report part definition is an XML fragment of a report definition 
  file. You create report parts by creating a report definition, and 
  then selecting report items in the report to publish separately as 
  report parts.
- Has the word "report" started to lose its meaning yet?
What's new

BºPostgreSQL 14 (2021-??-??)º
  - SEARCH and CYCLE clauses
    On 1st of February 2021, Peter Eisentraut committed patch:
    SQL standard for recursive queries to be able to do produce 
    "breadth" or "depth-first" search orders and detect cycles.
    These clauses can be rewritten into queries using existing syntax,
    and that is what this patch does in the rewriter.
    Reviewed-by: Vik Fearing ˂˃
    Reviewed-by: Pavel Stehule ˂˃

BºPostgreSQL 12 (2019-10-03)º
  - SQL/JSON (standard) query support allows to perform inequality
    comparisons, arithmetic operations, search over values using
    regular expressions and perform mathematical operations such
    as finding an absolute value ... 
  - improved authentication and administration options. 
  - improved performance by improving its indexing requiring 
    less space and better optimization to deliver faster access.
  - Just-in-time compilation using the LLVM compiler for increased
    query performance.

BºPostgreSQL 10º
  (specifically focused on effectively distribute data across many nodes)
  - Logical Replication: Instead of used adhoc WALs replications, a 
    network protocol is used to broadcast changes. The old method just
    allowed for replication among same ddbb version, while the new
    one is version independent.
    (host 1) CREATE PUBLICATION publication01 FOR TABLE table1, table2; 
    (host 2) CREATE SUBSCRIPTION mysub CONNECTION 'host=mydb user=myuser'
             PUBLICATION publication01;
  - Declarative Partitioning for tables [scalability]
    postgres=# CREATE TABLE Invoices (
      id SERIAL,
      issued_at TIMESTAMP NOT NULL,
      customer_name TEXT NOT NULL,
      amount INTEGER NOT NULL,
      product_bought TEXT NOT NULL
    )Bºpartition by range (issued_at);º 
    postgres=# CREATE TABLE issued_at_y2018m01 PARTITION OF Invoices
               FOR VALUES FROM ('2018-jan-01') to ('2018-jan-31');
    postgres=# CREATE TABLE issued_at_y2018m02 PARTITION OF Invoices
                FOR VALUES FROM ('2018-feb-01') to ('2018-feb-28');
    postgres=# create index on issued_at_y2018m01(issued_at); ← Optional
    postgres=# create index on issued_at_y2018m02(issued_at);  (recommended)

    See also pg_partman: 
    "creating new partitions and maintaining existing ones, including purging
     unwanted partitions, requires a good dose of manual effort, even in 10+"
     pg_partman extension allows to automate this maintenance.
  - improved Query Parallelism.
  - Quorum Commit for Synchronous Replication, 

- pldebugger: ¡¡must-have!! extension for developers who work on stored functions
  written in PL/pgSQL. This extension is well integrated with GUI tools like pgadmin,
  which allows developers to step through their code and debug it.

- plprofiler: profiler for pl. Specially useful during complex migrations from
              proprietary databases,
Hot Stand-By
Crunchy low-code deployments
The Crunchy Container Suite provides Docker containers that enable 
rapid deployment of PostgreSQL, including administration and 
monitoring tools. Multiple styles of deploying PostgreSQL clusters 
are supported.

- Tools to create graphs in PostGIS:

  - osm2pgrouting: (Preinstaled with PostGIS). Creates graphs from OSM cartography.
    Only OpenStreetMap supported (2018).
  - pgr_nodeNetwork: (Beta) creates a graphs from table's data.
  - Combining PostGIS functions:
    - St_Union() : Allows for "dissolve" operation, joining N geometries in one MULTI
    - St_Dump () : Allows to fetch all geometries in a MULTI one. in a data-composed type
                   formed by 'path' attribute (alphanumeric), and  'geom' attribute (geometry).
   CREATE TABLE  street_guide ( id SERIAL PRIMARY KEY, name VARCHAR(25), geom GEOMETRY );
   INSERT INTO street_guide (name, geom) 
     VALUES (‘A’, St_GeomFromText( ‘LINESTRING(
                  -3.6967738 40.4227273, -3.6966589 40.4231664,
                  -3.6965874 40.4234147, -3.6965117 40.4236891,
                  -3.6965125 40.4237212)’
             ) );
   INSERT INTO street_guide (name, geom) 
     VALUES (‘B’, St_GeomFromText( ‘LINESTRING(
                  -3.6955342 40.4236784, -3.6955697 40.4231059,
                  -3.6956075 40.4225342)’
   CREATE TABLE ºstreet_guide_graphº AS
     SELECT (ST_DUMP(St_Union(geom))).geom   ← Use ST_DUMP(ST_UNION(...)) to create graph º*1º
     FROM street_guide                          
   º*1º to group ALL geometries we must ignore any other attributes in ST_UNION (aggregate function)
        or grouping will fail. 
        To recover other attributes in each arc and espacial relationship of type ST_CONTAINS() among 
        streets and arcs must be established.
   ALTER TABLE street_guide_graph ADD COLUMN source INTEGER;
   ALTER TABLE street_guide_graph ADD COLUMN target INTEGER;
    SELECT pgr_createTopology
           (‘street_guide_graph’, 0.00001, ‘geom’,’id’); ← - Create street_guide_graph_vertices_pgr  table
                                                            - updates 'source', 'target' in street_guide_graph.
- This module integrates with SELinux to provide an additional layer of 
  security checking above and beyond what is normally provided by 
  PostgreSQL. From the perspective of SELinux, this module allows 
  PostgreSQL to function as a user-space object manager. Each table or 
  function access initiated by a DML query will be checked against the 
  system security policy. This check is in addition to the usual SQL 
  permissions checking performed by PostgreSQL.

  PostgreSQL supports row-level access, but sepgsql does not.
Logging in PostgreSQL
- log either:
  - all of the statements
  - a few statements based on parameter settings.

  You can log all the DDLs or 
DMLs or any statement running for more than a certain duration to the 
log file when logging_collector is enabled. To avoid write overload 
to the data directory, you may also move your log_directory to a 
different location. Here’s a few important parameters you should 
review when logging activities in your PostgreSQL server:

  - log_connections
  - log_disconnections
  - log_lock_waits
  - log_statement
  - log_min_duration_statement

- Please note that detailed logging takes additional disk space and may 
  impose an important overhead in terms of write IO depending on the 
  activity in your PostgreSQL server. You should be careful when 
  enabling logging and should only do so after understanding the 
  overhead and performance degradation it may cause to your workload.
User Auditing

Bºpgaudit and set_userº:
- Some essential auditing features in PostgreSQL are implemented as 
  extensions, which can be enabled at will on highly secured 
  environments with regulatory requirements.
- pgaudit helps to audit activities like 
  - unauthorized user intentionally obfuscated the DDL or DML, 
    statement passed and sub-statement actually executed 
    will be logged in the PostgreSQL log file.
- set_user  provides a method of privilege escalations. If properly 
  implemented, it provides the highest level of auditing, which allows 
  the monitoring of even SUPERUSER actions.
Percona "TODO"s
Configure HAProxy with PostgreSQL Using Built-in pgsql-check 
(What's new, breaking changes)
Expanding PostgreSQL with Extensions
- Extension Mechanism: 

  =˃ SELECT * FROM pg_available_extensions;     ← check extensions enabled in current ddbb
BºEx. Install postgis extension:º
  $ sudo add-apt-repository ppa:ubuntugis/ppa    ← STEP 1:
  $ sudo apt-get update
  $ sudo apt-get install postgis

  =˃ CREATE EXTENSION postgis;                   ← STEP 2:
  =˃ CREATE EXTENSION postgis_topology;

- mysql_fdw , postgres_fdw : allow PostgreSQL databases to talk to remote
  homogeneous/heterogeneous databases like PostgreSQL and MySQL, MongoDB, etc.

- pg_stat_statements: allows for tracking execution statistics of all
  SQL statements executed by a server. gathered statistics are made
  available via a view "pg_stat_statements".

- pg_repack: Address table fragmentation problems [scalability] [throubleshooting]

- pgaudit: caters with major compliance requirement for many security standards.
       providing detailed session and/or object audit logging via the standard PostgreSQL
       logging facility.

- PostGIS: "arguably the most versatile implementation of the specifications
           of the Open Geospatial Consortium."

- HypoPG: extension for adding support for hypothetical indexes
  that is, without actually adding the index. This helps us to answer questions
  such as “how will the execution plan change if there is an index on column X?”.

- tds_fdw: Another important FDW (foreign data wrapper) extension
  Both Microsoft SQL Server and Sybase uses TDS (Tabular Data Stream) format.

- orafce: there are lot of migrations underway from Oracle to PostgreSQL.
  Incompatible functions in PostgreSQL are often painful
   The “orafce” project implements some of the functions from the Oracle database.
  The functionality was verified on Oracle 10g and the module is useful for
   production work.

- pg_bulkload: Is loading a large volume of data into database in a very efficient
  and faster way a challenge for you?

- wal2json: PostgreSQL has feature related to logical replication built-in.
  Extra information is recorded in WALs which will facilitate logical decoding.
  wal2json is a popular output plugin for logical decoding.
  This can be utilized for different purposes including change data capture.

MongoDB integration
- mongo_fdw extension presents collections from mongodb as tables in PostgreSQL.
  This is a case where the NoSQL world meets the SQL world and features combine.
  systemctl status output:
  /system.slice/postgresql.service CGroup
    ├─7117 /usr/bin/postgres -D /var/lib/pgsql/data -p 5432
    ├─7118 postgres: logger process
    ├─7120 postgres: checkpointer process
    ├─7121 postgres: writer process
    ├─7122 postgres: wal writer process
    ├─7123 postgres: autovacuum launcher process
    └─7124 postgres: stats collector process
  $ ps hf -u postgresq -o cmd
  /usr/pgsql-9.4/bin/postgres -D /var/lib/pgsql/9.4/data
  \_ postgres: logger       process
  \_ postgres: checkpointer process
  \_ postgres: writer       process
  \_ postgres: wal writer   process
  \_ postgres: autovacuum launcher  process
  \_ postgres: stats collector  process
  \_ postgres: postgres pgbench [local] idle in transaction
  \_ postgres: postgres pgbench [local] idle
  \_ postgres: postgres pgbench [local] UPDATE
  \_ postgres: postgres pgbench [local] UPDATE waiting
  \_ postgres: postgres pgbench [local] UPDATE
Deep Dive into psql Statistics

- The Internals of PostgreSQL!!!  @[]

   Extracted from @[]
       Client Backends              |                                                  |Postmaster
      [pg_stat_activity]            |                                                  |[pg_stat_database]
    -----------------------         |                                                  |----------------
       Query Planning               |  Shared Buffers                                  |Background Workers
                                    |  [pg_buffercache]                                |
    -----------------------         |                                                  |------------------
    Query Execution                 |                                                  |Autovacuum Launcher
                                    |                                                  |
    =======================         |                                                  |-------------------
    Indexes IO  | Tables IO         |                                                  |Autovacuum Workers
                |                   |                                                  |[pg_stat_activity]
                |                   |                                                  |[pg_stat_user_tables]
    -----------------------         |                                                  |
        Buffers IO                  |                                                  |
    [pg_stat_database]              |                                                  |
    [pg_statio_all_indexes]         |                                                  |
    [pg_statio_all_tables]          |                                                  |
             Write Ahead Log
    Logger Process                                    |                        Stats Collector
    Logical               | WAL Sender           |Archiver           | Background         |  Checkpointer
    Replication           |   Process            | Process           |  Writer            |   Process
    [pr_replication_slots]| [pg_stat_replication]|[pg_stat_archiver] | [pg_stat_bgwriter] |  [pg_stat_database]
                          |                      |                   |                    |  [pg_stat_bgwriter]
                  NETWORK                            |               STORAGE
                                                     |         [pg_stat_kcache]
        (nicstat, iostat, ...)                       |         (iostat, ...)
         WAL Receiver Process     |                           |  Tables/Indexes Data Files
                                  |                           |  [pg_index_size]     [pgstattupple]
    ------------------------------|                           |  [pg_table_size]
            Recovery Process      |                           |  [pg_database_size]
     [pg_stat_database_conflicts] |                           |
    ==============================+                           +=================================================
pg Utils
pgbench (Benchmarks)
pgbench : Postgresql benchmark (Useful for tunning also)
Explaining PG lantency/ies

YugabyteDB is a high-performance, cloud-native distributed SQL 
database that aims to support all PostgreSQL features. It is best to 
fit for cloud-native OLTP (i.e. real-time, business-critical) 
applications that need absolute data correctness and require at least 
one of the following: scalability, high tolerance to failures, 
globally-distributed deployments.

Introducing YSQL: A PostgreSQL Compatible Distributed SQL API for YugaByte DB
Events Notify/Listen  [TODO]  [TODO]                 [TODO]

Elixir Server Better alternative to standard PostgreSQL's notify: @[] - Elixir server (Phoenix) allows to listen for ddbb changes via websockets. - It works like this: → listens ddbb replication functionality (using Postgres' logical decoding) → convert byte-stream into JSON → broadcasts over websockets. - Advantages over PostgreSQL's NOTIFY? - No need to set up triggers on every table - NOTIFY payload limit. 8000 bytes failing above it. patchy fix: send an ID, then fetch record, consuming extra resources. - Phoenix consumes one connection to the ddbb, even for many clients. - benefits - listening to replication allows to make changes in the DDBB from anywhere (custom API, directly in DB, via console etc,...) - and changes will still be available via websockets. - Decoupling. Ex: send a new slack message every time someone makes a new purchase. - built with Phoenix, an extremely scalable Elixir framework.
Tuning linux for performance

general guidelines :

- Choose correct RAID strategy:
  RAID0 :  more disks, more performance, more risk of data loss.
  RAID1 :  good for medium workloads redundancy.  Good performance for read applications
  RAID5 :  should be used for large workloads with more than 3 disks.

- File systems:
  - ext4: fast and stable.
  - xfs : good performance for many parallel processes.
  - when mounting your disks, be sure to always use the option “noatime”. 
    This makes the file system not update information of files that have been updated,
    information that we do not care about for a database server.
  - Be sure to always put your data folder on a separate disk. 
    Separating certain tables highly read or written on dedicated disks, as well as
    having pg_xlog on another disk can improve things as well.

- I/O scheduler (/sys/block/$DISK_NAME/queue/scheduler)
  - deadline: can offer good response times with mix of read/write loads and maintains a good throughput.
              Choose this wisely depending on your hardware though!
  - noop    : 
  - "anything wanted": 

- Linux readahead:
  Allowing to put a file's content into page cache instead of disk, making reading this file faster when
  it is subsequently accessed. 
  - Can be set with:
  $ blockdev -setra (-getra to read it)
  - On most of distributions it is set on a low value, 256~1024 512blocks /(128kB~512kB).
  - A higher value can really improve sequential scans for large tables,
    as well as index scans on a large number of rows. 
    - Start up with 4096 and see the results! Besides.
    - Be aware that values higher than 16MB (32768 for 512 byte sectors) do not show
     that much performance gain (thanks Dimitri Fontaine for this info!).

- Creating Events from Databases Using Change Data Capture: 
  Gunnar Morling at MicroXchg Berlin

- Objective:
  data → database → sync with : cache
                                search engine.

- Debezium solution:
  - use change data capture (CDC) tool that captures and
    publishes changes in a database.
- Based on Apache Kafka to publish changes as event streams.
  reads the TX logs (append-only) in a database and creates streams of events.
  - Different ddbbs have their own APIs for reading the log files.
    Debezium comes with connectors for several of them, producing
    one generic and abstract event representation for Kafka.
Blockchain on Postgresql
"Hacking" and Morphing Postgresql into a blockchain
Serializable Snapshot Isolation
PostgreSQL [8] is the first open source database to implement the 
abort during commit SSI variant [36]
PSQL "vs" Redis/ElasticSearch/...
sponsored article (

  - PSQL beautifully designed caching system with pages, usage counts,
    and transaction logs avoids disk access for repeated reads.

  -ºshared_buffer configuration parameterºin Postgres configuration file determines
    how much memory it will use for caching data. 
   ºTypically ~ 25% to 40% of  total memoryº.
    (PSQL also uses the OS cache). Change it like: 

  BºALTER SYSTEM SET shared_buffer TO = $valueº

  - Advanced caching tools include:
    - pg_buffercache view: see what's occupying the shared buffer cache.

    -ºpg_prewarmºfunction (base install), enables DBAs to load table data into
      either the OS cache or the PSQL buffer cache. (manual or automated).
    BºIf you know the nature of your queries, it will greatly improve performanceº.

  REF: In-depth guide of PSQL Cache:

BºText searchingº
  -ºtsvector data-typeº plus a set of functions (to_tsvector, to_tsquery, to search, ..)
  Bºtsvector represents a document optimized for text search by sorting terms and º
  Bºnormalizing variants.º Ex:

    SELECT to_tsquery('english', 'The ⅋ Boys ⅋ Girls');
     'boy' ⅋ 'girl'

  - results can be sorted by relevance depending on how often and which fields your query
    appeared in the results. For example, making title more relevant than body,...
    (Check official PSQL doc)

BºData preprocessing with functionsº
  - Try pre-process as much data as possible with server-side functions.
    cutting down latency from passing too much data back and forth between apps and ddbbs.
    - particularly useful for large aggregations and joins.

      CREATE FUNCTION                      ← Ex. PL/Python checking string lengths:
         (string1 string, string2 string)
        RETURNS integer
      AS $$
        if a ˃ b:
          return a
        return b
      $$ LANGUAGE plpythonu;

BºKey-Value Data Typeº
  -ºhstore extensionº allows store/search simple key-value pairs.

BºSemi-structured Data Typesº
  -ºJSON data-typeº: support native JSON binary (JSONB).
    JSONB significantly improves query performance.
    As you can see below, it can :

    SELECT '{"product1": ["blue""], "tags": {"price": 10}}'::json; ← convert JSON strings 
                                                                          to native JSON objects
     {"product1": ["blue"], "tags": {"price": 10}}

BºTips for Scalingº
  - Slow queries: See for missing important indexes.
  - Don't over-index
  - Use tools like EXPLAIN ANALYZE might surprise you by how often
    the query planer actually chooses sequential table scans. 
    Since much of your table's row data is already cached, oftentimes
    these elaborate indexes aren't even used.
  -BºPartial indexes save spaceº:
    CREATE INDEX user_signup_date 
      ON users(signup_date)
        WHERE is_signed_up;      ← We need to order, but only for users signed up

  - Choose correct index type:

    -ºB-tree indexº: used toºsortºdata efficiently. (default one for INDEX).
      - As you scale, inconsistencies can be a larger problem: 
        use 'amcheck' extension periodically.

    -ºBRIN indexesº: Block Range INdex (BRIN) can be used whenºtable is naturallyº
      ºalready sorted by a column where sort is neededº. Ex, log table written
       - setting BRIN index on timestamp column lets the server know that the data
         is already sorted.

    -ºBloom filter indexº: perfect for multi-column queries on big tables whereºyouº
     ºonly need to test for 'sparse' equality against some query valueº. Ex:

      CREATE INDEX i ON t USING bloom(col1, col2, col3);
      SELECT * from t WHERE col1 = 5 AND col2 = 9 AND col3 = ‘x’;

    -ºGIN and GiST indexesº: efficient indexes based onºcomposite values like text,º
     ºarrays, and JSONº.

BºLegitimate needs for another Data Storeº
  - Special data types not supported by PSQL:
    linked list, bitmaps, and HyperLogLog functions in Redis.
    Ex:  Frequency capping means displaying each web visitant (millions of them daily) 
         a given ad just once per day.
         Redis HyperLogLog data type is perfect. It approximates set membership with
         a very small error rate, in exchange for O(1) time and a very small memory
         PFADD adds an element to a HyperLogLog set. It returns 1 if your element is
         not in the set already, and 0 if it is in the set.

         PFADD user_ids uid1
         (integer) 1
         PFADD user_ids uid2
         (integer) 1
         PFADD user_ids uid1
         (integer) 0
  - Heavy real-time processing:
    - many pub-sub events, jobs, and dozens of workers to coordinate.
      Apache Kafka is preferred.

  - Instant full-text searching:
    real-time application under heavy load withºmore than ten searches going on at a timeº,
    needed features like autocomplete, ... Elasticsearch can be a better option.
Global Index Advisor
pg qualstats 2: Global index advisor

- Coming up with good index suggestion can be a complex task. 
  It requires knowledge of both apps queries and ddbb specificities.
-  pg_qualstats extension can provide pretty good index suggestion.
  Pre-setup: - install and configure PoWA.
- pg_qualstats version 2  removes the presetup.
Transparent Huge Pages (THP) problems
From  @[]
TODO: Does the same problem applies to PSQL?
Transparent Huge Pages (THP) is a Linux memory management system that reduces
the overhead of Translation Lookaside Buffer (TLB) lookups on machines with
large amounts of memory by using larger memory pages.

However, database workloads often perform poorly with THP enabled, because they
tend to have sparse rather than contiguous memory access patterns. When running
MongoDB on Linux, THP should be disabled for best performance.
scheduled backups in k8s

When I've given various talks on PostgreSQL and ask the question "do you take
regular backups of your production systems," I don't see as many hands raised
as I would like (and I'll also use this as an opportunity to say that having a
replica is not a backup). However, if you are running PostgreSQL on Kubernetes
ºusing the PostgreSQL Operatorº, with a few commands, the answer to this question
is "Yes!"
This is a postgres client that does auto-completion and syntax highlighting.
Home Page:
MySQL Equivalent:
- DALIBO is the leading PostgreSQL company in France, providing services, training and support since 2005.

- PEV2: Visualize your Explain plans
- temBoard: Monitor, optimize and configure multiple PostgreSQL instances
- Anomyzation and Data Masking for PostgreSQL
- Dramatiq-pg: PSQL Broker for Dramatiq task queue
- E-Maj: Track updates on table sets with rollback capabilities
- ldap2pg: Manage PostgreSQL roles and privileges from YAML or LDAP.
- pg_activity: "top like" application for PSQL server activity.
- Pitrery: PSQL Point-in-Time Recovery made easy
- sqlserver2pgsql: MS SQL Server to PostgreSQL converter.
BLOB cleanup 
by Hans-Jürgen Schönig

- bytea : simplest form to use binary data. up to Bº1 GB per fieldº!!.
          The binary field is seen as part of a row. Ex:
          # CREATE TABLE t_image (id int, name text,ºimage byteaº);
          # SHOW bytea_output;
            hex            ← encoding. Send the data in hex format.(vs "escape" == octal string).
            (1 row)

- BLOBs : More versatile. Ex:
          # SELECT lo_import('/etc/hosts');  ← /etc/hosts imported (copy -vs link- of data)
          80343        ← new OID (object ID) of the new entry. 
          (1 row)
          - To keep trace of new OIDs we can do something like:
          # CREATE TABLE t_file ( id int, name text, object_id oid);

          # INSERT INTO t_file VALUES
            (1, 'some_name', lo_import('/etc/hosts'))

          # DELETE FROM t_file WHERE id = 1;
          RºPROBLEMº: object id has been "forgotten".
            but it is still there.
           º'pg_largeobject'ºis theºsystem tableºin charge
            of storing the binary data inside PostgreSQL. 
            All lo_* functions "talk" to this system table
          # \x ← Expanded display is on
          test=# SELECT * FROM pg_largeobject WHERE loid = 80350;
          ─[ RECORD 1 ]──────────────────────────────────────────
          loid   | 80350       ← RºDead objectº
          pageno | 0
          data   | ##\012# Host Database\012#\012# localhost ...

          # SELECT lo_unlink(80350); BºCorrect way to clean BLOBº
          # SELECT * FROM pg_largeobject  ← Recheck
            WHERE loid = 80350;
          (0 rows)                        ← OK

   -BºAutomated cleaning of dead large objectsº
    $º$ vacuumlo -h localhost -v test          º
    $º...                                      º
    $ºSuccessfully removed 2 large objects ... º

    Other functions provided by PSQL for large objects include:
   # \df lo_*
                                   List of functions
      Schema   |   Name        | Result data type | Argument data types       | Type
    pg_catalog | lo_close      | integer          | integer                   | func
    pg_catalog | lo_creat      | oid              | integer                   | func
    pg_catalog | lo_create     | oid              | oid                       | func
    pg_catalog | lo_export     | integer          | oid, text                 | func
    pg_catalog | lo_from_bytea | oid              | oid, bytea                | func
    pg_catalog | lo_get        | bytea            | oid                       | func
    pg_catalog | lo_get        | bytea            | oid, bigint, integer      | func
    pg_catalog | lo_import     | oid              | text                      | func
    pg_catalog | lo_import     | oid              | text, oid                 | func
    pg_catalog | lo_lseek      | integer          | integer, integer, integer | func
    pg_catalog | lo_lseek64    | bigint           | integer, bigint, integer  | func
    pg_catalog | lo_open       | integer          | oid, integer              | func
    pg_catalog | lo_put        | void             | oid, bigint, bytea        | func
    pg_catalog | lo_tell       | integer          | integer                   | func
    pg_catalog | lo_tell64     | bigint           | integer                   | func
    pg_catalog | lo_truncate   | integer          | integer, integer          | func
    pg_catalog | lo_truncate64 | integer          | integer, bigint           | func
    pg_catalog | lo_unlink     | integer          | oid                       | func
   (18 rows)
    pg_catalog | loread        | bytea            | integer, integer          | func
    pg_catalog | lowrite       | integer          | integer, bytea            | func

  - BLOB interface is fully transactional: binary content and metadata cannot go 
    out of sync anymore.
Performance metrics
- Database Connection Parameters

  - If number of active connections running concurrently is high
    it can be convinient to terminate sessions in idle state slowing down the server.

  - monitoring replicas to check they are not out of sync.

- Throughput Parameters:
  - It is a composite of I/O speed, CPU speed, parallel capabilities of the machine,
    and efficiency of the OS.

- Locks Parameters:
  pg_locks displays locks granted and processes waiting for locks.

- Resource Parameters:
  - Disk and Index Usage
  - Memory Parameters
  - WAL Parameters

Items       Parameters              Table/Query 

Connections Max. number             SELECT count(*)
            of connections           FROM pg_stat_activity;
            DB Conflict count       pg_stat_database_conflicts

            commits number          xact_commit in pg_stat_database View

            sessions number         count of session from
                                    pg_stat_user_functions View

            Vacuums                 last_vacuum 
                                    autoanalyze_count from 
                                      pg_stat_all_tables View

            Checkpoints and         pg_stat_bgwriter shows 
            bgwriter statistics     metrics to flush data in 
                                    memory (buffers) to 
                                    disk. It can do this in 
                                    3 different ways

                                    buffers_checkpoint: # buffers written during 
                                    buffers_clean : # buffers written by background
                                    buffers_backend: # scheduled checkpoints performed
                                    buffers_alloc : Total  allocated buffer #

Replication    Hosts with           SELECT 
               replication delay      write_location - sent_location AS write_lag,
                                     flush_location - write_location AS flush_lag,
                                     replay_location - flush_location AS replay_lag
                                     FROM pg_stat_replication;

               Replication lag       SELECT 
               in bytes                pg_current_wal_lsn() - confirmed_flush_lsn
                                     FROM pg_replication_slots;

               Lag in seconds        SELECT 
                                        FROM (
                                          now() - pg_last_xact_replay_timestamp())
                                       ) as lag

               Checkpoints           checkpoints_req  : checkpoint # requested 
                                     checkpoints_timed: checkpoint # scheduled

               Status of physical    pg_stat_replication

               Inactive replication  SELECT count(*)
               slots                  FROM pg_replication_slots
                                      WHERE NOT active;

               Replica info          SELECT 
                                     FROM pg_stat_replication

Throughput     Sequential scans vs   seq_scan 
               index scans           seq_tup_read 
                                     idx_tup_fetch from
                                       pg_stat_all_tables View

               Top Function calls    SELECT backend_xid
                                      FROM pg_stat_activity


               running backend #     SELECT count(*)
                                      FROM pg_stat_activity;

Locks          Locks by lock mode    lock from the pg_locks view

               Deadlocks/database    Deadlocks from
                                      pg_stat_database View

               Backend waiting       SELECT count(*)
                on locks              FROM pg_stat_activity
                                      WHERE wait_event = 'Lock';

               Backend idle in       SELECT count(*)
                transactions          FROM pg_stat_activity
                                      WHERE state = 'idle in transaction';

               Session holding or   SELECT *
                awaiting each lock  FROM pg_locks pl
                                      LEFT JOIN pg_stat_activity psa
                                      ON =;

Resource       Tables with most     heap_blks_read from 
Utilization    disk usage            pg_statio_all_tables View
               Tables with most     n_live_tup from
                live rows            pg_stat_all_tables View

               Most frequent        Idx_scan from
                scanned index        pg_stat_all_tables?

               dead rows            higher number of dead rows
                                     (n_dead_tup in the
                                     pg_stat_user_tables view)

               Temp bytes           temp_bytes from 
                                      pg_stat_database View


               Active user count or pg_stat_activity view will
               current activity per have one row per server process

               DB commits           xact_commit from 
                                      pg_stat_database View

               Live and Dead        SELECT schemaname, relname, 
               tuples               n_live_tup, n_dead_tup, 
                                    FROM pg_stat_all_tables 
                                    ORDER BY 
                                      n_dead_tup/(n_live_tup * 
                                      cuum_scale_factor')::float8 + 

               Local block info       SELECT t2.rolname, 
                                      t3.datname, queryid, calls, 
                                      total_time / 1000 as 
                                      total_time_seconds, min_time 
                                      / 1000 as min_time_seconds, 
                                      max_time / 1000 as 
                                      max_time_seconds, mean_time / 
                                      1000 as mean_time_seconds, 
                                      stddev_time / 1000 as 
                                      stddev_time_seconds, rows, 
                                      blk_read_time / 1000 as 
                                      blk_write_time / 1000 as 
                                      blk_write_time_seconds FROM 
                                      pg_stat_statements t1 JOIN 
                                      pg_roles t2 ON 
                                      (t1.userid=t2.oid) JOIN 
                                      pg_database t3 ON 
                                      (t1.dbid=t3.oid) WHERE 
                                      t2.rolname != 'rdsadmin'

WAL Buffers    Shared/WAL/CLOG        SELECT current_database() 
               Checkpoint buffers       datname, schemaname, relname, 
                                        heap_blks_read, heap_blks_hit,
                                        idx_blks_read, idx_blks_hit, 
                                        toast_blks_read, toast_blks_hit,
                                        tidx_blks_read, tidx_blks_hit 
                                      FROM pg_statio_user_tables

               Database cache         blks_hit from 
               usage ratio:             pg_stat_database View
               SUM (blks_hit)/
                  SUM (blks_read).

               WAL count ready        SELECT count(*)
                to be archieved        FROM pg_ls_dir('pg_xlog/archive_status')
                                       WHERE pg_ls_dir ~ '^[0-9A-F]{24}.ready$';

               Disk Space by DDBB     SELECT 
                                         as size_bytes
                                      FROM pg_database

Bºworking python script  @ Github:º
- framework for developing PostgreSQL extensions in Rust and 
  strives to be as idiomatic and safe as possible.
- performance dashboard for Postgres