Intro
Ext. Links 
Manuals
PostgreSQL internals (Hironobu SUZUKI)
ENV.VARS
PGPORT
PGUSER  (alt. -U option)
Show ddbbs, tables,
schemas, permissions,..
mydb=˃ SELECT version();
mydb=˂˃ SLECT current_date;
psql commands:
\?
\h
\l            list databases
\c db_name    connect to ddbb
\q            quit
\dt           show all tables in ddbb
\dt º.º.      show all tables globally
\d  table     show table schema
\d+ table                      
\du           list current user's permissions
\u
SELECT current_database();

SELECT rolname       list users
   FROM pg_roles;
Instance "layout"
 PostgresSQL instance:  Server 1←→1 DDBB Cluster  1←→N Catalog(Database) 1←→N Schemas
  ├─ Databases
  │  │ 
  │  ├─ postgres                │ SCHEMA COMMANDS
  │  │   │                      │ ºlist schemasº
  │  │   ├─ Casts               │ \dn
  │  │   │                      │ SELECT schema_name FROM information_schema.schemata;
  │  │   ├─ Catalogs *1         │ SELECT nspname     FROM pg_catalog.pg_namespace;
  │  │   │                      │ 
  │  │   ├─ Event Triggers      │ ºcreate schemaº
  │  │   │                      │ @[http://www.postgresql.org/docs/current/static/sql-createschema.html]
  │  │   ├─ Extensions          │ CREATE SCHEMA IF NOT EXISTS ;
  │  │   │                      │ 
  │  │   ├─ Foreing Data Wrap   │ ºdrop schemaº
  │  │   │                      │ @[http://www.postgresql.org/docs/current/static/sql-dropschema.html]
  │  │   ├─ Languages           │ DROP SCHEMA IF EXISTS  CASCADE;
  │  │   │
  │  │   └─ Schemas
  │  │
  │  ├─ myDDBB01
  │
  ├─ Login/Group @[https://www.postgresql.org/docs/10/static/user-manag.html]
  │
  └─ Tablespaces

Difference between catalog and schema
@[https://stackoverflow.com/questions/7022755/whats-the-difference-between-a-catalog-and-a-schema-in-a-relational-database]
*1 So in both Postgres and the SQL Standard we have this containment hierarchy:
@[https://stackoverflow.com/questions/7022755/whats-the-difference-between-a-catalog-and-a-schema-in-a-relational-database]
 - A computer may have one cluster or multiple.
 - A database server is a cluster.
 - A cluster has catalogs. ( Catalog = Database )
 - Catalogs have schemas. (Schema = namespace of tables, and security boundary)
 - Schemas have tables.
 - Tables have rows.
 - Rows have values, defined by columns.
 - Those values are the business data your apps and users care about such as 
   person's name, invoice due date, product price, gamer’s high score. 
   The column defines the data type of the values (text, date, number, and so on).
Create DB Cluster (initdb)
Init storage area 
(data directory in FS terms) 
Server 1 → N Databases 
OS admin → shell:º$ sudo su postgres º
                 º$ initdb -D /usr/local/pgsql/dataº
                  alternatively
                  º# pg_ctl -D /usr/local/pgsql/data initdbº
                  ('postgres' and 'template1' ddbbs will be created inside)

                  This is the database/catalog cluster:
                  Collection of DDBB managed by a single instance

  Starting the Server
OS admin → shell: º# su postgres -c 'postgres -D /usr/local/pgsql/data 1˃/var/log/postgresql/logfile 2˃⅋1 ⅋'º
                   alternatively (using pg_ctl "easy" wrapper)
                   º# su postgres -c 'postgres@~$ pg_ctrl start -l /var/log/postgresql/logfile'º
                   alternatively (PostgreSQL systemd enabled)
                   º$ sudo systemctl enable postgresql.service'º
                   º$ sudo systemctl start postgresql.service'º
                   º$ sudo journalctl --unit postgresql.service'º

CRUD Users
$ sudo su postgres # change to postgres user
$ psql
#- CREATE USER IF NOT EXISTS my_user_login WITH PASSWORD 'my_user_password';
#- ALTER USER my_user_login WITH PASSWORD 'my_new_password';
#- DROP USER IF EXISTS my_user_login;
# TODO: switch to given ddbb

Granting privileges to users
GRANT ALL PRIVILEGES ON table TO my_user_login;

-- grant all permissions to ddbb
GRANT ALL PRIVILEGES ON DATABASE my_db_name TO my_user_name;

-- grant connection permissions on database
GRANT CONNECT ON DATABASE my_db_name TO my_user_name;

-- grant permissions on schema
GRANT USAGE ON SCHEMA public TO my_user_name;

-- grant permissions to functions
GRANT EXECUTE ON ALL FUNCTIONS IN SCHEMA public TO my_user_name;

-- grant permissions to select, update, insert, delete, on a all tables
GRANT SELECT, UPDATE, INSERT ON ALL TABLES IN SCHEMA public TO my_user_name

-- grant permissions, on a table
GRANT SELECT, UPDATE, INSERT ON table_name TO my_user_name;

-- grant permissions, to select, on a table
GRANT SELECT ON ALL TABLES IN SCHEMA public TO my_user_name;

Create new DDBB
"admin" → PostgreSQL: create user account
("admin" ussually match postgres OS user)
...
user → shell: $ createdb mydb

Many tools assume ddbb names == (-U,PGUSER) username by default
Drop existing DDBB:
user → shell: $ dropdb mydb # warn can NOT be undon

Console access:
user → shell: $ psql mydb

CREATE TABLE IF NOT EXISTS example
-- DO $$
-- BEGIN
--   EXECUTE 'ALTER DATABASE ' || current_database() || ' SET TIMEZONE TO UTC';
-- END; $$;

CREATE TABLE IF NOT EXISTS devices (
  ID         VARCHAR(40)                NOT NULL CONSTRAINT DEVICE_PK PRIMARY KEY,
  NAME       VARCHAR(255)               NOT NULL,
  CREATED_AT TIMESTAMP    DEFAULT NOW() NOT NULL,
  UPDATED_AT TIMESTAMP    DEFAULT NOW() NOT NULL,
  OWNER_ID   VARCHAR(40)                NOT NULL,
- - PUB_KEY NOT YET USED. Can be used to cipher message and decipher using device private key
  PUB_KEY    VARCHAR(88)                NOT NULL UNIQUE
  PARENT_ID  VARCHAR(40)                NULL CONSTRAINT DEVICE_FK3 REFERENCES devices (ID),
);
CREATE IF NOT EXISTS INDEX DEVICE_PUB_KEY_IDX ON devices   (PUB_KEY);
CREATE IF NOT EXISTS INDEX OWNER_IDX          ON devices   (OWNER_ID);

-- (Alt) ALTER TABLE tableName ADD PRIMARY KEY (id);
-- (Alt) CREATE UNIQUE INDEX indexName ON tableName (columnNames);
Client Authentication
@[https://www.postgresql.org/docs/10/static/client-authentication.html]
pg_hba.conf
(AAA+Encryption)
@[https://www.postgresql.org/docs/devel/static/auth-pg-hba-conf.html]
- the default client authentication setup allows any local user
   to connect to the database and even become the database superuser.
   If you do not trust them:
   1 - use one initdb -W, --pwprompt or --pwfile options 
       to assign a password to the database superuser.
   2 - also, specify -A md5 or -A password so that the 
       default trust authentication mode is not used; 
       or modify the generated ºpg_hba.confº file after running initdb
       , but before you start the server for the first time.

- TSL 
   allows both the client and server to provide SSL certificates to each other. 

- Encryp. Options 
  - The pg_hba.conf file allows administrators to specify which hosts can use
    non-encrypted connections (host) and which require SSL-encrypted 
    connections (hostssl). Also, clients can specify that they connect to 
    servers only via SSL.
  
  - The pgcrypto module allows certain fields to be stored encrypted. This 
    is useful if only some of the data is sensitive. The client supplies the 
    decryption key and the data is decrypted on the server and then sent to the client.
Backup/Restore
ºBACKUPº                                         ºRESTOREº
# backup ddbb                                    $ pg_restore -d ddbb_name               -a backup.sql
$ pg_dump ${dbName} ˃ dbName.sql                 $ pg_restore -d ddbb_name --data-only   -a backup.sql
# backup ddbb , only data                        $ pg_restore -d ddbb_name --schema-only -a backup.sql
$ pg_dump --data-only ${dbName} ˃ dbName.sql
# backup ddbb , only schema
$ pg_dump --schema-only ${dbName} ˃ dbName.sql

# backup all ddbb
$ pg_dumpall ˃ pgbackup.sql
  EXPORT/Import (COPY) file
\copy table_name            TO   '/home/user/weather.csv' CSV
\copy table_name(col1,col2) TO   '/home/user/weather.csv' CSV

\copy table_name            FROM '/home/user/weather.csv' CSV
\copy table_name(col1,col2) FROM   '/home/user/weather.csv' CSV
Table Maintenance
-- VACUUM
VACUUM ANALYZE table;

-- Reindex a database, table or index
REINDEX DATABASE dbName;

-- Show query plan

EXPLAIN SELECT * FROM table;
  Rotate logs
postgresql.conf
- Config Settings for logs, buffers, ...
  - Re-read changes with ºSIGHUPºsignal or $ºpg_ctl reloadº
pg_file_settings
@[https://www.postgresql.org/docs/10/static/config-setting.html]
- Server instance configuration
- can be used to debug or pre-test changes in conf.
pgbadger
- @[https://www.dalibo.org/_media/pgconf.eu.2013.conferences-pgbadger_v.4.pdf]
- log analyzer with detailed reports from PSQL log files 
  (syslog, stderr or csvlog) with in-browser zoomable graph
- designed to parse huge log files as well as gzip compressed file
Upgrading the server
- @[https://www.postgresql.org/docs/10/static/upgrading.html"]
  (pg_dumpall, pg_upgrade, replication)
Tunning OS
(Shared Memory/Semaphores/...):
- Show all runtime parameters: 
 º#- SHOW ALL;º
Performance Optimization
- Enable autovacuum. The working memory for autovacuum should be no more than 2% of
    the total available memory.
- Enable database caching with an effective cache size between 6% and 8% of the total
    available memory.
- To increase performance, the working memory should be at least 10% of the total
    available
  Set SERVER_ENCODING , LC_COLLATE and LC_CTYPE as :
    server_encoding = UTF8
    lc_collate = en_US.UTF-8
    lc_ctype = en_US.UTF-8
PSQL vs MSQL comparative
(for data analytics)
  - A data analytics platform which cannot handle CSV robustly is a broken, useless liability
  - PostgreSQL's CSV support is top notch. The COPY TO and COPY FROM commands support the spec outlined in
     RFC4180 (which is the closest thing there is to an official CSV standard) as well as a multitude of common
    and not-so-common variants and dialects. These commands are fast and robust. When an error occurs, they give
    helpful error messages. Importantly, they will not silently corrupt, misunderstand or alter data. If PostgreSQL 
    says your import worked, then it worked properly. The slightest whiff of a problem and it abandons the import
    and throws a helpful error message.
  
  - 
PostgreSQL                     | MS SQL Server:
DROP TABLE IF EXISTS my_table; | 
                               | IF OBJECT_ID (N'dbo.my_table', N'U') IS NOT NULL
                               | DROP TABLE dbo.my_table;
  
  - PostgreSQL supports DROP SCHEMA CASCADE, which drops a schema and all the database objects inside it. This is very, very important for a robust analytics delivery methodology, where tear-down-and-rebuild is the underlying principle of repeatable, auditable, collaborative analytics work.
  - 
PostgreSQL                  |  MS SQL Server
CREATE TABLE good_films AS  |  SELECT
SELECT                      |    *
  *                         |  INTO
FROM                        |    good_films
  all_films                 |  FROM
WHERE                       |    all_films
  imdb_rating ˃= 8;         |  WHERE
  
  - In PostgreSQL, you can execute as many SQL statements as you like in one 
  batch; as long as you've ended each statement with a semicolon, you can 
  execute whatever combination of statements you like. For executing 
  automated batch processes or repeatable data builds or output tasks, this 
  is critically important functionality.
  - PostgreSQL supports the RETURNING clause, allowing UPDATE, INSERT and 
  DELETE statements to return values from affected rows. This is elegant and 
  useful. MS SQL Server has the OUTPUT clause, which requires a separate 
  table variable definition to function. This is clunky and inconvenient and 
  forces a programmer to create and maintain unnecessary boilerplate code.
  - PostgreSQL supports $$ string quoting, like so:
SELECT $$Hello, World$$ AS greeting;
    This is extremely useful for generating dynamic SQL because (a) it allows 
  the user to avoid tedious and unreliable manual quoting and escaping when 
  literal strings are nested and (b) since text editors and IDEs tend not to 
  recogniise $$ as a string delimiter, syntax highlighting remains functional 
  even in dynamic SQL code.
  - PostgreSQL lets you use procedural languages simply by submitting code to 
  the database engine; you write procedural code in Python or Perl or R or 
  JavaScript or any of the other supported languages (see below) right next 
  to your SQL, in the same script. This is convenient, quick, maintainable, 
  easy to review, easy to reuse and so on.
  - "Pure" declarative SQL is good at what it was designed for – relational 
  data manipulation and querying. You quickly reach its limits if you try to 
  use it for more involved analytical processes, such as complex interest 
  calculations, time series analysis and general algorithm design. SQL 
  database providers know this, so almost all SQL databases implement some 
  kind of procedural language. This allows a database user to write imperative
  - style code for more complex or fiddly tasks.
  - PostgreSQL's procedural language support is exceptional:
    - PL/PGSQL: this is PostgreSQL's native procedural language. It's like Oracle's PL/SQL, but more modern and feature-complete.
    - PL/V8: the V8 JavaScript engine from Google Chrome is available in PostgreSQL.
      Even better, PL/V8 supports global (i.e. cross-function call) state, 
      allowing the user to selectively cache data in RAM for fast random access.
      Suppose you need to use 100,000 rows of data from table A on each of 1,000,000 
      rows of data from table B. In traditional SQL, you either need to join 
      these tables (resulting in a 100bn row intermediate table, which will 
      kill any but the most immense server) or do something akin to a scalar 
      subquery (or, worse, cursor-based nested loops), resulting in crippling 
      I/O load if the query planner doesn't read your intentions properly. 
     º In PL/V8 you simply cache table A in memory and run a function on each        º
     ºof the rows of table B – in effect giving you RAM-quality access (             º
     ºnegligible latency and random access penalty; no non-volatile I/O load)        º
     ºto the 100k-row table. I did this on a real piece of work recently – my        º
     ºPostgreSQL/PLV8 code was about 80 times faster than the MS T-SQL solution      º
     ºand the code was much smaller and more maintainable. Because it took about     º
     º23 seconds instead of half an hour to run, I was able to run 20 run-test-modifyº
     ºcycles in an hour, resulting in feature-complete, properly tested, bug-free    º
     ºcode.                                                                          º
      (All those run-test-modify cycles were only possible because of DROP SCHEMA CASCADE
       and freedom to execute CREATE FUNCTION statements in the middle of a statement
       batch, as explained above. See how nicely it all fits together?)
    - PL/Python: Fancy running a SVM from scikit-learn or some arbitrary-precision arithmetic provided by gmpy2 in the middle of a SQL query? No problem!
    - PL/R
    - C: doesn't quite belong in this list because you have to compile it 
      separately, but it's worth a mention.  In PostgreSQL it is trivially easy 
      to create functions which execute compiled, optimised C (or C++ or assembler)
      in the database backend.
  - In PostgreSQL, custom aggregates are convenient and simple to use, resulting in fast problem-solving and maintainable code:
CREATE FUNCTION interest_sfunc(state JSON, movement FLOAT, rate FLOAT, dt DATE) RETURNS JSON AS
$$
state.balance += movement;  //payments into/withdrawals from account
if (0 === dt.getUTCDate()) //compound interest on 1st of every month
{
  state.balance += state.accrual;
  state.accrual = 0;
}
state.accrual += state.balance * rate;
return state;
$$ LANGUAGE plv8;

CREATE AGGREGATE interest(FLOAT, FLOAT, DATE)
(
  SFUNC=interest_sfunc,
  STYPE=JSON,
  INITCOND='{"balance": 0, "accrual": 0}'
);

--assume accounts table has customer ID, date, interest rate and account movement for each day
CREATE TABLE cust_balances AS
SELECT
  cust_id,
  (interest(movement, rate, dt ORDER BY dt)-˃˃'balance')::FLOAT AS balance
FROM
  accounts
GROUP BY
  cust_id;
Elegant, eh? A custom aggregate is specified in terms of an internal state 
  and a way to modify that state when we push new values into the aggregate 
  function. In this case we start each customer off with zero balance and no 
  interest accrued, and on each day we accrue interest appropriately and 
  account for payments and withdrawals. We compound the interest on the 1st 
  of every month. Notice that the aggregate accepts an ORDER BY clause (since
  , unlike SUM, MAX and MIN, this aggregate is order-dependent) and 
  PostgreSQL provides operators for extracting values from JSON objects. So, 
  in 28 lines of code we've created the framework for monthly compounding 
  interest on bank accounts and used it to calculate final balances. If 
  features are to be added to the methodology (e.g. interest rate 
  modifications depending on debit/credit balance, detection of exceptional 
  circumstances), it's all right there in the transition function and is 
  written in an appropriate language for implementing complex logic. (Tragic 
  side-note: I have seen large organisations spend tens of thousands of 
  pounds over weeks of work trying to achieve the same thing using poorer tools.)
- Date/Time
    - PostgreSQL: you get DATE, TIME, TIMESTAMP and TIMESTAMP WITH TIME ZONE, 
  all of which do exactly what you would expect. They also have fantastic 
  range and precision, supporting microsecond resolution from the 5th 
  millennium BC to almost 300 millennia in the future. They accept input in a 
  wide variety of formats and the last one has full support for time zones
    - They can be converted to and from Unix time, which is very important 
      for interoperability with other systems.
    - They also support the INTERVAL type, which is so useful it has its own 
      section right after this one.
        SELECT to_char('2001-02-03'::DATE, 'FMDay DD Mon YYYY');  --this produces the string "Saturday 03 Feb 2001"
        and, going in the other direction,
        SELECT to_timestamp('Saturday 03 Feb 2001', 'FMDay DD Mon YYYY');  --this produces the timestamp value 2001-02-03 00:00:00+00
    - PostgreSQL: the INTERVAL type represents a period of time, such as "30 
  microseconds" or "50 years". It can also be negative, which may seem 
  counterintuitive until you remember that the word "ago" exists. PostgreSQL 
  also knows about "ago", in fact, and will accept strings like '1 day ago' 
  as interval values (this will be internally represented as an interval of -
  1 days). Interval values let you do intuitive date arithmetic and store 
  time durations as first-class data values. They work exactly as you expect 
  and can be freely casted and converted to and from anything which makes sense
  - PostgreSQL arrays are supported as a first-class data type
    - eaning fields in tables, variables in PL/PGSQL, parameters to functions 
      and so on can be arrays. Arrays can contain any data type you like, 
      including other arrays. This is very, very useful. Here are some of the 
      things you can do with arrays:
            
    - Store the results of function calls with arbitrarily-many return values, such as regex matches
    - Represent a string as integer word IDs, for use in fast text matching algorithms
    - Aggregation of multiple data values across groups, for efficient cross-tabulation
    - Perform row operations using multiple data values without the expense of a join
    - Accurately and semantically represent array data from other applications in your tool stack
    - Feed array data to other applications in your tool stack
  - PostgreSQL: full support for JSON, including a large set of utility functions for
     transforming between JSON types and tables (in both directions)
  - PostgreSQL: HSTORE is a PostgreSQL extension which implements a fast key-value store as a data type.
     Like arrays, this is very useful because virtually every high-level programming language has such
     a concept (associative arrays, dicts, std::map ...)
     There are also some fun unexpected uses of such a data type. A colleague recently asked me if there was a good way to deduplicate a text array. Here's what I came up with:
        SELECT akeys(hstore(my_array, my_array)) FROM my_table;
     i.e. put the array into both the keys and values of an HSTORE, forcing a 
     dedupe to take place (since key values are unique) then retrieve the keys 
     from the HSTORE. There's that PostgreSQL versatility again.
  
  - PostgreSQL:range types.
    Every database programmer has seen fields called start_date and end_date, 
    and most of them have had to implement logic to detect overlaps. Some have even found, the hard way,
    that joins to ranges using BETWEEN can go horribly wrong, for a number of reasons.
    PostgreSQL's approach is to treat time ranges as first-class data types. Not only can you put a
    range of time (or INTs or NUMERICs or whatever) into a single data value, you can use a host of
    built-in operators to manipulate and query ranges safely and quickly. You can even apply specially-developed
    indices to them to massively accelerate queries that use these operators. 
  - PostgreSQL: NUMERIC (and DECIMAL - they're symonyms) is near-as-dammit arbitrary 
    precision: it supports 131,072 digits before the decimal point and 16,383 digits after the decimal point.
  - PostgreSQL: XML/Xpath querying is supported
  - PostgreSQL's logs, by default, are all in one place. By changing a couple of settings in a text file,
  you can get it to log to CSV (and since we're talking about PostgreSQL, it's proper CSV, not broken CSV).
  You can easily set the logging level anywhere from "don't bother logging anything" to "full profiling and debugging output".
  The documentation even contains DDL for a table into which the CSV-format logs can be conveniently imported.
   You can also log to stderr or the system log or to the Windows event log (provided you're running PostgreSQL in Windows, of course).  
  The logs themselves are human-readable and machine-readable and contain data likely to be of great value to a sysadmin.
  Who logged in and out, at what times, and from where? Which queries are being run and by whom?
  How long are they taking? How many queries are submitted in each batch? Because the data is well-formatted CSV,
  it is trivially easy to visualise or analyse it in R or PostgreSQL itself or Python's matplotlib or whatever you like.
  - PostgreSQL comes with a set of extensions called contrib modules. There are libraries of functions,
    types and utilities for doing certain useful things which don't quite fall 
    into the core feature set of the server. There are libraries for fuzzy 
    string matching, fast integer array handling, external database connectivity,
    cryptography, UUID generation, tree data types and loads, loads more. A few
    of the modules don't even do anything except provide templates to allow 
    developers and advanced users to develop their own extensions and custom functionality.
Non classified
Liquibase
@[http://www.liquibase.org/]
Source control for the DDBB schema.
- Supports code branching and merging
- Supports multiple developers
- Supports multiple database types
- Supports XML, YAML, JSON and SQL formats
- Supports context-dependent logic
- Cluster-safe database upgrades
- Generate Database change documentation
- Generate Database "diffs"
- Run through your build process, embedded in your application or on demand
- Automatically generate SQL scripts for DBA code review
- Does not require a live database connection
AWS Serverless PostgreSQL
@[https://aws.amazon.com/blogs/aws/amazon-aurora-postgresql-serverless-now-generally-available/]

- "serverless" relational database service (RDS) in AWS Aurora.
- Automatically starts, scales, and shuts down database capacity 
- per-second billing for applications with less predictable usage patterns.

- It's a *different implementation of the standard versions of these open-source databases.

From the RDS console:
- select the Amazon Aurora database engine PostgreSQL 
- set new DB cluster identifier, specification of credentials, 
- set capacity: 
  - minimum and maximum capacity units for their database, in terms of Aurora Capacity Units (ACUs)
    – a combination of processing and memory capacity. Besides defining the ACUs, users can also
      determine when compute power should stop after a certain amount of idle time.

ºhow capacity settings will work once the database is availableº
- client apps transparently connect to a proxy fleet
  that routes the workload to a pool of resources that 
  are automatically scaled.
- Scaling is very fast because resources are "warm" and
  ready to be added to serve your requests.

- Minimum storage: 10GB, will automatically grow up to 64 TB
 (based on the database usage) in 10GB increments 
 ºwith no impact to database performanceº

ºpricing modelsº
- On-Demand Instance Pricing: pay by hour, no long-term commitments
- Reserved  Instance Pricing: steady-state database workloads 

ºCostº
@[https://aws.amazon.com/rds/aurora/pricing/]

Reactive java Client
@[https://github.com/vietj/reactive-pg-client]
@[https://github.com/eclipse-vertx/vertx-sql-client/tree/3.8/vertx-pg-client]

High performance reactive PostgreSQL client written in Java 
(By Julien Viet, core developer of VertX and Java Crash shell)