The Sandbox framework : 3. The kernel

The kernel is a database used by the system administrators to set :
• the databases and the IP address of the servers where they reside
• the name of the users and the databases they are allowed to connect to
• the tables to be mapped in function of the application

The first two pieces of information are given by the view permissions_matrix and the third one by the table sdb_tables_2_attach.

The kernel must be connected at the sandbox at login time with the instruction

select public.connect_kernel(main_server, user_name, password)

The main server is the server where the sandboxes and the kernel reside. The corporate databases can reside on other servers.

create or replace function public.connect_kernel(
	this_host_address text,
	this_user_name text,
	this_password text,
	this_port_number text default '5432'
    )
    returns boolean
    language 'plpgsql'
    volatile security definer
AS $$
declare
	sql_stmt text;
   	fdw_server_exists boolean;
begin

    sql_stmt =     'select public.is_database_connected('
                || ''''
                || 'kernel'
                || ''''
                || ');';

    execute sql_stmt into fdw_server_exists;

    if fdw_server_exists is true then
        return true;
    end if;

    sql_stmt = 	   'create server '
    		|| 'fdw_kernel'
		|| ' foreign data wrapper postgres_fdw '
		|| ' options (host '
                || ''''
                || this_host_address
                || ''''
                || ', dbname '
		|| ''''
		|| 'kernel'
                || ''''
                || ', port '
                || ''''
                || this_port_number
		|| ''''
                || ');';
                
    execute sql_stmt;
    
    
    sql_stmt = 	   'create user mapping for public '
    		|| ' server '
                || 'fdw_kernel'
                || ' options (user '
                || ''''
                || this_user_name
               	|| ''''
                || ', password '
                || ''''
                || this_password
                || ''''
                || ');';
                
    execute sql_stmt;
    
    sql_stmt =     'grant usage on foreign server fdw_kernel to '
                || this_user_name;
                
    execute sql_stmt;
     
    create foreign table permissions_matrix (
	pg_user_name text,
        pg_permission text,
    	pg_database_name text,
    	pg_server_name text,
    	pg_server_address text,
        pg_port_number text
    ) server fdw_kernel
        options (
            schema_name 'public',
            table_name 'permissions_matrix'
    );
 	

    create foreign table sdbx_tables_2_attach (
        database_type text, 
	front_end_table_name text,
	pg_table_name text, 
	pg_pk text
    ) server fdw_kernel
        options (
            schema_name 'public',
            table_name 'sdbx_tables_2_attach'
    );
    
    perform public.write2log(
           'connect_database : '  
        || ' kernel connected'
    	);
	return true;
    
    exception when others then 
  	perform public.write2log(
              'ERROR : connect_kernel : ' 
           || sql_stmt
           || ', '
           || sqlerrm
    	 );
	 return false;
    
end 
$BODY$;

The table sdbx_tables_2_attach is specific to the application I develop for my customer.

The sandbox framework : 2. A fdw table definitions server

To create a foreign table, we have to write :

create foreign_table_name (
 col1 data_type1,
 col2 data_type2,
 col3 data_type3,
 ...
 )
 server fdw_server_name
 options (schema_nameschema_name’, table_nametable_name’)

If you have a lot of tables to map with fdw, it is tedious to write for each such a statement.

Furthermore, if you modify tables on the source database, you must modify the create foreign table statements.

Our solution solves both problems.

On each database from which we want take tables, we have a view fdw_table_definitions with the create foreign table statement for all tables and views.

This string contains placeholders for variables that will be given when we map the table in the destination database :
• to_table_name
• from_schema_name
• to_schema_name

In the function where we create the fdw server (that will be explained later), we create a foreign table that maps this view.

We give to this view the name database_name_fdw_table_definitions where database_name is the name of the source database.

Each time that the application needs to map a table, a function attach_table is issued that substitutes placeholders with actual values and executes the statement create foreign table.

This function reads the view database_name_fdw_table_definitions, creates dynamically the create foreign table statement and executes it.

The query to create the fdw table definitions comes from the excellent site Postgres OnLine Journal. This query has been customised for the sandbox framework.

Let’s start with the view fdw_table_definitions :

select 
  tables.table_schema as schema_name
 ,tables.table_name
 ,get_fdw_table_definition(tables.table_schema, tables.table_name) 
                       as fdw_table_definition
from 
  information_schema.tables
where 
  tables.table_schema not in ('pg_catalog', 'information_schema');

This view calls the function get_fdw_table_definition :

create or replace function public.get_fdw_table_definition(
    this_schema_name text, 
    this_table_name text
    )
    returns text
    language 'sql'
as $body$

  with cols as (
    select
         cl.relname as table_name
        ,na.nspname as table_schema
        ,att.attname as column_name
        ,format_type(ty.oid,att.atttypmod) as column_type
        ,attnum as ordinal_position
    from
        pg_attribute as att
    join pg_type as ty
        on ty.oid = atttypid
    join pg_namespace as tn
        on tn.oid = ty.typnamespace
    join pg_class as cl
        on cl.oid = att.attrelid
    join pg_namespace as na 
        on na.oid = cl.relnamespace
    left outer join pg_type as et 
        on et.oid = ty.typelem
    left outer join pg_attrdef as def 
        on adrelid = att.attrelid and adnum = att.attnum
    where 
        cl.relkind in ('v','r')   -- take only tables and views
    and
        na.nspname = this_schema_name 
    and 
        cl.relname = this_table_name 
    and 
        att.attnum > 0
    order by
        att.attnum
)
select
     'create foreign table THIS_TO_TABLE_SCHEMA.THIS_TO_TABLE_NAME ('
  || string_agg(quote_ident(column_name)
  || ' '
  || column_type,  ', ' order by ordinal_position)
  || ')  server this_fdw_server_name options (schema_name '
  || ''''
  || 'THIS_FROM_SCHEMA_NAME'
  || ''''
  || ','
  || 'table_name '
  || ''''
  || table_name
  || ''''
  || '); '
from
  cols
group by
  table_schema, table_name

$body$;

THIS_TO_TABLE_SCHEMA, THIS_TO_TABLE_NAME and THIS_FROM_SCHEMA_NAME are the placeholders.

The sandbox framework : 1. Architecture

To meet the requirements of a customer who wants that each user of his system has its own private database connected with several corporate databases, I build a framework with the Foreign Data Wrapper protocol.

This framework is a set of functions that will be described in several posts.

Specifications

Each user has its own database called herafter the sandbox or simply the sdbx.

There are several databases on several servers that contain corporate data. Some databases have the same structure but not the same content.

The front end application is connected with the sdbx only.

The end user chooses the type of application he wants and the framework connects the necessary databases and, within those databases,  attachs the necessary tables.

All information about users (name and permissions) and databases (name and location) are stored in a database called kernel that is connected with the sdbx at the login.

Those tables are attached to the sdbx via the Foreign Data Wrapper protocol.

Usage

The framework is a set of five primitives that are called from the front_end applications.

Those functions are PL/pgSQL functions stored in the sdbx :

connect_kernel(host_name, user_name, password)

connect_database(database_name, user_name, password)

attach_table(database_name, from_table_name, to_table_name, 
                 from_schema_name, to_schema_name)

attach_schema(database_name, from_schema_name, to_schema_name)

disconnect_database(db_name)

All these functions return the boolean true if the job is done or false otherwise.

In case of error, all these functions write a record in a log table with the last SQL statement executed, the error message and a timestamp. This table log belongs to the sdbx.

Structure

The FDW protocol is used as follows :

create server fdw_server_name
 foreign data wrapper postgres_fdw
 options (host ‘host_address’, 
              port ‘port_number’, dbname ‘database_name’);

create user mapping for public server fdw_server_name
 options (user ‘user_name’, password ‘password’)

create foreign_table_name (
 col1 data_type1,
 col2 data_type2,
 col3 data_type3,
 ...
 )
 server fdw_server_name
 options (schema_name ‘schema_name’, table_name ‘table_name’)

drop server if exists fdw_server_name cascade;

Those colored strings are variables sent by the framework that receives them from the front end application and those colored strings are parameters given by the view list_fdw_definitions described below.

All those functions are build dynamically by the framework

The create foreign table is dynamically generated as follows :

Each database has a view list_fdw_definitions that contains for all tables and views the current values of :
schema_name
table_name
create foreign table statement

This view is mapped in the sdbx when the connect_database is issued with the name database_name_list_fdw_definitions.

The create foreign table has placeholders that are substituted by actual values by the framework :
to_table_name
from_schema_name
to_schema_name

The kernel

The kernel is connected to the sdbx at login time but only the management has write access to the data of the kernel.

At this stage, the kernel contains two tables :

• the view permission_matrix that gives for each user the list of databases he can connect and for each database, the IP host address.

• the table sdbx_tables_2_attach that gives for each type of application the tables that must be mapped in the sdbx and linked to the front end.

Notes

Each database name on the network is unique. The framework can be enhanced by defining the databases with the triplet (host_name, port_number, database_name) but it is a good practice to keep distinct names for distinct databases.

When a fdw server is dropped, all dependant objects are dropped. The sdbx is thus cleaned of fdw stuff.

 

Alter a column data type in all tables

We must change the timestamp format for all columns of type timestamp.

The format is like that :

2017-06-14 17:47:23.295343

and it must be like that :

2017-06-14 17:47:23

The SQL command to be done is :

alter table my_table 
   alter column my_timestamp_column type timestamp(0);

And we embed this command into a loop that takes relevant information from the catalog :

create or replace function public.alter_all_timestamps() 
 returns void as $$
declare
  r       record;
  stmt    varchar;
begin
  for r in 
    select 
        c.table_name
       ,c.column_name
    from 
        information_schema.columns as c
    inner join 
        information_schema.tables as t 
    on 
        c.table_name = t.table_name
    where 
        c.table_schema = 'public'
    and 
        c.data_type = 'timestamp without time zone'
    and 
        t.table_type = 'BASE TABLE'
  loop
    
        stmt = 'alter table ' 
                || quote_ident(r.table_name )
                || ' alter column '
                || quote_ident(r.column_name)
                || ' type timestamp(0)';
                
        execute stmt;

  end loop;

end $$ language 'plpgsql' volatile;

That’s simple !

If there are views on those tables that select a column of type timestamp, those views must be dropped before the run of the function and recreated after.

 

Calculated columns with a trigger

 

Some say that calculated columns is bad design because it does not follow the relational model. It is true but performance can be greatly enhanced. When a function is applied in a query on a indexed column, the query optimizer will not use the index. It is thus sometimes better to have an index on the calculated column because queries from the front end applications will run faster.

In this simple example, we have a table with 3 columns. The first and the second are given by the user and the third is calculated by a trigger. We add a condition : x * 0 = 0 * x = x (This fantasy comes from the requirements of a customer).  It is done with the case expression.

Some say that calculated columns is bad design because it does not follow the relational model. It is true but performance can be greatly enhanced.

The condition : x * 0 = 0 * x = x is implemented with a case construct.

drop table if exists tup;

create table tup (
     col1   numeric(10,2)
    ,col2   numeric(10,2)
    ,col3   numeric(10,2)
);

insert into tup(col1,col2) values
     (2.87      ,3.77)
    ,(4         ,5.11)
    ,(2.12      ,0)
    ,(0.0       ,3);

update tup
    set col3 =     (case col1 when 0 then 1 else col1 end)
                *  (case col2 when 0 then 1 else col2 end);

select * from tup;

will give

col1     col2   col3
---------------------
2.87     3.77   10.82
4.00     5.11   20.44
2.12     0.00    2.12
0.00     3.00    3.00

Let now add the trigger and the trigger function :

create or replace function calculate_columns() returns trigger as $$
begin

  new.col3 =   (case new.col1 when 0 then 1 else new.col1 end)
             * (case new.col2 when 0 then 1 else new.col2 end);

  return new;

end $$ language plpgsql;

create trigger calculated_columns  
    before insert or update 
    on tup
    for each row
    execute procedure calculate_columns();

Note that we do a before insert or update trigger and that we update the columns in the row new.

insert into tup(col1, col2) values
     (6.23      ,2)
    ,(0         ,55.11);

select * from tup;

will give :

col1     col2   col3
---------------------
2.87     3.77   10.82
4.00     5.11   20.44
2.12     0.00    2.12
0.00     3.00    3.00
6.23     2.00   12.46
0.00    55.11   55.11

The deal is in the bag!

To list all the pk’s and fk’s

The system catalogs of PostgreSQL are very well designed. They are not easy to understand at first glance but they allow to write short and efficient queries.

As usual, if the data model is sound, the queries are simple and natural.

Just an example : the comma separated list of columns for composite keys is automatically created in this query :

select
     conrelid::regclass as table_name
    ,conname
    ,pg_get_constraintdef(c.oid)
from
    pg_constraint c
join
    pg_namespace n on n.oid = c.connamespace
where
    contype in ('f', 'p')
and
    n.nspname = 'public'
order by
    conrelid::regclass::text
   ,contype desc;

The output is like that :

...
"SLM_TYPE"           SLM_TYPE_pkey                   PRIMARY KEY ("SLM_TYPE_NR")
"SLM_TYPE_CONTACT"   SLM_TYPE_CONTACT_CONTACT_fkey   FOREIGN KEY ("CONTACT") 
                                                        REFERENCES "CONTACTS"("CONTACT_NR") 
                                                        ON DELETE CASCADE
...

By the way, it is always better to use serials for pk’s and fk’s. But I am working with database I migrate from Access to PostgreSQL. I would like to write a script that will add serials for those pk’s and fk’s and that will transform the initial pk’s and fk’s to unique constraints. Not so easy…

To list all the indexes

When we create a pk in PostgreSQL, an index named table_name_pkey is automatically created.

The following script gives all indexes, those created by the pk’s and the others.

Don’t forget that PostgreSQL does not create automatically an index when you create a fk. You have to do it yourself !

By the way, this script works for composite indexes thanks to the handy function array_to_string that creates of comma separated list of the column names. It works because the two first columns in the select clause are given in the order by.

select 
     t.relname as table_name
    ,i.relname as index_name
    ,array_to_string(array_agg(a.attname), ', ') as column_names
from
     pg_class      t
    ,pg_class      i
    ,pg_index      ix
    ,pg_attribute  a
where
    t.oid = ix.indrelid
and 
    i.oid = ix.indexrelid
and 
    a.attrelid = t.oid
and 
    a.attnum = any(ix.indkey)
and 
    t.relkind = 'r' -- takes regular tables
and 
    t.relname not like 'pg_%' -- excludes system catalogs
group by
    t.relname
   ,i.relname
order by
    t.relname
   ,i.relname;