Adjancency

We begin with a description of recursive CTEs, a SQL pattern  that will be used afterwards.

What is a CTE ?
A CTE (Common table expression) is a view local to a query : a substitute for a view when the general use of a view is not required. There is no definition stored in metadata.

The structure of CTE is :

with name_of_CTE as (
 Body of CTE
)
SQL statement using the CTE (called the outside query)

An example with the classical table employees  mentioning salary and department : in the body of CTE, we select the highest salary of the department and in the calling query, we select all the columns of those employees.

with find_max_salary(max_salary) as (
 select department, max(salary) as max_salary
 from employees
 group by department
)
select e.*
from employees e
inner join find_max_salary f on e.department = f.department
where e.salary = f.max_salary;

What is recursion ?
Recursion is a method where the solution to a problem depends on solutions to smaller instances of the same problem.
An example :

Factorial(n) = 1                     if n = 1
             = n * Factorial(n - 1)  if n > 1

In the definition of Factorial, we find Factorial again but with a smaller argument. The first value (for n = 1) is called the base case.

Let’s apply the definition for n = 4 :

Factorial(4) = 4 * Factorial(3)
             = 4 * 3 * Factorial(2) 
             = 4 * 3 * 2 * Factorial(1)  (Base case)
             = 4 * 3 * 2 * 1 
             = 24                                   

What is a recursive CTE ?
A recursive CTE is one in which an initial CTE is repeatedly executed to return subsets of data until the complete result set is obtained.

--
-- structure of recursive CTE
--
with recursive t(n) as (
 select 1 -- non-recursive term : the base case
 union all
 select n + 1 from t -- recursive term
 where t < x -- halting condition
)
select n from t;

Here some examples :

--
-- a counter
--
with recursive source (counter) as (
 select 1
 union all
 select counter + 1
 from source
 where counter < 5
)
select * from source;

gives

counter
-------
1
2
3
4
5
--
-- sum of integers until 64
--
with recursive sigma(n) as (
 values 1
 union all
 select n + 1 from sigma
 where n < 64
)
select sum(n) from sigma;

gives

sigma
-----
2080
--
-- factorials until 10
--
with recursive factorial(counter, product) AS (
 select 1, 1
 union all
 select counter + 1, product * (counter + 1)
 from factorial
 where counter < 10
)
select counter, product from factorial;

gives

counter     product
-------     -------
1           1
2           2
3           6
4           24
5           120
6           720
7           5040
8           40320
9           362880
10          3628800

The tree, a recursive data structure
A tree t consists of a value v and a list of other trees :

t[Root] : v [t[1], ..., t[k]] where t[i] can be empty

We want to implement the following tree giving the logical connection between some concepts of relational databases :

Root
            Relational theory
                      First-order predicate logic
                                Syntax
                                Rules of inference
                                Deductive systems
                                Semantics
                      Three-valued logic  
                      Normal forms
                                Functional dependencies
                                Multivalued dependencies
                                1NF
                                2NF
                                3NF
                                BCNF
                                4NF
                                5NF
                                DKNF
           SQL
                      Types and domains
                      PKs & FKs
                      Relations
                      Operators
                            Restriction
                            Projection
                            Joins
                               Cross join
                               Inner join
                               Left outer join
                               Right outer join
                               Full outer join
                            Union
                            Intersection
                            Difference
                            Divide
                      Aggregations
                      Constraints
                      Views

We create a table topics with a reference to itself in order to allow recursion. This implementation is called the Adjacency List Model. We will explore other models like the Nested Sets model, the Nested Intervals Model and the Materialized Path Model like the PostgreSQL extension ltree in other posts.

id_topic_up is a foreign key to id_topic. It means that the topic having this id_topic_up is a subtopic of the topic having this id_topic.

drop table if exists topics;

create table topics (
 id_topic integer not null primary key
 ,id_topic_up integer references topics (id_topic)
 ,topic_name varchar(32)
 ,check(id_topic != id_topic_up) -- avoid cycle
);

insert into topics values ( 1, NULL, 'Root');

-- subtopics of root
insert into topics values ( 2, 1, 'Relational theory');
insert into topics values ( 3, 1, 'SQL');

--
-- level 2
--

-- subtopics of Relational theory
insert into topics values ( 4, 2, 'First-order predicate logic');
insert into topics values ( 5, 2, 'Three-valued logic');
insert into topics values ( 6, 2, 'Normal forms');

-- subtopics of SQL
insert into topics values ( 7, 3, 'Types and domains');
insert into topics values ( 8, 3, 'PKs & FKs');
insert into topics values ( 9, 3, 'Relations');
insert into topics values (10, 3, 'Operators');
insert into topics values (11, 3, 'Constraints');
insert into topics values (12, 3, 'Views');

--
-- level 3
--

-- subtopics of First-order predicate logic
insert into topics values (13, 4, 'Syntax');
insert into topics values (14, 4, 'Rules of inference');
insert into topics values (15, 4, 'Deductive systems');
insert into topics values (16, 4, 'Semantics');

-- subtopics of Normal forms
insert into topics values (17, 6, 'Functional dependencies');
insert into topics values (18, 6, 'Multivalued dependencies');
insert into topics values (19, 6, '1NF');
insert into topics values (20, 6, '2NF');
insert into topics values (21, 6, '3NF');
insert into topics values (22, 6, 'BCNF');
insert into topics values (23, 6, '4NF');
insert into topics values (24, 6, '5NF');
insert into topics values (25, 6, 'DKNF');

-- subtopics of Operators
insert into topics values (26, 10, 'Restriction');
insert into topics values (27, 10, 'Projection');
insert into topics values (28, 10, 'Joins');
insert into topics values (29, 10, 'Union');
insert into topics values (30, 10, 'Intersection');
insert into topics values (31, 10, 'Difference');
insert into topics values (32, 10, 'Divide');
insert into topics values (33, 10, 'Aggregations');

--
-- level 4
--

-- subtopics of of Joins
insert into topics values (34, 28, 'Cross join');
insert into topics values (35, 28, 'Inner join');
insert into topics values (36, 28, 'Left outer join');
insert into topics values (37, 28, 'Right outer join');
insert into topics values (38, 28, 'Full outer join');

Now, some recursive queries on this tree. As usual in recursive CTEs, the first SELECT of the body gives the base case and the second SELECT populates the result set.

The following query implements a breadth-first search :

--
-- give the list breadth-first /-separated 
--
with recursive tree(data, id_topic, level, full_path) as (
 select 
  topic_name
  ,id_topic
  ,0
  ,''
 from topics
 where id_topic_up is null
 union all
 select 
  topic_name
  ,topics.id_topic
  ,tree.level + 1
  ,tree.full_path || ' / ' || topics.topic_name
 from topics inner join tree on tree.id_topic = topics.id_topic_up
)
select full_path
from tree
where level != 0
order by level, id_topic;

The base case begins with level = 0 and an empty full path of topics. The recursive term (the select after union all) increments the level and adds a topic_name in the full path of topics. The result of this query is :

/ Relational theory
/ SQL
/ Relational theory / First-order predicate logic
/ Relational theory / Three-valued logic  
/ Relational theory / Normal forms
/ SQL / Types and domains
/ SQL / PKs & FKs
/ SQL / Relations
/ SQL / Operators
/ SQL / Constraints
/ SQL / Views
/ Relational theory / First-order predicate logic / Syntax  
/ Relational theory / First-order predicate logic / Rules of inference  
/ Relational theory / First-order predicate logic / Deductive systems  
/ Relational theory / First-order predicate logic / Semantics  
/ Relational theory / Normal forms / Functional dependencies
/ Relational theory / Normal forms / Multivalued dependencies
/ Relational theory / Normal forms / 1NF
/ Relational theory / Normal forms / 2NF
/ Relational theory / Normal forms / 3NF
/ Relational theory / Normal forms / BCNF
/ Relational theory / Normal forms / 4NF
/ Relational theory / Normal forms / 5NF
/ Relational theory / Normal forms / DKNF
/ SQL / Operators / Restriction
/ SQL / Operators / Projection
/ SQL / Operators / Joins
/ SQL / Operators / Union
/ SQL / Operators / Intersection
/ SQL / Operators / Difference
/ SQL / Operators / Divide
/ SQL / Operators / Aggregations
/ SQL / Operators / Joins / Cross join
/ SQL / Operators / Joins / Inner join
/ SQL / Operators / Joins / Left outer join
/ SQL / Operators / Joins / Right outer join
/ SQL / Operators / Joins / Full outer join

Generation of a Table of contents
We need an implementation of the depth-first search. We have to modify slightly the breadth-first query by adding a vector of the descendants which will be used as a sort key. We initialize this vector with

ARRAY[]::INTEGER[]

and we recursively populate it with

sort_key || topics.id_topic
--
-- give the list depth-first /-separated 
--
with recursive tree(data, id_topic, level, full_path, sort_key) as (
 select 
  topic_name 
  ,id_topic
  ,0
  ,''
  ,array[]::integer[] 
 from topics 
 where id_topic_up is null
 union all
 select 
   topics.topic_name 
  ,topics.id_topic
  ,level + 1 
  ,tree.full_path || ' / ' || topics.topic_name
  ,sort_key || topics.id_topic
 from tree inner join topics on topics.id_topic_up = tree.id_topic
)
select level, repeat('   ', level - 1) || data 
from tree 
where level != 0
order by sort_key;

The sort_key has the following values :

{1,2}
{1,2,4}
{1,2,4,13}
{1,2,4,14}
{1,2,4,15}
{1,2,4,16}
{1,2,5}
{1,2,6}
{1,2,6,17}
{1,2,6,18}
{1,2,6,19}
{1,2,6,20}
{1,2,6,21}
{1,2,6,22}
{1,2,6,23}
{1,2,6,24}
{1,2,6,25}
{1,3}
{1,3,7}
{1,3,8}
{1,3,9}
{1,3,10}
{1,3,10,26}
{1,3,10,27}
{1,3,10,28}
{1,3,10,28,34}
{1,3,10,28,35}
{1,3,10,28,36}
{1,3,10,28,37}
{1,3,10,28,38}
{1,3,10,29}
{1,3,10,30}
{1,3,10,31}
{1,3,10,32}
{1,3,10,33}

The query gives :

Relational theory
   First-order predicate logic
      Syntax
      Rules of inference
      Deductive systems
      Semantics
   Three-valued logic  
   Normal forms
      Functional dependencies
      Multivalued dependencies
      1NF
      2NF
      3NF
      BCNF
      4NF
      5NF
      DKNF
SQL
   Types and domains
   PKs & FKs
   Relations
   Operators
      Restriction
      Projection
      Joins
         Cross join
         Inner join
         Left outer join
         Right outer join
         Full outer join
      Union
      Intersection
      Difference
      Divide
      Aggregations
   Constraints
   Views

It will be nicer with hierarchical numbering of the topics.

--
-- generate Table of contents with hierarchical numbering of topics
-- 
with recursive numbered_topics(
      id_topic, id_topic_up, topic_name, topics_sequence) 
as (
 select
  id_topic
  ,id_topic_up
  ,topic_name
  ,row_number() over 
    (partition by id_topic_up order by id_topic) as topics_sequence
 from topics
), hierarchy(id_topic, numbering) as (
 select 
  id_topic
  ,cast(topics_sequence as varchar(128))
 from numbered_topics
 where id_topic_up = 1    
union all
 select 
  numbered_topics.id_topic
  ,cast(case when hierarchy.numbering = ''
        then(cast(numbered_topics.topics_sequence as varchar(128)))
        else(hierarchy.numbering 
         || '.' 
         || cast(numbered_topics.topics_sequence as varchar(128)))
        end as varchar(128))
 from numbered_topics 
 inner join hierarchy 
    on numbered_topics.id_topic_up = hierarchy.id_topic    
)
select numbering , repeat(' ', length(numbering)) || topic_name
from hierarchy 
join topics p, topics.id_topic = hierarchy.id_topic
order by hierarchy.numbering;

We see in this query that we can write several CTEs in one WITH statement by separating them with a “,”.

The CTE numbered_topics, which is not recursive, gives the number of the topics belonging to the same category which is equal to row_number(), the position of a topic in the category, because we do a partition on a category order by a topic.

The CTE hierarchy does a breadth-first scan of the tree and populates the string numbering.

The outside select makes a join with the table topics to get the names of the topics.

We order the result of the select with this string numbering to produce a depth-first list of topics.

1         Relational theory
1.1           First-order predicate logic
1.1.1             Syntax
1.1.2             Rules of inference
1.1.3             Deductive systems
1.1.4             Semantics
1.2           Three-valued logic
1.3           Normal forms
1.3.1             Functional dependencies
1.3.2             Multivalued dependencies
1.3.3             1NF
1.3.4             2NF
1.3.5             3NF
1.3.6             BCNF
1.3.7             4NF
1.3.8             5NF
1.3.9             DKNF
2         SQL
2.1           Types and domains
2.2           PKs & FKs
2.3           Relations
2.4           Operators
2.4.1             Restriction
2.4.2             Projection
2.4.3             Joins
2.4.3.1               Cross join
2.4.3.2               Inner join
2.4.3.3               Left outer join
2.4.3.4               Right outer join
2.4.3.5               Full outer join
2.4.4             Union
2.4.5             Intersection
2.4.6             Difference
2.4.7             Divide
2.4.8             Aggregations
2.5           Constraints
2.6           Views

Delete of a subtree
As usual, we use a recursive CTE. We encapsulate this CTE in a function in order to call it with the ID of the root of the subtree. We use a vector of the ancestots that will be used for the delete in the outside query. We initialize this vector with

array[]::integer[]

and we recursively populate it with

tree.sequence_topics_id || topics.id_topic_up
 create or replace function delete_subtree(subtree_root integer) 
 returns void as $$
 declare
  subtree_root alias for $1;
 begin
  with recursive tree as (
   select -- initialization
    id_topic
    ,array[]::integer[] as sequence_topics_up
   from topics 
   where id_topic_up is null
   union all
   select -- generation of the ancestors list
    topics.id_topic
    ,tree.sequence_topics_up || topics.id_topic_up
   from topics, tree
   where topics.id_topic_up = tree.id_topic
 )
 delete from topics 
 where id_topic in (
  select id_topic 
  from tree 
  where subtree_root = 
   any(tree.sequence_topics_up)) -- delete ancestors
 or id_topic = subtree_root; -- delete root of the subtree
end; 
$$ language 'plpgsql'; 

and we call this function for the id of the topic ‘Joins’ for example :

select delete_subtree(28);
select * from topics;
id_topic    id_topic    topic_name
--------    --------    ----------
1           0           Root
2           1           Relational theory
3           1           SQL
4           2           First-order predicate logic
5           2           Three-valued logic
6           2           Normal forms
7           3           Types and domains
8           3           PKs & FKs
9           3           Relations
10          3           Operators
11          3           Constraints
12          3           Views
13          4           Syntax
14          4           Rules of inference
15          4           Deductive systems
16          4           Semantics
17          6           Functional dependencies
18          6           Multivalued dependencies
19          6           1NF
20          6           2NF
21          6           3NF
22          6           BCNF
23          6           4NF
24          6           5NF
25          6           DKNF
26          10          Restriction
27          10          Projection
29          10          Union
30          10          Intersection
31          10          Difference
32          10          Divide
33          10          Aggregations

To add a new node or a new subtree is easy. The values of the PK do not have a semantic value. Only the PK/FK relations make sense.

Some utilities to inspect the tree

--
-- find leaves (topics without subtopics)
--
select id_topic, topic_name
from topics
where id_topic not in (select 
 id_topic_up from topics where id_topic_up is not null)
order by id_topic;
id_topic  topic_name
--------  ----------
5         Three-valued logic
7         Types and domains
8         PKs & FKs
9         Relations
11        Constraints
12        Views
13        Syntax
14        Rules of inference
15        Deductive systems
16        Semantics
17        Restriction
18        Projection
20        Union
21        Intersection
22        Difference
23        Divide
24        Aggregations
25        Operators
26        Functional dependencies
27        Multivalued dependencies
28        1NF
29        2NF
30        3NF
31        BCNF
32        4NF
33        5NF
34        DKNF
35        Cross join
36        Inner join
37        Left outer join
38        Right outer join
39        Full outer join

To find the nodes of the tree, the elements having children, we remove the NOT of the preceding query :

--
-- find nodes (topics having subtopics)
--
select id_topic, topic_name
from topics
where id_topic in (select 
 id_topic_up from topics where id_topic_up is not null)
order by id_topic;
id_topic    topic_name
--------    ----------
1           ALL
2           Relational theory
3           SQL
4           First-order predicate logic
6           Normal forms
10          Operators
19          Joins
--
-- isolated nodes ?
--
select id_topic
from topics
where id_topic_up is null
and id_topic not in (
   select id_topic_up from topics where id_topic is not null);

returns no value.

It is just an exercise. If you want to build trees, it is better to use the ltree package.