Handling Postgres custom data types

Published: 2022-12-29

Hi, welcome to the second article of the series about using Postgres. The first article, where we explored a bit the JSON fields, is available at this link. Go read it to have an introduction to the series.

Here is a index of them all.

In this article we will see a quick summary on how to use Postgres custom data types. The basic idea is that with Postgres we are not confined to the standard internal types (INT, REAL, TEXT, JSON, ...) but we can also define complex and structured data types, just like we would do with any programming language. Documentation for custom data types is here, turns out that there are even different type of custom data type you can create:

Composite Types
Enumerated Types
Ranges Types

The names should give a rough idea of their purpose. After an initial investigation on this peculiar feature of Postgres, I then decided to not use Composite types for reasons that will be explained later in this series of articles (summary: I don't really need it and the Rust middleware I use has a broken support for Postgres custom data types). So instead of opening another can of worms I've opted for working with JSON fields.

On the other hand I find Enumerated types very useful! Let's start with them.

§ Enumerated Types

When you have a piece of application data that is represented by a fixed number of values, you sometime use an enumeration. Example, the status of a user:

struct UserStatus {
    INVITED,
    ACTIVE,
    DELETED
}

Postgres can help enforcing the same constraint, mapping a field to an enumerated custom type:

CREATE TYPE USER_STATUS AS ENUM (
    'INVITED',
    'ACTIVE',
    'DELETED'
);

Let's review this new type:

> \dT+ user_status
+--------+---------------+---------------+------+----------+-------------------+------------>
| Schema | Name          | Internal name | Size | Elements | Access privileges | Description>
|--------+---------------+---------------+------+----------+-------------------+------------>
| public | user_status   | user_status   | 4    | INVITED  | <null>            | <null>     >
|        |               |               |      | ACTIVE   |                   |            >
|        |               |               |      | DELETED  |                   |            >
+--------+---------------+---------------+------+----------+-------------------+------------>

Let's use it in our schema and insert a new record, Postgres will protect me against typos:

CREATE TABLE users (
    name TEXT,
    surname TEXT,
    status USER_STATUS DEFAULT 'ACTIVE'
);

> INSERT INTO users  (name, surname, status) VALUES ('Max','Mustermann','active');

invalid input value for enum status: "active"
LINE 1: ...urname, status) VALUES ('Max','Mustermann','active')
                                                      ^

In the schema definition the field status can also be set to a default value of our enumeration, just like any other DB field.

Mind the fact that the values are case sensitive!

> INSERT INTO users  (name, surname, status) VALUES ('Max','Mustermann','active');
invalid input value for enum signup_status: "active"
LINE 1: ...surname, signup_status) VALUES ('Max','Mustermann','active')
                                                              ^
> INSERT INTO users  (name, surname, status) VALUES ('Max','Mustermann','ACTIVE');
INSERT 0 1
Time: 0.052s

§ Composite types

Composite types are extremely powerful but in the end for my use case they just added unnecessary cognitive overhead, so I stopped using them. Also, being a peculiar feature of Postgres, I am wary of using them extensively, if not really necessary.

Let's create a schema and define a composite type field. I create a type t_pet and a field pet of that type. Of course the name of the custom type can be anything, I'm just incapable of finding a better name.

CREATE TYPE t_pet AS (name TEXT, age INT);
CREATE TABLE tbl (
    owner varchar(64)
);
ALTER TABLE tbl ADD COLUMN pet t_pet;

Let's review the custom type and the table layout:

> \dT+ t_pet
+--------+--------+---------------+-------+----------+-------------------+-------------+
| Schema | Name   | Internal name | Size  | Elements | Access privileges | Description |
|--------+--------+---------------+-------+----------+-------------------+-------------|
| public | t_pet  | t_pet         | tuple |          | <null>            | <null>      |
+--------+--------+---------------+-------+----------+-------------------+-------------+

> \d tbl
+--------------+-----------------------+-----------------------+
| Column       | Type                  | Modifiers             |
|--------------+-----------------------+-----------------------|
| owner        | character varying(64) |                       |
| pet          | t_pet                 |                       |
+--------------+-----------------------+-----------------------+

Let's add some data:

> INSERT INTO tbl (owner, pet) VALUES ('Max', ('hund', 2) );
> INSERT INTO tbl (owner, pet) VALUES ('Otto', ('katze', 3) );

> select * from tbl;
+-------+-----------+
| owner | pet       |
|-------+-----------|
| Max   | (hund,2)  |
| Otto  | (katze,3) |
+-------+-----------+

let's see how the fields of the custom data type can be accessed:

> SELECT (pet).name FROM tbl;
+-------+
| name  |
|-------|
| hund  |
| katze |
+-------+

The custom type can be used also as an array: let's change the table and add our custom data type, this time as array:

CREATE TYPE t_pet AS (name TEXT, age INT);
CREATE TABLE tbl (
    owner varchar(64)
);
ALTER TABLE tbl ADD COLUMN pets t_pet[];

To add a new record, I now need to cast the type for each element of the array (see documentation):

INSERT INTO tbl (owner, pets) VALUES ('Max',
    array[
        ('hund', 2)::t_pet,
        ('katze', 3)::t_pet
    ]
);

Now let's retrieve them back. As any array type in Postgres, you can access fields singularly:

> SELECT pets FROM tbl;
+--------------------------+
| pets                     |
|--------------------------|
| {"(hund,2)","(katze,3)"} |
+--------------------------+

> SELECT pets[0] FROM tbl;
+--------+
| pets   |
|--------|
| <null> |
+--------+

(LOL, 1-index arrays...)

> SELECT pets[1] FROM tbl;
+----------+
| pets     |
|----------|
| (hund,2) |
+----------+

> SELECT pets[1].name FROM tbl;
+------+
| name |
|------|
| hund |
+------+

There's a lot more about this topic, but in the end I had found this approach too complicated with no great advantage for my use case, so I ended up storing straight JSON objects in Postgres and leverage the strong typing of Rust when retrieving the records from the DB.

§ Conclusions

And this is a wrap! We have touched another interesting but specific feature of Postgres and written down some knowledge so it won't be lost.

Next up, we will move up the ladder of the stack and see how to talk to Postgres with a Rust middleware: How to use Postgres with Rust.