Using Postgres JSONB data types

12 minute read Published: 2022-12-22

With this article I'd like to start a series of technical posts about things I've learned in (more or less) the last year while working on a project. It's a kind of braindump of technical documentation, for things that I need to know and tend to forget.

Here is a index of them all.

I often find myself referencing my own code to pick up again how to do something. I'm quite sure many developers have that familiar feeling of being sure to have solved a specific problem but not remembering when or how. That's the curse of keeping the context of many things at once.

Anyway, I won't go into deep detail of these topics because a) I just want to pinpoint some basics for my reference and b) I have limited mental bandwidth :D but I want to finalize in some form these drafts I have in the drawer since months. I hope my future self will find this useful. If not else, like Richard Feynman once said (paraphrasing): making the effort to explain something in simple terms it's the best way to learn that something.

§ Preamble

Although I worked on these articles when Postgres 13 was out, I will refer to the latest Postgres documentation at the time of publishing: Postgres 15. Unfortunately I'm still blocked on Debian buster and until I'll have time to upgrade to bullseye I must live with Postgres 11 anyway.

One more thing: the Postgres documentation is nothing short of a masterpiece of work and we all should learn from them about how to write documentation. However, I find it sometimes obscure and completely lacking clear examples ...

Let's write clear and simple examples, please!

... so in case I don't understand something, first I try to get the gist from a straightforward Stack Overflow answer (which sometimes has a helpful link to the relevant documentation) or one of the those terrible Postgres tutorial websites (LOL), then read more from the official documentation. It's important to understand well something before putting your life ar risk by using someone else's code :)

Ok, enough blahblah, let's start with the first topic: JSONB Postgres fields.

§ JSON and JSONB data types

I regularly use JSONB fields because they are very flexible. I prefer the JSONB instead of the plain JSON because they support indexing and have specific operators to run targeted queries: this means that you can store any structure and (with a bit of elbow grease ...) you can retrieve a specific part of a JSON structure or even perform aggregate operations without suffering performace penalties.

My main use case is when I have to store some data coming from an external source and I'm not yet sure what to do with it (example a feed, statistics, things that don't need specific reasoning). Destructure it in separated fields? Spread it on different tables? In general I treat it like a blob until I figure out if something else is needed.

Note: I've just realized when writing this article that starting from Postgres 12 we now have a third JSON data type: the jsonpath, which is interesting but I didn't yet investigate. Postgres is seriously one of the most impressive FLOSS project I know of.

The documentation for the JSON data types is at this link. Let's see some tricks I can play with this field.

§ Filtering content in a JSONB field

Let's say we have this schema:

CREATE TABLE test (
    id int,
    resources JSONB
)

The resources field is defined as a JSONB object and could be anything, for example:

{
  "items": [
    {
      "id": 3,
      "title": "The Narconomicon",
      "date_publish": 1967
    },
    {
      "id": 4,
      "title": "Wild Things",
      "date_publish": 1994
    }
  ]
}

Let's add that junk to the table:

INSERT INTO test (id, resources) VALUES (1, '{{"items":[{"id:"3",...}'::jsonb);

The cast to ::jsonb in this case is not really necessary but I use it for hygiene, there are some cases where Postgres will complain without it.

Now let's retrieve the content of the field resources:

> SELECT resources FROM test;

+------------------------------------------------------------------------------------------------------------------------------------+
| resources                                                                                                                          |
|------------------------------------------------------------------------------------------------------------------------------------|
| {"items": [{"id": 3, "title": "The Narconomicon", "date_publish": 1967}, {"id": 4, "title": "Wild Things", "date_publish": 1994}]} |
+------------------------------------------------------------------------------------------------------------------------------------+

we have the whole JSON back. Now let's start with the basics, how to extract a partial json:

> SELECT resources->>'items' FROM test;

+-------------------------------------------------------------------------------------------------------------------------+
| ?column?                                                                                                                |
|-------------------------------------------------------------------------------------------------------------------------|
| [{"id": 3, "title": "The Narconomicon", "date_publish": 1967}, {"id": 4, "title": "Wild Things", "date_publish": 1994}] |
+-------------------------------------------------------------------------------------------------------------------------+

The ->> operator says "extract the items JSON object and return it as plain text".

Pay close attention to not confuse the operator ->> (result is plain text) with the other -> operator (result is a json/jsob object)! The difference is subtle but fundamental: if the query returns a json/jsonb object you can perform further operations on the returned value:

> SELECT resources->'items'->id->'title' FROM test;

+---------------+
| ?column?      |
|---------------|
| "Wild Things" |
+---------------+

If it's text ... you can't:

> SELECT resources->>'items'->id FROM test;
operator does not exist: text -> integer
LINE 1: SELECT resources->>'items'->id FROM test limit 1
                                  ^
HINT:  No operator matches the given name and argument types. You might need to add explicit type casts.

Moving on, Let's filter just the record matching a json object? I can use the @> operator which means "given a JSON object, check if is contained in any record":

> SELECT resources->'items' FROM test WHERE resources->'items' @> '[{"id": 4}]';

+-------------------------------------------------------------------------------------------------------------------------+
| ?column?                                                                                                                |
|-------------------------------------------------------------------------------------------------------------------------|
| [{"id": 3, "title": "The Narconomicon", "date_publish": 1967}, {"id": 4, "title": "Wild Things", "date_publish": 1994}] |
+-------------------------------------------------------------------------------------------------------------------------+

You can throw anything at the @> operator. For example let's populate the table as follows:

> INSERT INTO test (id,resources) VALUES (1, '[{"id": "1"},{"id":"2"}]'::jsonb);
> INSERT INTO test (id,resources) VALUES (2, '[{"id": "2"},{"id":"4"}]'::jsonb);

> SELECT * FROM test;
+----+----------------------------+
| id | resources                  |
|----+----------------------------|
| 1  | [{"id": "1"}, {"id": "2"}] |
| 2  | [{"id": "2"}, {"id": "4"}] |
+----+----------------------------+
SELECT 2

Now let's invoke Cthulhu to help with this syntax and run some queries filtering by an arbitrary json array (source):

> SELECT * FROM test WHERE resources @> ANY (ARRAY ['[{"id":"4"}]']::jsonb[]);
+----+----------------------------+
| id | resources                  |
|----+----------------------------|
| 2  | [{"id": "2"}, {"id": "4"}] |
+----+----------------------------+
SELECT 1

> SELECT * FROM test WHERE resources @> ANY (ARRAY ['[{"id":"2"}]', '[{"id":"1"}]']::jsonb[]);
+----+----------------------------+
| id | resources                  |
|----+----------------------------|
| 1  | [{"id": "1"}, {"id": "2"}] |
| 2  | [{"id": "2"}, {"id": "4"}] |
+----+----------------------------+
SELECT 2

The query can be read as: "return any record where there are ANY (zero or more) number of elements of the JSONB array I'm giving you.

Being able to handle JSON objects this way, allows aggregating data like any other data type. Let's say we want to calculate the average for a list of values, using the AVG() Postgres function. We can also do that with a list of JSONB values (ensuring to cast them to a type that AVG can handle):

-- We have some JSONB data
> SELECT data FROM tbl;

+-----------------------------+
| data                        |
|-----------------------------|
| {"key": "1", "value": "57"} |
| {"key": "2", "value": "64"} |
| {"key": "3", "value": "32"} |
+-----------------------------+

-- Filter out only the values
> SELECT data->'values' FROM tbl;

+----------+
| ?column? |
|----------|
| 57       |
| 64       |
| 32       |
+----------+

-- calculate the average of the values 
> SELECT AVG((data->'values')::REAL) FROM tbl;

+-------+
| avg   |
|-------|
| 131.6 |
+-------+

There are other json operators but I'll stop here.

§ JSON/JSONB functions

There are a bunch of JSON functions, most of them are duplicated to work either on a JSON or a JSONB, example:

It's important to match json_ and jsonb_ functions with the correct data type! If I use a json_ function on a jsonb field, Postgres will complain:

> SELECT from test t WHERE EXISTS (SELECT json_object_keys(t.resources));
function json_object_keys(jsonb) does not exist
LINE 1: SELECT t.* from test t WHERE EXISTS (SELECT json_object_keys...
                                                    ^
HINT:  No function matches the given name and argument types. You might need to add explicit type casts.

Let's see a couple of them at work.

§ jsonb_object_keys: Return the set of keys in the top-level JSON object

> SELECT t.* from test t WHERE EXISTS (SELECT jsonb_object_keys(t.resources));

+----+------------------------------------------------------------------------------------------------------------------------------------+
| id | resources                                                                                                                          |
|----+------------------------------------------------------------------------------------------------------------------------------------|
| 1  | {"items": [{"id": 3, "title": "The Narconomicon", "date_publish": 1967}, {"id": 4, "title": "Wild Things", "date_publish": 1994}]} |
+----+------------------------------------------------------------------------------------------------------------------------------------+

and since I am returning a jsonb I can further filter the results like we did before:

> SELECT t.resources->'items'->0->'title' from test t WHERE EXISTS (SELECT jsonb_object_keys(t.resources));

+--------------------+
| ?column?           |
|--------------------|
| "The Narconomicon" |
+--------------------+

§ jsonb_array_elements: Retrieve the array content of items in the field test.resources, the result in this case looks the same as before

> SELECT t.* from test t WHERE EXISTS (SELECT jsonb_array_elements(t.resources->'items'));

+----+------------------------------------------------------------------------------------------------------------------------------------+
| id | resources                                                                                                                          |
|----+------------------------------------------------------------------------------------------------------------------------------------|
| 1  | {"items": [{"id": 3, "title": "The Narconomicon", "date_publish": 1967}, {"id": 4, "title": "Wild Things", "date_publish": 1994}]} |
+----+------------------------------------------------------------------------------------------------------------------------------------+

Oh, but with some weird dance we can also say that we want to filter records having a specific id inside resources->'items':

> SELECT t.* from test t WHERE EXISTS (SELECT * FROM jsonb_array_elements(t.resources->'items') AS foo(bar) WHERE foo.bar->'id'='3');

+----+------------------------------------------------------------------------------------------------------------------------------------+
| id | resources                                                                                                                          |
|----+------------------------------------------------------------------------------------------------------------------------------------|
| 1  | {"items": [{"id": 3, "title": "The Narconomicon", "date_publish": 1967}, {"id": 4, "title": "Wild Things", "date_publish": 1994}]} |
+----+------------------------------------------------------------------------------------------------------------------------------------+

wow, is that really working? Let's try with a wrong id:

> SELECT t.* from test t WHERE EXISTS (SELECT * FROM jsonb_array_elements(t.resources->'items') as foo(bar) where foo.bar->'id'='42');

+----+-----------+
| id | resources |
|----+-----------|
+----+-----------+

§ json_array_length: count how many elements in a json array

(or the equivalent jsonb_array_length for JSONB)

> SELECT json_array_length('[{"id":"1"},{"id":"2"},{"id":"3"}]');
+-------------------+
| json_array_length |
|-------------------|
| 3                 |
+-------------------+
SELECT 1

§ jsonb_pretty: prettify a jsonb output

> select jsonb_pretty('{"array":[1,2,3],"boolean":true,"color":"gold","null":null,"number":123,"object":{"a":"b","c":"d"},"string":"
 Hello World"}')
+------------------------------+
| jsonb_pretty                 |
|------------------------------|
| {                            |
|     "null": null,            |
|     "array": [               |
|         1,                   |
|         2,                   |
|         3                    |
|     ],                       |
|     "color": "gold",         |
|     "number": 123,           |
|     "object": {              |
|         "a": "b",            |
|         "c": "d"             |
|     },                       |
|     "string": "Hello World", |
|     "boolean": true          |
| }                            |
+------------------------------+

§ jsonb_set: replace a portion in a jsonb field

Just learned the other day! The syntax is a bit confusing, so let's see the definition:

jsonb_set (
  target jsonb,
  path text[],
  new_value jsonb,
  (optional) create_if_missing boolean
) → jsonb

Reads as follows: take a jsonb target, walk it down to path and replace that part with new_value, optionally creating the entry if missing. The updated jsonb is returned.

A relatively simple example:

-- path is the 'a' json key, replace its value with 3
>  SELECT jsonb_set('{"a":1,"b":2}', '{a}', '"3"'::jsonb)
+--------------------+
| jsonb_set          |
|--------------------|
| {"a": "3", "b": 2} |
+--------------------+

-- path is the first element of the array 'tags', replace its value with 'carrot'
> SELECT jsonb_set('{"id":1,"tags":["apple","pear","banana"]}', '{tags,0}', '"carrot"'::jsonb)
+-------------------------------------------------+
| jsonb_set                                       |
|-------------------------------------------------|
| {"id": 1, "tags": ["carrot", "pear", "banana"]} |
+-------------------------------------------------+

new_value must be a valid jsonb value so sometimes it's tricky. Example I needed to update date_update in this jsonb object:

> select jsonb_pretty(data) from tbl;
+------------------------------------------------------+
| jsonb_pretty                                         |
|------------------------------------------------------|
| {                                                    |
|     "id": "8caf7c27-79ce-4a53-9b42-20f5958b3795",    |
|     "date_update": "2021-01-01T12:00:00.0000000000Z" |
| }                                                    |
+------------------------------------------------------+

I am using the NOW() Postgres function to update the timestamp (returns a timestamp with time zone). After some help from Stack Overflow and the Postgres documentation:

> SELECT jsonb_pretty(
  jsonb_set('{"id": "8caf7c27-79ce-4a53-9b42-20f5958b3795", "date_update": "2021-01-01T12:00:00.0000000000Z"}',
 '{date_update}',
 to_char(now(),'\"YYYY-MM-DD HH24:MI:SS.US\"')::jsonb
 )
)
 
+--------------------------------------------------+
| jsonb_pretty                                     |
|--------------------------------------------------|
| {                                                |
|     "id": "8caf7c27-79ce-4a53-9b42-20f5958b3795",|
|     "date_update": "2022-01-10 22:46:16.071481"  |
| }                                                |
+--------------------------------------------------+

which translates to the following UPDATE query:

> select data from tbl where id=42;
+--------------------------------------------------------|
| data                                                   |
|--------------------------------------------------------|
| {"id": 42, "date_update": "1970-01-01 12:00:00.00000"} |
+--------------------------------------------------------|

> UPDATE tbl SET data=jsonb_set(data, '{date_update}', to_char(now(),'\"YYYY-MM-DD HH24:MI:SS.US\"')::jsonb) where id=42
+---------------------------------------------------------|
| data                                                    |
|---------------------------------------------------------|
| {"id": 42, "date_update": "2022-01-10 23:16:39.813946"} |
+---------------------------------------------------------|

§ jsonb_agg: aggregate all input values into a single jsonb object

Another interesting function discovered recently. Let's say I have a table like this:

CREATE TABLE library (
  shelf VARCHAR(2) NOT NULL,
  book JSONB
);

INSERT INTO library (shelf, book) VALUES ('AA', '{"title": "A deepness in the sky", "author": "Vernor Vinge"}');
INSERT INTO library (shelf, book) VALUES ('AA', '{"title": "Tranny", "author": "Laura Jane Grace"}');
INSERT INTO library (shelf, book) VALUES ('AB', '{"title": "It Doesn''t Have to Be Crazy at Work", "author": "Jason Fried and David Heinemeier Hansson"}');
INSERT INTO library (shelf, book) VALUES ('AB', '{"title": "My family and other animals", "author": "Durrell Gerald"}');
INSERT INTO library (shelf) VALUES ('AC');

> SELECT * FROM library
+-------+--------------------------------------------------------------------------------------------------------+
| shelf | book                                                                                                  |
|-------+--------------------------------------------------------------------------------------------------------|
| AA    | {"title": "A deepness in the sky", "author": "Vernor Vinge"}                                           |
| AA    | {"title": "Tranny", "author": "Laura Jane Grace"}                                                      |
| AB    | {"title": "It Doesn't Have to Be Crazy at Work", "author": "Jason Fried and David Heinemeier Hansson"} |
| AB    | {"title": "My family and other animals", "author": "Durrell Gerald"}                                   |
| AC    | <null>                                                                                                 |
+-------+--------------------------------------------------------------------------------------------------------+

That's probably bad database modelling but it's only for demo purposes :) Let's group all books by shelf:

> SELECT shelf,jsonb_agg(book) FROM library GROUP BY shelf;
+-------+----------------------------------------------------------------------------------------------------------------------------------->
| shelf | jsonb_agg                                                                                                                         >
|-------+----------------------------------------------------------------------------------------------------------------------------------->
| AA    | [{"title": "A deepness in the sky", "author": "Vernor Vinge"}, {"title": "Tranny", "author": "Laura Jane Grace"}]                 >
| AB    | [{"title": "It Doesn't Have to Be Crazy at Work", "author": "Jason Fried and David Heinemeier Hansson"}, {"title": "My family and >
| AC    | [null]                                                                                                                            >
+-------+----------------------------------------------------------------------------------------------------------------------------------->

and there you have all books grouped by shelf. That [null] is not nice though, we just want an empty list because in that shelf there are no books. Let's fix it:

> SELECT shelf, COALESCE(jsonb_agg(book) FILTER (WHERE library.book IS NOT NULL), '[]'::jsonb) FROM library GROUP BY shelf;
+-------+----------------------------------------------------------------------------------------------------------------------------------------------->
| shelf | coalesce                                                                                                                                      >
|-------+----------------------------------------------------------------------------------------------------------------------------------------------->
| AA    | [{"title": "A deepness in the sky", "author": "Vernor Vinge"}, {"title": "Tranny", "author": "Laura Jane Grace"}]                             >
| AB    | [{"title": "It Doesn't Have to Be Crazy at Work", "author": "Jason Fried and David Heinemeier Hansson"}, {"title": "My family and other animal>
| AC    | []                                                                                                                                            >
+-------+----------------------------------------------------------------------------------------------------------------------------------------------->

Documentation is found at the aggregate functions page

§ Conclusions

This was a brief but intense first encounter with the JSON Postgres data types and associated functions: they're impressive, not really easy to master but I think I now have set in stone the general syntax and logic that can be applied to the rest.

There are many more but I'll leave it to the curious reader to see what's more.

Oh, one last thing. The standard CLI Postgres client (psql on Linux) is not so great: I have problems when I resize the terminal window and the autocompletion leave something to desire. To replace that, I'm experimenting with another CLI written in Python: pgsql, check it out!

See you at the next article: working with Postgres custom data types.