Access all block attributes structurally with the Gutenberg block editor

Can we read all block attributes as JSON data vs. as a mixture of JSON and HTML-sourced data? In short, the answer is yes, and it’s surprisingly easy.

Oftentimes I hear the refrain “I don’t want to use Gutenberg (the block editor) because its posts are HTML.” As I’ve already written about, Gutenberg posts aren’t HTML; but in this post I want to explore a little aspect of the design that many people may not realize is available. I was inspired to explore this after discussing the use of Gutenberg as a CMS editor for data exchanged between multiple parties, each wanting to perform a different kind of operation on the content.

Before we begin I want to take a moment and mention that I’ve only grown in my confidence over the years that serializing Gutenberg posts to HTML was the right call, and this comes from having worked with a few JSON-based systems since last writing on this blog. The plain text/HTML format has remained durable and interchangeable and flexible in a way that so many JSON-backed formats haven’t been. The fact that the HTML serialization is open for expansion means that we don’t have to add support to old servers in order to support new block types, and the fact that everything can read and process and render HTML means that those posts keep on serving their purpose even after the supporting JavaScript or server-side support code is forgotten and unmaintained or lost.

How block attributes are serialized

Gutenberg serializes its posts to HTML and surrounds each block with an HTML comment delimiter optionally containing attributes for the block. Many blocks also source their attributes from the HTML inside the block itself. This makes it possible to eliminate a source of data duplication that someone might not notice when manually editing a post outside of the editor (an environment that’s technically not supported).

<!-- wp:image -->
<img src="https://my-photo.localhost/image.png">
<!-- /wp:image -->

In this example the url attribute for the image is pulled from the src HTML attribute of the first img tag inside the block. This comes from attributes property when the block is registered (typically directly through registerBlock() or indirectly through block.json).

{
    attributes: {
        url: {
            type: 'string',
            source: 'attribute',
            attribute: 'src',
            selector: 'img'
        }
    }
}

All in all it’s not that complicated. When the block saves and re-serializes, the editor looks at which of the attributes are “sourced” and then omits storing them in the JSON comment delimiters. This leaves one copy of data in the serialized block but means it can be hard to read the url attribute when you don’t have a definition for the block’s attributes, when you don’t have a convenient way to parse HTML, and when you have neither of those things.

So people might come and say, “I want to index all of the posts on the site and get a list of all the images that are linked.” They think it’s not possible or viable because they lack these attributes in a readily-accessible way. The attributes are read from the “static fallback render.” This static fallback render is the HTML inside the block comment delimiters and primarily it’s there as a form of making sure that blocks continue to work at least a little bit when taken out of their supported environments. We may lack the code necessary to render a block, but at least we can pass along the inner HTML as-is and get something out of it.

Write-only static fallback renders

So here’s where an old idea resurfaces. I’ve long wished more blocks only ever wrote to the static fallback render and never read from it. Having sourced attributes implies that we’re reading from that safety net and what ends up happening is that as block implementation code changes, or as WordPress or other plugins modify the stored HTML, blocks “invalidate” and the editor can’t deal with them. If we could just ignore everything inside the HTML we wouldn’t have this problem, and our block’s attributes could be fully-specified from the JSON with no external definitions.

So let’s experiment and try doing this.

Remember that block definition with the source property in the attributes? We are fully-free to remove that.

{
    attributes: {
        url: {
            type: 'string',
-            source: 'attribute',
-            attribute: 'src',
-            selector: 'img'
        }
    }
}

Now our image url is just a string. If we rebuild the project, create a new post, and save an image this is now what it looks like.

<!-- wp:image {"url":"https://my-photo.localhost/image.png"} -->
<img src="https://my-photo.localhost/image.png">
<!-- /wp:image -->

Well that was quite easy, actually! We have the duplication of the data again, but when processing this block in its serialized form we can entirely ignore the HTML within.

This works as long as we can rebuild all of the blocks we want to change. It won’t work if we don’t recompile blocks. However, WordPress is extensible, so I think we can do better.

const select = wp.data.select('core/blocks');
const dispatch = wp.data.dispatch('core/blocks');

const blocks = select.getBlockTypes();

const withoutSourcing = attribute => {
    const {
        attribute,
        selector,
        source,
        ...definition
    } = attribute;

    return definition;
}

const removeSourcing = attributes => {
    const without = {};

    for ( const [ name, def ] of Object.entries( attributes ) ) {
        without[ name ] = withouSourcing( def )
    }

    return without;
}

dispatch.addBlockTypes(
    blocks.map( block => ({
        ...block,
        attributes: removeSourcing( attributes )
    })
)

Run this code after the editor initializes and every registered block will lose its sourced attributes, and every attribute will now appear in the JSON attributes. Beware, as block-validation issues will appear. If you run your own custom block editor though you can run this code on startup and prevent those issues from appearing (your blocks should never serialize without the attributes we’re looking for).

If you find that too verbose you can paste this one-liner into your browser console and give it a try. Paste it in when you have a new post open but before you add any blocks. Once you start adding content to the post you can jump to the code view and see that everything exists structurally, and the static fallback render is now a write-only artifact.

wp.data.dispatch('core/blocks').addBlockTypes(wp.data.select('core/blocks').getBlockTypes().map(t => ({...t, attributes: Object.keys(t.attributes).reduce((as, a) => {const {source, selector, attribute, ...rest} = t.attributes[a]; as[a] = rest; return as}, {})})))
<!– wp:paragraph {"content":"This paragraph has \u003cem\u003e\u003cstrong\u003eno sourced attributes\u003c/strong\u003e\u003c/em\u003e. It's content is fully stored inside the JSON attributes \u003cem\u003eand\u003c/em\u003e it still contains the rendered HTML useful for rendering it with no server is available."} –>
<p>This paragraph has <em><strong>no sourced attributes</strong></em>. It's content is fully stored inside the JSON attributes <em>and</em> it still contains the rendered HTML useful for rendering it with no server is available.</p>
<!– /wp:paragraph –>
<!– wp:image {"url":"https://wordpress.org/files/2022/08/theme-styles.png&quot;,"caption":"Image captions are usually sourced from the \u003ccode\u003efigcaption\u003c/code\u003e element.","width":650,"height":406,"sizeSlug":"large","linkDestination":"none"} –>
<figure class="wp-block-image size-large is-resized"><img src="https://wordpress.org/files/2022/08/theme-styles.png" alt="" width="650" height="406"/><figcaption class="wp-element-caption">Image captions are usually sourced from the <code>figcaption</code> element.</figcaption></figure>
<!– /wp:image –>
<!– wp:heading {"content":"Even headings usually source their level as the \u003ccode\u003eh1/2/3/4/5/6\u003c/code\u003e tag.","level":3} –>
<h3 class="wp-block-heading">Even headings usually source their level as the <code>h1/2/3/4/5/6</code> tag.</h3>
<!– /wp:heading –>
<!– wp:paragraph {"content":"But the default value is \u003ccode\u003e2\u003c/code\u003e, so the default value won't be present."} –>
<p>But the default value is <code>2</code>, so the default value won't be present.</p>
<!– /wp:paragraph –>
Please note that it appears that some HTML encoding issues are coming across in this embed preview, so if you want to see the actual contents of the post you may follow the link to the gist on GitHub’s website.
Categories Gutenberg, UncategorizedTags

Leave a Reply

%d bloggers like this:
search previous next tag category expand menu location phone mail time cart zoom edit close