Can we read all block attributes as JSON data vs. as a mixture of JSON and HTML-sourced data? In short, the answer is yes, and it’s surprisingly easy.
Oftentimes I hear the refrain “I don’t want to use Gutenberg (the block editor) because its posts are HTML.” As I’ve already written about, Gutenberg posts aren’t HTML; but in this post I want to explore a little aspect of the design that many people may not realize is available. I was inspired to explore this after discussing the use of Gutenberg as a CMS editor for data exchanged between multiple parties, each wanting to perform a different kind of operation on the content.
Before we begin I want to take a moment and mention that I’ve only grown in my confidence over the years that serializing Gutenberg posts to HTML was the right call, and this comes from having worked with a few JSON-based systems since last writing on this blog. The plain text/HTML format has remained durable and interchangeable and flexible in a way that so many JSON-backed formats haven’t been. The fact that the HTML serialization is open for expansion means that we don’t have to add support to old servers in order to support new block types, and the fact that everything can read and process and render HTML means that those posts keep on serving their purpose even after the supporting JavaScript or server-side support code is forgotten and unmaintained or lost.
How block attributes are serialized
Gutenberg serializes its posts to HTML and surrounds each block with an HTML comment delimiter optionally containing attributes for the block. Many blocks also source their attributes from the HTML inside the block itself. This makes it possible to eliminate a source of data duplication that someone might not notice when manually editing a post outside of the editor (an environment that’s technically not supported).
<!-- wp:image -->
<img src="https://my-photo.localhost/image.png">
<!-- /wp:image -->
In this example the url
attribute for the image is pulled from the src
HTML attribute of the first img
tag inside the block. This comes from attributes
property when the block is registered (typically directly through registerBlock()
or indirectly through block.json
).
{
attributes: {
url: {
type: 'string',
source: 'attribute',
attribute: 'src',
selector: 'img'
}
}
}
All in all it’s not that complicated. When the block saves and re-serializes, the editor looks at which of the attributes are “sourced” and then omits storing them in the JSON comment delimiters. This leaves one copy of data in the serialized block but means it can be hard to read the url
attribute when you don’t have a definition for the block’s attributes, when you don’t have a convenient way to parse HTML, and when you have neither of those things.
So people might come and say, “I want to index all of the posts on the site and get a list of all the images that are linked.” They think it’s not possible or viable because they lack these attributes in a readily-accessible way. The attributes are read from the “static fallback render.” This static fallback render is the HTML inside the block comment delimiters and primarily it’s there as a form of making sure that blocks continue to work at least a little bit when taken out of their supported environments. We may lack the code necessary to render a block, but at least we can pass along the inner HTML as-is and get something out of it.
Write-only static fallback renders
So here’s where an old idea resurfaces. I’ve long wished more blocks only ever wrote to the static fallback render and never read from it. Having sourced attributes implies that we’re reading from that safety net and what ends up happening is that as block implementation code changes, or as WordPress or other plugins modify the stored HTML, blocks “invalidate” and the editor can’t deal with them. If we could just ignore everything inside the HTML we wouldn’t have this problem, and our block’s attributes could be fully-specified from the JSON with no external definitions.
So let’s experiment and try doing this.
Remember that block definition with the source
property in the attributes
? We are fully-free to remove that.
{
attributes: {
url: {
type: 'string',
- source: 'attribute',
- attribute: 'src',
- selector: 'img'
}
}
}
Now our image url
is just a string. If we rebuild the project, create a new post, and save an image this is now what it looks like.
<!-- wp:image {"url":"https://my-photo.localhost/image.png"} -->
<img src="https://my-photo.localhost/image.png">
<!-- /wp:image -->
Well that was quite easy, actually! We have the duplication of the data again, but when processing this block in its serialized form we can entirely ignore the HTML within.
This works as long as we can rebuild all of the blocks we want to change. It won’t work if we don’t recompile blocks. However, WordPress is extensible, so I think we can do better.
const select = wp.data.select('core/blocks');
const dispatch = wp.data.dispatch('core/blocks');
const blocks = select.getBlockTypes();
const withoutSourcing = attribute => {
const {
attribute,
selector,
source,
...definition
} = attribute;
return definition;
}
const removeSourcing = attributes => {
const without = {};
for ( const [ name, def ] of Object.entries( attributes ) ) {
without[ name ] = withouSourcing( def )
}
return without;
}
dispatch.addBlockTypes(
blocks.map( block => ({
...block,
attributes: removeSourcing( attributes )
})
)
Run this code after the editor initializes and every registered block will lose its sourced attributes, and every attribute will now appear in the JSON attributes. Beware, as block-validation issues will appear. If you run your own custom block editor though you can run this code on startup and prevent those issues from appearing (your blocks should never serialize without the attributes we’re looking for).
If you find that too verbose you can paste this one-liner into your browser console and give it a try. Paste it in when you have a new post open but before you add any blocks. Once you start adding content to the post you can jump to the code view and see that everything exists structurally, and the static fallback render is now a write-only artifact.
wp.data.dispatch('core/blocks').addBlockTypes(wp.data.select('core/blocks').getBlockTypes().map(t => ({...t, attributes: Object.keys(t.attributes).reduce((as, a) => {const {source, selector, attribute, ...rest} = t.attributes[a]; as[a] = rest; return as}, {})})))
When registering a block, either through `registerBlockType` or `block.json`, it defines its attributes by indicating their names and types. By default, those attributes are serialized into the “block comment delimiter” in a JSON serialization. It’s also possible to indicate that they are sourced attributes by specifying a path for extracting them from the block’s HTML. If they are sourced then they won’t be saved in the block comment delimiter, as doing so would be redundant and invite data synchronization issues.
So if a block doesn’t define any attributes or defines only sourced attributes then there will be no JSON serialization inside the block comment delimiter, and this is what makes them optional.
In this post we’re discussing a mechanism to open the door so that even sourced attributes are serialized inside the block comment delimiter.