David’s Model, Second take.

By: David Nuescheler

Historical Context

A long time ago, in a galaxy far, far away, I got to a point where I realized that people had a lot of different options on how they could model content in a content repository specification that I was involved in.

So in 2007 I started writing up my controversial opinions around content modeling in JCR and to make sure that people considered it “just my opinion” and nothing that would claim universal applicability I called it “David’s Model”.

It seems that this has helped people in the context of modeling content on an infrastructure level.

Today, I find myself very often in situations that feel the same when arguing “How to model or structure content in Adobe Experience Manager”.

The parameters and stakeholders feel very different today, as I am not primarily worried about the underlying infrastructure anymore but about the user experience of a person authoring content and reaching a great authoring experience across different content sources and to a lesser extent the ease of developing against those structures and being able have portable (block) code across projects and content sources.

Introduction

This document should serve as a collection of “Content Modeling” or “Content Structure” best practices as they relate to Adobe Experience Manager and more importantly to an intuitive authoring experience across different authoring platforms. A good way of testing a content model, is to imagine an author working in different environments (eg. Word, Google Docs, AEM, Custom Authoring etc.) and making sure that the content model can easily be constructed in an intuitive manner across all possible content sources.

These “rules” are a reflection of lessons learned from first-hand authoring and authoring support and are rooted in the experiences working on real-world limitations of commonly used authoring environments like Microsoft Word Online or Google Docs, but also in the average knowledge of said tooling by the average author.

Making the authoring experience intuitive, simple and fast is paramount for the long-term success of any project as lays the foundation of authors enjoying making updates to websites or other digital experiences.

These rules are evolving, and I would like to invite discussion and commentary on all of them.

Rule #1: Blocks are not great for authoring

Generally blocks are not great as they are surfaced as tables on the authoring side. They provide a necessary framework for an author to indicate some special functionality or design for a certain component. For authors it is often easier to work in “Default Content” wherever possible.
For developers blocks are a great way to componentize their work, so there is tension where the developer feels that having something in a block makes their lives easier, sacrificing authoring ease-of-use.

A good way to get the best of both worlds is to add Auto Blocking, which means that you infer the existence of a Block based on sequence, template or link information. A good example of this is the hero block in boilerplate which creates a hero block, based on the fact there is an image before the h1 of a page.

It is definitely an anti-pattern to have things that are represented natively as default content and put them into a block, so something like a Text, Heading or Image block yields a bad authoring perspective.

Rule #2: No nested blocks for authors

To a developer it might very often be tempting to nest complex structures, which in word document would lead to a table inside a table. As Rule #1 states that blocks are not desirable, nested blocks are definitely a lot worse.

Consider fragments (referencing other documents) or links (with auto blocking) to reduce the authoring complexity.

Rule #3: Limit Row and Column spans

Generally we use a Column Span (merged cells) to denote the header with a block name in it. This is relatively straightforward and works well in word and google docs.
There are definitely situations where more complex table structure make sense (eg. a portion of the block content being in two columns and another portion of the block content being in three columns) but it is important to understand that creating and managing these structures can be extremely difficult, especially in word online that has very enigmatic support for complex tables.

If you find yourself in a place where you have a non-trivial rows and columns setup with spans / merged cells, it is probably a good idea to consider a different structure.

Rule #4: Fully qualified URLs only

When referencing content sometimes developers think in references that are relative to host, the content repository or to their sharepoint / google drive. Authors (and most humans) often think of a URL as an opaque token that they copy/paste from their browser without deciphering them into protocol, hostname, pathname, etc.
It is always advisable to just let authors work fully qualified URLs and let either AEM or a developer do the work of extracting eg. a pathname where needed. As a bonus the URL can and should link to something that is easily accessible for an author from their document.

Rule #5: Lists?

I often find myself in a situation where a block has a list of references, something like a list of related articles, or a list of cards. In HTML a lot of those semantically should be considered lists (mostly <ul><il> combinations). For simple lists, something like some text (possibly with a link) or just regular links, a list in word or google docs may be ideal.

It turns out that list items that are more complex, are somewhere between “hard” to “practically impossible” for an author to keep that in a list in word or google drive.
In that case it is much easier to have the list items be rows of a block table. A good example of that is cards block in the boilerplate project.

For simple lists where it is intuitive to have inferred semantics, eg. a related articles block in a blog post that just contains links to the articles that should be references, it may be easiest to just have a single table cell inside the block containing all the links and dropping the actual list in the word processor. From a code standpoint it is usually easy to just pull all the links from a block and not specifically worry about the details of a structure in that particular block.

Rule #6: Buttons need to inherit from context

In many design guides we find buttons as a common element across many blocks and default content. In many cases they are outlined in all their variations (eg. primary vs. secondary, sizes, colors, etc.) at the beginning of a design specification together with the specified colors and fonts.

In projects we found that it is intuitive for authors to treat links that are on a line by themselves as for a button. In many cases it is important to inherit from the block and section context that a particular button is in to make the authoring experience easy.

As an example, if a button (read, “link by itself on a line”) is a part of a hero block, it might assume a certain bigger size, or if a button is in a section that has an inverted background color it might need to automatically switch to a different foreground / background color combination.

There are cases where within a given section / block context the author needs to be offered a set of explicit choices (eg. primary vs. secondary button) and in those cases we use combinations of bold and italic, usually bold for an explicit primary button and italic for an explicit secondary button.

It is conceivable that within a given block / section context there are more than four options for an author to choose from, in which case other formatting options could be used like underline, strikethrough etc. however, this is extremely rare and usually an indication that a decision that should be made within the design system is delegated to the author, leading to a less intuitive authoring experience.

Rule #7: Filenames matter to authors

There are a few content management systems that append trailing slashes to all their URLs and when migrating from websites that are powered by those systems an intuitive approach could be to map every single URL to an index (.docx or gdoc) inside a folder. The downside of this approach is that the filenames are not really useful anymore for authors when they are searching for files in gdrive and sharepoint.

A better approach is to remove the trailing slashes from the URLs and redirect with a 301 (usually from the redirects spreadsheet) from the existing URLs with a trailing slash to the URLs without a trailing slash.

(related: the same approach should also be used for other undesirable URLs for example URLs that end in .html)

There are situations where this change results in too much a temporary SEO impact, in which case rewriting the URLs on the CDN may be the appropriate option, however this should only be done if there is a quantifiable business impact. long term it is more desirable to have a clean URL that maps directly to the corresponding file in sharepoint or gdrive.

Rule #8: Access Controls and Content Grouping

It often makes sense to group content similar to how authoring teams are organized. A good way of thinking about this is that if you have a team that looks after the blog section on your website, technical documentation, support content, a particular country / language or a product it makes sense to keep that content together, and make sure that corresponding team has access to the content.

In organizations this often happens naturally, and it is intuitive for organizations to break up their content teams similar to the structure of their URL space. In some cases the URL structure doesn’t align with authoring teams (and the corresponding access control groups). In such a case combining multiple AEM projects on the CDN tier tying together content from different sources may be the right approach if the complexity and size of a site (single domain) gets overwhelming.
While your content source (Sharepoint and Google Drive) possibly supports complex access control models, it is desirable to keep Access Control as simple as possible from a management standpoint.

Both Sharepoint and Google Drive have a concept for grouping content that helps to manage access control in a simple manner, in Sharepoint they are called “Sites” or “Libraries” and in Google Drive they are called “Shared Drives”. Both of those have predefined access control roles that are advantageous to use for simple group membership based access. Unless there are specific access control requirements it is recommended to keep access control to these OOTB groups.

Sites/Libraries and Share Drives are built to work well with a certain team and content size and complexity.

Particularly Sharepoint has best practices on access control complexity within a library, and creating access control complexity beyond that may yield undesirable results. In case you get to a place where it becomes unnatural to manage either the size of the content or the size and complexity in a single Site/Library or Shared Drive, it is likely that it makes sense to break things up into multiple different Site/Libraries or Shared Drives and blend the content together from different projects on the CDN.

Rule #9: Number of Blocks and Variants

Over the lifecycle of a website it is common that new blocks and variants are added. Especially for developers that are not very familiar with the existing block library of the project it is usually the easiest path to just add net new blocks or variants to make sure that there is no regression with existing content.

While it is probably not easy to avoid the sprawl of blocks and variants on projects that have a lot of functionality or justified requirements for a lot of visually diverse content within a single project, it is important to make sure that the core set of blocks and variant combinations that authors need to use commonly is limited. There are situations where special blocks are used infrequently on sites and often those are placed by developers initially, those blocks probably don’t need to be exposed to authors at the same level in documentation or a block library as the commonly used blocks.

More generally, large block libraries and a lot of variant / section metadata combinations are less desirable. Maintaining a “minimum use” criteria for blocks and variants based on a content report is a good practice to deprecate and remove superfluous code from the block library that’s exposed to authors.

Rule #10: Limit number of Columns

A large number of columns is not a good ideal for authoring as there are practical horizontal screen / document size limitations. More importantly this is usually a symptom that content is split into small values that do not reflect a proper use of default content and the implied HTML semantics.
There are some exceptions to this rule in cases where data is represented in a table, as opposed to content that should be a part of document semantics, and in those cases it is often useful to go the name-value pair route via a spreadsheet instead.

Rule #11: Use the block collection content models

The AEM Block Collection is a great source for well designed content models. If your block is producing a similar feature set as the one of the blocks in the Block Collection, the content model should be similar.

Rule #12: Fragments may be harmful

Fragments are very useful when the same content is used across a lot of different pages. Obvious good examples are header (navigation) and footer information that is identical throughout a site. These are great examples especially since authors of individual pages do not have to worry about that content showing up on their page, and there is no authoring impact.
Using Fragments may also be useful in situations where there is an explicit selection of a content that is used across many pages of a site, such as a sign up form, a legal disclaimer, etc. and the content appears on a page but is not really a part of the canonical content of the page.

It is important to note that using a fragment comes at a cost of complexity and indirection for authors. Instead of seeing the actual content that is on a page, an author only sees a reference to a fragment, which makes it much less intuitive for authors to make changes and gauge the impact of their changes across pages. This is even amplified in cases of nested fragments.

Along those lines, from an SEO standpoint it is only advisable to use fragments at times where having duplicate content is acceptable (meaning that the content inside a fragment doesn't carry significant SEO weight for that page), hence content that is relevant from an SEO standpoint should always be placed on the page directly.

Rule #13: Don't overload image alt-text semantics

It may be convenient at times to put extra information hidden away into image alt-texts, but this is only recommended in exceptional cases.
Alt-texts often cannot be easily discovered by authors, there is very little indication about their existence in common document authoring environments (eg. word or google docs).
Depending on the type of copy/paste operation the alt-text may be lost without the author noticing, and if the alt-text contains special semantics, authors will have to be familiar with specific semantics within the value of an alt-text of individual images on a per block basis.