Performance improvements in Drupal 11.3 and Paragraphs

Beside a host of other improvements, such as full OOP support for hooks and preprocess in themes, Drupal 11.3 includes a large amount of performance improvements. Some of which enabled and inspired other improvements in Entity Reference Revisions which results in considerable performance improvements for Paragraphs.

All improvements combined result in saving hundreds of queries on complex pages built with Paragraphs, see the detailed results at the end of this blog post.

Bootstrap and cached page responses improvements

Middlewares are a mechanism that allows to execute code very early in the Drupal 8+ bootstrap phase. Examples of this are page cache and the IP ban module. They allow very fast responses with minimal memory footprint. This concept also made it possible to move these features into modules in Drupal 8, which makes them optional and allows for alternative implementations.

Middlewares can be compared to layers of an onion, each layer capable of returning a response early (on a page cache hit, or in case of a blocked IP) before the center (the full Drupal bootstrap) is reached. Or at least, so we thought.

Only a few months ago, @znerol noticed that this hadn't worked as intended since before Drupal 8.0 was even released, because each layer was initialized from the beginning, including all their dependencies and specifically the event dispatcher, that is injected into the center of the onion.

With Drupal 11.1, this suddenly had a much bigger impact. While publicly a different API and system, internally, the new OOP hook system was built on top of the event dispatcher in Drupal 11.1/11.2. Suddenly, the typical Drupal site didn't have a few dozens of events subscribers but several hundreds of them. And every single one was injected as a reference into the event dispatcher as a separate method call, on every request. That also considerably increased the container size (which has to be fetched and unserialized from cache on every request), leading to a much higher memory usage.

One important fix of this release has been to correct the loading of all layers at once. Additionally the internal data and information on hooks was changed to be stored in key value and cached in the bootstrap bin. This reduced the size of the container considerably, by around 30-50% on tested example sites. The relative improvements to page cache responses times and memory usage due to these two changes are massive. And while the absolute savings are not that significant, many Drupal sites have a large number of page cache hit responses. This allows to serve fresh content to a larger number of users with less complex infrastructure (such as a CDN with support for tag based invalidation).

Metric	Drupal 10.4	Drupal 11.2	Drupal 11.3
Response time	6.36ms	12.3ms	5.29ms
Peak Memory	3.18MB	9.67MB	2.35MB
Network	106kb	180kb	122kB

Entity revision caching and bulk loading

Until Drupal 11.3, loading a specific revision of an entity had no caching at all, neither an in-memory cache for multiple load calls in the same request nor a persistent cache for repeated loads over multiple requests. Regular entity load usually has both.

This might not seem like a big issue at first, since loading non-default revisions does not seem like something that's done on regular requests. However, changes to how access is checked for various links, such as Edit and Delete local tasks resulted in loading entities by revision, even for anonymous users. . Additionally, Paragraphs and Layout builder inline blocks rely on loading entities that are referenced through a specific revision. Paragraphs/Entity Reference Revisions had some workarounds in place, but did not load entities in bulk.

Performance tests using the demo profile Umami, which were recently added to Drupal core, exposed about 80 additional queries, mostly because the same node revision was loaded around 4 times. The total amount of queries varies per scenario, in one especially impactful one, there are 170 queries in total, so almost 50% of the queries could be avoided.

There had been an open issue to add caching to entity revisions since 2015, but it was complex and tried to solve many different scenarios at once, which resulted in considerable complexity. While working on this, the new performance tests not only confirmed that those extra queries were eliminated, but also a considerable amount of additional cache tag lookups. We were able to simplify the proposed changes and focus on the most important aspects,which still resulted in the same performance gains in the performance tests with minimal extra cost.

Based on this, I updated and already merged an existing issue in Entity Reference Revisions to fully benefit from bulk loading and revision caching when using Drupal 11.3 with partial improvements on earlier versions as well. In our own performance test scenario with 25 paragraphs, including multiple nested groups, this saved an additional 68 queries on a cold cache bootstrap.

Those performance tests however exposed that some paragraphs were being loaded again by their ID, when they were already loaded by their revision. Paragraphs load their parents for access check through their IDs, and that was no longer hitting the cache: the mentioned load optimization in Entity Reference Revisions led to storing paragraphs in the cache as just by the revision ID and not the ID. We created an issue to optimize that case specifically for Paragraphs. (https://www.drupal.org/project/paragraphs/issues/3559239).

These changes, in combination with other performance improvements that happened in core, inspired another idea that saves another 100-200 queries on our example pages: an extended preloading mechanism in entity formatters, implemented in the Entity Reference Revisions project. This will first load all nested referenced paragraphs in as few grouped steps as possible and then find and identify other entities being referenced by those and load them together as well. On our example page, those 25 paragraphs reference a total of 14 media entities. Card images, icons and so on. Before this change, they were all loaded one by one, each resulting in around 10 extra queries due to all their fields, 140 queries in total. This now loads them all at once in 10 queries total. This feature is still in development and will only be supported on Drupal 11.3 and later.

And that's not the end. We're exploring additional ways to load entities more efficiently, for example by rendering at least some entities through lazy builders, to benefit further from additions that I'll dive into in the next chapter.

Fibers, render and other caching improvements

Fibers are a new PHP feature that allows to suspend an execution, typically when waiting on external activity such as a HTTP request or a database query. It does not allow to run code in parallel like multithreading, but it allows to execute other tasks while waiting on something using explicit suspend and resume management.

Drupal 10.2 started using fibers for rendering certain parts, such as blocks. Not much happened in practice, because no or only very few places actually suspended a fiber to give others a chance to do something while a slow operation such as a HTTP request happened.

Drupal 11.3 introduces a different idea on how those fibers can help, by using them to group multiple load operations/queries together. That is now done for path aliases and entities. In a nutshell, when loading an entity or alias and the code is currently in a fiber, such as within a block, it will suspend the fiber, hoping that other fibers will also load an entity of the same type. If the first fiber of a given group is resumed, it will then load all collected entities. The other fibers will then on resume find their requested entities already in the in-memory cache. The current performance tests only see limited gains, as regular blocks are currently the only thing that actively uses fibers/lazy builders like this. There are plans to expand this to rendered entities, specifically in views, menu links, alternative page builders such as Layout Builder and Canvas and more.

Lazy-building rendered entities are a good example that could further expand on the optimizations done by bulk loading of entities. Currently, while those 14 media entities are loaded in bulk, each of them has a file entity within it that's still loaded one-by-one.

Some blocks have a large amount of variations and are at the same time very quick to build. One such example is the language switcher, which is different on every page. Until now, it was set as max-age 0, but this can bubble up and was a blocker for respecting max-age on the internal page cache. The combination of CacheOptionalInterface is now also supported for blocks and User #placeholder_strategy_denylist for CacheOptionalInterface blocks to prevent them being rendered by BigPipe allows such blocks to not be cached and also avoid big pipe rendering, which can requires around 60kb of JavaScript. Performance tests make it much easier to identify repeating or unnecessary queries or caching. It's often fairly easy to just add some caches, but sometimes they were either not really needed in the first place or become obsolete due to other changes. Several examples for that landed in Drupal 11.3:

Remove persistent views:unpack_options caching removed around 40 cache writes during a route rebuild in Umami (depends on the number of views, in our project it's around 80), this was a cache that was added to views all the way back 2009, but changes to Drupal since then meant the problematic code paths were called far less frequently and profiling showed that improvements to PHP since then meant that just running the code was faster than loading that information from the cache.
performance tests exposed that CKeditor on multilingual sites fetched the same information from cache many times in a single request, so an in-memory cache was added on top of it and since it was only a small cache and a single variation, it was moved to the fast-chained discovery bin. (Refactor _ckeditor5_get_langcode_mapping(), )
Config overrides are loaded for English even when translate_english is false removes a large amount of queries in some use cases.

Performance tests were also instrumental in identifying various performance issues in contributed projects and verifying their resolution, for example:

Scheduler 2.2.2 includes considerable improvements to its caching implementation
A small, single-variation cache in the token module, that already was a significant performance improvement was moved to the fast-chained bootstrap bin
Redirect runs a unique query on every unique URL. A new prefix cache similar to path aliases was added that can avoid this query on many requests that do not have any redirects that may apply. This change was verified with a specialized performance test in the redirect project as well as our distribution performance tests.
Workaround for bootstrap causes expensive rebuild removes an extra and now unecessary rebuild of theme data in Webform on Drupal 11.3+

A very recent improvement for cold cache scenario is improving the fast chained backend to avoid unnecessary writes to the fast backend and so called cache stampedes, which can happen when many requests start the bootstrap at the same time and write the same caches in parallel.

Another cold cache improvement are Optimize field module's hook_entity_bundle_info() implementation and Optimize EntityFieldManager::buildBundleFieldDefinitions(), together, they eliminate a large amount of config lookup queries, especially on sites with many bundles such as paragraph types or webforms.

Combined impact on Primer and measuring this using performance tests and profiling

We recently wrote our first Performance test for our internal Primer distribution, to evaluate the impact of Drupal 11.3 improvements and identify additional issues in the contrib projects that we need. The test assesses the behavior and executed queries on a page cache hit, dynamic page cache hit and miss and on completely cold caches.

While the measured changes to the warm cache scenarios are fairly minimal between 11.2 and 11.3 (the performance test does not assert/compare the memory usage and execution time directly), the differences on cold caches are massive and almost certainly bigger than any Drupal core release before.

Cold Cache

Graph that hights the query and cache optimizations in Drupal 11.3

This simulates the first request on completely cold caches, by manually emptying all cache tables before the request. Notable impacts are the massive reduction on data cache bin lookups due the removed views cache, various smaller changes that also reduce the default cache bin lookups by half and the also halfed entity cache bin lookups, which unlike the other two are not fewer entities being loaded, but the same entities loaded in bulk. config cache bins reads and writes are significantly reduced due to the field and translation improvements. Finally, cache tag lookups are also reduced because the views cache also used per-view cache tags.

Metric	11.2	11.3	Reduction
Total Query count	1097	420	62%
Total Cache lookups	991	522	47%
Total Cache writes	893	462	48%
Highlighted cache bins:
data lookup	95	5	95%
default lookup	80	36	55%
config lookup	498	197	60%
entity lookup	69	29	58%
config writes	451	150	67%
Cache tag lookups	69	35	49%

Second page request

This profile is the second request to another page. Many global caches have already been built. This page contains 50 paragraphs in multiple nested groups, more menu links and a view. There are significantly fewer cache writes than cache reads as many caches, especially config is already a cache hit. There is a slight increase in cache tag lookups as the additional revision load cache tags are no longer masked by the removed views cache tags.

Metric	11.2	11.3	Reduction
Total Query count	696	274	61%
Total Cache lookups	562	373	34%
Total Cache writes	179	105	41%
Highlighted cache bins:
entity lookup	109	43	61%
config lookup	245	145	41%
config writes	17	9	47%
Cache tag lookups	35	39	-11%
Cache deletes	3	0	100%

Related: The remaining cache lookups are also considerably faster with Redis module 2.x compared to Redis 1.x

Reducing the query count on those pages by 50% is great, but it also needs to be said that executing ~400 queries is still a huge amount to render those pages. Especially the cold cache includes a large amount of config, routing and menu queries that will be mostly returned from cache on subsequent requests. There's still a lot of potential to load more entities in bulk and other optimizations.

The performance test has been instrumental in identifying issues and asserting gradual improvements, but such tests can be challenging to run fast and reliably and maintain over time. We'll write about how to write such a test, for whom it is useful and our learnings around them in a separate blog post.

If you are interested in a similar assessment of your project to identify and improve bottlenecks or are otherwise struggling with a slow Drupal website, reach out to us!