Performance Problems in Old Features

Unraveling past development work is sometimes the opposite of a quick win

As certain parts of the codebase age, features that were developed years ago no longer fit current needs, have low usage, or can even have detrimental effects.

Too often, people look at features as "done" once they hit production and move onto the next new thing. This is true especially in an industry like publishing and media if the organization started in (or still produces) print products where once an issue goes to press, it's a done deal. One thing I'm working on culturally is to be better about raising these issues so improvements to existing features get emphasized just like new feature development.

I've mentioned quick wins in previous posts. This is about the opposite; I have been going through a process of documenting, convincing, and planning in an attempt to improve a problematic feature with low usage. Sometimes, there was even an intentional step backwards that allowed me to move the performance needle forward.

Reviewing a legacy feature

For context, I work in media, and my company's sites are several years old. The sites are traditional (i.e. not headless) WordPress, and a lot has changed since they were developed: features have been added, there has been staff turnover and new hires, and browsers have gotten increasingly more capable.

As I was looking at a waterfall chart, I noticed images being loaded late in the page.

Partial waterfall chart
Resources #54, #57, #58, and #59 in the waterfall chart are content images from an image gallery in an article. Resources #55 and #56 are for ads.

These images are outside the viewport. In fact, they are part of a modal carousel of images that the user can view as part of an article. They are initially hidden until the user opens the gallery.

Screenshot of the image gallery that appears in an article. Large image on the left with an optional caption on the right

Since these are images are hidden until a user initiates the gallery, they aren't a candidate for the Largest Contentful Paint, so the fact that they are not among the first resources loaded was perfectly fine. In fact, one thing I wanted to change was to push these even later in the waterfall or maybe not at all.

What are the problems?

The code that runs the gallery hasn't seen many changes over the past few years. I personally hadn't worked on it other than small bug fixes, but the waterfall pointed out some issues.

Everything is too eager

When the page is loaded, the images for the article gallery are located in a hidden <div> that contains the contents of the gallery. Since the images are sourced <img> tags, the browser loads them even though they are hidden and might not be displayed to the user.

Although the images for the gallery are loaded with lower priority than the LCP, they still are loaded when the page is requested. In an application that's constrained by bandwidth (as opposed to CPU) like these sites typically are, this means the images for the gallery compete for bandwidth with ads.

Bad, blocking JavaScript

Since the images were loaded too eagerly, all the JS and CSS for the gallery was shipped in the initial page load as well. The default, out-of-the-box behavior for WordPress is to load scripts in a way that blocks rendering, so the image gallery had an effect on the initial render, even though it isn't initially visible.

All the newer JS is loaded with a deferred script tag, but this feature was old enough that it was in the render blocking bundle from the days where there was only one bundle on the sites. I have tried to move it to the deferred bundle before, but it's deceptively tangled and unnecessarily complicated.

The carousel itself could also be improved. It uses Owl Carousel, a defunct, unmaintained plugin for the now obsolete jQuery. Unfortunately, jQuery itself is a hard dependency to shake if you're using a WordPress multipage app because so many plugins in the ecosystem rely upon it, but something could be done about the carousel.

The modal part of the image gallery is one of several implementations of a modal on these sites. Given that the modals lack consistency in their behavior, it is also something to consider as issues are addressed.

Problems with the images

The LCP image in the article uses AWS Serverless Image Resizing and is served from an AWS Cloudfront distribution. It's a proper CDN and serves the images in WEBP to browsers that support it.

Unfortunately, that wasn't the case with the images for the gallery. In the waterfall chart that I attached before, the content image resources that I highlighted in the image caption are being served directly from AWS S3 in JPEG format.

This is bad for a few reasons.

  • S3 is not a CDN. Users far away from our AWS region experience latency with anything served from there.
  • S3 uses HTTP/1.1. There is a practical connection limit, and there are multiple connections that need to be opened.
  • The images were not resized any smaller than 700px wide. For image-heavy articles, this could mean multiple MBs wasted bandwidth.
  • S3 is merely storage and does not transform the images to modern formats like AVIF or WEBP.

Remediation plan

When I looked at the site analytics, less than 4% of article page views had views on the modal image gallery. This usage data can help inform some decisions with how to address performance issues. I also keep this quote in mind:

[Make the browser] do less work. Avoid work completely, if at all possible.

Stoyan Stefanov

With such low usage, it doesn't make sense to load the gallery so eagerly, and from a product perspective, it might not even make sense to have this feature at all.

That, of course, is something that requires further discussion, but I raised enough commotion that these discussions are happening.

As the fate of this feature hangs in the balance, I've effectively drawn attention to the performance problems. At a bare minimum, I've done enough to show that this feature has sneaky revenue impact because it competes with ads, and we'll at least address performance issues if we don't scrap it entirely.

Assuming that we rewrite the gallery, I do have a plan to address the issues.

Removing render blocking

At my company, it's perfectly fine if I change things that don't result in visual differences, and I do this often in other areas. A while back, I was surprised how many carousels were used on the sites in other areas. These carousels also used Owl, but I replaced them with a much lighter, maintained, and dependency-free one that I found on Bundlephobia.

List of articles displayed in a 3 item grid on wider screens that is paginated with a carousel
Carousels were used in cross-promotional areas on multiple templates throughout the site.

When I did this, I was able to remove Owl Carousel from the global dependencies file. That took the on-disk JS for the carousel from 43kb down to 15.5kb on all page templates other than the article template. More importantly, the new carousel dependency on these pages was now deferred.

The old carousel was added as a separate dependency to the article page template only, and this is the intentional step backwards I was talking about. There were now 2 carousel libraries on the article pages: the old one for the modal carousel and the new one for the cross-promotional areas.

This effectively put us into a transitional period, but I knew this was ok as long as it didn't stay this way for too long. It came with the benefit of giving the image gallery a sense of urgency to get back to 1 carousel dependency and also meant that my changes to the other carousels was an easier change to QA since it wasn't wrapped up with other development work.

Correcting the images

One thing I did a while ago was to make sure that the images for the gallery had srcset and sizes attributes, and this improved the overall image payload for the images that are rendered server-side.

However, in some cases, images without responsive properties on them are pushed into the article gallery with JavaScript, which is especially true for older posts that were migrated from another CMS.

The plan to address this is to use an edge worker to generate the srcset and sizes attributes if it's not already present.

Browser support and rendering strategy

The image gallery was written when IE10 was a viable browser. We dropped IE10 years ago, but still kept IE11 around until this year. As many headaches as supporting IE caused me for years, it did make it easy to set a support policy because you could essentially use it as the lowest common denominator.

I recently set a new browser support policy, also using analytics to get relevant information about our site users and especially the paying subscribers. Here is what that looks like currently, and I plan on reviewing and updating it every 6 months.

If this feature is rewritten instead of scrapped, I can use more modern features to make the gallery more performant. For example, I now can use the <template> tag to defer the gallery creation until it's absolutely necessary, where I couldn't do this with the old support matrix. I will also use a lazy loading strategy for the images in the gallery to make sure they aren't all loaded when the gallery is initialized.

This will impact the initial load of the article and also prevent too much from being loaded once the user opens the gallery since there are some galleries that contain 15 images or more.

Conclusion

It can be difficult at times to get people to address issues on existing features that already "work." I'd love to say that this was my first attempt at getting an image gallery rewrite on the docket, but it took a lot of time and care on my part in order to sell it because I believed it was important.