Swimming team, Savitar 1956

Need for speed 2: Newspaper data diving, metrics and methodologies

Welcome to the weeds, fellow bit-twisters and data divers. We can chat here without worrying about the numeracy nonbelievers. This post details the methodologies used in “Need for speed 1: Newspaper load times give ‘slow news days’ new meaning.”

Pingdom Website Speed Test
Pingdom Website Speed Test

First, you and I both know “load time” is a fickle metric, completely dependent on the user’s connection speed and location. These screenshots of Pingdom Website Speed Test results show load times for The Washington Post: first, 3.76 seconds from Dallas; a minute later, 16.46 seconds from San Jose, just a few states away.

I hope the number of results in this report (1,455 newspapers) smooths out these variances enough to approach an overall average of actual load times. But maybe not. So, as with the Tools We Use reports, consider the data herein more generally informative than statistically precise.

Methodology

The steps to improve the accuracy of my results include:

  • Rerunning tests for 100-plus sites with greater than 50-second load times (exceeding two standard deviations), which returned more reasonable results for 95 percent of those sites.
  • Visually inspecting all sites with less than four-second load times and deleting any not really news sites (loosely defined as regularly updated linked lists of news articles and sections).

Perhaps I can divert you from my data deficiencies with some eye candy:

Good correlation to load time: requests and page weight; less correlated: Google score and Alexa rank

Note the nice funnels of correlation for requests and page weight (at the top). I expected PageSpeed scores, Alexa rank and other factors to also correlate well. None did.

So I grabbed a bunch of other data, including bounce rates, DOM elements and monthly visits. But requests and page weight remained the only factors I found in lock-step with load times:

Correlation withAverages
Load timeSpeed indexMeanMedian
Load time0.66316.7s13.0s
Requests0.7400.540281228
Page weight (MB)0.6830.4614.7MB3.9MB
Speed index0.6635.9s6.4s
Widgets0.3450.250256259
Desktop score-0.274-0.17152/10054/100
Mobile score-0.224-0.18845/10045/100
Mobile UX0.1990.16494/10099/100
DOM Elements0.0740.12515941577
Rank-0.071-0.069733,473311,108
Bounce rate-0.014-0.06356.6%57.4%
Circulation0.0040.04090,60523,592
Pages-per-session0.0040.1152.92.5
Visits/mo.-0.0060.030627,87245,000

I now have a sea of data I’m just starting to wade through. I’ll be looking at other load time correlations, like with CMS and servers. If you have suggestions, please comment below.

Sources

The newspaper data came mostly from API calls to:

The “Tools We Use 1“ methodologies section details how we compiled the list of newspapers and identified their CMS and servers. The URL tested for all results was the newspaper’s homepage. The WebPagetest setting used using Chrome from Dulles, Virginia, simulating cable bandwidth.

Averages

In the top table of “Need for speed 1,” the “U.S newspapers“ averages are of all results for load times, requests and page weights (from WebPagetest.org), and of desktop scores (from Google PageSpeed Insights).

The averages of “All sites” for requests, page weights and desktop scores are from the HTTP Archives (September 2015). Average load time for all sites is a slippery statistic. To get close I used the admittedly shaky method of determining the seconds at which Pingdom Tools switches from reporting, “Your site is slower than X% of sites” to “Your site is faster than. …” That happened at about 5.3 seconds.

And noting way down here where no one will notice: “Widgets” isn’t really only widgets, it’s everything BuiltWith calls a technology, from a plugin for WordPress or jQuery, to WordPress and jQuery itself (so the total is proportional to the number of requests). I included this factor only because it was one of the few with decent correlation to load time.

Thanks to BuiltWith for donating an account to RJI for this project. Thanks to Michael Jenner and Randy Picht for direction, and to Harsh Taneja for deviations (of the standard kind). The top image comes from the University of Missouri yearbook Savitar, 1956.

I’ll leave you with one last data dump:

 Load time (seconds)RequestsPage weight (MB)
 MeanMedianMeanMedianMeanMedian
All sites16.713.02812874.73.9
By CMS:
Drupal13.811.12182133.93.4
WordPress15.512.41941526.04.0
BLOX CMS17.613.93482955.54.7
By Server:
Apache13.510.41921543.83.0
Nginx16.012.52352245.03.6
IIS22.220.03913175.24.9
Comments

Comments are closed.