Your blog deserves a great Content Delivery Network

The Thinker by Rodin

While I do a lot of blogging, I suck at marketing my blog. Oh, I do look at who’s viewing my blog and check my statistics daily, and often more than once a day. Google Analytics provides a wealth of data on my web hits, and StatCounter is useful to see what was recently read. Aside from dressing up my blog’s sidebars with marketing stuff and making sure my content is easily accessible as a newsfeed, I can’t seem to be bothered to do much else.

Part of the problem is that my blog serves principally to keep me amused and to stave off boredom. If readers find an occasional post worthy of a Facebook Like or a Share, that’s nice, but I don’t lose sleep when they don’t. You would think that as a software engineer and someone who spent ten years directing the management of the largest web site in the U.S. Department of the Interior, I might find this web marketing business pretty easy. But one thing I learned early on is if you have great content, the marketing kind of takes care of itself.

In that job I simply worked to make the content more readily accessible and to make sure that the data was easily consumed. I spent much of my ten years there leading an effort to make the site’s data accessible as a set of web services. In this sense I do know marketing. When I left these new web services constituted the third most accessed site for my agency, in spite of not having existed just a few years earlier.

On this blog though my traffic is pretty anemic, particularly during the summer. There are things I could do to get more hits: shorter posts, more topical posts, turn it into more of a stream of consciousness blog and link ruthlessly to posts in other blogs, which seems to be the way blog aggregators like Tumblr work. Doing this though would ruin blogging for me. It might be successful, but I wouldn’t care. I’d be bored with my own blog.

During one of the recent Net Neutrality debates I mentioned that the Internet was already not net neutral. If you can afford little, you may (shudder) use an Earthlink dial-up account and watch web pages slowly draw themselves like they did in 1995. If you can afford $100 a month or more for Internet, or live in a place like Kansas City where you can get Google Fiber, you can cruise the Internet at 100MB per second or more. Some people have 1GB/sec connections.

If you have your own web site you also have some factors that limit the speed of your website. That’s the case with this blog. I host the site on hostgator.com, which is a really good shared web host. What’s not optimal about Hostgator is that while it can reliably serve most content at $5 or so a month, getting the data between its servers and your computer can be like going through every traffic light in town to get home from work as opposed to taking the expressway. It typically took eight or more “hops” to get my blog posts to my computer. A “hop” in this case means a router, which is effectively a traffic light as it routes parts of web pages from one place to another. According to Google Analytics that it took about ten seconds to load one of my web pages. Most of that was due to all those routers that had to be traversed.

So it finally dawned on me that this was probably a significant reason my traffic is declining. Google is looking at the hassle factor at getting content from my site, and is probably lowering my search rankings because of it. Aware of the problem for several years I have used CloudFlare to try to speed up the serving of my content. CloudFlare is a content delivery network or CDN. It specializes in reducing the number of traffic lights and making sure that my content goes through crazily fast connections, usually one physically close to where you are. Hostgator (and a lot of web hosts) offer CloudFlare for free to its customers. CloudFlare like every CDN sells a more expansive service for those with deeper pockets.

I had outsourced my CDN to CloudFlare, but I never really went back to look to see if it was doing a good job. There are probably things I could do to cache more of my content on CloudFlare’s servers (probably for money) but mostly I stuck with its defaults and ignored it. However, when I looked at Google Analytics, my average page load time was still stuck at around ten seconds.

Ten seconds is a long time to wait for content these days. So I figured I was probably losing a lot of readers because they lose patience and go elsewhere, particularly mobile users. We want every web page to load like a Google web page: fully dress itself for our eyes in a couple of seconds or less.

But not my blog. It was like a horse-drawn milk wagon compared with a racing car. Actually, this describes a lot of sites on the web, particularly Mom and Pop affairs where the owners know little or nothing about web architecture.

I decided to put on my software engineering hat, and started researching CDNs some more. There’s a lot of competition in the market, mostly aimed at well moneyed corporations. I’m just a little blog, however. And this blog runs on WordPress. What options do I have for a swift CDN that won’t cost me an arm and a leg? CloudFlare was free but it clearly wasn’t doing the job.

After some research I settled on MaxCND.com. For about $9 a month it will serve my pages quick. Of course if traffic increases a whole lot it could get a lot more expensive. But if I am content to use principally their servers in Europe and the USA (which is most of my readers) and I expect a terabyte or less of bandwidth a month then $9 a month should be fine. I can afford that. My pages seem to load in about 3 seconds now. A lot of the sidebar stuff comes from elsewhere, so that slows things down a bit. But the main content, if it is cached, takes about a second to load. That’s pretty impressive for $9 a month. And this fast speed might draw in new readers.

So far it’s looking good. Today’s traffic is roughly double what it was two days ago. Over time Google may take notice and rank my posts higher in their search engine. Here’s hoping.

Does your blog or website need a CDN too? It can’t hurt if you can afford it, and it can’t hurt to do your research and see which CDN is best optimized for your kind of content. MaxCDN has a plug in that works with WordPress to facilitate sharing. It was a little tedious to get it configured but the instructions were clear enough. Some of it is kind of wonky (how many people know what minifying is anyhow?) but the more technical you are the more you can fine tune things.

Please note you don’t need a CDN if you are using a blogging platform like Tumblr, BlogSpot or WordPress.com. They are already effectively CDN platforms as well as blogging sites. But if you host your own site and you want to increase traffic, integrating your site with the right CDN may be the most cost effective way to go.

I’ll be watching my metrics and perhaps reporting success or failure in the months ahead. So far the signs look good.

It’s official: SiteMeter no longer gives a damn

The Thinker by Rodin

Once upon a time when you wanted to meter your site, SiteMeter was the only solution. I started metering my blog with SiteMeter around 2004 because that’s what all the cool blogs were doing. Not that my “impressions” (page views) were ever that impressive, at least according to SiteMeter. Their meter went up and down but generally I was somewhere between a hundred and two hundred page views a day.

As I documented elsewhere, their metrics were grossly inflated, as they caught obvious search engines, which are not human beings. Still, it was useful to get a general snapshot of blog traffic. One click got you an up to the minute report. Google Analytics makes you log in and by default you are always a day behind. Despite its shortcomings, SiteMeter is useful. It excels in useful reports that always just one click away.

Around six a.m. on September 24, SiteMeter stopped metering my blog. The reports still come up but they just show zero traffic. Of course, this blog’s web traffic had not stopped, as evidenced by the fact that you are reading this. Both Google Analytics and StatCounter showed the usual site traffic. I thought maybe my tracking code had expired, but when I was finally able to log in to the SiteMeter manager and review my tracking code I found that it had not changed. So then I figured maybe they just weren’t aware that they weren’t catching my blog’s statistics. So I sent them a support request. More than a week later, I still have heard nothing.

Granted, it is hard to give me much attention when I don’t pay them anything. Most of SiteMeter’s customers don’t pay them. This limits us webmasters to the last 100 page views or visits and overall statistics, but they still have plenty of opportunities to make money from me. Every time I go to check out a SiteMeter report I see no less that two ads, one on the top and one on the side, will appear. And I typically checked the site a half dozen or so times during the day.

Go to SiteMeter’s web site today and it suggests that no one is minding the store. Their latest announcement was in February 2009. Their newest widget is for Windows Vista. They will still take your money quickly enough, if you want to pay for their service. It’s not worth paying for when there are so many superior and free alternatives. Why pay for a service when they cannot be bothered to maintain the site or troubleshoot problems? I imagine they hired some hacks to put the whole thing in the Amazon cloud and just forgot about it. To the extent they pay attention to it, it is to collect Google Adsense revenue. It probably pays for plenty of margaritas at the bar close to their deck chair along a beach in the Bahamas.

Not that they have cut off all my metering with SiteMeter. I also use SiteMeter on two other domains, and they are continuing to run fine. Their statistics, of course, are bogus and inflated as well, but I can still look at SiteMeter reports for these domains. For more official statistics, I go into Google Analytics.

However, Google Analytics tells you far more than you need to know. It’s an amazing product, just overkill for all but the most diehard web statisticians. SiteMeter’s user interface is simple, usable and clean. What I really need to do is emulate their reports and tie it directly into Google Analytics. Being lazy, however, I just haven’t gotten around to it. I’ve searched around to see if someone has taken the time to build SiteMeter-like reports for Google Analytics. If they have, I can’t find them or they are afraid of a lawsuit from SiteMeter’s lawyers. However, if I roll my own, I figure they’ll never know. So when I find some free time for the project, I plan to do it. It looks straightforward if you can write some code to parse a XML file.

Like Craigslist Casual Encounters, it appears that tracking your site with SiteMeter is now simply a waste of your time. So I’ll be removing my tracking code. No reason that I should give them my business since they obviously don’t care about retaining it.

Web statistics are untrustworthy

The Thinker by Rodin

Like many site owners, I monitor my web traffic. And every year I rediscover what Disraeli discovered long ago. There are three kinds of lies: lies, damned lies and statistics. The problem with web statistics is it is often hard to discern who is lying and by how much.

Most of us site owners care principally about one thing: how many eyeballs are looking at our site. And the answer turns out to be: no one really knows for sure. If you collect statistics using a hosted package like Awstats, it will accurately tell you how many overall hits and page requests you received, but at best it will poorly discern which of these represent eyeballs on the other end, instead of search engine robots and crawlers.

Based on my research, not even the mighty Google really knows. Because Google has tons of resources to throw at the web statistics problem, I figured they should know best. But it turns out that even Google can be fooled. At least that’s what I have inferred because around June 28, 2011 the number of page views on my site per day dropped roughly in half and have stayed that way. The same was not true with SiteMeter and StatCounter, which were also tracking my site usage.

06-26-2011

307

06-27-2011

337

06-28-2011

164

06-29-2011

165

06-30-2011

155

07-01-2011

135

07-02-2011

116

Was I upset that fewer people than I thought were hitting my blog? Not really. I had been thinking for months that Google Analytics was overstating my page views since their numbers were higher than anyone else’s, including SiteMeter. Sure, a higher number is always more flattering than a lower number but the average person arriving by a search engine is not reading three pages on my site, which Google Analytics was suggesting. Get real. No, the average human comes to glance at some post it found via a search engine then quickly move on. Anyhow, as you can see, around June 28, 2011 Google Analytics started applying a new algorithm, filtering out about half the page requests it used to. What I suspect happened is that they realized they were counting as humans a whole mess of automated requests.

At least Google eventually realized their mistake. As I noted some time ago, SiteMeter simply does not care. For years it has included the Google search engine robot, among others search engines and robots among my visitors and page views. Yes, it’s technically true they visited, but clearly no human was looking at my site. I guess if the agent can fire off the embedded Javascript that pings SiteMeter, that’s good enough for SiteMeter. What’s clear is that SiteMeter has basically given up bothering to care. They were one of the first to market in this business, developed a huge market share, and now apparently is only interested in the revenue from selling ad space when you go to their site to check on your statistics.

To get an idea of what’s wrong with web statistics these days, let’s look at visits and page views for this last week. Which statistics provider would you trust? Google Analytics, SiteMeter or StatCounter?

Date

Visits

Page views

Google SiteMeter StatCounter Google SiteMeter StatCounter

10/5/11

143

149

161

174

209

188

10/6/11

125

152

153

143

195

176

10/7/11

105

130

116

124

211

139

10/8/11

80

113

97

100

150

114

10/9/11

104

131

111

119

227

135

10/10/11

149

183

175

179

259

209

10/11/11

116

164

146

123

298

159

Total

822

1022

959

962

1549

1120

Increase compared with Google Analytics

24%

17%

61%

16%

Granted, each may have different criteria for when a day begins and ends. The good news is that since Google now does a better job of filtering requests, it is now consistently showing the lowest number of visits and page requests, hence I am more likely to trust it. But since they made a major change to their count algorithm in June, it throws off all of their statistics for this site for 2011, which makes the overall statistics pretty useless.

SiteMeter obviously does not care, since you simply have to look at the visit details to see that many of them come from googlebot.com. It would be a simple matter to filter these out, but SiteMeter would rather sell ads than improve their filters. Overall, SiteMeter counts 24% more visits and 61% more page views than Google Analytics.

StatCounter appears to be doing a pretty good job. Its numbers are about 15-20% higher than Google Analytics, but at least it tracks proportionately with Google Analytics. Moreover, StatCounter clearly actively maintains its product, so it has some integrity. I have some sympathy for those in this business. It must be very confusing to provide any sort of reliable statistics because there is never any way of knowing for sure whether a human is at the other end or not. Then there are all sorts of “in the background” web hits enabled by Web 2.0 technologies, such as redrawing Google Maps. On the web anyone with the right technical knowledge can pretend to be a human. All a statistics service can really do is make reasonable inferences and continuously change their filters as new trends emerge.

Products like Google Analytics do a great job of slicing and dicing the data they decide to count. I particularly like some of the newer features, like the ability to see on a map those states and cities that are providing most of your hits, and statistics that show many mobile users you have and what kind of devices they are using. Given their inability to wholly discern human traffic from automated traffic, even those statistics are suspect. Still, they are one of the few providers capable of providing any statistics like this, and they do it for free. So even numbers that are probably somewhat off are still valuable.

The ultimate lesson for site owners: take your web statistics with a grain of salt. In particular, realize that SiteMeter is a tainted product, useless for meaningful statistics, and useful only for getting some idea of what pages were most recently viewed. In fact, you might as well get rid of your SiteMeter tracking code altogether.

More problems with SiteMeter

The Thinker by Rodin

I first started metering my blog using SiteMeter back in 2004 because it was free and it did not have much competition. It solved the general problem of knowing who was accessing my web site in a simple way that still seems quite elegant. Create a SiteMeter account, slap some code into your site’s templates and you were done. The only alternative we had back them was to hope our web hosts had installed a package like Awstats. In many ways, SiteMeter was better than Awstats because it filtered out a lot of the noise. Awstats was not that good with determining “real” visits and page views vs. “fake” visits and page views. “Fake” visits and page views are any access by search engines. They don’t represent an actual human being reading your site. SiteMeter and similar services can tell real viewers from fake viewers because it depends on the browser to read and execute some embedded Javascript. The Javascript in your browser essentially “pings” a remote server, passing on information about your access. Search engine robots generally cannot be bothered.

Over the years, I have developed my suspicions about how accurate SiteMeter was. However, it was at least a common benchmark, since SiteMeter was also metering most other prominent websites. At least it gave you some idea of your relevant web traffic. Back in 2006, Google became a serious player in the site analytics business. In 2007 I began also monitoring my site with Google Analytics.

Over the last few months, it became clear to me that SiteMeter was missing many page requests. While it came close to matching the number of visits to my site, it was way off in the number of page views. For example, here are some statistics over the last six days for my site:

Date Google Analytics SiteMeter
Visits Page Views Visits Page Views
3/22/10 174 286 147 212
3/23/10 167 418 152 201
3/24/10 150 319 165 190
3/25/10 127 296 143 176
3/26/10 143 369 146 189
3/27/10 87 289 91 144
Total 848 1977 844 1112

What explains the difference? One small factor is that Google Analytics tracks days in Pacific time, while SiteMeter tracks in Eastern time. However, Google Analytics is reporting more than forty percent more page views that SiteMeter, 865 more page views over the same six-day period!

I really don’t think Google Analytics is creating artificial page views. As best I can figure, SiteMeter is either not getting notified of these additional page views or, more likely, one or more page views per visit are getting lost on the Internet and not actually arriving at SiteMeter. Why would this be? This is speculation, of course, but Google has much deeper pockets than SiteMeter. I suspect they have more servers listening on the edge of the cloud than SiteMeter. If correct, this means that for Google Analytics to collect a “ping” there are fewer routers to hop through on its journey, so they are more likely to be recorded. Part of the problem may be SiteMeter’s more precarious revenue stream. I don’t pay SiteMeter for monitoring my site, which means the only money they make from me are from serving me ads when I (or others) visit SiteMeter to see my statistics.

There are other issues with SiteMeter that show that they are getting sloppy. SiteMeter is also including the Google Search engine as a visitor, which artificially inflates my page view. If you are being metered by SiteMeter, you may be affected as well. Look at Recent Visitors by Details. If for example you see “googlebot.com” as the domain and a large number of page views, it’s pretty obvious that these are not human beings reading your site and the Google search engine is indexing your site instead. This problem has persisted for months and I have brought it to SiteMeter’s attention. They clearly don’t consider fixing it a priority, which implies they are not very concerned about the accuracy of their statistics.

I can understand that keeping track of the myriad search engines out there is a large challenge. I am sure Google has the same issue, but I am also confident that Google has the resources to make sure my statistics are clean. It sure appears that SiteMeter does not, or gives much lower priority to us non-paying customers.

SiteMeter is still useful to me as a quick way to check usage on my site. It gives me an idea of whether a certain post has gained in popularity and who has visited the site recently. However, it is clear that it is, at best, a rough record of actual usage of your site and is probably underreporting your site’s actual numbers of page views. You would be wise not to read too much into its statistics. If you have not added Google Analytics tracking code, you might want to do so.