Web Page Load Time can Positively Influence Rankings

November 13th, 2009

Long Post Alert

This turned into a rather long post, so it’s probably a good thing I did it on a Friday.
However, this is a very important issue, so set aside some time to read through the whole thing.

I was reading a summary of topics that were presented at PubCon written by Rand Fishkin over at SEOMOz. One in particular caught my eye as I have been giving this very advice to clients for most of 2009: “Web Page Load Time can Positively Influence Rankings.”

It makes sense, doesn’t it? Google’s success is based on focusing on three key areas: user experience, monetization, and branding. Often, these principles may seem to be at odds, but it is the struggle that produces innovation. A tripod couldn’t stand if the legs weren’t pushing against each other right?

In this case, Google is focusing on user experience to feed branding, and may be sacrificing short-term revenue. In this case they may giving less focus to websites showing their content network ads and greater focus to established businesses that can afford to concentrate on optimizing load times.

The Most Important Issue in Load Times

One of my favorite anecdotes from my web career comes from a few years back when I was interviewing for a graphic designer position. A well-meaning fellow came to us with experience in ColdFusion and PaintShopPro (we were a .Net and Illustrator department at the time).

After running through our list of general questions we started asking the more existential ones, like “What is the most challenging thing you’ve ever had to do for a client?” The answer we got is hilarious to me, even today.

After a short explanation that “the web” ran on connections between machines, he launched into a short dissertation on the hex-code color system. Eventually we learned that, in his view, optimizing load times was best accomplished by taking a color which was used throughout the website (say “purple”) and gently dialing the hex codes closer to white, each time saving himself a bit of bandwidth usage (in theory).

Not only does this not make any sense from a design standpoint, it’s also fundamentally flawed from a tech-standpoint as well. But it does illustrate the desperation which plagues some designers in reducing their memory footprint.

Unfortunately, in the modern search landscape, kb-size means less and less. When you consider the cost of bandwidth (especially from Google’s perspective, who is amassing one of the largest fiber-optics collections in the world) the difference between downloading a 500kb page and a 5kb page is so small that it doesn’t matter, let alone the difference between 20kb and 18kb.

The most notorious factor now affecting this aspect of search rankings is one level deeper: it’s your server.

An Example of Server Response vs. Rankings

In 2008 I acquired a domain which had been abandoned by its previous owner. I set up a blog on that domain which stuck with the topics of the previous owner, and put in place a system which gave me 100s of pages of new content each week. By the end of 2008 the site was approximately 6,000 pages large, but with only a couple hundred pages actually indexed by Google.

During that time, the site was hosted on a shared platform. With approximately 3,000 other websites on the server, the load would sometimes spike to levels where the site was inaccessible, at least by the standards of the modern user.

As You Can See…

It’s almost a misnomer to say that your search rankings are positively influenced by load times. If anything, your rankings are negatively influenced, and anyone who doesn’t optimize their webserver is leaving money on the table.

I don’t have the exact numbers, but I would guess from experience that the site probably only had an 85% uptime (determined by pinging the site in 15 minute intervals). In March of 2009, I moved the site to a dedicated server.

The Not-so-Instantaneous Effects

When the site was re-launched on the dedicated server, I noticed an uptick in the number of pages indexed for the rest of March. Looking back, I would guess that these pages were considered “questionable” by the GoogleBot due to their unpredictable load times, and were suddenly being served very close to 100% of the time.

For the next month or so, nothing much happened. I believe Google was giving the site some time, the change in IP address probably tripped something on their end to wait and see what else happened.

Around May the site picked up momentum in leaps and bounds. In the space of a month I watched the number of pages indexed at Google jump from just under 1,000 all the way to the low 6,000s by the end of June. The site now stands at just over 14,000 pages indexed, which is roughly 75% of the total pages available.

Traffic similarly skyrocketed, from around 12,000 pageviews early in the year to a record 182,000 pageviews in August.

Caveats

Of course, it’s difficult to isolate the compounding variables in any SEO experiment, especially when you don’t know you’re experimenting in the first place.

There are a number of other factors that could have influenced the growth of this particular site: the fact that it now has a dedicated IP, or, since it is a blog, additional links in to popular posts. But I believe the predominant factor is the increased uptime.

Checking Server Response, and Why Load Time isn’t the Whole Game

There are many services out there that can help you monitor server performance, one of the most famous is PingDom. I’ve also used BasicState, which is a free service that sends you alerts and summaries when your server is unreachable.

It’s important to see how often your server is unreachable, not just how long it takes to load a page. You have to remember that for all the collective intelligence at Google, the menial tasks are still performed by robots.

Let’s say your site has 1,000 links coming in to it from other sites. And let’s pretend Googlebots follow those links at a rate of 10 per day, and that any given Googlebot will wait 10 seconds for your page to load before reporting that the site is gone and moving on. For simplicity’s sake, let’s also assume that all Googlebots communicate back to an Aggregator of some kind which stores data about your website.

On Day One, 10 Googlebots come by (or the same Googlebot 10 times, however you prefer to think of it). Your server is only capable of responding 8/10 times. Maybe one time it didn’t respond within 10 seconds, and one time it was actually down.

Remember, Googlebot is stupid. In the first case, Googlebot calls Aggregator and says, “The site is slow,”. In the second case, where the site is giving him a 404 error, he reads it as “Not Found” and decides the page is gone. He calls Aggregator and says “I got to this page via a link from someothersite.com and the page is gone.”

Aggregator sits back and watches this happen for 10 days. At the end of that cycle, he finds that Googlebot has reported 100 broken links (because the site was down when it got there) and 10% of the time the page did not load within the allotted time.

Aggregator compares this information to the limits set by Google Engineers, and data collected from other websites of this type. He then removes 100 links from your site’s profile (they go to “Not found” pages right?), and then places the site further down in the results, beneath websites that are better at responding to users.

The Bottom Line

The bottom line is that your website is an investment. If it’s worth it for you to spend time and money on an SEO strategy, then it’s worth your time and money to make sure you’re running on a strong server.

All too often, I see people get started by having their nephew build them a website. Eventually they realize they need to upgrade and hire a competent web-dev. Next they know they want more people on the page so they hire an SEO, who tweaks titles and builds links.

Nowhere in that entire chain did anyone think to upgrade the server. Servers are thought to be an issue for IT and Security, not Marketing. But in the world of Web Marketing, the underlying structure is just as important as the message.

To put it another way: would you print your next direct mail piece on the inkjet in the office? Or have it printed in full color on glossy cardstock?

Tags for This Post: Alert, seo, server, google results, response time, Post, search engine robots, google, Long

The Magic of Replication

November 2nd, 2009

IMG_1231bWe all know the golden rule of business: “it’s {10x cheaper|3 times easier|90% more profitable} to sell another unit to a current customer than to acquire a new one.” What we’re really talking about here is “replication” and it has much wider applications than customer acquisition.

A “Black Hat” Example
3-5 years ago, the name of the game in SEO was the “blog and ping” method. Essentially, this involved writing a blog post, and then “pinging”, or sending that URL to aggregation websites (Technorati is probably the most notable surviving member) where the links would be posted.

These aggregation sites were great for search engines, as it gave them an inside track to breaking news. They began to crawl the “New Posts” pages frequently, leading to rapid indexation for bloggers who participated in pinging. Eventually, the size of your ping list was just as important as the content of your post.

Of course, like anything with the potential to generate money, methods were invented to automate and scale. One method was to hide a 1-pixel iFrame on your site which loaded the submission page for a ping website. You could load 100s of these pixels on each page of your website. Because the page was loaded by your visitor’s browser, these 100s of pings seemed to come from all over the country, from different IP addresses.

What does this have to do with replication?

Let’s say Google crawled the Technorati page showing the most-pinged posts and saw John’s Blog post on “How I Repaired my Credit” at the top of the list. The spider makes a note to crawl John’s site and moves on. The blog post gets indexed, and John gets 5 hits on his site, each of those 5 people load his iFrames, which ping Technorati 5 more times.

Google comes back to the Technorati page an hour later and sees John’s post is still at the top of the list. The spider tells Google to put a rush on indexing John’s post. The post gets indexed and John gets 25 visitors on his site. Once again, the iFrame is fired 25 times from different IP addresses.

This could go on for days. Each time John gets a visitor, Technorati thinks that John’s site is growing in popularity, and Google sees John’s blog climbing the ranks. Google begins to think that John’s post is the authority on “how to repair credit” and sends him visitors. Those visitors inadvertently ping Technorati (and others) and Google keeps finding links to John’s site.

This is a great example of replication: 1 visitor leads to 2 visitors, leads to 5… on and on until you max out the machine.

A “White Hat” Example
If you haven’t signed up for LinkedIn you have probably at least gotten “invitations” to connect from friends and colleagues. That’s because, in the middle of the registration process, LinkedIn requests permission to look through your contact list to see who else is on LinkedIn. They also give you the option to spam invite people you know who aren’t yet on LinkedIn to join.

Of course, not everyone lets LinkedIn use their contact list, but there is incentive. A larger network on LinkedIn gives you certain benefits. So we know that some percentage of new users will expose their contacts to this system.

Let’s say that 1 in 10 take advantage of this feature, and that the average person has 100 contacts to send to, and that LinkedIn has a 10% signup rate, on average.

All LinkedIn has to do is get 10 signups before one person hands over their contacts, which leads to a further 10 signups (10% * 100), which leads to another 100 contacts mailed, and 10 more signups, and 100 more emails…

Optimizing the Replication
In the LinkedIn example, what would it be worth to increase their rate of “contact-allowance” to 20%? Or to increase their invitation response rate to 11%?

They would be increasing their net gain for each iteration. Improving either of those ratios leads to exponential gains down the line. If they increased both ratios simultaneously the gains are multiplied. By the third “generation” they’ve more than doubled their user population over the total if they had done nothing.

Increase Response Rate to 11%:

Userbase Iteration 1 Iteration 2 Iteration 3
10 20 40 80
10 21 44 92

Increase Permission to Invite Colleagues to 20%:

Userbase Iteration 1 Iteration 2 Iteration 3
10 20 40 80
10 30 90 270

Both Increases, Compounded:

Userbase Iteration 1 Iteration 2 Iteration 3
10 20 40 80
10 32 102 326

By improving their process, LinkedIn would quadruple their user base after just 3 iterations!

The Take Home
How can you use you website, your emails, your customer service processes, your products or your business itself to replicate something valuable? Once you have a replication action, measure the ratio. Then work on improving that ratio.

Tags for This Post: technorati, customer acquisition, google, money methods, name of the game

Directory Submission: An Inadvertent Case Study

April 18th, 2008

Directory SubmissionThe arguments against bulk submission of your website to hundreds of general directories can be summarized in the following way:

  1. Search engines will frown upon you getting 1,000 new links in one day.
  2. The value of 1,000 directory links from general-topic sites is next to nothing.
  3. Topical directories that are hand-edited will provide much more link-value, so you should spend resources on that instead.

I’d like to use an opportunity (or “crisi-tunity” if you’re a Simpsons fan) that’s recently come up for me to refute some of these claims. But first, let me refute #1 right away…

Rate of Change
When search engine spiders visit your webpage, there is a certain process that occurs. The spider reads the HTML of the page, probably makes a few statistical notes (# of words, # of links (internal and external), URL parameters used, dates on the page, etc.) and then puts it in the queue for analysis by a heavier-duty piece of software.

Your site gets put in the queue to be spidered in a few different ways: someone with a search-engine toolbar visits your site, someone links to your site, someone does a search for your specific URL. All of which give the site a different priority, depending on whether or not it’s in the index. Statistically speaking, this means that your site and my site are almost never being spidered simultaneously.

When you add in the fact that your page can’t be indexed until it goes through the indexing algorithSimpsons fanm (the heavy-duty software I referenced above) which can take a couple days, there is just no way for your new pages and mine to be added to the index at the same time.

Because links are an element of the page, they are “counted” in the same manner. When you spread this example across 1,000 sites, all with different indexing periods and rates there is just no way for a search engine to “see” you receiving hundred of links at the same time. They will appear to trickle in over the course of several weeks or months, especially when you consider the fact that not all the approvals go through at once either. I run several directories, and I only approve submissions once or twice a week.

There, #1 is officially dead, no?

The meat and potatoes
Last year a friend of mine was designing a brand new website for a company that sold promotional items. They were starting off with a brand new domain and wanted to get search engine traffic right away. Although my buddy asked me if I would help out, at the time I was confined by an agreement to only perform SEO for one company. So instead I hooked him up with a few tips on on-page optimization and a vendor who does directory submission.

Long story short the company ended up bailing on my friend after he delivered the site. They never paid him, so all that happened was the new site was built and live with on-page SEO basics, and it was submitted to 1,000 general-topic directories. One year later, the company has basically dissolved, but their site still stands (along with some analytics code I told my buddy to put on the site). Here’s what has materialized:

  • The site now has a pagerank of 4 (from N/A at the start).
  • The site shows 553 backlinks in Yahoo.
  • The site is receiving about 20 visitors a day from Google.

So it seems that those links are neither worthless nor ignored. They have produced tangible results in 2 major search engines. Granted, this won’t make you a million dollars, but for a $50 “fire-and-forget” submission package, why wouldn’t you?

What I Advocate and Response to #3
With regard to topical web directories and finding the really strong ones to submit to, I fully encourage you to take advantage. There will never, ever be anything wrong with achieving a high-quality link, from a site that is topically-related to yours.

However, if you have a brand new site where you’re just trying to get some “air under the wings” (or you’re running an Advanced Domaining Strategy) why not spend a little cash to get things moving? You can (and should) always supplement this with topical link-building, but that kind of strategy means you need to know your business, which is something you (conceivably) can’t outsource.

So spend your time building quality links, and outsource the foundational stuff.

Tags for This Post: seo, directory, link building, , directories

iSnare for the long term?

November 21st, 2006

The Article Distribution Service iSnare.com has been billed as one of the best tools around to increase a website’s presence. And I’ve been a big proponent of it since I first came across the service.

The idea is simple enough: submit an article to this service, it is reviewed by humans for quality and then gets auto-distributed to 1000s of article-aggregation websites, many on general topics, and a few on whatever topic you choose for your article.

After using it a few times, I began to notice that pages I promoted with the service would tend to rise in Google’s SERPs for my targeted terms, and then slowly fall back down. They would usually settle at higher positions than where they started, but I wondered why the Rome effect was so strong (that was a subtle reference to a rise/fall timeline).

So, I decided to study the Google results on fresh articles, and their mentions in search engines. I used the old trick of searching a unique phrase. On August 4th I used a unique phrase from each article on Google’s engine: 0 results. I then submitted both articles to iSnare for distribution. On August 8th I got an email that both articles had been approved and syndicated; a second Google search revealed 0 results for both.

0 results again on Aug. 9th. Then on Aug. 10th I saw the first signs of life: 7 results for Article 1 and 8 results for Article 2. By Aug. 15th, Article 1 had 437 results, and Article 2 had 458 results. There are two points of note here:

Point 1: I submitted both articles under the same category. They were approximately the same length (around 450 words). I submitted them on the same day within minutes of each other, and yet Article 1 lagged behind Article 2 for some reason.

Point 2: At this point (Aug. 15th) there were no supplemental results for either article. All 400+ results were fully viewable in the main index.

On the 16th of August the dupe filter must have kicked in on Article 1, because supplementals appeared and total results dropped to 361. Article 2 continued to thrive with 556 results on the 16th, with still no supplementals showing.

Eventually the dupe filter must’ve kicked in on Article 2 as well, and by August 30th, both result counts were below 50 (39 and 34 for 1 & 2, respectively).

As of today, Big G shows 11 results, of a total of 16 for Article 1 (so, approx. 4 supplementals). Article 2 fared better in the end, today displaying 16 results of 22 total (so, approx. 6 supplementals).

The [recently exported] PageRank for the top 10 results on each article range from 0-2, with the majority being 0 (and 2 N/As!).

So now some theories:

1. Article 1’s target phrase was more competitive than Article 2’s. My theory is that the more competitive an area, the greater number of filters (or in some cases, reviews) a page must pass to become part of the index. This is explained best in the theory of long-tail keywords, where phrases that don’t mean much in a marketing sense have a lot of impact on John Q. Searcher.

2. To compete with social bookmarking, Google needs to be buzz-aware. When a site creates a certain amount of buzz (linking, textual-references, etc.) Google needs to get in there and evaluate it for ranking. It will weight these sites with additional trustrank to get on top of the coming wave. A second (and potentially third) filter will later decide if the page is worth keeping in the index. Possibly by analyzing search volume for a phrase vs. the amount of “buzz”.

What might a takeaway be from this experiment? In my case, the combination of the “buzz” created with the article distro, plus the already-established authority (or Trustrank) of the site was enough to put the [brand new] pages I was targeting into the top 10 for their intended keyphrase.

As with most SEO activities, it is recommended to use this tool appropriately, and in combination with other tools.
Any thoughts?

Update: Looks like Aaron Wall and I may have been thinking along some similar lines.  He just posted about  new domains getting ranked in Google over old sites, and mentioned the following:

“Also think of the search business model as though you are a search engine. To them, being the first person to do something is a sign of quality because to be the first person in a market requires some market timing / knowledge / investment / luck.”

Tags for This Post: signs of life, serps, subtle reference, google results, google search, first signs