Search Engine Optimisation
What is Search Engine OptimizingPitfalls of SEOSearch engine optimisation tutorialsContact XSEOABOUT US
 

 

Making Sense of the Search Engines

We wrote this (fairly lengthy) article to give a broad overview of how the search engines work. We have no objection to its re-use by bona-fide publications, provided we are contacted beforehand and credited in the by-line. Like all the information it is the copyrighted property of XSEO Ltd, and all rights are reserved.

At some point you've probably wondered why it is that so many searches on the Web lead you to hobby Websites. You search for a new BMW and you find a site that someone created in his bedroom to show of pictures of his 1982 rattle trap. It's frustrating experience, and it's brought about by an imbalance of technology.

Web designers are becoming increasingly creative, using frames, ASP, Flash and many other sophisticated tools. But Search Engine spiders are simple creatures. Some of them can now cope to an extent with framed sites and ASP pages (created on the fly from a database), but the results can be unpredictable. As for Flash, as far as the search engine is concerned it's an applet, and it doesn't know how to deal with it at all.

Search engines read text. They have little or no concept of what your site looks like; they simply read the HTML source that generates the page. As they do so they build up a weighting for certain phrases within the page. It follows that they best understand sites constructed using proper body text, header text, and relevant text links. The names of graphics on the page are also read as text - the spider doesn't know what the graphic actually shows. The domain name that brought the spider to the page is also taken into account.

In the early days of optimisation, it was relatively easy to fool the search engines by multiple repetitions of keywords, and by tricks like placing a large amount of keyword-rich text on a background of the same colour. This practice has come to be known as "Spamming", and the search engine operators have become increasingly adept at spotting it.

So the bedroom site tends to score high because it's easy for the spider to understand. For the best chance of a good listing we need to emulate this simplicity, but use a professional presentation style.

Understanding Search Engines
The term search engine is usually used to cover all of the methods of finding topics on the Web. In fact there are two distinctly different methods used: the crawler-based search engine and the human-operated directory.

Crawler-Based Search Engines
Crawler-based search engines, such as HotBot, "crawl" the Web using a computer program called a Web spider. The spider gathers information and stores it in a huge database, which later becomes available for searching. Eventually, changes to your web pages are found by the spider, and will have an impact on how the site is listed. Page titles, body copy and other elements all play a role in how the page is given relevancy.
Crawler-based search engines have three major elements:

The Spider
The spider visits a web page, reads it, and then follows links to other pages within the site. It usually returns to the site on a regular basis to identify site changes. The spider stores all of the information that it has collected, but this can't yet be searched by users of the search engine. This confuses many people, as it's possible for a site to be registered with a search engine, and to have been visited by the spider, but still be invisible to any search.

The Index
Most search engine operators update their index monthly. It's a huge electronic list containing a copy of every web page that the spider finds. If a web site changes this book is updated with the new information. Unfortunately, the delay between spidering and indexing can be several months. Until a site has been indexed it can't be found.

The Search Program
This is the part of the search engine with which we're all familiar. It's the retrieval system that allows us to key in a search and find hundreds of sites about 1982 BMWs.

Human-Powered Directories
The human-powered directories such as DMOZ and Yahoo depend on human editors for their listings. During submission you submit a short description to the directory for your entire site. At some point your site and your review will be examined and evaluated by a human editor. The editor may use your description or write one of his or her own. A search within a directory looks for matches only in the descriptions submitted.

Changing your web pages has no effect on your listing, as it is unlikely to be revisited once indexed. Techniques that are useful for improving a listing with a crawler-based search engine have no impact on improving a listing in a directory. That said, a site that impresses an editor who scans it for a few seconds is likely to score well with its target audience, so good presentation and clear content serves both functions.

Hybrid Search Engines
It's now common for both of the above techniques to be used for Website listing. A hybrid search engine will usually favour one type of listings over the other. For example, Yahoo is a human-powered search engine, however it does also present crawler-based results provided by Google.

Keywords
Without the right keywords, nothing else matters. In fact the term "keyword" is misleading. Think rather of "key phrase", a collection of keywords. Given the vast size of the Internet, optimising for a single keyword is becoming effectively impossible unless the word is very uncommon. Single words also tend to return poor relevancy for the searcher, so more and more people enter a short phrase.

So optimising for strings of two or three words is more likely to be successful.

The trick with key phrase selection is to find out what people are actually searching for. Asking your colleagues and a few customers what they'd type into a search engine just isn't accurate enough to produce the result you need.

Here's an example to illustrate the point:

A recent client had been employing another Search Engine Optimising company (SEO) to optimise their car sales Website. Words selected like "cars" and "vehicles" were proving just too popular to pursue. A recent search at Lycos resulted in the following:

 Cars6,764,785 pages
 Vehicles5,504,970 pages

It would be very difficult to get to the top using these keywords, although given time not impossible. But do we want to? Is everyone who searches for these terms looking to buy a car?

We advised this client to go for less popular, more specific phrases. "Used car ", for example (3,259,428 listings) would attract people looking for something more specific and relevant, and there's less competition. Being more targeted they are more closely related to the actual product or service and are much more likely to searched by people who want to buy.

It's important, of course, not to make the keywords too obscure. This client requested the phrase "car buying advice". When we investigated, we found that the phrase was little used by would-be clients.

 car buying advicesearched 356 times: 560,754 listings in Lycos
 car buying guidesearched 69,900 times: 530,400 listings in Lycos

The competition for "car buying advice" is almost the same as that for "car buying guide". But only 356 people searched for it. The latter phrase was searched almost 70,000 times, making it a far better prospect.

We recommend that most optimisation projects begin with a Keyword Relevancy Report. We research search behaviours over the previous month so that we can advise on the best target phrases.

Meta Tags
Meta tags were once the most important factor in search engine optimising. Unfortunately, the fact that they're invisible to the user means that they're easy to abuse, and so search engine operators turned their attention to visible elements. But they remain an essential part of a well-designed Website promotion program.

The Title tag is probably the most important along with the Meta description. Most search engines give additional weight to words found in the Title tag if those words are also found in the body text.

The Meta description is the text that is presented by the search engine following the site's listing. It describes to the humans reading the search results that the search is relevant and that they should click on your site -not the one above. Because this is regarded as visible text, the spiders may give it attention. It follows that if this text includes your key phrases, and they're repeated in the main body text, your page will be regarded as more relevant.

Flash, Frames and JavaScript
To humans they look great, but to search engines Flash, frames and JavaScript are meaningless. In fact their very presence on a page can cause the page to be ignored. Some operators have made progress with making sense of frames, but JavaScript, and especially Flash remain a total mystery. A text rich page is sadly the only answer. This doesn't mean you can't use them - the Web would be a much more boring place if you did - but you must keep them in the place and make sure you're putting out food for the spiders.

Keyword Density
One way the spiders can tell how relevant your page matches the needs of the searcher is to examine how many times your page contains the key phrase entered by the searcher.
One trick used in the past was to use a technique known as keyword packing. Fill a page full of keywords over and over again, and a simple spider will be fooled.

Because keyword packing on this scale made the page un-readable, many web masters resorted to hidden text. Typically this hides text by making it white white on a white background, or hiding it in the no-frames section or in layers. This worked for a while until the search engine operators got wise to it. These techniques are classed as Spam by the engines. The spiders became increasingly intelligent and able to spot them, and guilty parties were de-listed.

But the most telling measure is keyword density. In normal text, it's not possible to over-use a key phrase and maintain readability. By simply comparing the ratio of certain phrases to the total word count, the spider can establish a page's relevancy and its honesty. Too high a density and it's Spam; too low and the phrase will be ignored. Keyword density is therefore a delicate balance.

New Content Algorithms
The latest spiders are becoming intelligent enough to understand syntax and page construction. This means that they can detecting if pages are linguistically correct, and they can also identify pages that are almost identical, making automatic page generation largely obsolete.

Most recently, the introduction of a Theme Bias algorithm is set to turn the listings on their head. Theme bias implies that a site will be assessed for content as a whole; the spider views the entire site and establishes themes that follow through all of the pages. This can cause problems for sites with genuine content. In our car sales example, the site may rarely mention "new cars", but is likely to use terms like "warranty" repeatedly across all makes. This could result in the theme bias scoring the site highly for warranties, but less high for its core business. It may also mean that Websites offering many different types products will lose out to single product sites.

Page Rank
There is a continual process of measure and countermeasure between less ethical members of the SEO community and the Search Engines. In an effort to reward popular, highly relevant sites there now exists something called PR (page rank). PR is a method by which a site can be measured for its importance on a subject. The theory is simple: if a lot of other sites link to this site, it must be important.

Not surprisingly, as soon as this criterion was understood it was abused. Suddenly we saw a proliferation of Link Farms. As the name suggests a link farm is a site that shares links to others for no other reason than increasing link popularity. Link farms became so successful that the search engines responded by specifically looking for link farm structures. Any site found to have such a link is at best ignored or even potentially de-listed.

Good page ranking comes from sites that already have a good PR themselves. A genuine incoming link from a popular, high PR site boosts your ranking by effectively vouching for its content.

Cloaking
Cloaking became very popular up to a few months ago when, once again, the operators caught on. It refers to a technique whereby the page contains code that can detect the presence of a Web spider. When a human browser visits the site, he or she sees the public presentation. The spider, however, is redirected to a specially constructed "spider food" page.

The practice is still fairly widely used, but it is detectable by the spider and may result in de-listing. Google specifically states in its guideline rules; "Don't employ cloaking or sneaky redirects."

In Conclusion
There are no shortcuts and few tricks in search engine optimising. Much of it is common sense, and a well laid out page that appeals to humans is often the best temptation for the Web spider too. The following rules are worth keeping in mind:

  • Give a lot of thought to keywords; they are the most important element to any search engine optimising project. Get into the minds of your would-be clients.
  • Avoid over-using frames, Flash and JavaScript.
  • Don't try to trick a search engine
  • Be very wary of any Search Engine Optimising (SEO) company that guarantees results. How can anyone guarantee something it has no control over? Google's search engine guidelines clearly state that guarantees are impossible. The guarantee is often for a paid listing or very obscure key phrases. Or, of course, it's simply meaningless and you're free to try and get your money back!
  • The bottom line is if your site has content which is invisible to the human user, there's a good chance it could be interpreted as spam by the Search Engines. Try a sneaky trick and you may well shoot to number one for your chosen phrase - for a week or so. Once you're caught and de-listed it may be months or even years before you can claw your way back. The only route to the top is hard work without tricks.

Good SEO is about creating proper content. It's about staying within the rules, analysing results and making regular content, structure and link alterations. It's labour intensive and time consuming, and it's therefore not cheap. But used correctly it's the most cost effective marketing tool in your armoury.

 
 
JOIN THE MAILING
LIST
 
 If you'd like to receive occasional updates on what's happening in the world of Search Engine Optimisation, then please give us your e-mail address 
 Your privacy 
 
 
 
XSEO TUTORIALS
 
 

XSEO has an open policy to optimisation,
so before you part with any money (with us or anyone else), it makes sense to make sure you're site is working within the confines of
SEO Best Practice.

search engine otimization tutorials

FREE SEO Tutorials

 
 Adv 
 

Free 15 Day trial , how are you Ranked?

 
   
 


SEO INDUSTRY
NEWS

 
 

Best practice Search Engine Optimisation changes continually.

Search Engine News
Our dedicated SEO news site highlights the important issues.

Search Engine News

 
Search Engine News
 
 

 

 
 

XSEO Limited Stafford Staffordshire UK

 
 
Search engine marketing
SEO HomeWhat is Search Engine OptimizingPitfalls of SEOSearch engine optimisation tutorialsContact XSEO 

"" Last Update 28 Apr 06

© XSEO Ltd 2002-2006 All Rights Reserved