|
Making Sense of the Search Engines We
wrote this (fairly lengthy) article to give a broad overview of how the search
engines work. We have no objection to its re-use by bona-fide publications, provided
we are contacted beforehand and credited in the by-line. Like all the information
it is the copyrighted property of XSEO Ltd, and all rights are reserved. At
some point you've probably wondered why it is that so many searches on the Web
lead you to hobby Websites. You search for a new BMW and you find a site that
someone created in his bedroom to show of pictures of his 1982 rattle trap. It's
frustrating experience, and it's brought about by an imbalance of technology. Web
designers are becoming increasingly creative, using frames, ASP, Flash and many
other sophisticated tools. But Search Engine spiders are simple creatures. Some
of them can now cope to an extent with framed sites and ASP pages (created on
the fly from a database), but the results can be unpredictable. As for Flash,
as far as the search engine is concerned it's an applet, and it doesn't know how
to deal with it at all. Search
engines read text. They have little or no concept of what your site looks like;
they simply read the HTML source that generates the page. As they do so they build
up a weighting for certain phrases within the page. It follows that they best
understand sites constructed using proper body text, header text, and relevant
text links. The names of graphics on the page are also read as text - the spider
doesn't know what the graphic actually shows. The domain name that brought the
spider to the page is also taken into account. In
the early days of optimisation, it was relatively easy to fool the search engines
by multiple repetitions of keywords, and by tricks like placing a large amount
of keyword-rich text on a background of the same colour. This practice has come
to be known as "Spamming", and the search engine operators have become
increasingly adept at spotting it. So
the bedroom site tends to score high because it's easy for the spider to understand.
For the best chance of a good listing we need to emulate this simplicity, but
use a professional presentation style. 
Understanding
Search Engines The term search engine is usually
used to cover all of the methods of finding topics on the Web. In fact there are
two distinctly different methods used: the crawler-based search engine and the
human-operated directory. Crawler-Based
Search Engines Crawler-based search engines, such as HotBot, "crawl"
the Web using a computer program called a Web spider. The spider gathers information
and stores it in a huge database, which later becomes available for searching.
Eventually, changes to your web pages are found by the spider, and will have an
impact on how the site is listed. Page titles, body copy and other elements all
play a role in how the page is given relevancy. Crawler-based search engines
have three major elements: The
Spider The spider visits a web page, reads it,
and then follows links to other pages within the site. It usually returns to the
site on a regular basis to identify site changes. The spider stores all of the
information that it has collected, but this can't yet be searched by users of
the search engine. This confuses many people, as it's possible for a site to be
registered with a search engine, and to have been visited by the spider, but still
be invisible to any search. The Index
Most search engine operators update their index monthly. It's a huge electronic
list containing a copy of every web page that the spider finds. If a web site
changes this book is updated with the new information. Unfortunately, the delay
between spidering and indexing can be several months. Until a site has been indexed
it can't be found. The Search Program
This is the part of the search engine with which we're all familiar. It's the
retrieval system that allows us to key in a search and find hundreds of sites
about 1982 BMWs. Human-Powered
Directories The human-powered directories such as DMOZ and Yahoo depend
on human editors for their listings. During submission you submit a short description
to the directory for your entire site. At some point your site and your review
will be examined and evaluated by a human editor. The editor may use your description
or write one of his or her own. A search within a directory looks for matches
only in the descriptions submitted. Changing your
web pages has no effect on your listing, as it is unlikely to be revisited once
indexed. Techniques that are useful for improving a listing with a crawler-based
search engine have no impact on improving a listing in a directory. That said,
a site that impresses an editor who scans it for a few seconds is likely to score
well with its target audience, so good presentation and clear content serves both
functions. Hybrid Search Engines
It's now common for both of the above techniques to be used for Website listing.
A hybrid search engine will usually favour one type of listings over the other.
For example, Yahoo is a human-powered search engine, however it does also present
crawler-based results provided by Google. 
Keywords
Without the right keywords, nothing else matters. In fact the term "keyword"
is misleading. Think rather of "key phrase", a collection of keywords.
Given the vast size of the Internet, optimising for a single keyword is becoming
effectively impossible unless the word is very uncommon. Single words also tend
to return poor relevancy for the searcher, so more and more people enter a short
phrase. So optimising for strings of two or three
words is more likely to be successful. The trick
with key phrase selection is to find out what people are actually searching for.
Asking your colleagues and a few customers what they'd type into a search engine
just isn't accurate enough to produce the result you need. Here's
an example to illustrate the point: A
recent client had been employing another Search Engine Optimising company (SEO)
to optimise their car sales Website. Words selected like "cars" and
"vehicles" were proving just too popular to pursue. A recent search
at Lycos resulted in the following:
| | Cars | 6,764,785
pages | | | Vehicles | 5,504,970
pages | It would be very difficult
to get to the top using these keywords, although given time not impossible. But
do we want to? Is everyone who searches for these terms looking to buy a car?
We advised this client to go for less popular,
more specific phrases. "Used car ", for example (3,259,428 listings)
would attract people looking for something more specific and relevant, and there's
less competition. Being more targeted they are more closely related to the actual
product or service and are much more likely to searched by people who want to
buy. It's important, of course, not to make the
keywords too obscure. This client requested the phrase "car buying advice".
When we investigated, we found that the phrase was little used by would-be clients.
| | car buying advice | searched
356 times: 560,754 listings in Lycos | | | car
buying guide | searched 69,900 times:
530,400 listings in Lycos | The competition
for "car buying advice" is almost the same as that for "car buying
guide". But only 356 people searched for it. The latter phrase was searched
almost 70,000 times, making it a far better prospect. We
recommend that most optimisation projects begin with a Keyword
Relevancy Report. We research search behaviours over the previous month so
that we can advise on the best target phrases. 
Meta
Tags Meta tags were once the most important factor in search engine
optimising. Unfortunately, the fact that they're invisible to the user means that
they're easy to abuse, and so search engine operators turned their attention to
visible elements. But they remain an essential part of a well-designed Website
promotion program. The Title tag is probably the most
important along with the Meta description. Most search engines give additional
weight to words found in the Title tag if those words are also found in the body
text. The Meta description is the text that is presented
by the search engine following the site's listing. It describes to the humans
reading the search results that the search is relevant and that they should click
on your site -not the one above. Because this is regarded as visible text, the
spiders may give it attention. It follows that if this text includes your key
phrases, and they're repeated in the main body text, your page will be regarded
as more relevant. 
Flash,
Frames and JavaScript To humans they look great, but to search engines
Flash, frames and JavaScript are meaningless. In fact their very presence on a
page can cause the page to be ignored. Some operators have made progress with
making sense of frames, but JavaScript, and especially Flash remain a total mystery.
A text rich page is sadly the only answer. This doesn't mean you can't use them
- the Web would be a much more boring place if you did - but you must keep them
in the place and make sure you're putting out food for the spiders. 
Keyword
Density One way the spiders can tell how relevant your page matches
the needs of the searcher is to examine how many times your page contains the
key phrase entered by the searcher. One trick used in the past was to use
a technique known as keyword packing. Fill a page
full of keywords over and over again, and a simple spider will be fooled. Because
keyword packing on this scale made the page un-readable, many web masters resorted
to hidden text. Typically this hides text by making
it white white on a white background, or hiding it in the no-frames section or
in layers. This worked for a while until the search engine operators got wise
to it. These techniques are classed as Spam by the engines. The spiders
became increasingly intelligent and able to spot them, and guilty parties were
de-listed. But the most telling measure is keyword
density. In normal text, it's not possible to over-use a key phrase and maintain
readability. By simply comparing the ratio of certain phrases to the total word
count, the spider can establish a page's relevancy and its honesty. Too high a
density and it's Spam; too low and the phrase will be ignored. Keyword density
is therefore a delicate balance. 
New
Content Algorithms The latest spiders are becoming intelligent enough
to understand syntax and page construction. This means that they can detecting
if pages are linguistically correct, and they can also identify pages that are
almost identical, making automatic page generation largely obsolete. Most
recently, the introduction of a Theme Bias algorithm is set to turn the
listings on their head. Theme bias implies that a site will be assessed for content
as a whole; the spider views the entire site and establishes themes that follow
through all of the pages. This can cause problems for sites with genuine content.
In our car sales example, the site may rarely mention "new cars", but
is likely to use terms like "warranty" repeatedly across all makes.
This could result in the theme bias scoring the site highly for warranties, but
less high for its core business. It may also mean that Websites offering many
different types products will lose out to single product sites. 
Page
Rank There is a continual process of measure and countermeasure between
less ethical members of the SEO community and the Search Engines. In an effort
to reward popular, highly relevant sites there now exists something called PR
(page rank). PR is a method by which a site can be measured for its importance
on a subject. The theory is simple: if a lot of other sites link to this site,
it must be important. Not surprisingly, as soon as this criterion
was understood it was abused. Suddenly we saw a proliferation of Link Farms.
As the name suggests a link farm is a site that shares links to others for no
other reason than increasing link popularity. Link farms became so successful
that the search engines responded by specifically looking for link farm structures.
Any site found to have such a link is at best ignored or even potentially de-listed.
Good page ranking comes from sites that already have a good PR themselves. A genuine
incoming link from a popular, high PR site boosts your ranking by effectively
vouching for its content. 
Cloaking
Cloaking became very popular up to a few months ago when, once again, the operators
caught on. It refers to a technique whereby the page contains code that can detect
the presence of a Web spider. When a human browser visits the site, he or she
sees the public presentation. The spider, however, is redirected to a specially
constructed "spider food" page. The practice is still
fairly widely used, but it is detectable by the spider and may result in de-listing.
Google specifically states in its guideline
rules; "Don't employ cloaking or sneaky redirects." 
In
Conclusion There are no shortcuts and few tricks in search engine optimising.
Much of it is common sense, and a well laid out page that appeals to humans is
often the best temptation for the Web spider too. The following rules are worth
keeping in mind: - Give a lot of thought to keywords; they
are the most important element to any search engine optimising project. Get into
the minds of your would-be clients.
- Avoid over-using frames,
Flash and JavaScript.
- Don't try to trick a search engine
- Be
very wary of any Search Engine Optimising (SEO) company that guarantees results.
How can anyone guarantee something it has no control over? Google's search engine
guidelines clearly state that guarantees are impossible. The guarantee is often
for a paid listing or very obscure key phrases. Or, of course, it's simply meaningless
and you're free to try and get your money back!
- The bottom
line is if your site has content which is invisible to the human user, there's
a good chance it could be interpreted as spam by the Search Engines. Try a sneaky
trick and you may well shoot to number one for your chosen phrase - for a week
or so. Once you're caught and de-listed it may be months or even years before
you can claw your way back. The only route to the top is hard work without tricks.
Good
SEO is about creating proper content. It's about staying within the rules, analysing
results and making regular content, structure and link alterations. It's labour
intensive and time consuming, and it's therefore not cheap. But used correctly
it's the most cost effective marketing tool in your armoury. 
 |