SE Robots: Uncontrollable Corsairs or Obedient Info Collectors?
Search engines’ crawlers do not scurry about the web at random. Their navigation paths obey the rules and schedules given by certain control centers. In the past, you could have been a control center – the main method of giving instructions to search engines’ bots was Meta tags and robots.txt usage.
Meta elements provide web page information that helps search engines to categorize or ignore it.
The major search engines of mid 90-s relied on Meta tags hard. Smart webmasters realized that the result pages can be easily manipulated for commercial purposes. That is why today’s major search engines do not pay as much attention to Meta information as they used to. However there are still some important directives that should be used to control search engines’ crawlers.
Description Meta Tag
Example: <meta name=”description” content=”Here goes the description of the page”>
Description Meta tag is supported by most search engines. It provides a concise explanation of a webpage content and lets the author of the page write a proper description of it. This text is often used in search engine results pages, so good descriptive text may increase the page’s click-through rate. W3C does not set the length of description meta tag, however search engines advise using no more than 200 characters of plain text.
Robots Meta Tag
Example: <meta name=”robots” content=”noindex, nofollow”>
This tag is used to tell search engines’ bots that a page shouldn’t be indexed and that its links shouldn’t be followed. Robotstxt.org reminds us that:
Besides “noindex” and “nofollow” values, “noarchive” and “nosnippet” are used to tell bots that a page should not be cached and there should be no description in the search engine results page.
However robots’ meta tag should not be confused with rel=”nofollow” link attribute (it is set on an HTML <a> link tag). This attribute was invented by Google and supported by other search engines. It tells bots that PageRank should not be spread to the link. Thus it only affects the ranking, and does not stop bots from following the link and index pages.
Speaking about link attribute, the new rel=”canonical” should be mentioned. At the beginning of February, Google, Yahoo, and Microsoft announced that this new link attribute is supported. It was created to give webmasters more control over pages that have the same content.
Google gives the following example of the attribute usage on the Webmaster Central blog:
<link rel=”canonical” href=”http://www.example.com/product.php?item=fish”/>
All this was done to reduce the amount of duplicate content. To use the tag, simply place it in the head section of the duplicate content URLs. The tag can only be used on pages within a single site. The search engines recommend using absolute links, though relative links are also acceptable.
Sitemaps are one of the easiest ways to inform search engines about website pages available for crawling. A Sitemap is an XML file that lists URLs for a site along with additional metadata about each URL (when it was last updated, how often it usually changes, and how important it is, relative to other URLs in the site) so that search engines can more intelligently crawl the site.
Using the Sitemap protocol does not guarantee that web pages are included in search engines, but provides crawlers with the directions to follow. You can find detailed information on sitemaps at Sitemaps.org. Google encourages webmasters to submit their sitemaps, especially if the sites are dynamic, featuring rich AJAX or Flash, relatively new and have almost no links pointing to them, or have a lot of orphaned content pages.
There is one more way to guide search engines’ bots: the Robots Exclusion Protocol, also known as /robots.txt file.
All the major crawlers check what is written in www.example.com/robots.txt and if it says
this means the page should be indexed by each and every bot. You can read more about robots.txt at Robotstxt.org.
In respect to SEO, the robots.txt file is a must-have, because it helps to guide bots through the pages that should be indexed. First, you should exclude pages that are available only to registered users; it is also a good SEO idea to exclude pages with duplicate content (for example, your articles archive) to prevent them from outranking the original pages.
You can use Web CEO’s Editor to create or edit your Meta Description, Meta Robots, Site maps and robots.txt files. Make sure to check them all after any changes you make within your website. After all, you do not want to mislead the search engines’ bots, do you?
SEO Companies’ Visibility Rate
Are SEO companies as good as they claim to be on their sites? Will they return the efficiency they promise? Are their skills qualified? The only way to find it out is to check how they optimize and promote their own sites.
Here we share Top 10 SEO Companies according to their search visibility rate for February 2009.
Web CEO analysts use objective evidence to rate SEO firms according to their search engine visibility. SEO companies’ visibility rate is calculated using a special formula that considers the positions of SEO companies’ sites in search engines results pages for the keywords their potential clients use, popularity of these keywords and number of competitors. Learn more about the formula.
“In a few months, we will discontinue support for uploads to Google Video. Don’t worry, we’re not removing any content hosted on Google Video – this just means you will no longer be able to upload new content to the service.”
Michael Cohen, Product Manager – Google Video
What does this mean? It means you can either use other video search engines (including YouTube), or host video files on your own web server. Both sides have pros and cons, but this statement means that creating a video sitemap is an obvious advantage of hosting video files on your own web server.
Video sitemaps: share your videos with Googlebot and the whole world
In December 2007, Google offered a video sitemap – an extension of the Sitemap Protocol that helps provide Google with more information about your video content. Google promised this would help videos get picked up by the Googlebot and be fully searchable on Google Video.
If you type “Madonna” or “Matt Cutts” into Google Universal Search right now, you will see that 2-4 of 10 results on the 1st results page are powered by YouTube or Google Video, which means your video can also be found in Google Web results. In many cases, the video results can catch many more eyes and result in many more clicks than plain text listings, so take your time deciding if they can be helpful for your online business.
For now, Google is the only search engine that provides support for video sitemaps; but as it often happens in the industry, other search engines might be on the way.
How to create a video sitemap
The concept of the video sitemap is the same as those of the html: you list the content you want to be indexed, and submit the sitemap to the search engine (Google in our case).
All you need to create a video sitemap is to take an example and substitute sample URLs by your real files’ data.
Here is a sample of a Video Sitemap entry using Video-specific tags and the information they should report:[SinglePic not found]
©Google Webmaster Center Help http://www.google.com/support/webmasters
The landing page and the location of the video file tags are obligatory, so your shortest sitemap will consist of 3-4 lines. Although other tags are optional, our advice is to enter all this information, because it will help your video files rank higher for the target keywords.
There can be up to 50,000 video URLs in a sitemap and the file must be no larger than 10MB uncompressed. Compatible video types are .mpg, .mpeg, .mp4, .mov, .wmv, .asf, .avi, .ra, .ram, .rm, and .flv. The files must be available directly via HTTP.
Submit the video sitemap to Google
Once you have created your sitemap file, let Google know about it – submit it directly from Google Webmaster Tools account (you may need to register it if you don’t have one yet).
After your submission, check the sitemap’s status. (Google needs about 5-50 min to check it.) If you receive an error, take a look at the details – Google will explain the mistakes that you made.
After you fix your sitemap and Google accepts it, you finally have something to show the world.
Tips & Tricks
1) There’s no Google-approved video sitemap generator
Google doesn’t recommend any special video sitemap generator. What should you do if you have a lot of video content to upload? Creating a sitemap by hand is a pain. Google lists third-party sitemap generators in its help file, though it’s hard to understand which of them are capable of creating video sitemaps.
The good news is that you don’t need to submit the sitemap each time you change it – Google regularly picks up changes in the sitemaps it knows about.
2) Make your own thumbnail
Load your own thumbnail for the video and specify its location in the sitemap. If you don’t, Google will choose the fragment from your video file at its discretion.
A thumbnail is valuable; it announces your video, and plays a main role in a visitor’s decision whether or not to click your file. So, it’s up to you – give your visitors an idea with the 80*60 pxls picture or entrust the choice of your video fragment to Google.
3) Specify the sitemap location in your robots.txt file
To do this, simply add the following line to the sitemap:
The <sitemap_location> is the complete URL to the Sitemap e.g. http://www.example.com/sitemap.xml.
It doesn’t matter where you place this tag in your sitemap since this directive is independent of the user-agent.
Web CEO Metrics
Here we are sharing the generalized numbers from our HitLens Web Analytics service. It covers 300,000+ websites from all over the world.
Visitor Referrers (%)
This chart gives the idea of the market share of each of the three major search engines.
Visitor Referrers (%)
You can see how visitors are being referred to websites. The social media share is constantly growing, while search engines still keep the crown and remain a top referrer.
Online Video Facts