Kamis, 21 Desember 2006

Better understanding of your site

SES Chicago was wonderful. Meeting so many of you made the trip absolutely perfect. It was as special as if (Chicago local) Oprah had joined us!

While hanging out at the Google booth, I was often asked about how to take advantage of our webmaster tools. For example, here's one tip on Common Words.

Common Words: Our prioritized listing of your site's content
The common words feature lists in order of priority (from highest to lowest) the prevalent words we've found in your site, and in links to your site. (This information isn't available for subdirectories or subdomains.) Here are the steps to leveraging common words:

1. Determine your website's key concepts. If it offers getaways to a cattle ranch in Wyoming, the key concepts may be "cattle ranch," "horseback riding," and "Wyoming."

2. Verify that Google detected the same phrases you believe are of high importance. Login to webmaster tools, select your verified site, and choose Page analysis from the Statistics tab. Here, under "Common words in your site's content," we list the phrases detected from your site's content in order of prevalence. Do the common words lack any concepts you believe are important? Are they listing phrases that have little direct relevance to your site?

2a. If you're missing important phrases, you should first review your content. Do you have solid, textual information that explains and relates to the key concepts of your site? If in the cattle-ranch example, "horseback riding" was absent from common words, you may then want to review the "activities" page of the site. Does it include mostly images, or only list a schedule of riding lessons, rather than conceptually relevant information?

It may sound obvious, but if you want to rank for a certain set of keywords, but we don't even see those keyword phrases on your website, then ranking for those phrases will be difficult.

2b. When you see general, non-illustrative common words that don't relate helpfully to your site's content (e.g. a top listing of "driving directions" or "contact us"), then it may be beneficial to increase the ratio of relevant content on your site. (Although don't be too worried if you see a few of these common words, as long as you also see words that are relevant to your main topics.) In the cattle ranch example, you would give visitors "driving directions" and "contact us" information. However, if these general, non-illustrative terms surface as the highest-rated common words, or the entire list of common words is only these types of terms, then Google (and likely other search engines) could not find enough "meaty" content.

2c. If you find that many of the common words still don't relate to your site, check out our blog post on unexpected common words.

3. Here are a few of our favorite posts on improving your site's content:
Target visitors or search engines?

Improving your site's indexing and ranking

NEW! SES Chicago - Using Images

4. Should you decide to update your content, please keep in mind that we will need to recrawl your site in order to recognize changes, and that this may take time. Of course, you can notify us of modifications by submitting a Sitemap.

Happy holidays from all of us on the Webmaster Central team!

SES Chicago: Googlers Trevor Foucher, Adam Lasnik and Jonathan Simon

Senin, 18 Desember 2006

Deftly dealing with duplicate content

At the recent Search Engine Strategies conference in freezing Chicago, many of us Googlers were asked questions about duplicate content. We recognize that there are many nuances and a bit of confusion on the topic, so we'd like to help set the record straight.

What is duplicate content?
Duplicate content generally refers to substantive blocks of content within or across domains that either completely match other content or are appreciably similar. Most of the time when we see this, it's unintentional or at least not malicious in origin: forums that generate both regular and stripped-down mobile-targeted pages, store items shown (and -- worse yet -- linked) via multiple distinct URLs, and so on. In some cases, content is duplicated across domains in an attempt to manipulate search engine rankings or garner more traffic via popular or long-tail queries.

What isn't duplicate content?
Though we do offer a handy translation utility, our algorithms won't view the same article written in English and Spanish as duplicate content. Similarly, you shouldn't worry about occasional snippets (quotes and otherwise) being flagged as duplicate content.

Why does Google care about duplicate content?
Our users typically want to see a diverse cross-section of unique content when they do searches. In contrast, they're understandably annoyed when they see substantially the same content within a set of search results. Also, webmasters become sad when we show a complex URL (example.com/contentredir?value=shorty-george〈=en) instead of the pretty URL they prefer (example.com/en/shorty-george.htm).

What does Google do about it?
During our crawling and when serving search results, we try hard to index and show pages with distinct information. This filtering means, for instance, that if your site has articles in "regular" and "printer" versions and neither set is blocked in robots.txt or via a noindex meta tag, we'll choose one version to list. In the rare cases in which we perceive that duplicate content may be shown with intent to manipulate our rankings and deceive our users, we'll also make appropriate adjustments in the indexing and ranking of the sites involved. However, we prefer to focus on filtering rather than ranking adjustments ... so in the vast majority of cases, the worst thing that'll befall webmasters is to see the "less desired" version of a page shown in our index.

How can Webmasters proactively address duplicate content issues?
  • Block appropriately: Rather than letting our algorithms determine the "best" version of a document, you may wish to help guide us to your preferred version. For instance, if you don't want us to index the printer versions of your site's articles, disallow those directories or make use of regular expressions in your robots.txt file.
  • Use 301s: If you have restructured your site, use 301 redirects ("RedirectPermanent") in your .htaccess file to smartly redirect users, the Googlebot, and other spiders.
  • Be consistent: Endeavor to keep your internal linking consistent; don't link to /page/ and /page and /page/index.htm.
  • Use TLDs: To help us serve the most appropriate version of a document, use top level domains whenever possible to handle country-specific content. We're more likely to know that .de indicates Germany-focused content, for instance, than /de or de.example.com.
  • Syndicate carefully: If you syndicate your content on other sites, make sure they include a link back to the original article on each syndicated article. Even with that, note that we'll always show the (unblocked) version we think is most appropriate for users in each given search, which may or may not be the version you'd prefer.
  • Use the preferred domain feature of webmaster tools: If other sites link to yours using both the www and non-www version of your URLs, you can let us know which way you prefer your site to be indexed.
  • Minimize boilerplate repetition: For instance, instead of including lengthy copyright text on the bottom of every page, include a very brief summary and then link to a page with more details.
  • Avoid publishing stubs: Users don't like seeing "empty" pages, so avoid placeholders where possible. This means not publishing (or at least blocking) pages with zero reviews, no real estate listings, etc., so users (and bots) aren't subjected to a zillion instances of "Below you'll find a superb list of all the great rental opportunities in [insert cityname]..." with no actual listings.
  • Understand your CMS: Make sure you're familiar with how content is displayed on your Web site, particularly if it includes a blog, a forum, or related system that often shows the same content in multiple formats.
  • Don't worry be happy: Don't fret too much about sites that scrape (misappropriate and republish) your content. Though annoying, it's highly unlikely that such sites can negatively impact your site's presence in Google. If you do spot a case that's particularly frustrating, you are welcome to file a DMCA request to claim ownership of the content and have us deal with the rogue site.

In short, a general awareness of duplicate content issues and a few minutes of thoughtful preventative maintenance should help you to help us provide users with unique and relevant content.

Jumat, 15 Desember 2006

Building link-based popularity

Late in November we were at SES in Paris, where we had the opportunity to meet some of the most prominent figures in the French SEO and SEM market. One of the issues that came up in sessions and in conversations was a certain confusion about how to most effectively increase the link-based popularity of a website. As a result we thought it might be helpful to clarify how search engines treat link spamming to increase a site´s popularity.

This confusion lies in the common belief that there are two ways for optimizing the link-based popularity of your website: Either the meritocratic and long-term option of developing natural links or the risky and short-term option of non-earned backlinks via link spamming tactics such as buying links. We've always taken a clear stance with respect to manipulating the PageRank algorithm in our Quality Guidelines. Despite these policies, the strategy of participating in link schemes might have previously paid off. But more recently, Google has tremendously refined its link-weighting algorithms. We have more people working on Google's link-weighting for quality control and to correct issues we find. So nowadays, undermining the PageRank algorithm is likely to result in the loss of the ability of link-selling sites to pass on reputation via links to other sites.

Discounting non-earned links by search engines opened a new and wide field of tactics to build link-based popularity: Classically this involves optimizing your content so that thematically-related or trusted websites link to you by choice. A more recent method is link baiting, which typically takes advantage of Web 2.0 social content websites. One example of this new way of generating links is to submit a handcrafted article to a service such as http://digg.com. Another example is to earn a reputation in a certain field by building an authority through services such as http://answers.yahoo.com. Our general advice is: Always focus on the users and not on search engines when developing your optimization strategy. Ask yourself what creates value for your users. Investing in the quality of your content and thereby earning natural backlinks benefits both the users and drives more qualified traffic to your site.

To sum up, even though improved algorithms have promoted a transition away from paid or exchanged links towards earned organic links, there still seems to be some confusion within the market about what the most effective link strategy is. So when taking advice from your SEO consultant, keep in mind that nowadays search engines reward sweat-of-the-brow work on content that bait natural links given by choice.

In French / en Francais

Liens et popularité.
[Translated by] Eric et Adrien, l’équipe de qualité de recherche.

Les 28 et 29 Novembre dernier, nous étions à Paris pour assister à SES. Nous avons eu la chance de rencontrer les acteurs du référencement et du Web marketing en France. L’un des principaux points qui a été abordé au cours de cette conférence, et sur lequel il règne toujours une certaine confusion, concerne l’utilisation des liens dans le but d’augmenter la popularité d’un site. Nous avons pensé qu’il serait utile de clarifier le traitement réservé aux liens spam par les moteurs de recherche.

Cette confusion vient du fait qu’un grand nombre de personnes pensent qu’il existe deux manières d’utiliser les liens pour augmenter la popularité de leurs sites. D’une part, l’option à long terme, basée sur le mérite, qui consiste à développer des liens de manière naturelle. D’autre part, l’option à court terme, plus risquée, qui consiste à obtenir des liens spam, tel les liens achetés. Nous avons toujours eu une position claire concernant les techniques visant à manipuler l’algorithme PageRank dans nos conseils aux webmasters.

Il est vrai que certaines de ces techniques ont pu fonctionner par le passé. Cependant, Google a récemment affiné les algorithmes qui mesurent l’importance des liens. Un plus grand nombre de personnes évaluent aujourd’hui la pertinence de ces liens et corrigent les problèmes éventuels. Désormais, les sites qui tentent de manipuler le Page Rank en vendant des liens peuvent voir leur habilité à transmettre leur popularité diminuer.

Du fait que les moteurs de recherche ne prennent désormais en compte que les liens pertinents, de nouvelles techniques se sont développées pour augmenter la popularité d’un site Web. Il y a d’une part la manière classique, et légitime, qui consiste à optimiser son contenu pour obtenir des liens naturels de la part de sites aux thématiques similaires ou faisant autorité. Une technique plus récente, la pêche aux liens, (en Anglais « link baiting »), consiste à utiliser à son profit certains sites Web 2.0 dont les contenus sont générés par les utilisateurs. Un exemple classique étant de soumettre un article soigneusement prépare à un site comme http://digg.com. Un autre exemple consiste à acquérir un statut d’expert concernant un sujet précis, sur un site comme http://answers.yahoo.com. Notre conseil est simple : lorsque vous développez votre stratégie d’optimisation, pensez en premier lieu à vos utilisateurs plutôt qu’aux moteurs de recherche. Demandez-vous quelle est la valeur ajoutée de votre contenu pour vos utilisateurs. De cette manière, tout le monde y gagne : investir dans la qualité de votre contenu bénéficie à vos utilisateurs, cela vous permet aussi d’augmenter le nombre et la qualité des liens naturels qui pointent vers votre site, et donc, de mieux cibler vos visiteurs.

En conclusion, bien que les algorithmes récents aient mis un frein aux techniques d’échanges et d’achats de liens au profit des liens naturels, il semble toujours régner une certaine confusion sur la stratégie à adopter. Gardez donc à l’esprit, lorsque vous demandez conseil à votre expert en référencement, que les moteurs de recherche récompensent aujourd’hui le travail apporté au contenu qui attire des liens naturels.

Kamis, 14 Desember 2006

SES Chicago - Using Images

We all had a great time at SES Chicago last week, answering questions and getting feedback.

One of the sessions I participated in was Images and Search Engines, and the panelists had great information about using images on your site, as well as on optimizing for Google Image search.

Ensuring visitors and search engines know what your content is about
Images on a site are great -- but search engines can't read them, and not all visitors can. Make sure your site is accessible and can be understood by visitors viewing your site with images turned off in their browsers, on mobile devices, and with screen readers. If you do that, search engines won't have any trouble. Some things that you can do to ensure this:

  • Don't put the bulk of your text in images. It may sound simple, but the best thing you can do is to put your text into well, text. Reserve images for graphical elements. If all of the text on your page is in an image, it becomes inaccessible.
  • Take advantage of alt tags for all of your images. Make sure the alt text is descriptive and unique. For instance, alt text such as "picture1" or "logo" doesn't provide much information about the image. "Charting the path of stock x" and "Company Y" give more details.
  • Don't overload your alt text. Be descriptive, but don't stuff it with extra keywords.
  • It's important to use alt text for any image on your pages, but if your company name, navigation, or other major elements of your pages are in images, alt text becomes especially important. Consider moving vital details to text to ensure all visitors can view them.
  • Look at the image-to-text ratio on your page. How much text do you have? One way of looking at this is to look at your site with images turned off in your browser. What content can you see? Is the intent of your site obvious? Do the pages convey your message effectively?

Taking advantage of Image search
The panelists pointed out that shoppers often use Image search to see the things they want to buy. If you have a retail site, make sure that you have images of your products (and that they can be easily identified with alt text, headings, and textual descriptions). Searchers can then find your images and get to your site.

One thing that can help your images be returned for results in Google Image search is opting in to enhanced image search in webmaster tools. This enables us to use your images in the Google Image Labeler, which harnesses the power of the community for adding metadata to your images.

Someone asked if we have a maximum number of images per site that we accept for the Image Labeler. We don't. You can opt in no matter how many, or how few, images your site has.

Update: More information on using images can be found in our Help Center. 

Sabtu, 02 Desember 2006

Come and see us at Search Engine Strategies Chicago

If you're planning to be at SES Chicago this week, be sure to stop by and say hi to the many Googlers who are coming out to brave the cold and snow. Many of us will be on hand at the booth, speaking at sessions, and wandering the halls. Check out Search Engine Land for tips on how to spot some of us and be sure to catch our sessions:

Monday, December 4th

Drive traffic to your site with Google
Jessica Ewing, Product Manager, Google Gadgets
Vanessa Fox, Product Manager, Webmaster Central
Shashi Seth, Lead Product Manager, Custom Search Egnine

Lunch with Google Webmaster Central
Vanessa Fox, Product Manager, Webmaster Central
Amanda Camp, Software Engineer, Webmaster Tools
Trevor Foucher, Software Engineer, Webmaster Tools
Adam Lasnik, Search Evangelist
Evan Roseman, Software Engineer
Maile Ohye, Developer Support Engineer

Tuesday, December 5th

Bulk Submit 2.0
Amanda Camp, Software Engineer, Webmaster Tools

Domaining and Address Bar-Driven Traffic
Hal Bailey, Strategic Partner Manager

Duplicate Content and Multiple Site Issues
Adam Lasnik, Search Evangelist

Bot Obedience Course
Vanessa Fox, Product Manager, Webmaster Central

Meet the Search Ad Networks
Gretchen Howard, Online Sales and Operations Manager

Meet the Mobile Search Engines
Sumit Agarwal, Product Manager, Mobile

Wednesday, December 6th

Social Search Overview
Shashi Seth, Product Manager, Custom Search Engine

Images and Search Engines
Vanessa Fox, Product Manager, Webmaster Central

Vendor Chat on Measuring Success
Paul Botto, Google Analytics

Flash and Search Engines
Dan Crow, Product Manager

CSS, AJAX, Web 2.0, and Search Engines
Dan Crow, Product Manager

Auditing Paid Listings and Click Fraud Issues
Shuman Ghosemajumder, Business Product Manager, Trust and Safety

Thursday, December 7th

Meet the Crawlers
Evan Roseman, Software Engineer

Search Engine Q&A on Links
Adam Lasnik, Search Evangelist

Selasa, 28 November 2006

Viva, Webmasters in Vegas

Thanks for visiting us at WebmasterWorld PubCon in Las Vegas couple weeks ago. Whether it was at the panel sessions, the exhibitor hall, or the Safe Bets event, we had a blast meeting you and sharing with you the many Google products that are available to webmasters to enhance and drive traffic to your site. For those who weren't able to join us, here are answers to some of the top questions that we heard:

Q: How do I increase the visibility of my site in search results?
A: There are many factors that can impact visibility of your site in search results. We outlined just a few tips that can make a big difference to increasing your site's visibility in Google search results. First, you should ensure your site has quality content that is unique. Second, have quality sites link to your site. Third, submit a Sitemap to let us know about all the URLs on your site. Fourth, sign up for a webmaster tools account to get information how about Google sees your site, such as crawl errors, indexing details, and top queries to your site. Lastly, you can visit Webmaster Central and Webmaster Help Center for more webmaster related questions and resources.

Q How much do I have to pay to create a Google Custom Search Engine?

A: Nothing -- it's free. In addition to being able to create your own custom search engine on your site, you can make money on your site using AdSense for Search.

Q: Why is it better to create gadgets rather than create feeds?
A: First, gadgets are much more flexible. As a publisher, you control the format of your content. Second, gadgets are by nature more interactive. They can be built with flash, HTML or AJAX, and are generally much more interesting than feeds. Finally, your users can customize a gadget to their liking, making your content a lot more targeted.

Q: What is this new ad placement feature for AdSense and how come I don't see it in my account?
A: Ad placements are publisher-defined groups of ad units that advertisers will see when searching for places where they can target their ads. If you don't yet see it in your AdSense account, it's because we've been slowly rolling out this feature to everyone. This exciting feature will be available to all publishers in the next few weeks, so be sure to keep an eye out.

Q: What's the easiest way to put a searchable Google Map on my web page?
A: Use the Map Search Wizard to design a Google Map for your page. The wizard will write all of the code for you; all you need to do is copy and paste the code into your web page, and your users will see your location on a map.

For more information about Google products for webmasters, you can check them out here:
We also wanted to share some photos from PubCon. If you look closely enough, you may be able to see yourself.


Thanks for stopping by, on behalf of the 25 Googlers in attendance!

Senin, 20 November 2006

Introducing Sitemaps for Google News

Good news for webmasters of English-language news sites: If your site is currently included in Google News, you can now create News Sitemaps that tell us exactly which articles to crawl for inclusion in Google News. In addition, you can access crawl errors, which tell you if there were any problems crawling the articles in your News Sitemaps, or, for that matter, any articles on your site that Google News reaches through its normal crawl.

Freshness is important for news, so we recrawl all News Sitemaps frequently. The News Sitemaps XML definition lets you specify a publication date and time for each article to help us process fresh articles in timely fashion. You can also specify keywords for each article to inform the placement of the articles into sections on Google News.

If your English-language news site is currently included in Google News, the news features are automatically enabled in webmaster tools; just add the site to your account. Here's how the new summary page will look:

The presence of the News crawl link on the left indicates that the news features are enabled. A few things to note:
  • You will only have the news features enabled if your site is currently included in Google News. If it's not, you can request inclusion.

  • In most cases, you should add the site for the hostname under which you publish your articles. For example, if you publish your articles at URLs such as http://www.example.com/business/article123.html, you should add the site http://www.example.com/. Exception: If your site is within a hosting site, you should add the site for your homepage, e.g., http://members.tripod.com/mynewssite/. If you publish articles under multiple hostnames, you should add a site for each of them.

  • You must verify your site to enable the news features.

We'll be working to make the news features available to publishers in more languages as soon as possible.

Rabu, 15 November 2006

Joint support for the Sitemap Protocol

We're thrilled to tell you that Yahoo! and Microsoft are joining us in supporting the Sitemap protocol.

As part of this development, we're moving the protocol to a new namespace, www.sitemaps.org, and raising the version number to 0.9. The sponsoring companies will continue to collaborate on the protocol and publish enhancements on the jointly-maintained site sitemaps.org.

If you've already submitted a Sitemap to Google using the previous namespace and version number, we'll continue to accept it. If you haven't submitted a Sitemap before, check out the documentation on www.sitemaps.org for information on creating one. You can submit your Sitemap file to Google using Google webmaster tools. See the documentation that Yahoo! and Microsoft provide for information about submitting to them.

If any website owners, tool writers, or webserver developers haven't gotten around to implementing Sitemaps yet, thinking this was just a crazy Google experiment, we hope this joint announcement shows that the industry is heading in this direction. The more Sitemaps eventually cover the entire web, the more we can revolutionize the way web crawlers interact with websites. In our view, the experiment is still underway.

Selasa, 14 November 2006

Badware alerts for your sites

As part of our efforts to protect users, we have been warning people using Google search before they visit sites that have been determined to distribute badware under the guidelines published by StopBadware. Warning users is only part of the solution, though; the real win comes from helping webmasters protect their own users by alerting them when their sites have been flagged for badware -- and working with them to remove the threats.

It's my pleasure to introduce badware alerts in Google webmaster tools. You can see on the Diagnostic Summary tab if your site has been determined to distribute badware and can access information to help you correct this.

If your site has been flagged and you believe you've since removed the threats, go to http://stopbadware.org/home/review to request a review. If that's successful, your site will no longer be flagged -- and your users will be safer as a result of your diligence.

This version is only the beginning: we plan to continue to provide more data to help webmasters diagnose issues on their sites. We realize that in many cases, badware distribution is unintentional and the result of being hacked or running ads which lead directly to pages with browser exploits. Stay tuned for improvements to this feature and others on webmaster tools.

Update: this post has been updated to provide a link to the new form for requesting a review.


Update: More information is available in our Help Center article on malware and hacked sites.

Senin, 13 November 2006

Las Vegas Pubcon 2006

As if working at Google isn't already a party, today I'm traveling to Las Vegas for WebmasterWorld PubCon 2006! But instead of talking bets and odds, I'll be talking about how Google can help webmasters improve their sites. I love chatting with webmasters about all the work that goes into creating a great website. Several other Googlers will be there too, so if you have a burning question or just wanna talk about random stuff feel free to stop us and say hi. Besides the sessions, we'll be at the Google booth on Wednesday and Thursday, so come by and introduce yourself.

Here's the list of Google events at PubCon:

Tuesday 14

10:15 - 11:30 SEO and Big Search Adam Lasnik, Search Evangelist

1:30 - 2:45 PPC Search Advertising Programs Frederick Vallaeys, Senior Product Specialist, AdWords

2:45 - 4:00 PPC Tracking and Reconciliation Brett Crosby, Senior Manager, Google Analytics

Wednesday 15

10:15 - 11:30 Contextual Advertising Optimization Tom Pickett, Online Sales and Operations

11:35 - 12:50 Site Structure for Crawlability Vanessa Fox, Product Manager, Google Webmaster Central

1:30 - 3:10 Duplicate Content Issues Vanessa Fox, Product Manager, Google Webmaster Central

5:30 - 7:30 Safe Bets From Google Cocktail party!

Thursday 16

11:35 - 12:50 Spider and DOS Defense Vanessa Fox, Product Manager, Google Webmaster Central

1:30 - 3:10 Interactive Site Reviews Matt Cutts, Software Engineer

3:30 - 5:00 Super Session Matt Cutts, Software Engineer

You can view this schedule on Google Calendar here:

Come to "Safe Bets From Google" on Wednesday 5:30-7:30pm -- it's a cocktail party where you can mingle with other webmasters and Googlers, learn about other Google products for webmasters, and in typical Google style enjoy some great food and drinks. I'll be there with some other engineers from our Seattle office. Don't miss it!

Jumat, 10 November 2006

New third-party Sitemaps tools

Hello, webmasters, I'm Maile, and I recently joined the team here at Google webmaster central. And I already have good news to report: we've updated our third-party program and websites information. These third-party tools provide lots of options for easily generate a Sitemap -- from plugins for content management systems to online generators.

Many thanks to this community for continuing to innovate and improve the Sitemap tools. Since most of my work focuses on the Sitemaps protocol, I hope to meet you on our Sitemaps protocol discussion group.

Kamis, 09 November 2006

The number of pages Googlebot crawls

The Googlebot activity reports in webmaster tools show you the number of pages of your site Googlebot has crawled over the last 90 days. We've seen some of you asking why this number might be higher than the total number of pages on your sites.


Googlebot crawls pages of your site based on a number of things including:
  • pages it already knows about
  • links from other web pages (within your site and on other sites)
  • pages listed in your Sitemap file
More specifically, Googlebot doesn't access pages, it accesses URLs. And the same page can often be accessed via several URLs. Consider the home page of a site that can be accessed from the following four URLs:
  • http://www.example.com/
  • http://www.example.com/index.html
  • http://example.com
  • http://example.com/index.html
Although all URLs lead to the same page, all four URLs may be used in links to the page. When Googlebot follows these links, a count of four is added to the activity report.

Many other scenarios can lead to multiple URLs for the same page. For instance, a page may have several named anchors, such as:
  • http://www.example.com/mypage.html#heading1
  • http://www.example.com/mypage.html#heading2
  • http://www.example.com/mypage.html#heading3
And dynamically generated pages often can be reached by multiple URLs, such as:
  • http://www.example.com/furniture?type=chair&brand=123
  • http://www.example.com/hotbuys?type=chair&brand=123
As you can see, when you consider that each page on your site might have multiple URLs that lead to it, the number of URLs that Googlebot crawls can be considerably higher than the number of total pages for your site.

Of course, you (and we) only want one version of the URL to be returned in the search results. Not to worry -- this is exactly what happens. Our algorithms selects a version to include, and you can provide input on this selection process.

Redirect to the preferred version of the URL
You can do this using 301 (permanent) redirect. In the first example that shows four URLs that point to a site's home page, you may want to redirect index.html to www.example.com/. And you may want to redirect example.com to www.example.com so that any URLs that begin with one version are redirected to the other version. Note that you can do this latter redirect with the Preferred Domain feature in webmaster tools. (If you also use a 301 redirect, make sure that this redirect matches what you set for the preferred domain.)

Block the non-preferred versions of a URL with a robots.txt file
For dynamically generated pages, you may want to block the non-preferred version using pattern matching in your robots.txt file. (Note that not all search engines support pattern matching, so check the guidelines for each search engine bot you're interested in.) For instance, in the third example that shows two URLs that point to a page about the chairs available from brand 123, the "hotbuys" section rotates periodically and the content is always available from a primary and permanent location. If that case, you may want to index the first version, and block the "hotbuys" version. To do this, add the following to your robots.txt file:

User-agent: Googlebot
Disallow: /hotbuys?*

To ensure that this directive will actually block and allow what you intend, use the robots.txt analysis tool in webmaster tools. Just add this directive to the robots.txt section on that page, list the URLs you want to check in the "Test URLs" section and click the Check button. For this example, you'd see a result like this:

Don't worry about links to anchors, because while Googlebot will crawl each link, our algorithms will index the URL without the anchor.

And if you don't provide input such as that described above, our algorithms do a really good job of picking a version to show in the search results.

Selasa, 31 Oktober 2006

Target visitors or search engines?

Last Friday afternoon, I was able to catch the end of the Blog Business Summit in Seattle. At the session called "Blogging and SEO Strategies," John Battelle brought up a good point. He said that as a writer, he doesn't want to have to think about all of this search engine optimization stuff. Dave Taylor had just been talking about order of words in title tags and keyword density and using hyphens rather than underscores in URLs.

We agree, which is why you'll find that the main point in our webmaster guidelines is to make sites for visitors, not for search engines. Visitor-friendly design makes for search engine friendly design as well. The team at Google webmaster central talks a lot with site owners who care a lot about the details of how Google crawls and indexes sites (hyphens and underscores included), but many site owners out there are just concerned with building great sites. The good news is that the guidelines and tips about how Google crawls and indexes sites come down to wanting great content for our search results.

In the spirit of John Battelle's point, here's a recap of some quick tips for ensuring your site is friendly for visitors.

Make good use of page titles
This is true of the main heading on the page itself, but is also true of the title that appears in the browser's title bar.


Whenever possible, ensure each page has a unique title that describes the page well. For instance, if your site is for your store "Buffy's House of Sofas", a visitor may want to bookmark your home page and the order page for your red, fluffy sofa. If all of your pages have the same title: "Wecome to my site!", then a visitor will have trouble finding your site again in the bookmarks. However, if your home page has the title "Buffy's House of Sofas" and your red sofa page has the title "Buffy's red fluffy sofa", then visitors can glance at the title to see what it's about and can easily find it in the bookmarks later. And if your visitors are anything like me, they may have several browser tabs open and appreciate descriptive titles for easier navigation.

This simple tip for visitors helps search engines too. Search engines index pages based on the words contained in them, and including descriptive titles helps search engines know what the pages are about. And search engines often use a page's title in the search results. "Welcome to my site" may not entice searchers to click on your site in the results quite so much as "Buffy's House of Sofas".
Write with words
Images, flash, and other multimedia make for pretty web pages, but make sure your core messages are in text or use ALT text to provide textual descriptions of your multimedia. This is great for search engines, which are based on text: searchers enter search queries as word, after all. But it's also great for visitors, who may have images or Flash turned off in their browsers or might be using screen readers or mobile devices. You can also provide HTML versions of your multimedia-based pages (if you do that, be sure to block the multimedia versions from being indexed using a robots.txt file).

Make sure the text you're talking about is in your content
Visitors may not read your web site linearly like they would a newspaper article or book. Visitors may follow links from elsewhere on the web to any of your pages. Make sure that they have context for any page they're on. On your order page, don't just write "order now!" Write something like "Order your fluffy red sofa now!" But write it for people who will be reading your site. Don't try to cram as many words in as possible, thinking search engines can index more words that way. Think of your visitors. What are they going to be searching for? Is your site full of industry jargon when they'll be searching for you with more informal words?

As I wrote in that guest post on Matt Cutts' blog when I talked about hyphens and underscores:

You know what your site’s about, so it may seem completely obvious to you when you look at your home page. But ask someone else to take a look and don’t tell them anything about the site. What do they think your site is about?

Consider this text:

“We have hundreds of workshops and classes available. You can choose the workshop that is right for you. Spend an hour or a week in our relaxing facility.”

Will this site show up for searches for [cooking classes] or [wine tasting workshops] or even [classes in Seattle]? It may not be as obvious to visitors (and search engine bots) what your page is about as you think.

Along those same lines, does your content use words that people are searching for? Does your site text say “check out our homes for sale” when people are searching for [real estate in Boston]?

Make sure your pages are accessible
I know -- this post was supposed to be about writing content, not technical details. But visitors can't read your site if they can't access it. If the network is down or your server returns errors when someone tries to access the pages of your site, it's not just search engines who will have trouble. Fortunately, webmaster tools makes it easy. We'll let you know if we've had any trouble accessing any of the pages. We tell you the specific page we couldn't access and the exact error we got. These problems aren't always easy to fix, but we try to make them easy to find.