Difference between revisions of "AboutUsSiteMap"
Line 22: | Line 22: | ||
* <s>Create a script that takes a limit and offset parameter to generate the sitemap for these pages</s> | * <s>Create a script that takes a limit and offset parameter to generate the sitemap for these pages</s> | ||
* Refactor the code | * Refactor the code | ||
+ | * Convert time into W3C Datetime format | ||
* Find a function in Ruby to encode the URLs in UTF-8 | * Find a function in Ruby to encode the URLs in UTF-8 | ||
* Generate sitemap_index. | * Generate sitemap_index. |
Revision as of 10:40, 11 February 2008
What (summary)
Index our site and present search engines with the resulting sitemap.
Why this is important
Traffic from search engines is our bread and butter. If there are pages that they would index if only they knew about them ... we should let them know. This has a 5% chance of doubling our traffic with 3 days worth of work.
DoneDone
- We have a validated index and sitemap files that include all of our page titles
- We are serving it up from the proper location so that googlebot and other search engines crawlers can find it
Steps to get to DoneDone
-
Create a branch and stage it locally.sitemap -
Read out http://sitemaps.org/ -
Read out http://sitemaps.org/protocol.php -
Read out https://www.google.com/webmasters/tools/docs/en/protocol.html -
Read out https://www.google.com/webmasters/tools/docs/en/sitemap-generator.html -
Understand this task -
Get a list of all of our pagesGot hold of 100k pages from mist. -
Load these pages into our branch database. -
Create a script that takes a limit and offset parameter to generate the sitemap for these pages - Refactor the code
- Convert time into W3C Datetime format
- Find a function in Ruby to encode the URLs in UTF-8
- Generate sitemap_index.
- Write a runner script that will walk over the pages table and generate xml for sitemap.
- Break it up into sitemaps that have no more than 50,000 urls and are smaller than 10MB each
Notes
- http://sitemaps.org/
- google's link to the sitemap protocol
- google's link to a python sitemap generator