Difference between revisions of "Rewrite PageCreationBot"

(Added stuff from RewritePageCreationScraper so we can nuke RewritePageCreationScraper)
Line 7: Line 7:
 
* Still relies on Java/Tomcat to do crawling (for now)
 
* Still relies on Java/Tomcat to do crawling (for now)
 
* Carefully tested
 
* Carefully tested
 
+
* This is essentially a 1-1 rewrite of the scraping pieces of the bot in ruby instead of Java.
  
 
== Why this is important ==
 
== Why this is important ==
Line 13: Line 13:
 
* We need to have control over the pages that our created on our site.
 
* We need to have control over the pages that our created on our site.
 
* The old bot was known to pollute the database; we need control over all the access points that could screw up our data.
 
* The old bot was known to pollute the database; we need control over all the access points that could screw up our data.
 +
* Gaining mastery over the code so that we can add new features easily.
  
  

Revision as of 07:34, 16 August 2007

DevelopmentTeam


What (summary)

  • New page-building bot
  • Still relies on Java/Tomcat to do crawling (for now)
  • Carefully tested
  • This is essentially a 1-1 rewrite of the scraping pieces of the bot in ruby instead of Java.

Why this is important

  • We need to have control over the pages that our created on our site.
  • The old bot was known to pollute the database; we need control over all the access points that could screw up our data.
  • Gaining mastery over the code so that we can add new features easily.


DoneDone

  • Creates news pages based on a template
  • Monitoring and logging have been added (tests whether or not the bot succeeds)
  • Hooked in to all the old points Bot was
  • Checks robots.txt before spidering the website.

Retrieved from "http://aboutus.com/index.php?title=Rewrite_PageCreationBot&oldid=8931321"