Rewrite PageCreationBot
OurWork < DevelopmentTeam < Priorities < (2) Rewrite PageCreationBot (Ghufran, Umar Sheikh)
What (summary) edit
Current Status edit
Why this is important edit
DoneDone edit
Bot insertion points into Mediawiki edit
Schema edit
Discussion edit
What (summary) edit
- New page-building bot
- Still relies on Java/Tomcat to do crawling (for now)
- Carefully tested
Current Status edit
-
Creates new pages based on a template -
Monitoring and Logging has been added -
Test cases added - We have created a sample page which is a rough sketch of how a page looks like after being created by the bot. Here...
- The current version of the PageCreationBot is not using the thumbnail extracted from Alexa. It is currently using the thumbnail tag being used in the Domain_Page template.
- This can be changed by using the get_thumbnail function that is already in place.
Why this is important edit
- We need to have control over the pages that are created on our site.
- The old bot was known to pollute the database; we need control over all the access points that could screw up our data.
- Gaining mastery over the code so that we can add new features easily.
DoneDone edit
- Creates news pages based on a template
- Monitoring and logging have been added (tests whether or not the bot succeeds)
- Output to a log file. Either on each squal box (with aggregation) or an NFS volume. Have emailed Ethan and Michael about this.
- Hooked in to all the old points Bot was
- Not exactly the same points, but the same end-user functionality.
- Projects:BotTest problems fixed
Bot insertion points into Mediawiki edit
-
/wiki/skins/common/generatePage.js (and some other javascript that we should remove) -
/wiki/extensions/AboutUsDomainRedirect/SpecialRedirectToDomain.php (deprecate and point to CaseSpace) -
/wiki/extensions/CaseSpace/CaseSpace.php (Ultimately, here is where the magic will happen.) - /wiki/extensions/AboutUsBuildDomain/AboutUsBuildDomain.php should be the best place to keep it.
Schema edit
- New schema location http://images.aboutus.org/images/b/be/Aboutusbot_new.zip. Its an sql file and not a compressed one.
Discussion edit
- I heard rumor of a possible change in format for new pages. Is this true? Where is the discussion about the new format possibilities happening? TedErnst | talk 13:50, 25 October 2007 (PDT)
- I think that the bot is still using
tag instead of the tag
with the new name. Please correct me if I'm wrong. :) Vartan 17:21, 25 October 2007 (PDT)

