Difference between revisions of "Rewrite PageCreationBot"
Line 19: | Line 19: | ||
* Hooked in to all the old points Bot was | * Hooked in to all the old points Bot was | ||
* Checks robots.txt before spidering the website. | * Checks robots.txt before spidering the website. | ||
+ | |||
+ | == Bot insertion points into Mediawiki == | ||
+ | * /wiki/skins/common/generatePage.js (and some other javascript that we should remove) | ||
+ | * /wiki/extensions/AboutUsDomainRedirect/SpecialRedirectToDomain.php | ||
[[Category:DevelopmentTeamTask]] | [[Category:DevelopmentTeamTask]] | ||
</noinclude> | </noinclude> |
Revision as of 23:43, 23 August 2007
What (summary)
- New page-building bot
- Still relies on Java/Tomcat to do crawling (for now)
- Carefully tested
Why this is important
- We need to have control over the pages that our created on our site.
- The old bot was known to pollute the database; we need control over all the access points that could screw up our data.
- Gaining mastery over the code so that we can add new features easily.
DoneDone
- Creates news pages based on a template
- Monitoring and logging have been added (tests whether or not the bot succeeds)
- Hooked in to all the old points Bot was
- Checks robots.txt before spidering the website.
Bot insertion points into Mediawiki
- /wiki/skins/common/generatePage.js (and some other javascript that we should remove)
- /wiki/extensions/AboutUsDomainRedirect/SpecialRedirectToDomain.php