IncreasePagesWithAdvertisements

Revision as of 13:16, 21 November 2007 by Mohammad Ghufran (talk | contribs)



OurWork Edit-chalk-10bo12.png

What (summary)

Improve the AdSidebar adult content filter so that fewer non-adult pages are flagged and advertisements show up on more page. Even more importantly, catch more of the pages that actually do have adult content on them so that Google doesn't get angry at us for showing ads on adult pages.

Why this is important

Revenue determines how many resources we have to spend on developing cool features and tools for our community. We need to improve our revenue.

DoneDone

  • Advertisements show up on at least 95% of the pages that are not adult content
  • Advertisements sow up on at most .5% of the pages that are adult content

Steps to get to DoneDone

  • Collect a sample of pages
    • hand-audit for adult pages in sample
  • Partition the adult content keywords into levels of suggestiveness
  • If 1 of these words is detected ... label as adult
  • If 2 or more of these words show up at least once each in the page ... label as adult
  • If 3 or more of these words show up at least once each in the page ... label as adult

Current Status

Today we worked on two different approaches. One was to experiment with a scheme which uses different regular expressions based on their suggestiveness. Based on these REs, we tried to improve the numbers. Using simple counts, this method was an improvement but not a significant one.

The second approach we tried was a statistical one. We wrote code to train and then calculate probabilities of specific features from documents.

As a next step, we will improve the approach to find out the probabilities and implement a basic structure to determine whether a given document is adult or not based on the probabilities of the words contained within the document.

Current Numbers

  • Total: 37,526
  • Correctly Marked as Adult: 33,447
  • Correctly Marked as Not Adult: 1,097
  • Incorrectly Marked as Adult: 321
  • Incorrectly Marked as Not Adult: 2,661

Interesting Cases

  • NiceRoundAsses.com ... because all of the naughty words are buried in domain names ... but there are a lot of them

See



Retrieved from "http://aboutus.com/index.php?title=IncreasePagesWithAdvertisements&oldid=12502844"