WhoisRefresh
DevelopmentTeam < Priorities < WhoisRefresh (14, 10-9, 8-6, 4-2, 4)
|
This project actually brings together several smaller projects into one. Essentially, it involves a rewrite of the page creation bot to use several new data sources.
- (10)
WhoisParsing(Arif Iqbal) (7-10) - (2) Rewrite PageCreationBot (Ghufran, Umar Sheikh)
- MediaWiki:InitialDomainPage
- (4) Rewrite PageScrapeBot (Laiq)
- (4) WhoisRefreshRunRefresh (Jason and Ali)
edit Steps to DoneDone
- Branch whois-refresh
- Get it running on our local machines
- Assess where we currently are
- Ruthlessly prune ... spin off any tasks that aren't absolutely essential onto their own pages that will be considered later
edit Synopsis
A new domain-page is scraped, and populated
- When a red-link is clicked
- When a search for a domain doesn't return a page
edit AcceptanceTest
Look at 50 pages that we know don't have contact info and verify that the contact info is coming in.
edit Background
- rfc3912 the current protocol specification
- Domain Statistics by TLD
- 100 oldest dot com domains
- Registrar Stats
The ranking and percentages come from http://populicio.us/toptlds.html and are at least as stale as November 2006.
| Rank | TLD | Purpose | Percentage | Whois Server |
|---|---|---|---|---|
| 1 | .com | commercial organizations | 58.3973 | whois.verisign-grs.com |
| 2 | .org | organizations | 12.8734 | whois.pir.org |
| 3 | .net | network infrastructures | 7.3600 | whois.verisign-grs.com |
| 4 | .uk | United Kingdom | 3.2604 | whois.nic.uk |
| 5 | .edu | educational establishments accredited in the United States | 2.7008 | whois.educause.edu |
| 6 | .jp | Japan | 2.6159 | whois.jprs.jp |
| 7 | .de | Germany | 2.1484 | whois.denic.de |
| 8 | .br | Brazil | 0.8066 | whois.registro.br |
| 9 | .ca | Canada | 0.7208 | whois.cira.ca |
| 10 | .gov | governments and their agencies in the United States | 0.6832 | whois.dotgov.gov |
| 11 | .au | Australia | 0.6463 | whois.aunic.net |
| 12 | .info | informational sites | 0.5674 | whois.afilias.net |
| 13 | .nl | Netherlands | 0.5380 | whois.domain-registry.nl |
| 14 | .fr | France | 0.5108 | whois.nic.fr |
| 15 | .us | United States | 0.5030 | whois.nic.us |
| 16 | .ru | Russian Federation | 0.4610 | whois.ripn.net |
| 17 | .it | Italy | 0.3527 | whois.nic.it |
| 18 | .cn | China | 0.3480 | whois.cnnic.net.cn |
| 19 | .ch | Switzerland | 0.2761 | whois.nic.ch |
| 20 | .tw | Taiwan | 0.2727 | whois.twnic.net.tw |
| 21 | .es | Spain | 0.2699 | |
| 22 | .se | Sweden | 0.2493 | whois.iis.se |
| 23 | .dk | Denmark | 0.1957 | whois.dk-hostmaster.dk |
| 24 | .be | Belgium | 0.1956 | whois.dns.be |
| 25 | .pl | Poland | 0.1816 | whois.dns.pl |
| 26 | .at | Austria | 0.1659 | whois.nic.at |
| 27 | .il | Israel | 0.1559 | whois.isoc.org.il |
| 28 | .tv | Tuvalu | 0.1553 | |
| 29 | .nz | New Zealand | 0.1233 | whois.srs.net.nz |
| 30 | .biz | business use | 0.1188 | whois.biz |
| ?? | .eu | European Union | ??? | whois.eu |
edit Information We Need
- Date of lookup
- Registrant Address
- Admin Address
- Phone
- Question: do we need both the registrant and admin addresses or is one enough? In the past we've only used one. - Ray | talk
edit Next
-
Get gpMakeImage working so that tests pass - Send email
- Given an address get a lattitude/longitude
- Obfuscate address
. . .
![]()
edit Latitude/Longitude
Need a table so that we can associate one or more lat/long pairs with a page.
edit Some neat regular expressions
These are from the book, "The Ruby Way"
- The following regex matches a phone number in the NANP format (North American Numbering Plan). It allows three common ways of writing such a phone number:phone = /^((\(\d{3}\) |\d{3}-)\d{3}-\d{4}|\d{3}\.\d{3}\.\d{4})$/
"(512) 555-1234" =~ phone # true "512.555.1234" =~ phone # true "512-555-1234" =~ phone # true "(512)-555-1234" =~ phone # false "512-555.1234" =~ phone # false
- Here is a regex to match a U.S. ZIP Code (which may be five or nine digits):
zip = /^\d{5}(-\d{4})?$/
- The following regex matches all the 51 usual codes (50 states and the District of Columbia):
state = /^A[LKZR] | C[AOT] | D[EC] | FL | GA | HI | I[DLNA] | K[SY] | LA | M[EDAINSOT] | N[EVHJMYCD] | O[HKR] | PA | RI | S[CD] | T[NX] | UT | V[TA] | W[AVIY]$/x
edit Whois Records We Need
We need 50 more whois records covering the range of formats for each of the following registrars:
- public domain
- ONLINENIC
- STRATO
- BASICFUSION.COM
- DOMAINNAMESALES | domain name sales
- core
- METAPREDICT.COM
We have a few but need 50 whois records covering the range of formats for each of the following registrars:
- ascio
- beijing Innovative
- belgium Domains
- capitol_domain
- discount_domain
- domaindiscover
- domain_doorman
- domainsite
- dotregistrar
- dotster
- fabulous.com
- gandi
- hichina
- innerwise
- joker.com
- Key systems
- Mark Monitor
- Melbourne IT
- Moniker
- NameKing
- Name.Net
- Names4ever
- namesdirect
- nameview
- nicline
- ovh
- psi-usa
- registerfly
- schlund
- srsplus
- wild west domains

.
.
.