AddressParsing

OurWork Edit-chalk-10bo12.png

What (summary)

Why this is important

DoneDone

Steps to get to DoneDone

Finding Addresses

We've demonstrated that it's really easy to create simple regular expression detectors and that the regular expressions themselves can be quite large with no problem. We've also demonstrated that it isn't much more complicated to combine them in a sequence, and that the speed of the sequence is determined by the speed of the first recognizer run in the sequence.

Things like adding word boundaries '\b' to regular expressions can dramatically speed them up. In fact, moving the '\b' from inside the alternation to outside of it on country.rb resulted in a 4x speedup for it and all recognizers that start by recognizing a country.

Finding complete addresses depends upon what we lead with. If we lead with an organization or a street, it will be slow. If on the other hand we lead with the postal_code it will be fast.


It seems like what we want to be able to do is create a complex pattern from simpler patterns -- A sequence of simple patterns. The key is being able to select the first one to be run from the sequence and pin the pattern down. Then move out and recognize the immediately preceding and immediately following patterns (do this recursively).

The StringScanner class already allows us to set the position to begin searching for a match from, and our recognizers are already outputing the location they matched. Now we just need to create a simple mechanism for specifying which recognizer to run first, then adding something onto the start or end of the string that we've already matched.

recognizer.match_first       postal_code
recognizer.then_skip_after   whitespace
recognizer.then_match_after  country
recognizer.then_skip_before  whitespace
recognizer.then_match_before state_or_province
recognizer.then_skip_before  whitespace
recognizer.then_match_before street

Refresh Specification

When a domain page is viewed:

  • Do we have a contact info section that has been human edited? If so do nothing
  • Else
    • Pull new contact information and dump into contact info section
    • Overwrite if contact info section already exists
    • Protect with the "address" tag
    • Lookup lat/longitude via google api
    • Mashup lat/longitude with google maps


Retrieved from "http://aboutus.com/index.php?title=AddressParsing&oldid=14739886"