Dec 27

Requirement: Develop a Regular Expression to parse a single line of text considering any of the following combinations of City, State, and Zip Code valid:

  • City State
  • City<space>Name State
  • City, State
  • City<space>Name, State
  • City State Zip
  • City<space>Name State Zip
  • City, State Zip
  • City<space>Name, State Zip
  • Zip

Regular Expressions can be very powerful, but also can be a bit hard to develop and read. I would recommend using a good RegEx editor to help you build and test your expressions. Also, make sure to comment your expressions to assist in future maintenance and debugging.

After trying out a few RegEx tools, I recommend using Expresso which is available on the Ultrapico web site. The tool includes a 60 day trial period, and at the time of writing this article, the actual registration for the tool is also free.

To accomplish the above requirement, I found that using named groups and multiple or conditions (‘|’) was the best way to handle all the different combinations of City, State, and Zip Code I needed to support. This made it easy to allow for commas or the lack of commas and different spacing in the input text as well.

Using Expresso, I came up with the following Regular Expression:

(updated 2/4/2010 to allow for ‘.’ (period) and ‘-’ (dash) in City names)

#Parse address line into named groups (City, State, Zip)

^                         #Beginning of string

(                         #Start OR condition

(                         #Begin first condition (City, State, Zip)

(?<City>[A-Za-z\.\-\s]+)  #City

( (?:,\s?) | (?:\s?) )\b  #Comma, comma space, or space

(?<State>[A-Za-z]{2})     #State

(?:\s?)                   #Space

(?<Zip>\d{5}(-\d{4})?)    #Zip

) |                       #End first condition

(                         #Begin second condition (City, State)

(?<City>[A-Za-z\s]+)      #City

( (?:,\s?) | (?:\s?) )\b  #Comma, comma space, or space

(?<State>[A-Za-z]{2})     #State

(?:\s?)                   #Space

) |                       #End second condition

(                         #Begin third condition (Zip)

(?<Zip>\d{5}(-\d{4})?)    #Zip

)                         #End third condition

)                         #End OR condition

$                         #End of string

 

I then incorporated the new Regular Expression into a simple .NET console application to test. The application prompts for an input and then outputs any valid combinations that are matched. The complete listing for the console application is below.

using System;

using System.Text;

using System.Text.RegularExpressions;

 

namespace ConsoleApplication1

{

  class Program

  {

    static void Main(string[] args)

    {

      string addressToParse = String.Empty;

      Console.WriteLine("Sample RegEx application to parse combinations of City, State, and Zip Code.");

      Console.WriteLine();

      Console.Write("Enter address or <Enter> to Quit: ");

 

      while (true)

      {

        addressToParse = Console.ReadLine();

        if (addressToParse.Length > 0)

        {

          ParseAddressSegments(addressToParse);

          Console.WriteLine();

          Console.Write("Enter address or <Enter> to Quit: ");

        }

        else

        {

          break;

        }

      }

    }

 

    private static void ParseAddressSegments(string addressToParse)

    {

      StringBuilder pattern = new StringBuilder();

      pattern.Append(@"#Parse address line into named groups (City, State, Zip)" + Environment.NewLine);

      pattern.Append(@"^                         #Begining of string" + Environment.NewLine);

      pattern.Append(@"(                        #Start OR condition" + Environment.NewLine);

      pattern.Append(@"(                        #Begin first condition (City, State, Zip)" + Environment.NewLine);

      pattern.Append(@"(?<City>[A-Za-z\.\-\s]+)  #City" + Environment.NewLine);

      pattern.Append(@"( (?:,\s?) | (?:\s?) )\b  #Comma, comma space, or space" + Environment.NewLine);

      pattern.Append(@"(?<State>[A-Za-z]{2})    #State" + Environment.NewLine);

      pattern.Append(@"(?:\s?)                  #Space" + Environment.NewLine);

      pattern.Append(@"(?<Zip>\d{5}(-\d{4})?)    #Zip" + Environment.NewLine);

      pattern.Append(@") |                      #End first condition" + Environment.NewLine);

      pattern.Append(@"(                        #Begin second condition (City, State)" + Environment.NewLine);

      pattern.Append(@"(?<City>[A-Za-z\s]+)      #City" + Environment.NewLine);

      pattern.Append(@"( (?:,\s?) | (?:\s?) )\b  #Comma, comma space, or space" + Environment.NewLine);

      pattern.Append(@"(?<State>[A-Za-z]{2})    #State" + Environment.NewLine);

      pattern.Append(@"(?:\s?)                  #Space" + Environment.NewLine);

      pattern.Append(@") |                      #End second condition" + Environment.NewLine);

      pattern.Append(@"(                        #Begin third condition (Zip)" + Environment.NewLine);

      pattern.Append(@"(?<Zip>\d{5}(-\d{4})?)    #Zip" + Environment.NewLine);

      pattern.Append(@")                        #End third condition" + Environment.NewLine);

      pattern.Append(@")                        #End OR condition" + Environment.NewLine);

      pattern.Append(@"$                         #End of string" + Environment.NewLine);

 

      Regex rgx = new Regex(pattern.ToString(), RegexOptions.IgnoreCase

                                                | RegexOptions.CultureInvariant

                                                | RegexOptions.IgnorePatternWhitespace

                                                | RegexOptions.Compiled);

      Match match = rgx.Match(addressToParse);

      if (match.Success)

      {

        foreach (string name in rgx.GetGroupNames())

        {

          if ( (match.Groups[name].Value != String.Empty) && (name == "City" || name == "State" || name == "Zip"))

          {

            Console.WriteLine(@"{0} = ""{1}""", name, match.Groups[name].Value.Trim());

          }

        }

      }

    }

 

  }

}

Download the sample Visual Studio 2008 solution.

Happy RegEx coding!

255 Responses to “A Regular Expression (RegEx) to parse city, state and zip code for .NET”

  1. Hi Homie
    That was a good reading and informative. You obviously know your stuff!

  2. RACHELL VIOLETTE earned about $112 million for s RAEANN

  3. How did things today make you feel?

  4. I am just making a blog related to this. If you allow, I would like to use some of your content. And with full refernce of course. Thanks in advance.

  5. Wow, great article.Much thanks again. Will read on…

Leave a Reply

preload preload preload