In the United States, ZIP Codesare used to facilitate postal deliveries. They are created andmaintained by the United States Postal Service (USPS), but are alsorequired by other shippers and appear in a variety of administrativedata contexts. ZIP stands for “Zone Improvement Plan,” which wasintroduced in 1963.
ZIP Code Basics
A complete ZIP Code consists of two elements - a five-digit numberand a four-digit number. Together, these nine digits represent aspecific delivery area (sometimes referred to as a carrier route). Thefirst five digits, which we’ll refer to as ZIP Codes for shorthand,represent one of the following:
- a general delivery area,
- a Post Office, to facilitate deliveries to PO Boxes,
- or a single high-volume customer, such as a university, federalagency, or large company.
The most common form of a ZIP Code is the general delivery area.While we imagine these to be distinct, non-overlapping regions on a map,they are actually collections of carrier routes that sometimes overlapwith each other.
Those individual carrier routes are represented by the second,four-digit number. These may correspond to a particular city block,apartment building, or other delivery area within a ZIP Code. For POBoxes, individual boxes may be assigned their own four-digit number.Generally speaking, the four-digit add-on is not especially useful forRWE.
ZIP codes are assigned to four types of jurisdictions:
- states,
- the District of Columbia,
- insular areas, which include:
- five inhabited territories (American Samoa, Guam, the NorthernMariana Islands, Puerto Rico, and United States Virgin Islands)
- as well three independent nations (the Federated States ofMicronesia, the Republic of the Marshall Islands, and the Republic ofPalau) that are part of the Compact of Free Association (COFA), and
- federal facilities in other countries, most notably Department ofDefense installations.
The Compact of Free Association (COFA) is a wide-ranging legalagreement that gives the three nations access to many U.S. federalservices usually considered domestic programs. This includes USPSdeliveries, and so all three nations are assigned ZIP Codes as well.
ZIP Codes generally are patterned regionally, with ZIP Codesbeginning with 0
being located in the Northeast. Valuesincrease westward, with the highest ZIP Codes in Alaska, Hawaii, andislands in the Pacific. These ZIP Codes all begin with a9
.
States are assigned one or more initial digits. For example, New YorkState primarily has ZIP Codes beginning with values between10
and 14
for their first two digits. However,there are routine exceptions. 06390
(Fishers Island) isalso a valid ZIP Code in New York even though it begins with0
. Likewise, federal agencies located in Maryland andVirginia have Washington, D.C. assigned ZIP Codes. The first digit,therefore, does not correspond to Census Region orDivision, and it is not possible to aggregate up tothose geographies based on the first digit alone. ZIP Codes alsocannot be aggregated to counties or states withcomplete accuracy.
Since ZIP Codes are designed to facilitate the delivery of mail, andnot other uses, they do not neatly nest into other jurisdictionalboundaries. They regularly cross county boundaries, and sometimes crossstate boundaries as well, especially in rural areas where postaldelivery is best facilitated from a neighboring state.
Finally, it is important to know that ZIP Codes are not permanent.Carrier routes are updated frequently, and even the ZIP Codes themselvesare subject to revision occasionally. For example, some ZIP Codes havebeen split to accommodate population growth and housingconstruction.
Three-Digit ZIP Codes
Three-digit ZIP Codes have a specific meaning for the USPS, and aresometimes also used in patient-level data to preserve confidentiality.More details on working with three-digit ZIPs can be found in a separate vignette.
ZIP Code Tabulation Areas
The USPS does not publish a map of ZIP Codes because carrier routesare constantly changing and being revised. Moreover, since ZIP Codes arenot areas on maps as we imagine them, they are not suitable foraggregation or analysis. The Census Bureau has created ZIP CodeTabulation Areas (ZCTAs) to approximate ZIP Code areas for the purposesof data analysis and mapping. ZCTAs are created by aggregating Censusblocks that have the same first three digits of their ZIP Codes. Thismeans that ZCTAs are not the same as ZIP Codes, and they are not thesame as carrier routes. They are a useful approximation for manypurposes, but they are not perfect. ZCTAs are updated every year, thoughthe most significant updates occur every decade with the release of theDecennial Census.
ZCTAs are created by identifying the most common ZIP Code within eachCensus Block, and then dissolving those individual Census Blockstogether to create a single polygon. This results in misclassificationat the address-level, where some address points within a given CensusBlock will be assigned a ZCTA that differs from their individual ZIPCode. Correcting this form misclassification requires point-leveladdress data, which may be available in some areas but are not availablefor the entire United States. Addressing this problem is beyond thescope of zippeR
.
ZCTAs are not created for areas that have no population, or areasthat have very sparse populations. This affects areas in the AmericanWest, for example, that have very few residents.
Finally, it is important to note that not all ZIP Codes have ananalogous ZCTA. Some ZIP Codes are used for Post Office Boxes, and theseare not included in the ZCTA data. Using ZIP to ZCTA crosswalk files canhelp address this form of misclassification, and is the subject of aseparate vignette.
ZIP Code and ZCTA Formatting
One of the core features of zippeR
validate inputs ofZIP codes or ZCTA codes. For example, here are a set of ZCTAs that lieon the Missouri/Iowa border:
zcta5 <- c("51640", "52542", "52573", "5262x")
Notice how the last element contains a non-numeric character. Whenzcta5
is passed to zi_validate()
, it willcatch the formatting issue. There are two options, one of which returnsa single logical value (TRUE
or FALSE
):
> library(zippeR)> zi_validate(zcta5)[1] FALSE
The other option, with verbose = TRUE
, providesadditional data about where formatting issues may exist:
> zi_validate(zcta5, verbose = TRUE)# A tibble: 4 × 2 condition result <chr> <lgl> 1 Input is a character vector? TRUE 2 All input values have 5 characters? TRUE 3 No input values are over 5 characters long? TRUE 4 All input values are numeric? FALSE
For the third and fourth tests, users are strongly encourage toattempt to manually correct problems. However, zi_repair()
can be used to address the first and second tests, and will returnNA
values for ZIPs or ZCTAs that do not pass the third andfourth tests:
> zi_repair(zcta5)[1] "51640" "52542" "52573" NA Warning message:In zi_repair(zcta5) : NAs introduced by coercion
When malformed ZIPs or ZCTAs are replaced with NA
values, zi_repair()
will return a warning. Note thatzi_validate()
also works with three-digit ZCTAs aswell:
> zcta3 <- c("516", "525", "526")> zi_validate(zcta3, style = "zcta3")[1] TRUE
Note that, at this time, the validation process does not ensure thatinputs correspond to valid ZCTAs.