12 Geocoding

In Chapter 10 we got the address of every officer-involved shooting in Philadelphia. To be able to graph where each shooting occurred we need to convert the addresses into geographic coordinates, a process called geocoding.

This type of activity is common in research studying the effect of place. For example, several recent studies have looked at the effect of marijuana dispensaries on crime around the dispensary. For these analyses they find the coordinates of each crime in the city and see if it occurred in a certain distance from the dispensary. Many crime data sets provide the coordinates of where each occurred, however sometimes the coordinates are missing - and other data such as marijuana dispensary locations give only the address - meaning that we need a way to find the coordinates of these locations.

12.1 Geocoding a single address

In this chapter we will cover using the free geocoder from ArcGIS, a software that people frequently use when dealing primarily with mapping projects. Google Maps used to be easily usable in R but since 2018 requires an account to use it’s geocoder so we will not be using it.

The URL for geocoding using ArcGIS is the following:

https://geocode.arcgis.com/arcgis/rest/services/World/GeocodeServer/findAddressCandidates?f=json&singleLine=ADDRESS&outFields=Match_addr,Addr_type

where instead of “ADDRESS” we put in the address whose coordinates we want. As an example, let’s look at Penn’s McNeil Building.

https://geocode.arcgis.com/arcgis/rest/services/World/GeocodeServer/findAddressCandidates?f=json&singleLine=38th%20St%20and%20Walnut%20St,%20Philadelphia,%20PA&outFields=Match_addr,Addr_type

Including spaces in the address causes errors so all spaces need to be replaced with %20. Let’s see what data we get back from this URL. Enter the URL above in your browser and you’ll see these results.

It gives us a page with several important values. For our purposes we want the “lat” and “lon” sections which are the latitude and longitude parts of a location’s coordinates.

This data is stored on the page in a JSON format which is a convenient (for computers to read) way data is stored online. We can concert it to a data.frame that we’re more familiar with using the package jsonlite.

install.packages("jsonlite")

We will use the fromJSON() function and enter in the URL right in the ().

library(jsonlite)
fromJSON("https://geocode.arcgis.com/arcgis/rest/services/World/GeocodeServer/findAddressCandidates?f=json&singleLine=38th%20St%20and%20Walnut%20St,%20Philadelphia,%20PA&outFields=Match_addr,Addr_type")
#> $spatialReference
#> $spatialReference$wkid
#> [1] 4326
#> 
#> $spatialReference$latestWkid
#> [1] 4326
#> 
#> 
#> $candidates
#>                                                           address
#> 1        S 38th St & Walnut St, Philadelphia, Pennsylvania, 19104
#> 2                    Walnut St, Philadelphia, Pennsylvania, 19102
#> 3                    Walnut St, Philadelphia, Pennsylvania, 19107
#> 4                    Walnut St, Philadelphia, Pennsylvania, 19139
#> 5                    Walnut St, Philadelphia, Pennsylvania, 19104
#> 6                    Walnut St, Philadelphia, Pennsylvania, 19106
#> 7                    Walnut St, Philadelphia, Pennsylvania, 19103
#> 8   E State St & N Walnut St, Kennett Square, Pennsylvania, 19348
#> 9   E State St & S Walnut St, Kennett Square, Pennsylvania, 19348
#> 10             State Rd & Walnut Ln, Telford, Pennsylvania, 18969
#> 11         State Rd & Walnut Ave E, Bensalem, Pennsylvania, 19020
#> 12                     Walnut St, Middleport, Pennsylvania, 17953
#> 13 E Street Rd & N Walnut Rd, Kennett Square, Pennsylvania, 19348
#>    location.x location.y score
#> 1   -75.19868   39.95362 99.61
#> 2   -75.16610   39.94957 89.84
#> 3   -75.15921   39.94872 89.84
#> 4   -75.22944   39.95745 89.84
#> 5   -75.19633   39.95332 89.84
#> 6   -75.14852   39.94737 89.84
#> 7   -75.17449   39.95063 89.84
#> 8   -75.70560   39.84894 88.18
#> 9   -75.70500   39.84917 88.18
#> 10  -75.32272   40.33865 87.85
#> 11  -74.97302   40.06046 87.54
#> 12  -76.08395   40.72661 87.25
#> 13  -75.70581   39.87566 87.23
#>                                             attributes.Match_addr
#> 1        S 38th St & Walnut St, Philadelphia, Pennsylvania, 19104
#> 2                    Walnut St, Philadelphia, Pennsylvania, 19102
#> 3                    Walnut St, Philadelphia, Pennsylvania, 19107
#> 4                    Walnut St, Philadelphia, Pennsylvania, 19139
#> 5                    Walnut St, Philadelphia, Pennsylvania, 19104
#> 6                    Walnut St, Philadelphia, Pennsylvania, 19106
#> 7                    Walnut St, Philadelphia, Pennsylvania, 19103
#> 8   E State St & N Walnut St, Kennett Square, Pennsylvania, 19348
#> 9   E State St & S Walnut St, Kennett Square, Pennsylvania, 19348
#> 10             State Rd & Walnut Ln, Telford, Pennsylvania, 18969
#> 11         State Rd & Walnut Ave E, Bensalem, Pennsylvania, 19020
#> 12                     Walnut St, Middleport, Pennsylvania, 17953
#> 13 E Street Rd & N Walnut Rd, Kennett Square, Pennsylvania, 19348
#>    attributes.Addr_type extent.xmin extent.ymin extent.xmax extent.ymax
#> 1             StreetInt   -75.19968    39.95262   -75.19768    39.95462
#> 2            StreetName   -75.16710    39.94857   -75.16510    39.95057
#> 3            StreetName   -75.16021    39.94772   -75.15821    39.94972
#> 4            StreetName   -75.23044    39.95645   -75.22844    39.95845
#> 5            StreetName   -75.19733    39.95232   -75.19533    39.95432
#> 6            StreetName   -75.14952    39.94637   -75.14752    39.94837
#> 7            StreetName   -75.17549    39.94963   -75.17349    39.95163
#> 8             StreetInt   -75.70660    39.84794   -75.70460    39.84994
#> 9             StreetInt   -75.70600    39.84817   -75.70400    39.85017
#> 10            StreetInt   -75.32372    40.33765   -75.32172    40.33965
#> 11            StreetInt   -74.97402    40.05946   -74.97202    40.06146
#> 12           StreetName   -76.08495    40.72561   -76.08295    40.72761
#> 13            StreetInt   -75.70681    39.87466   -75.70481    39.87666

It returns a list of objects. This is a named list meaning that we can grab the part of the list we want using dollar sign notation as if it were a column in a data.frame. In this case we want the part of the object called candidates. To avoid having a very long line of code, let’s call the list fromJSON() returns “address_coordinate” and grab the candidates object from that list.

address_coordinates <- fromJSON("https://geocode.arcgis.com/arcgis/rest/services/World/GeocodeServer/findAddressCandidates?f=json&singleLine=38th%20St%20and%20Walnut%20St,%20Philadelphia,%20PA&outFields=Match_addr,Addr_type")
address_coordinates$candidates
#>                                                           address
#> 1        S 38th St & Walnut St, Philadelphia, Pennsylvania, 19104
#> 2                    Walnut St, Philadelphia, Pennsylvania, 19102
#> 3                    Walnut St, Philadelphia, Pennsylvania, 19107
#> 4                    Walnut St, Philadelphia, Pennsylvania, 19139
#> 5                    Walnut St, Philadelphia, Pennsylvania, 19104
#> 6                    Walnut St, Philadelphia, Pennsylvania, 19106
#> 7                    Walnut St, Philadelphia, Pennsylvania, 19103
#> 8   E State St & N Walnut St, Kennett Square, Pennsylvania, 19348
#> 9   E State St & S Walnut St, Kennett Square, Pennsylvania, 19348
#> 10             State Rd & Walnut Ln, Telford, Pennsylvania, 18969
#> 11         State Rd & Walnut Ave E, Bensalem, Pennsylvania, 19020
#> 12                     Walnut St, Middleport, Pennsylvania, 17953
#> 13 E Street Rd & N Walnut Rd, Kennett Square, Pennsylvania, 19348
#>    location.x location.y score
#> 1   -75.19868   39.95362 99.61
#> 2   -75.16610   39.94957 89.84
#> 3   -75.15921   39.94872 89.84
#> 4   -75.22944   39.95745 89.84
#> 5   -75.19633   39.95332 89.84
#> 6   -75.14852   39.94737 89.84
#> 7   -75.17449   39.95063 89.84
#> 8   -75.70560   39.84894 88.18
#> 9   -75.70500   39.84917 88.18
#> 10  -75.32272   40.33865 87.85
#> 11  -74.97302   40.06046 87.54
#> 12  -76.08395   40.72661 87.25
#> 13  -75.70581   39.87566 87.23
#>                                             attributes.Match_addr
#> 1        S 38th St & Walnut St, Philadelphia, Pennsylvania, 19104
#> 2                    Walnut St, Philadelphia, Pennsylvania, 19102
#> 3                    Walnut St, Philadelphia, Pennsylvania, 19107
#> 4                    Walnut St, Philadelphia, Pennsylvania, 19139
#> 5                    Walnut St, Philadelphia, Pennsylvania, 19104
#> 6                    Walnut St, Philadelphia, Pennsylvania, 19106
#> 7                    Walnut St, Philadelphia, Pennsylvania, 19103
#> 8   E State St & N Walnut St, Kennett Square, Pennsylvania, 19348
#> 9   E State St & S Walnut St, Kennett Square, Pennsylvania, 19348
#> 10             State Rd & Walnut Ln, Telford, Pennsylvania, 18969
#> 11         State Rd & Walnut Ave E, Bensalem, Pennsylvania, 19020
#> 12                     Walnut St, Middleport, Pennsylvania, 17953
#> 13 E Street Rd & N Walnut Rd, Kennett Square, Pennsylvania, 19348
#>    attributes.Addr_type extent.xmin extent.ymin extent.xmax extent.ymax
#> 1             StreetInt   -75.19968    39.95262   -75.19768    39.95462
#> 2            StreetName   -75.16710    39.94857   -75.16510    39.95057
#> 3            StreetName   -75.16021    39.94772   -75.15821    39.94972
#> 4            StreetName   -75.23044    39.95645   -75.22844    39.95845
#> 5            StreetName   -75.19733    39.95232   -75.19533    39.95432
#> 6            StreetName   -75.14952    39.94637   -75.14752    39.94837
#> 7            StreetName   -75.17549    39.94963   -75.17349    39.95163
#> 8             StreetInt   -75.70660    39.84794   -75.70460    39.84994
#> 9             StreetInt   -75.70600    39.84817   -75.70400    39.85017
#> 10            StreetInt   -75.32372    40.33765   -75.32172    40.33965
#> 11            StreetInt   -74.97402    40.05946   -74.97202    40.06146
#> 12           StreetName   -76.08495    40.72561   -76.08295    40.72761
#> 13            StreetInt   -75.70681    39.87466   -75.70481    39.87666

The candidates is a data.frame which includes 12 (slightly) different coordinates for our address. The first one is the one we want and if you look at the “score” column you can see it has the highest score of those 12. The ArcGIS geocoder provides a number of potential coordinates for an inputted address and ranks them in order of how confident it is that this is the address you want. Since we just want the top address - the “most confident” one - so we will just keep the first row.

Since we are grabbing the first row of a data.frame, our square bracket notation must be [row, column]. For row we put 1 since we want the first row. Since we want every column we can leave it blank but make sure to keep the comma.

address_coordinates <- fromJSON("https://geocode.arcgis.com/arcgis/rest/services/World/GeocodeServer/findAddressCandidates?f=json&singleLine=38th%20St%20and%20Walnut%20St,%20Philadelphia,%20PA&outFields=Match_addr,Addr_type")
address_coordinates <- address_coordinates$candidates
address_coordinates <- address_coordinates[1, ]
address_coordinates
#>                                                    address location.x
#> 1 S 38th St & Walnut St, Philadelphia, Pennsylvania, 19104  -75.19868
#>   location.y score
#> 1   39.95362 99.61
#>                                      attributes.Match_addr
#> 1 S 38th St & Walnut St, Philadelphia, Pennsylvania, 19104
#>   attributes.Addr_type extent.xmin extent.ymin extent.xmax extent.ymax
#> 1            StreetInt   -75.19968    39.95262   -75.19768    39.95462

This data.frame has something we’ve never seen before. It has columns that are themselves data.frames. For example, the column “location” is a data.frame with the x- and y-coordinates that we want. We can select this exactly as we do with any column but instead of returning a vector of values it returns a data.frame.

address_coordinates$location
#>           x        y
#> 1 -75.19868 39.95362

Since our end goal was to get the coordinates of an address, the data.frame in the “location” column is exactly what we want. It took a few steps but now we have code that returns the coordinates of an address.

12.2 Making a function

We want to geocode every single address from the officer-involved shooting data. As with most things where we do the same thing many times except for one minor change - here, the address being geocoded - we will make a function to help us.

Let’s start by copying the code used to geocode a single address.

address_coordinates <- fromJSON("https://geocode.arcgis.com/arcgis/rest/services/World/GeocodeServer/findAddressCandidates?f=json&singleLine=38th%20St%20and%20Walnut%20St,%20Philadelphia,%20PA&outFields=Match_addr,Addr_type")
address_coordinates <- address_coordinates$candidates
address_coordinates <- address_coordinates[1, ]
address_coordinates$location
#>           x        y
#> 1 -75.19868 39.95362

Now we can make the skeleton of a function without including any code. What do we want to input to the function and what do we want it to return? We want it so we input an address and it returns the coordinates of that address.

We can call the function “geocode_address”, the input “address” and the returning value “address_coordinates” just to stay consistent with the code we already wrote.

geocode_address <- function(address) {
   
   return(address_coordinates)
}

Now we can add the code.

geocode_address <- function(address) {
   address_coordinates <- fromJSON("https://geocode.arcgis.com/arcgis/rest/services/World/GeocodeServer/findAddressCandidates?f=json&singleLine=38th%20St%20and%20Walnut%20St,%20Philadelphia,%20PA&outFields=Match_addr,Addr_type")
   address_coordinates <- address_coordinates$candidates
   address_coordinates <- address_coordinates[1, ]
   address_coordinates$location
   return(address_coordinates)
}

Finally we need to replace the value in fromJSON() which is for a specific address with something that works for any address we input.

Since the URL is in the form

https://geocode.arcgis.com/arcgis/rest/services/World/GeocodeServer/findAddressCandidates?f=json&singleLine=ADDRESS&outFields=Match_addr,Addr_type

we can use the paste() function to combine the address inputted with the URL format. There is one step necessary before that, however. Since spaces cause issues in the data, we need to replace every space in the address with %20. We can do that using gsub() which is perfect for replacing characters. Let’s try a simple example using gsub() before including it in our function. We just want to find every " " and replace it with “%20”.

We will use the address for the McNeil Building as the example.

gsub(" ", "%20", "38th St and Walnut St, Philadelphia, PA")
#> [1] "38th%20St%20and%20Walnut%20St,%20Philadelphia,%20PA"

It works so we can use the code to fix the address before putting it in the URL. To avoid having very long lines of code, we can break down the code into smaller pieces. We want to use paste() to combine the parts of the URL with the address and have that as the input in fromJSON(). Let’s do that in two steps. First we do the paste(), saving it in an object we can call “url”, and then use “url” as our input in fromJSON(). Since we do not want spaces in the URL, we need to set the sep parameter in paste() to "".

geocode_address <- function(address) {
   address <- gsub(" ", "%20", address)
   url <- paste("https://geocode.arcgis.com/arcgis/rest/services/World/GeocodeServer/findAddressCandidates?f=json&singleLine=",
                address,
                "&outFields=Match_addr,Addr_type",
                sep = "")
   address_coordinates <- fromJSON(url)
   
   address_coordinates <- address_coordinates$candidates
   address_coordinates <- address_coordinates[1, ]
   address_coordinates <- address_coordinates$location
   
   return(address_coordinates)
}

We can try it using the same address we did earlier, “38th St and Walnut St, Philadelphia, PA”, the address of Penn’s McNeil Building.

geocode_address("38th St and Walnut St, Philadelphia, PA")
#>           x        y
#> 1 -75.19868 39.95362

It returns the same data.frame as earlier so our function works!

12.3 Geocoding officer shooting locations

We now have a function capable of returning the coordinates of every location in our officer-involved shooting data. We can write a for loop to go through every row of data and get the coordinates for that row’s location.

Let’s load in the officer shooting data we made earlier.

setwd(here::here("data"))
load("officer_shootings.rda")

Before we do that we need to fix some issues in the locations column. The main issue is that several of the shootings have the location labeled as “withheld” in the table we scraped but the address does exist inside the PDF itself.

Let’s check first how many times the word “withheld” (with the W capitalized or not) or an empty string "" exists in the “location” column.

table(officer_shootings$location %in% c("", "withheld", "Withheld"))
#> 
#> FALSE  TRUE 
#>   449    15

Above we are saying “how many times do the values”“,”withheld“, or”Withheld" appear in the column “location”. The answer is 15 times.

We can print out the shooting number of the rows with one of those values to make it easy to check the PDF. We can use square bracket [] subsetting to return the rows in the “shooting_number” column where the location is one of those values above.

officer_shootings$shooting_number[officer_shootings$location %in% c("", "withheld", "Withheld")]
#>  [1] "16-18" "16-26" "10-06" "09-25" "09-27" "09-76" "08-06" "08-18"
#>  [9] "08-30" "08-35" "08-40" "08-60" "08-70" "08-74" "07-27"

We will also use square bracket [] notation to replace the value in the “location” column with the correct address that we read manually in the PDF. Let’s start with the first shooting in the list, number “16-18”. Using square bracket notation we can see which value is in the “location” column for the row whose “shooting_number” is “16-18”.

officer_shootings$location[officer_shootings$shooting_number == "16-18"]
#> [1] "withheld"

Looking into the PDF we can see that the correct address is “3200 block of Wellington Street”. Let’s assign that address to the “location” column in the row for that shooting.

officer_shootings$location[officer_shootings$shooting_number == "16-18"] <- "3200 block of Wellington Street"

And we can check if it worked.

officer_shootings$location[officer_shootings$shooting_number == "16-18"]
#> [1] "3200 block of Wellington Street"

We need to do the same for the other 14 values with missing locations. Several of these text descriptions of the incidents contain the location information. Let’s fill those in.


officer_shootings$location[officer_shootings$shooting_number == "08-06"] <- "200 block of Clapier Street"
officer_shootings$location[officer_shootings$shooting_number == "08-18"] <- "900 block of E. Slocum Street"
officer_shootings$location[officer_shootings$shooting_number == "08-30"] <- "700 block of W. Rockland Street"
officer_shootings$location[officer_shootings$shooting_number == "08-40"] <- "5400 Jefferson Street"
officer_shootings$location[officer_shootings$shooting_number == "08-60"] <- "3000 Memphis Street"
officer_shootings$location[officer_shootings$shooting_number == "08-70"] <- "1300 block of S. 29th Street"
officer_shootings$location[officer_shootings$shooting_number == "08-74"] <- "5600 block of N. Mascher Street"
officer_shootings$location[officer_shootings$shooting_number == "10-06"] <- "Howard and Grange Street"

Six shootings (“07-27”, “08-35”, “09-25”, “09-27”, “09-76”, “16-26”) do not have an address in the PDF so we can’t fix them.

Shooting number “17-08” is a strange one. The description says that there were shootings at two locations because the suspect fled after the first shooting. Since the second shooting was in Delaware, we will use the address of the first shooting, “5600 Whitby Avenue”.

officer_shootings$location[officer_shootings$shooting_number == "17-08"] <- "5600 Whitby Avenue"

Shooting number “15-06” is unusual as it has quotes around the “A”. This will cause issues when geocoding so we need to remove those quotes. The “A” probably refers to A Street so we can replace it as such.

officer_shootings$location[officer_shootings$shooting_number == "15-06"] <- "A Street & Somerset Street"

There are a few more issues we can use gsub() to fix. Some of the addresses say “xxx block of”. We need to remove " block of" from the text. A few addresses also say “Rear Alley of” or “near” an address. We can delete those words as well. We can do this using three separate gsub()s or in a single gsub() using the | which stands for “or”. The | says look for things on the left or right of the |. For all our values we want to replace them with nothing, or in gsub() terms, an empty string "".

officer_shootings$location <- gsub(" block of|Rear Alley of |near ", "", officer_shootings$location, 
                                   ignore.case = TRUE)

We must paste the string “, Philadelphia, PA” to the end of each address so when we geocode it ArcGIS knows these addresses are in Philly.

officer_shootings$location <- paste(officer_shootings$location, ", Philadelphia, PA",
                                    sep = "")

There are still some addresses that require manual fixing, such as the one reading “5700 N. Park street/5700 N. Broad street”. Since this work requires reading through each address and seeing if it is accurate and can be rewritten, I won’t dwell on it any further. This kind of work, manually checking data, is important even when using a programming language like R. R can only do what we tell it to do and isn’t smart enough to recognize an issue that we can easily see.

Since I am not going to clean the remaining street issues, we will likely not be able to geocode those addresses. How big of an issue is this? In an example like this where we have fewer than 500 events, it can be important if we fail to geocode even a few dozen shootings. So this kind of data would mean manually inspecting and correction the data is important. If instead you look at crime data with millions of rows, it likely wouldn’t be worth it to manually inspect so many values.

We can now write a for loop to go through every row in our data and geocode that address. The function geocode_address() we made returns a data.frame with one column for the longitude and one for the latitude. To make it so we only work with the data.frame officer_shootings we can save the output of geocode_address() to a temporary file and add each of the columns it produces to a column in officer_shootings.

We need to make columns for the coordinates in officer_shootings now to be filled in during the for loop. We can call them “lon” and “lat” for the longitude and latitude values we get from the coordinates. When making a new column which you will fill through a for loop, it is a good assign to start by assigning the column NA. That way any row that you don’t fill in the loop (such as if there is no match for the address), will still be NA. NAs are easy to detect in your data for future subsetting or to ignore in a mathematical operation.

officer_shootings$lon <- NA
officer_shootings$lat <- NA

Let’s start with an example using the first row. Inputting the address from the first row gives a data.frame with the coordinates. Let’s now save that output to an object we call “temp”.

temp <- geocode_address(officer_shootings$location[1])
temp
#>           x        y
#> 1 -75.22087 39.95046

We can use square bracket [] notation to assign the value from the “x” column of “temp” to our “lon” column in “officers_shootings” and do the same for the “y” and “lat” columns. Since we got the address from the first row, we need to put the coordinates in the first row so they are with the right address.

officer_shootings$lon[1] <- temp$x
officer_shootings$lat[1] <- temp$y

And we can check the first 6 rows to make sure the first row is the only one with values in these new columns.

head(officer_shootings)
#>   shooting_number                                              location
#> 1           19-04                   4900 Hazel Avenue, Philadelphia, PA
#> 2           19-06                   1300 Kater Street, Philadelphia, PA
#> 3           19-09 Bridge Street & Roosevelt Boulevard, Philadelphia, PA
#> 4           19 11                  2100 Taney Terrace, Philadelphia, PA
#> 5           19-13                1800 N. Broad Street, Philadelphia, PA
#> 6           19 14                       3400 G Street, Philadelphia, PA
#>        dates       lon      lat
#> 1 2019-03-06 -75.22087 39.95046
#> 2 2019-03-28        NA       NA
#> 3 2019-04-20        NA       NA
#> 4 2019-04-25        NA       NA
#> 5 2019-05-11        NA       NA
#> 6 2019-05-20        NA       NA

Since we are geocoding a lot of addresses, this may take some time.

for (i in 1:nrow(officer_shootings)) {
   temp <- geocode_address(officer_shootings$location[i])
   officer_shootings$lon[i] <- temp$x
   officer_shootings$lat[i] <- temp$y
}

Now it appears that we have longitude and latitude for every incident. We should check that they all look sensible.

summary(officer_shootings$lat)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>  -20.57   39.96   39.99   39.80   40.02   53.61
summary(officer_shootings$lon)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#> -122.86  -75.19  -75.16  -73.62  -75.13  121.05

What is odd about these results? There are no NA values in either column! That does not make sense because we know some of the rows have non-addresses such as “withheld”. What happened is that when ArcGIS couldn’t find an address match it just gave us the generic coordinates for the city of Philly. Normally we would need to remove those rows but we will keep them in for now and look at the strange pattern caused by this in the section on mapping.

Another check is to make a simple scatterplot of the data. Since all the shootings occurred in Philly, they should be relatively close to each other. If there are dots far from the rest, that is probably a geocoding issue.

plot(officer_shootings$lon, officer_shootings$lat)

Most points are clustered around 39-40 degrees latitude and -75 degrees longitude with some exceptions. This is likely due to a geocoding issue with our geocoder finding the wrong address. For your own research, and considering the small number of values in this data, you should check the address to try to get them all geocoded properly. Here, we will simply remove all rows outside this -75 longitude and 39-40 latitude range.

Let’s keep only rows with a latitude lower than 45 and a longitude less than -70 and higher than -76

officer_shootings <- officer_shootings[officer_shootings$lat < 45, ]
officer_shootings <- officer_shootings[officer_shootings$lon < -70, ]
officer_shootings <- officer_shootings[officer_shootings$lon > -76, ]

Now we can check the summary() function again to see if all the values are in their normal ranges.

summary(officer_shootings$lat)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>   39.89   39.97   39.99   39.99   40.02   40.10
summary(officer_shootings$lon)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>  -75.30  -75.19  -75.16  -75.16  -75.13  -74.98

These values look correct. We can make another scatterplot as a second check.

plot(officer_shootings$lon, officer_shootings$lat)

Two shootings have inaccurate latitudes. Let’s drop any latitude that is greater than 39.

officer_shootings <- officer_shootings[officer_shootings$lat > 39, ]

To finish this lesson we want to save the officer_shootings data.frame to use in future lessons. I am going to make a new object called “officer_shootings_geocoded” that is a copy of officer_shootings just so I can rerun this lesson and it will work properly (as it should start without any geocoded values). If this was a real project you would likely just save the object as officer_shootings to have fewer objects to manage.

officer_shootings_geocoded <- officer_shootings
setwd(here::here("data"))
save(officer_shootings_geocoded, file = "officer_shootings_geocoded.rda")

In the next lesson we’ll start mapping these shootings. For now, here is a map of every shooting we managed to geocode.

library(ggmap)
philly_map <- ggmap(get_map(c(-75.288486, 39.868285, -74.950965, 40.138251), source = "stamen"))
philly_map +
  geom_point(aes(x = lon, y = lat),
             data  = officer_shootings,
             alpha = 0.5,
             color = "darkred",
             size  = 1)
#> Warning: Removed 2 rows containing missing values (geom_point).