ggplot/mapping US counties — problems with visualization shapes in R
ggplot us map
50 state map in r
r county map
maps package r
ggplot2 zip code map
shaded county map
choropleth map of us in r
So I have a data frame in R called obesity_map which basically gives me the state, county, and obesity rate per county. It looks more or less like this:
obesity_map = data.frame(state, county, obesity_rate)
I'm trying to visualize this on the map by showing various obesity rates per county throughout the US with this:
us.state.map <- map_data('state') head(us.state.map) states <- levels(as.factor(us.state.map$region)) df <- data.frame(region = states, value = runif(length(states), min=0, max=100),stringsAsFactors = FALSE) map.data <- merge(us.state.map, df, by='region', all=T) map.data <- map.data[order(map.data$order),] head(map.data) map.county <- map_data('county') county.obesity <- data.frame(region = obesity_map$state, subregion = obesity_map$county, value = obesity_map$obesity_rate) map.county <- merge(county.obesity, map.county, all=TRUE) ggplot(map.county, aes(x = long, y = lat, group=group, fill=as.factor(value))) + geom_polygon(colour = "white", size = 0.1)
And it basically creates an image that looks like this:
As you can see, the US is divided into strange shapes, the colors aren't one consistent color in varying gradients, and you can't make much from it. But what I really want is something like this below but with each county filled in:
I'm fairly new to this so I'd appreciate any and all help!
Here's the output of the dput:
structure(list(X = 1:3141, FIPS = c(1L, 3L, 5L, 7L, 9L, 11L, 13L, 15L, 17L, 19L, 21L, 23L, 25L, 27L, 29L, 31L, 33L, 35L, 37L, 39L, 41L, 43L, 45L, 47L, 49L, 51L, 53L, 55L, 57L, 59L, 61L, 63L, 65L, 67L, 69L, 71L, 73L, 75L, 77L, 79L, 81L, 83L, 85L, 87L, 89L, 91L, 93L, 95L, 97L, 99L, 101L, 103L, 105L, 107L, 109L, 111L, 113L, 115L, 117L, 119L, 121L, 123L, 125L, 127L, 129L, 131L, 133L, 13L, 16L, 20L, 50L, 60L, 68L, 70L, 90L, 100L, 110L, 122L, 130L, 150L, 164L, 170L, 180L, 185L, 188L, 201L, 220L, 232L, 240L, 261L, 270L, 280L, 282L, 290L, 1L, 3L, 5L, 7L, 9L, 11L, 12L, 13L, 15L, 17L, 19L, 21L, 23L, 25L, 27L, 1L, 3L, 5L, 7L, 9L, 11L, 13L, 15L, 17L, 19L, 21L, 23L, 25L, 27L, 29L, 31L, 33L, 35L, 37L, 39L, 41L,
It's a huge amount of numbers because it's for every US county so I abbreviated the results and put in the first couple lines.
Basically, the data frame looks like this though:
print(head(obesity_map)) X FIPS state_names county_names obesity 1 1 1 Alabama Autauga 24.5 2 2 3 Alabama Baldwin 23.6 3 3 5 Alabama Barbour 25.6 4 4 7 Alabama Bibb 0.0 5 5 9 Alabama Blount 24.2 6 6 11 Alabama Bullock 0.0
I also tried using ggcounty by following the example put up but I keep getting an error. I'm not entirely sure what I've done wrong:
library(ggcounty) # breaks obesity_map$obese <- cut(obesity_map$obesity, breaks=c(0, 5, 10, 15, 20, 25, 30), labels=c("1", "2", "3", "4", "5", "6"), include.lowest=TRUE) # get the US counties map (lower 48) us <- ggcounty.us() # start the plot with our base map gg <- us$g # add a new geom with our population (choropleth) gg <- gg + geom_map(data=obesity_map, map=us$map, aes(map_id=FIPS, fill=obesity_map$obese), color="white", size=0.125)
But I always end up getting an error saying: "Error: Argument must be coercible to non-negative integer"
Any idea? Thanks again for all your help! I appreciate it so much.
Maybe a little late for another answer, but still worthwhile to share I think.
The reading and preprocessing of the data is similar to jlhoward's answer, with some differences:
library(tmap) # package for plotting library(readxl) # for reading Excel library(maptools) # for unionSpatialPolygons # download data download.file("http://www.ers.usda.gov/datafiles/Food_Environment_Atlas/Data_Access_and_Documentation_Downloads/Current_Version/DataDownload.xls", destfile = "DataDownload.xls", mode="wb") df <- read_excel("DataDownload.xls", sheet = "HEALTH") # download shape (a little less detail than in the other scripts) f <- tempfile() download.file("http://www2.census.gov/geo/tiger/GENZ2010/gz_2010_us_050_00_20m.zip", destfile = f) unzip(f, exdir = ".") US <- read_shape("gz_2010_us_050_00_20m.shp") # leave out AK, HI, and PR (state FIPS: 02, 15, and 72) US <- US[!(US$STATE %in% c("02","15","72")),] # append data to shape US$FIPS <- paste0(US$STATE, US$COUNTY) US <- append_data(US, df, key.shp = "FIPS", key.data = "FIPS")
When the correct data is attached to the shape object, a choropleth can be drawn with one line of code:
qtm(US, fill = "PCT_OBESE_ADULTS10")
This could be enhanced by adding state borders, a better projection, and a title:
# create shape object with state polygons US_states <- unionSpatialPolygons(US, IDs=US$STATE) tm_shape(US, projection="+init=epsg:2163") + tm_polygons("PCT_OBESE_ADULTS10", border.col = "grey30", title="") + tm_shape(US_states) + tm_borders(lwd=2, col = "black", alpha = .5) + tm_layout(title="2010 Adult Obesity by County, percent", title.position = c("center", "top"), legend.text.size=1)
ggplot/mapping US counties, Maybe a little late for another answer, but still worthwhile to share I think. The reading and preprocessing of the data is similar to jlhoward's the gg ggplot2 object a map object which is a "fortified" data frame a county.names object which is a list of all county names (or FIPS codes) in that county a geom_map object (geom) for the state county map
So this is a similar example but attempts to accommodate the format of your
obesity_map dataset. It also uses a data table join which is much faster than
merge(...), especially with large datasets like yours.
library(ggplot2) # this creates an example formatted as your obesity.map - you have this already... set.seed(1) # for reproducible example map.county <- map_data('county') counties <- unique(map.county[,5:6]) obesity_map <- data.frame(state_names=counties$region, county_names=counties$subregion, obesity= runif(nrow(counties), min=0, max=100)) # you start here... library(data.table) # use data table merge - it's *much* faster map.county <- data.table(map_data('county')) setkey(map.county,region,subregion) obesity_map <- data.table(obesity_map) setkey(obesity_map,state_names,county_names) map.df <- map.county[obesity_map] ggplot(map.df, aes(x=long, y=lat, group=group, fill=obesity)) + geom_polygon()+coord_map()
Also, if your dataset has the FIPS codes, which it seems to, I'd strongly recommend you use the US Census Bureau's TIGER/Line county shapefile (which also has these codes), and merge on that. This is much more reliable. For example, in your extract of the obesity_map data frame, the states and counties are capitalized, whereas in the built-in counties dataset in R, they are not, so you would have to deal with that. Also, the TIGER file is up to date, whereas the internal dataset is not.
So this is kind of an interesting question. Turns out the actual obesity data is on the USDA website and can be downloaded here as an MSExcel file. There's also a shapfile of US counties on the Census Bureau website, here. Both the Excel file and the shapefile have FIPS information. In R this can be put together relatively simply:
library(XLConnect) # for loadWorkbook(...) and readWorksheet(...) library(rgdal) # for readOGR(...) library(RcolorBrewer) # for brewer.pal(...) library(data.table) setwd(" < directory with all your files > ") wb <- loadWorkbook("DataDownload.xls") # from the USDA website df <- readWorksheet(wb,"HEALTH") # this sheet has the obesity data US.counties <- readOGR(dsn=".",layer="gz_2010_us_050_00_5m") #leave out AK, HI, and PR (state FIPS: 02, 15, and 72) US.counties <- US.counties[!(US.counties$STATE %in% c("02","15","72")),] county.data <- US.counties@data county.data <- cbind(id=rownames(county.data),county.data) county.data <- data.table(county.data) county.data[,FIPS:=paste0(STATE,COUNTY)] # this is the state + county FIPS code setkey(county.data,FIPS) obesity.data <- data.table(df) setkey(obesity.data,FIPS) county.data[obesity.data,obesity:=PCT_OBESE_ADULTS10] map.df <- data.table(fortify(US.counties)) setkey(map.df,id) setkey(county.data,id) map.df[county.data,obesity:=obesity] ggplot(map.df, aes(x=long, y=lat, group=group, fill=obesity)) + scale_fill_gradientn("",colours=brewer.pal(9,"YlOrRd"))+ geom_polygon()+coord_map()+ labs(title="2010 Adult Obesity by Country, percent",x="",y="")+ theme_bw()
to produce this:
Introduction to Creating Maps with ggplot2, Create a base map of United States counties. Make sure you use the map_data function as well as the ggplot function. Use the world file to create a base map of ggplot2 mapping county boundries in one color and state boundries in another on the same map. I am creating a choropleth county map with grey borders, and I also want to include the state boundries in black.
This is something I can get working managing the mapping variable. Renaming it to 'region'.
library(ggplot2) library(maps) m.usa <- map_data("county") m.usa$id <- m.usa$subregion m.usa <- m.usa[ ,-5] names(m.usa) <- 'region' df <- data.frame(region = unique(m.usa$region), obesity = rnorm(length(unique(m.usa$region)), 50, 10), stringsAsFactors = F) head(df) region obesity 1 autauga 44.54833 2 baldwin 68.61470 3 barbour 52.19718 4 bibb 50.88948 5 blount 42.73134 6 bullock 59.93515 ggplot(df, aes(map_id = region)) + geom_map(aes(fill = obesity), map = m.usa) + expand_limits(x = m.usa$long, y = m.usa$lat) + coord_map()
How to Create State and County Maps Easily in R, At Urban, we frequently create maps of the US to include in blog posts and reports. There are a I needed shapefiles I could use in R, specifically with ggplot2. A ggplot object that contains a basic US map with the described parameters. Since the result is a ggplot object, it can be extended with more geom layers, scales, labels, themes, etc. See Also
I think all you needed to do was reorder the map.county variable like you had for the map.data variable previously.
.... map.county <- merge(county.obesity, map.county, all=TRUE) ## reorder the map before plotting map.county <- map.county[order(map.data$county),] ## plot ggplot(map.county, aes(x = long, y = lat, group=group, fill=as.factor(value))) + geom_polygon(colour = "white", size = 0.1)
7 Draw maps, p <- ggplot(data = us_states, mapping = aes(x = long, y = lat, group = group)) p + County-level US maps can be aesthetically pleasing, because of the added A ggplot object that contains a basic US map with the described parameters. Since the result is a ggplot object, it can be extended with more geom layers, scales, labels, themes, etc.
Building on @jlhoward's answer: the code with
data.table fails for me in a mysterious way:
Error in `:=`(FIPS, paste0(STATE, COUNTY)) : Check that is.data.table(DT) == TRUE. Otherwise, := and `:=`(...) are defined for use in j, once only and in particular ways. See help(":=").
This error happened to me several times, but only when the code was inside a function, even just a minimal wrapper. It worked fine in a script. Although now I cannot reproduce the error, I adapted his/her code with
merge() instead of
data.table for completeness:
library(rgdal) # for readOGR(...) library(ggplot2) # for fortify() and plot() library(RColorBrewer) # for brewer.pal(...) US.counties <- readOGR(dsn=".",layer="gz_2010_us_050_00_5m") #leave out AK, HI, and PR (state FIPS: 02, 15, and 72) US.counties <- US.counties[!(US.counties$STATE %in% c("02","15","72")),] county.data <- US.counties@data county.data <- cbind(id=rownames(county.data),county.data) county.data$FIPS <- paste0(county.data$STATE, county.data$COUNTY) # this is the state + county FIPS code df <- data.frame(FIPS=county.data$FIPS, PCT_OBESE_ADULTS10= runif(nrow(county.data), min=0, max=100)) # Merge county.data to obesity county.data <- merge(county.data, df, by.x = "FIPS", by.y = "FIPS") map.df <- fortify(US.counties) # Merge the map to county.data map.df <- merge(map.df, county.data, by.x = "id", by.y = "id") ggplot(map.df, aes(x=long, y=lat, group=group, fill=PCT_OBESE_ADULTS10)) + scale_fill_gradientn("",colours=brewer.pal(9,"YlOrRd"))+ geom_polygon()+coord_map()+ labs(title="2010 Adult Obesity by Country, percent",x="",y="")+ theme_bw()
Drawing beautiful maps programmatically with R, sf and ggplot2 , The package maps (which is automatically installed and loaded with ggplot2 ) provides maps of the USA, with state and county borders, that It would be informative to add finer administrative information on top of the previous map, starting with state borders and names. The package maps (which is automatically installed and loaded with ggplot2) provides maps of the USA, with state and county borders, that can be retrieved and converted as sf objects:
Maps and Geographical Data, The census bureau provides estimates of populations of US counties. Estimates are For a county-level ggplot map, first get the polygon data frame: Code. Say you want to map every county in one state. Using the R library (dplyr), you can easily create a map of California: It’s also easy to use the full layering capabilities of ggplot2.
plot_usmap: Conveniently plot basic US map in usmap: US Maps , If states are provided in the county map, only counties in the included states A ggplot object that contains a basic US map with the described Blank US counties map in SVG format The important thing here, if you’re not familiar with SVG (which stands for scalable vector graphics), is that it’s actually an XML file. It’s text with tags, and you can edit it in a text editor like you would a HTML file.
How to Create a Map using GGPlot2: The Best Reference, This article provide many examples for creating a ggplot map. You will also learn World map colored by life expectancy; US map colored by violent crime rates Hadley has included a few functions that make it relatively easy to interact with the data in R’s maps package, and of course, once a map layer is laid down, you have all the power of ggplot at your fingertips to overlay whatever you may want to over the map. ggmap is a package that goes out to different map servers and grabs base maps
- I'm afraid you cannot map "various obesity rates" to a single county (geometry). You'll have to find a measure (mean?) or category (or interval) and map it to each administrative area. Would this suffice?
- I'm not entirely sure if I'm understanding you? The data in the data frame is an average for each US county so I'm trying to get each rate to show on the map. So for example, if LA County had a 40% obesity rate and Orange County had a 50% obesity rate, it'd show the two different counties shaded in a varying gradient for their percentage. I apologize if my original post lacked clarity!
- I'll put forth a shameless plug for my ggcounty package (heavy alpha/beta but there are examples there). If you can do a
dputof your obesity data frame, I'll be happy to toss up an example here.
- I see. You used '"various obesity rates per county". Could you add a sample of your data?
- perhaps paste output of a
dput(obesity)in the original post :-)
- Do you know why this error would occur? I didn't make any changes to your code: Error in readOGR(dir, base, verbose = FALSE, ...) : no features found
- I do not get that error. Was the download successful? You could try to use readOGR directly.
library(tmap)fails with error
Error: package or namespace load failed for ‘tmap’ in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]): there is no package called ‘units’. @MartijnTennekes are you still able to run the code?
- Oh wow, thank you so much! I appreciate it! This really did the trick. One final question. Do you know how I might go around setting 0 values to be a separate color? I set all NA values to 0 and I'd like to have it show up as grey. Again I appreciate the help!
- Actually, if you leave them as NA, they should show up as grey by default. Since you're new to SO, please read this.
- +1 for this great example with data.table. I just still trying to understand how data.table is working in your first map, particularly with
map.county[obesity_map]. This is for merging? Thanks for sharing this.
- @jlhoward the second part of the code with FIPS works in a script and fails inside a function, even just a wrapper, with this error:
Error in :=(FIPS, paste0(STATE, COUNTY)) : Check that is.data.table(DT) == TRUE. Otherwise, := and :=(...) are defined for use in j, once only and in particular ways. See help(":=").Do you have any idea why?
- Thanks so much! I really appreciate the help!
- This works! The problem is that
map_data(county)uses county and state names as identifiers, which is a problem for counties with different spellings, like "Ste Genevieve"/"Ste Genevieve" in Louisiana. This R-bloggers post suggests using a shapefile with FIPS codes provided by the Census Bureau.