In the Czech Republic, only a fraction of psychotherapeutic care is covered by health insurance companies. So, it was a welcomed step when VZP, one of the largest such companies, offered its clients an allowance of 7 000 CZK (or circa 280 EUR) for psychotherapy.
However, the way of finding yourself a psychotherapist was not the most convenient one as you had to scroll through a .pdf file. Luckily, Dominika Čechová, a board member of the Czech Association for Psychotherapy, negotiated additional information from VZP like which of the psychotherapists can take new clients.
My part was to combine these inputs into a more user-friendly way of finding yourself a therapist and thus make the service more accessible. The focus was the speed of delivery and ease of deployment, so I decided to follow through R with flexdashboard, a package allowing for creating interactive yet serverless dashboards.
I had most of the building blocks covered from previous projects - R implementations of Leaflet for maps and DataTables for tables. Tidyverse for loading and wrangling the data.
However, the input table did not contain latitude and longitude, two elements needed for displaying the therapists’ location on a map. Previously, I had adjusted data using ggmap. However, the changes in the Google API requirements have made this somewhat inconvenient.
So, I decided to look for an alternative. Then I came across the article Geocode with Python by Abdishakur, a great introduction to Python’s package GeoPy.
Now, although R is my special data friend, I also like to integrate other languages and tools like Python and pandas in my workflow where beneficial.
You may find the final dashboard here. In this post, I would like to review how I proceed when creating the desired outcome.
Let us begin by loading the required packages:
# Load required packages
library(flexdashboard) # dashboard wrapping
library(tidyverse) # data wrangling
library(crosstalk) # interactivity
library(broom) # output of built-in functions cleanup
library(DT) # Table formatting
library(htmltools) # widgets
library(reshape2) # data taransformations
library(leaflet) # interactive maps
library(leaflet.extras) # interactive features
You may find the input dataset on GitHub.
# Initial dataset
= read.csv("vzp_data_geo.csv") %>%
vzp_data select(name, surname, alias, website, address,
city, region, psc, phone, email, Kapacita, remote_therapy)
First, you need to turn on the Python interface in R. I work with reticulate.
# Python interoperability in R
library(reticulate)
# Specifying which version of python to use.
use_python("/home/vg/anaconda3/bin/python3.7",
required=T) # Locate and run Python
Even though it is possible to run all of the following chunks of code at once, let us follow the do one thing principle and separate these according to their functions.
Python packages first:
from geopy.geocoders import Nominatim
from geopy.extra.rate_limiter import RateLimiter
import pandas as pd
Activate the geocoder:
= Nominatim(user_agent="myGeocoder") locator
Link parts of an address to one variable - will be used for geocoding:
= r['vzp_data']
df
'state'] = "CZ"
df[
"ADDRESS"] = df["address"] + "," + df["city"] + "," + df["state"]
df[
"Adresa"] = df["address"] + ", " + df["city"] df[
To mitigate the Too Many Requests
error, use RateLimiter
. Adding a delay
“between geocoding calls to reduce the load on the Geocoding service”, as the documentation of GeoPy puts it.
# 1 - conveneint function to delay between geocoding calls
= RateLimiter(locator.geocode, min_delay_seconds=1)
geocode
# 2- create location column
'location'] = df['ADDRESS'].apply(geocode) df[
# 3 - create longitude, laatitude and altitude from location column (returns tuple)
'point'] = df['location'].apply(lambda loc: tuple(loc.point) if loc else None)
df[
# 4 - split point column into latitude, longitude and altitude columns
'latitude', 'longitude', 'altitude']] = pd.DataFrame(df['point'].tolist(), index=df.index) df[[
Now, you might be asking - why I am exporting the table? Am I not going to proceed with the data further as a Python/R object?
Unfortunately, flexdashboard cannot yet be knitted while containing Python code. So, when working on the dashboard, I had to create two separate workflows - one for the data geocoding and the second for compiling the dashboard. However, I wanted to make the dashboard coding more reproducible while keeping all of the important components in one workflow.
Fortunately, other R Markdown outputs like R notebook are happy wrapping Python code. That is why you see this step in code. Surely, it is not the most efficient workflow, but you know what they say - done (and working) is better than perfect.
Of course, if the goal would be an R Markdown document containing some Python chunks, you can simply add
vzp_data = py[df]
in an R chunk to convert a pandas DataFrame into R Data Frame and proceed
smoothly to the next part.
# Convert pandas DataFrame into R Data Frame (tibble)
= py['df']
vzp_data_geocoded
<- vzp_data_geocoded %>%
vzp_data_geo select(name, surname, alias, website, address,
`Kapacita`, remote_therapy,
city, region, psc, phone, email,
state, Adresa, latitude, longitude)
<- apply(vzp_data_geo,2,as.character)
vzp_data_geo write.csv(vzp_data_geo, file = "vzp_data_geo.csv")
Select only the columns that will be used in the output and format them:
# Data import and wrangling
= read.csv("vzp_data_geo.csv") %>%
vzp_data select(name,
surname,
alias,
website,
address,
city,
phone,
email,
Kapacita,
remote_therapy,
latitude,%>%
longitude) mutate_all(list(~str_trim(.,side = "both"))) %>%
mutate(`Jméno a příjmení` = paste(name,surname),
latitude = as.numeric(latitude),
longitude = as.numeric(longitude)) %>%
rename("Email" = email,
"Telefon" = phone,
"Web" = website,
"Online nebo telefonické konzultace?" = remote_therapy,
"Kapacita?" = Kapacita,
"Město" = city,
"Ulice" = address) %>%
mutate(`Kapacita?` = case_when(`Kapacita?` == "volno" ~ "Volno",
`Kapacita?` == "naplněno" ~ "Naplněno",
`Kapacita?` == "Naplněno" ~ "Naplněno",
TRUE ~ `Kapacita?`),
`Online nebo telefonické konzultace?` = case_when(`Online nebo telefonické konzultace?` == "Ano" ~ "Ano",
`Online nebo telefonické konzultace?` == "Ano/ možnost konzultací v anglickém jazyce" ~ "Ano",
`Online nebo telefonické konzultace?` == "Ne" ~ "Ne",
`Online nebo telefonické konzultace?` == "ne" ~ "Ne",
TRUE ~ `Online nebo telefonické konzultace?`)
)
# Replace "NaN" with "Neuvedeno"
=="NaN"] <- "Neuvedeno" vzp_data[vzp_data
The following table is a snapshot of the final look-up table, done using
DataTable
package. This package means a number of interactive elements at your disposal. You
can add search and filter across columns, sorting, pagination, and so forth. In addition, it could be very
easily linked with filters using crosstalk
.
As the output is in Czech, I suggest the Translate to English
feature of your browser to get
more value.
<- vzp_data %>%
vzp_data_table select(`Kapacita?`, `Online nebo telefonické konzultace?`,
Město,
Ulice,`Jméno a příjmení`,
Email, Telefon, Web, Ulice,`Online nebo telefonické konzultace?`)
<- crosstalk::SharedData$new(vzp_data_table)
test_shared
::datatable(test_shared,
DTextensions = c(
"Responsive"
),rownames = FALSE, # remove rownames
style = "bootstrap",
class = 'cell-border display',
options = list(
pageLength = 10,
dom = 't',
deferRender = TRUE,
scroller = TRUE,
columnDefs = list(list(className = 'dt-center', targets = "_all"))
)%>% formatStyle(
) "Kapacita?",
target = 'row',
backgroundColor = styleEqual(c("Naplněno", "Volno"), c('#ffcccb', '#d2e9af')))
As the search might be more convenient via a map, I decided to add one using leaflet. Similar to DataTable, leaflet allows for multiple interactive elements. Including search based on a string or different map layers.
Again, as the output is in Czech, I suggest the Translate to English
feature of your browser
to get more value.
# prepare a palette - manual colors according to branch column
<- leaflet::colorFactor(palette = c("Naplněno" = "#8b0000",
pal "Neuvedeno" = "#A9A9A9",
"Volno" = "#006400"
), domain = vzp_data$`Kapacita?`)
<- SharedData$new(vzp_data)
points_fin
<- leaflet(data = points_fin, width = '100%', height = 800) %>%
map1 addProviderTiles("CartoDB.Positron", group = 'Základní') %>%
addProviderTiles("Esri.WorldImagery", group = 'Letecká') %>%
addProviderTiles("OpenStreetMap.Mapnik", group = 'Uliční') %>%
addProviderTiles("OpenTopoMap", group = 'Zeměpisná') %>%
addScaleBar('bottomright') %>%
setView(15.4129318, 49.7559455, zoom = 8.2) %>%
addCircleMarkers(
group = 'Obor',
stroke = FALSE,
opacity = 0.9,
fillOpacity = 0.9,
fillColor = ~sapply(`Kapacita?`, switch, USE.NAMES = FALSE,
"Volno" = "#006400",
"Naplněno" = "#8b0000",
"Neuvedeno" = "#A9A9A9"
),popup = ~paste0('<h2>Detail</h2> <br>',
'<b>Město</b>: ', Město, '<br>',
'<b>Ulice</b>: ', Ulice, '<br>',
'<b>Jméno a příjmení</b>: ', `Jméno a příjmení`, '<br>',
'<b>Online nebo telefonické konzultace</b>: ', `Online nebo telefonické konzultace?`, '<br>',
'<b>Telefon</b>: ',`Telefon`, "<br>",
'<b>Email</b>: ', Email, '<br>',
'<b>Web</b>: ', Web, '<br>',
'<b>Kapacita</b>: ', `Kapacita?`, '<br>')
,clusterOptions = markerClusterOptions(showCoverageOnHover = FALSE,
iconCreateFunction=JS("function (cluster) {
var childCount = cluster.getChildCount();
var c = ' marker-cluster-';
if (childCount < 100) {
c += 'small';
} else if (childCount < 1000) {
c += 'medium';
} else {
c += 'large';
}
return new L.DivIcon({ html: '<div><span>' + childCount + '</span></div>', className: 'marker-cluster' + c, iconSize: new L.Point(40, 40) });
}"))) %>%
addLegend(position = "topright",
values = ~`Kapacita?`,
opacity = .7,
pal = pal,
title = "Kapacita?") %>%
::addResetMapButton() %>%
leaflet.extrasaddLayersControl(
baseGroups = c("Základní", "Letecká", "Uliční", "Zeměpisná"),
options = layersControlOptions(collapsed = TRUE)
%>%
) addSearchOSM()
map1
I believe that some will roll their eyebrow when integrating R and Python the way presented above. “Why not choose just one tool and use it?” you may ask yourself. In my case, I needed to add a piece to my template. Although its parts had been written in R, I did not want to limit myself to one tool only.
At the same time, by looking at the current limits, we can promote future ease of integration. Here, I have to acknowledge that RStudio, in my view one of the best tools for working with data, has taken several significant steps in integrating Python. However, in particular use-cases, it is still a somewhat bumpy ride.
Ultimately, I wanted to experiment and test the limits. I believe that both Python and R have their pros and cons when working with data, so why not explore where you can go if you combine them?
Go back to Blog