If data manipulation is carried out in R, why not creating the hyperlinks in R as well? Excel files use hyperlinks to navigate to external content, such as, urls or file paths to some other files. Excel uses HYPERLINK()
function for this purpose. Below I present
- how to create hyperlinks
- how to update an excel file with hyperlinks in R
Part 1: Create excel reports with hyperlinks
Problem
How to create hyperlinks to external files in an excel workbook using R?
Solution
Here I present a simple scenario, where the hyperlinks are created next to the filename column of a worksheet using the writeFormula()
in openxlsx
package. For details and other scenarios of creating hyperlinks, visit rdrr.io/cran/openxlsx/man/makeHyperlinkStri...
Example
I have generated a set of PDF files containing data on each country using the gapminder
dataset. In the following code snippet, I first create a master table of country name and its PDF file name.
# Create Master table
country_lst <- unique(gapminder$country)
filename_lst <- paste0(country_lst, ".pdf")
df_master <- data.frame(country_lst, stringsAsFactors = F)
df_master <- cbind(df_master, filename_lst)
names(df_master) <- c("Country", "File_Name")
head(df_master)
## Country File_Name
## 1 Afghanistan Afghanistan.pdf
## 2 Albania Albania.pdf
## 3 Algeria Algeria.pdf
## 4 Angola Angola.pdf
## 5 Argentina Argentina.pdf
## 6 Australia Australia.pdf
Now I create a workbook, write the master table and add hyperlinks using the writeFormular()
. This function takes the HYPERLINK([link location], [friendly name])
excel formula as a string in the x argument. Thus, I generate this string dynamically for each row.
# Create an excel workbook and write data
wb <- createWorkbook()
addWorksheet(wb, "Countries")
writeData(wb,sheet = "Countries", x = df_master)
# Add hyperlinks to filenames
for(i in 2:length(country_lst)) {
formula <- paste0('HYPERLINK(B',i, ', "Link to File")')
writeFormula(wb, sheet ="Countries", startRow = i, startCol = 3
, x = formula)
}
# Save the workbook
saveWorkbook(wb, "Gapminder_Countries.xlsx", overwrite = T)
Part 2: Update excel file with hyperlinks without touching the existing data
Problem
Forget about the above section, where I created hyperlinks. Now I have an excel file with hyperlinks to external files. I need to do some data manipulation and add a new column to this file. If I do the data manipulation in R and write the entire dataframe to a new file without configuring the hyperlinks as mentioned above, I will loose the hyperlinks. Hence, how can I write only the new columns to the existing file, such that the existing data are not touched?
Solution
I can do the data manipulation in R and write only the new columns to the existing file by specifying the range.
Example
Add the average change in life expectancy and population over 50 years (1957 - 2007) to the masterfile Gapminder_Countries.xlsx
that I created in Part 1 above.
library(dplyr)
dat <- gapminder %>% group_by(country) %>% summarise(avg_change_LE = round((max(lifeExp) - min(lifeExp))/50,1), avg_change_Pop = (max(pop) - min(pop))/50)
head(dat)
## # A tibble: 6 x 3
## country avg_change_LE avg_change_Pop
## <fct> <dbl> <dbl>
## 1 Afghanistan 0.3 469292.
## 2 Albania 0.4 46357.
## 3 Algeria 0.6 481074.
## 4 Angola 0.3 163768.
## 5 Argentina 0.3 448499.
## 6 Australia 0.2 234859.
Now I write only the avg_change_LE
and avg_change_Pop
columns to the existing Counties
worksheet of the workbook. First I create a new dataframe selecting only the new columns. Data is NOT joined/merged using a common field when writing to the worksheet. Therefore, data in our dataframe and the worksheet need to follow the same order without gaps. Also make sure to specify the correct start column and row where the new columns need to be dumped.
All the material of this example are at github.com/geethika01/data-manipulation-wit.. .