And in this tidyverse tutorial, a part of tidyverse 101 series, we will learn how to use dplyr’s mutate() function. With dplyr’s mutate() function one can create a new variable/column in the data frame. Here we will use dplyr’s mutate() function to create one variable first and multiple variables at the same time.
library("tidyverse")
We will use the fantastic Penguins dataset to illustrate the three ways to see data in a dataframe. Let us load the data from cmdlinetips.com‘ github page.
path2data <- "https://raw.githubusercontent.com/cmdlinetips/data/master/palmer_penguins.csv" penguins<- readr::read_csv(path2data)
We can see that our data frame contains multiple variables that are mesaured in milli-meter (mm) and a variable measured in gram (g).
## Parsed with column specification: ## cols( ## species = col_character(), ## island = col_character(), ## bill_length_mm = col_double(), ## bill_depth_mm = col_double(), ## flipper_length_mm = col_double(), ## body_mass_g = col_double(), ## sex = col_character() ## )
How To Create A New Variable with mutate() in dplyr?
Let us create a single new column using dplyr’s mutate(). We will use an existing column to create the new column or variable.
Our new variable is body_mass in kg and we will compute it from existing variable body_mass_g. To create the new variable, we start with the data frame with the pipe operator and use mutate() function. Inside mutate() function, we specify the name of the new variable we are creating and how exactly we are creating. In this example, we create the new variable body_mass_kg by dividing an existing variable body_mass_g by 1000.
penguins %>% mutate(body_mass_kg = body_mass_g/1000)
We get a data frame with the new column as result. The new variable that we created will be added as the last column. the
## # A tibble: 344 x 8 ## species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g ## <chr> <chr> <dbl> <dbl> <dbl> <dbl> ## 1 Adelie Torge… 39.1 18.7 181 3750 ## 2 Adelie Torge… 39.5 17.4 186 3800 ## 3 Adelie Torge… 40.3 18 195 3250 ## 4 Adelie Torge… NA NA NA NA ## 5 Adelie Torge… 36.7 19.3 193 3450 ## 6 Adelie Torge… 39.3 20.6 190 3650 ## 7 Adelie Torge… 38.9 17.8 181 3625 ## 8 Adelie Torge… 39.2 19.6 195 4675 ## 9 Adelie Torge… 34.1 18.1 193 3475 ## 10 Adelie Torge… 42 20.2 190 4250 ## # … with 334 more rows, and 2 more variables: sex <chr>, body_mass_kg <dbl>
Note that creating a new column with mutate() does not change the original dataframe. We get a new dataframe as a tibble.
How to Create two variables with mutate?
We can create two or more new variables using a single mutate function. For example, to create two new columns, we use mutate() fucntions with new variables separated by comma.
In this example below we create two new variables using existing variables.
penguins %>% mutate(body_mass_kg= body_mass_g/1000, flipper_length_m = flipper_length_mm/1000)
## # A tibble: 344 x 9 ## species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g ## <chr> <chr> <dbl> <dbl> <dbl> <dbl> ## 1 Adelie Torge… 39.1 18.7 181 3750 ## 2 Adelie Torge… 39.5 17.4 186 3800 ## 3 Adelie Torge… 40.3 18 195 3250 ## 4 Adelie Torge… NA NA NA NA ## 5 Adelie Torge… 36.7 19.3 193 3450 ## 6 Adelie Torge… 39.3 20.6 190 3650 ## 7 Adelie Torge… 38.9 17.8 181 3625 ## 8 Adelie Torge… 39.2 19.6 195 4675 ## 9 Adelie Torge… 34.1 18.1 193 3475 ## 10 Adelie Torge… 42 20.2 190 4250 ## # … with 334 more rows, and 3 more variables: sex <chr>, body_mass_kg <dbl>, ## # flipper_length_m <dbl>
How To Create a Fresh New Column with dplyr’s mutate
In the above examples, we create one or more new columns from an existing columns. We can use mutate() function to create without using existing column as well.
In this example, we use dplyr’s mutate() function to create new column using row number.
penguins %>% mutate(ID=row_number())
This creates ID column at the end of the dataframe.
## # A tibble: 344 x 8 ## species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g ## <chr> <chr> <dbl> <dbl> <dbl> <dbl> ## 1 Adelie Torge… 39.1 18.7 181 3750 ## 2 Adelie Torge… 39.5 17.4 186 3800 ## 3 Adelie Torge… 40.3 18 195 3250 ## 4 Adelie Torge… NA NA NA NA ## 5 Adelie Torge… 36.7 19.3 193 3450 ## 6 Adelie Torge… 39.3 20.6 190 3650 ## 7 Adelie Torge… 38.9 17.8 181 3625 ## 8 Adelie Torge… 39.2 19.6 195 4675 ## 9 Adelie Torge… 34.1 18.1 193 3475 ## 10 Adelie Torge… 42 20.2 190 4250 ## # … with 334 more rows, and 2 more variables: sex <chr>, ID <int>
How To Overwrite an Existing Column with dplyr’s mutate
We can also use dplyr’s mutate() function to overwrite an existing column. In the example below, we use mutate() function to overwrite the existing “species” variable.
penguins %>% mutate(species= stringr::str_to_upper(species))
We use str_to_upper() function from stringr package to convert the character variable to uppercase variable. Note the values of the first column species is all in upper case now.
## # A tibble: 344 x 7 ## species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g ## <chr> <chr> <dbl> <dbl> <dbl> <dbl> ## 1 ADELIE Torge… 39.1 18.7 181 3750 ## 2 ADELIE Torge… 39.5 17.4 186 3800 ## 3 ADELIE Torge… 40.3 18 195 3250 ## 4 ADELIE Torge… NA NA NA NA ## 5 ADELIE Torge… 36.7 19.3 193 3450 ## 6 ADELIE Torge… 39.3 20.6 190 3650 ## 7 ADELIE Torge… 38.9 17.8 181 3625 ## 8 ADELIE Torge… 39.2 19.6 195 4675 ## 9 ADELIE Torge… 34.1 18.1 193 3475 ## 10 ADELIE Torge… 42 20.2 190 4250 ## # … with 334 more rows, and 1 more variable: sex <chr>
The post dplyr mutate(): Create New Variables with mutate appeared first on Python and R Tips.
from Python and R Tips https://ift.tt/3fKaDCC
via Gabe's MusingsGabe's Musings