dplyr, R package part of tidyverse suite of packages, provides a great set of tools to manipulate datasets in the tabular form. dplyr has a set of core functions for “data munging”,including select(),mutate(), filter(), summarise(), and arrange().
And in this tidyverse tutorial, we will learn how to use dplyr’s arrange() function to sort a data frame in multiple ways. First we will start with how to sort a dataframe by values of a single variable, And then we will learn how to sort a dataframe by more than one variable in the dataframe. By default, dplyr’s arrange() sorts in ascending order, we will also learn to sort in descending order.
Let us get started by loading tidyverse, suite of R packges from RStudio.
library("tidyverse")
We will use the fantastic Penguins dataset to illustrate the three ways to see data in a dataframe. Let us load the data from cmdlinetips.com’ github page.
path2data <- "https://raw.githubusercontent.com/cmdlinetips/data/master/palmer_penguins.csv" penguins<- readr::read_csv(path2data)
## Parsed with column specification: ## cols( ## species = col_character(), ## island = col_character(), ## bill_length_mm = col_double(), ## bill_depth_mm = col_double(), ## flipper_length_mm = col_double(), ## body_mass_g = col_double(), ## sex = col_character() ## )
head(penguins) ## # A tibble: 6 x 7 ## species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g sex ## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr> ## 1 Adelie Torge… 39.1 18.7 181 3750 male ## 2 Adelie Torge… 39.5 17.4 186 3800 fema… ## 3 Adelie Torge… 40.3 18 195 3250 fema… ## 4 Adelie Torge… NA NA NA NA <NA> ## 5 Adelie Torge… 36.7 19.3 193 3450 fema… ## 6 Adelie Torge… 39.3 20.6 190 3650 male
How To Sort a Dataframe by a single Variable with dplyr’s arrange()?
We can use dplyr’s arrange() function to sort a dataframe by one or more variables. Let us say we want to sort Penguins dataframe by its body mass to quickly learn about smallest weighing penguin and its relations to other variables.
We will use pipe operator “%>%” to feed the data to the dplyr function arrange(). We need to specify name of the variable that we want to sort dataframe. In this example, we are sorting by variable “body_mass_g”.
penguins %>% arrange(body_mass_g)
dplyr’s arrange() sorts the dataframe by the variable and outputs a new dataframe (as a tibble). You can notice that the resulting dataframe is different from the original dataframe. We can see that body_mass_g column arranged from smallest to largest values.
## # A tibble: 344 x 7 ## species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g ## <chr> <chr> <dbl> <dbl> <dbl> <dbl> ## 1 Chinst… Dream 46.9 16.6 192 2700 ## 2 Adelie Biscoe 36.5 16.6 181 2850 ## 3 Adelie Biscoe 36.4 17.1 184 2850 ## 4 Adelie Biscoe 34.5 18.1 187 2900 ## 5 Adelie Dream 33.1 16.1 178 2900 ## 6 Adelie Torge… 38.6 17 188 2900 ## 7 Chinst… Dream 43.2 16.6 187 2900 ## 8 Adelie Biscoe 37.9 18.6 193 2925 ## 9 Adelie Dream 37.5 18.9 179 2975 ## 10 Adelie Dream 37 16.9 185 3000 ## # … with 334 more rows, and 1 more variable: sex <chr>
How To Sort or Reorder Rows in Descending Order with dplyr’s arrange()?
By default, dplyr’s arrange() sorts in ascending order. We can sort by a variable in descending order using desc() function on the variable we want to sort by. For example, to sort the dataframe by body_mass_g in descending order we use
penguins %>% arrange(desc(body_mass_g)) ## # A tibble: 344 x 7 ## species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g ## <chr> <chr> <dbl> <dbl> <dbl> <dbl> ## 1 Gentoo Biscoe 49.2 15.2 221 6300 ## 2 Gentoo Biscoe 59.6 17 230 6050 ## 3 Gentoo Biscoe 51.1 16.3 220 6000 ## 4 Gentoo Biscoe 48.8 16.2 222 6000 ## 5 Gentoo Biscoe 45.2 16.4 223 5950 ## 6 Gentoo Biscoe 49.8 15.9 229 5950 ## 7 Gentoo Biscoe 48.4 14.6 213 5850 ## 8 Gentoo Biscoe 49.3 15.7 217 5850 ## 9 Gentoo Biscoe 55.1 16 230 5850 ## 10 Gentoo Biscoe 49.5 16.2 229 5800 ## # … with 334 more rows, and 1 more variable: sex <chr>
How To Sort a Dataframe by Two Variables?
With dplyr’s arrange() function we can sort by more than one variable. To sort or arrange by two variables, we specify the names of two variables as arguments to arrange() function as shown below. Note that the order matters here.
penguins %>% arrange(body_mass_g,flipper_length_mm)
In this example here, we have body_mass_g first and flipper_length_mm second. dplyr’s arrange() sorts by these two variables such that for each value the first variable, dplyr under the good subsets the data and sorts by second variable.
For example, we can see that starting from second row body_mass_g has the same values and the flipper_length is sorted in ascending order.
## # A tibble: 344 x 7 ## species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g ## <chr> <chr> <dbl> <dbl> <dbl> <dbl> ## 1 Chinst… Dream 46.9 16.6 192 2700 ## 2 Adelie Biscoe 36.5 16.6 181 2850 ## 3 Adelie Biscoe 36.4 17.1 184 2850 ## 4 Adelie Dream 33.1 16.1 178 2900 ## 5 Adelie Biscoe 34.5 18.1 187 2900 ## 6 Chinst… Dream 43.2 16.6 187 2900 ## 7 Adelie Torge… 38.6 17 188 2900 ## 8 Adelie Biscoe 37.9 18.6 193 2925 ## 9 Adelie Dream 37.5 18.9 179 2975 ## 10 Adelie Dream 37 16.9 185 3000 ## # … with 334 more rows, and 1 more variable: sex <chr>
Notice the difference in results we get by changing the order of two variables we want to sort by. In the example below we have flipper_length first and body_mass next.
penguins %>% arrange(flipper_length_mm,body_mass_g)
Now our dataframe is first sorted by flipper_length and then by body_mass.
## # A tibble: 344 x 7 ## species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g ## <chr> <chr> <dbl> <dbl> <dbl> <dbl> ## 1 Adelie Biscoe 37.9 18.6 172 3150 ## 2 Adelie Biscoe 37.8 18.3 174 3400 ## 3 Adelie Torge… 40.2 17 176 3450 ## 4 Adelie Dream 33.1 16.1 178 2900 ## 5 Adelie Dream 39.5 16.7 178 3250 ## 6 Chinst… Dream 46.1 18.2 178 3250 ## 7 Adelie Dream 37.2 18.1 178 3900 ## 8 Adelie Dream 37.5 18.9 179 2975 ## 9 Adelie Dream 42.2 18.5 180 3550 ## 10 Adelie Biscoe 37.7 18.7 180 3600 ## # … with 334 more rows, and 1 more variable: sex <chr>
The post dplyr arrange(): Sort/Reorder by One or More Variables appeared first on Python and R Tips.
from Python and R Tips https://ift.tt/3ivjkSs
via Gabe's MusingsGabe's Musings