The apply() collection is bundled with r essential package if you install R with Anaconda. The apply in R function can be feed with many functions to perform redundant application on a collection of object (data frame, list, vector, etc.). The purpose of apply() is primarily to avoid explicit uses of loop constructs. They can be used for an input list, matrix or array and apply a function. Any function can be passed into apply(). In this tutorial, you will learn

apply() function in R with Example lapply( )function in R with Example sapply() function in R with Example Slice Vector tapply() function in R with Example

This function takes 3 arguments:

apply(X, MARGIN, FUN) Here: -x: an array or matrix -MARGIN: take a value or range between 1 and 2 to define where to apply the function: -MARGIN=1: the manipulation is performed on rows -MARGIN=2: the manipulation is performed on columns -MARGIN=c(1,2)` the manipulation is performed on rows and columns -FUN: tells which function to apply. Built functions like mean, median, sum, min, max and even user-defined functions can be applied>

m1 <- matrix(C<-(1:10),nrow=5, ncol=6) m1 a_m1 <- apply(m1, 2, sum) a_m1

Output:

Best practice: Store the values before printing it to the console.

lapply(X, FUN) Arguments: -X: A vector or an object -FUN: Function applied to each element of x

l in lapply() stands for list. The difference between lapply() and apply() lies between the output return. The output of lapply() is a list. lapply() can be used for other objects like data frames and lists. lapply() function does not need MARGIN. A very easy example can be to change the string value of a matrix to lower case with tolower function. We construct a matrix with the name of the famous movies. The name is in upper case format.

movies <- c(“SPYDERMAN”,“BATMAN”,“VERTIGO”,“CHINATOWN”) movies_lower <-lapply(movies, tolower) str(movies_lower)

List of 4

$:chr"spyderman"

$:chr"batman"

$:chr"vertigo"

$:chr"chinatown"

We can use unlist() to convert the list into a vector.

movies_lower <-unlist(lapply(movies,tolower)) str(movies_lower)

Output:

chr [1:4] “spyderman” “batman” “vertigo” “chinatown”

sapply(X, FUN) Arguments: -X: A vector or an object -FUN: Function applied to each element of x

We can measure the minimum speed and stopping distances of cars from the cars dataset.

dt <- cars lmn_cars <- lapply(dt, min) smn_cars <- sapply(dt, min) lmn_cars

Output:

$speed

[1] 4

$dist

[1] 2

smn_cars

Output:

speed dist

4 2

lmxcars <- lapply(dt, max) smxcars <- sapply(dt, max) lmxcars

Output:

$speed

[1] 25

$dist

[1] 120

smxcars

Output:

speed dist

25 120

We can use a user built-in function into lapply() or sapply(). We create a function named avg to compute the average of the minimum and maximum of the vector.

avg <- function(x) {
( min(x) + max(x) ) / 2} fcars <- sapply(dt, avg) fcars

speed dist

14.5 61.0

Sapply in R is more efficient than lapply() in the output returned because sapply() store values direclty into a vector. In the next example, we will see this is not always the case. We can summarize the difference between apply(), sapply() and `lapply() in the following table:

Slice vector

We can use lapply() or sapply() interchangeable to slice a data frame. We create a function, below_average(), that takes a vector of numerical values and returns a vector that only contains the values that are strictly above the average. We compare both results with the identical() function.

below_ave <- function(x) {
ave <- mean(x) return(x[x > ave]) } dt_s<- sapply(dt, below_ave) dt_l<- lapply(dt, below_ave) identical(dt_s, dt_l)

Output:

[1] TRUE

tapply(X, INDEX, FUN = NULL) Arguments: -X: An object, usually a vector -INDEX: A list containing factor -FUN: Function applied to each element of x

Part of the job of a data scientist or researchers is to compute summaries of variables. For instance, measure the average or group data based on a characteristic. Most of the data are grouped by ID, city, countries, and so on. Summarizing over group reveals more interesting patterns. To understand how it works, let’s use the iris dataset. This dataset is very famous in the world of machine learning. The purpose of this dataset is to predict the class of each of the three flower species: Sepal, Versicolor, Virginica. The dataset collects information for each species about their length and width. As a prior work, we can compute the median of the length for each species. Tapply in R is a quick way to perform this computation.

data(iris) tapply(iris$Sepal.Width, iris$Species, median)

Output:

setosa versicolor virginica

3.4 2.8 3.0