for (variable in sequence) {
# Code to be executed for each element
}Control Flow - Looping
Looping in R
Looping is a fundamental concept in programming where a set of instructions is executed repeatedly based on a condition or for a fixed number of times. R provides several mechanisms for looping.
1. for Loop
The for loop is used to iterate over a sequence (like a vector or list) and execute a block of code for each element in the sequence.
•variable: A variable that takes the value from the sequence in each iteration.
•sequence: A vector or list over which the loop iterates.
Example 1: Printing Numbers
In this example, a for loop is used to print numbers from 1 to 5.
for (i in 1:5) {
print(i) }[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
Example 2: Calculating the Factorial of a Number
This example demonstrates the use of a for loop to calculate the factorial of a number.
number <- 5
factorial <- 1
for (i in 1:number) {
factorial <- factorial * i
}
print(paste("The factorial of", number, "is", factorial))[1] "The factorial of 5 is 120"
Example 3: Summing the Elements of a Vector
Here, a for loop is used to calculate the sum of the elements of a vector.
numbers <- c(2, 4, 6, 8, 10)
sum_numbers <- 0
for (num in numbers) {
sum_numbers <- sum_numbers + num
}
print(paste("The sum of the numbers is", sum_numbers))[1] "The sum of the numbers is 30"
Example 4: Changing data type to factor level
# List of variables to convert to factor
variables_to_convert <- c("cyl", "am", "vs", "gear", "carb")
# Using a for loop to convert each variable
for (var in variables_to_convert) {
mtcars[[var]] <- as.factor(mtcars[[var]])
}
# Checking the structure of the modified dataset
str(mtcars)'data.frame': 32 obs. of 11 variables:
$ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
$ cyl : Factor w/ 3 levels "4","6","8": 2 2 1 2 3 2 3 1 1 2 ...
$ disp: num 160 160 108 258 360 ...
$ hp : num 110 110 93 110 175 105 245 62 95 123 ...
$ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
$ wt : num 2.62 2.88 2.32 3.21 3.44 ...
$ qsec: num 16.5 17 18.6 19.4 17 ...
$ vs : Factor w/ 2 levels "0","1": 1 1 2 2 1 2 1 2 2 2 ...
$ am : Factor w/ 2 levels "0","1": 2 2 2 1 1 1 1 1 1 1 ...
$ gear: Factor w/ 3 levels "3","4","5": 2 2 2 1 1 1 1 2 2 2 ...
$ carb: Factor w/ 6 levels "1","2","3","4",..: 4 4 1 1 2 1 4 2 2 4 ...
Example 5: Calculating mean for all numerical variables
# Create an empty vector to store the means
means <- numeric()
# Loop through each column in mtcars
for (col_name in names(mtcars)) {
# Check if the column is numeric
if (is.numeric(mtcars[[col_name]])) {
# Compute the mean and store it in the means vector
means[col_name] <- mean(mtcars[[col_name]], na.rm = TRUE)
}
}
# Print the means
print(means) mpg disp hp drat wt qsec
20.090625 230.721875 146.687500 3.596563 3.217250 17.848750
Example 6: Calculating mean and standard deviation for all numerical variables
# Create empty lists to store the means and standard deviations
means <- numeric()
sds <- numeric()
# Loop through each column in mtcars
for (col_name in names(mtcars)) {
# Check if the column is numeric
if (is.numeric(mtcars[[col_name]])) {
# Compute the mean and store it in the means list
means[col_name] <- mean(mtcars[[col_name]], na.rm = TRUE)
# Compute the standard deviation and store it in the sds list
sds[col_name] <- sd(mtcars[[col_name]], na.rm = TRUE)
}
}
# Print the results
cat("Means:\n")Means:
print(means) mpg disp hp drat wt qsec
20.090625 230.721875 146.687500 3.596563 3.217250 17.848750
cat("\nStandard Deviations:\n")
Standard Deviations:
print(sds) mpg disp hp drat wt qsec
6.0269481 123.9386938 68.5628685 0.5346787 0.9784574 1.7869432
2. Nested for loop
Why Use Nested Loops?
Imagine you’re handling multiple datasets, and within each dataset, you have several variables. If you need to compute specific statistics for each variable across all datasets, a nested loop becomes an efficient solution. The outer loop can iterate over datasets, while the inner loop handles each variable within the current dataset.
Example 7: Finding mean, sd and median for mtcars and iris dataset
# List of datasets to process
datasets_list <- list(mtcars=mtcars, iris=iris)
# Statistical measures to compute
measures <- c("mean", "sd", "median")
# Loop through each dataset
for (dataset_name in names(datasets_list)) {
cat(paste("\nStatistics for dataset:", dataset_name, "\n"))
cat("--------------------------------------------------\n")
# Loop through each column in the dataset
for (col_name in names(datasets_list[[dataset_name]])) {
# Check if the column is numeric
if (is.numeric(datasets_list[[dataset_name]][[col_name]])) {
cat(paste("\nColumn:", col_name, "\n"))
cat("--------------------------------------------------\n")
# Loop through each statistical measure
for (measure in measures) {
if (measure == "mean") {
value <- mean(datasets_list[[dataset_name]][[col_name]], na.rm = TRUE)
} else if (measure == "sd") {
value <- sd(datasets_list[[dataset_name]][[col_name]], na.rm = TRUE)
} else if (measure == "median") {
value <- median(datasets_list[[dataset_name]][[col_name]], na.rm = TRUE)
}
cat(paste(measure, ":", round(value, 2), "\n"))
}
}
}
}
Statistics for dataset: mtcars
--------------------------------------------------
Column: mpg
--------------------------------------------------
mean : 20.09
sd : 6.03
median : 19.2
Column: disp
--------------------------------------------------
mean : 230.72
sd : 123.94
median : 196.3
Column: hp
--------------------------------------------------
mean : 146.69
sd : 68.56
median : 123
Column: drat
--------------------------------------------------
mean : 3.6
sd : 0.53
median : 3.7
Column: wt
--------------------------------------------------
mean : 3.22
sd : 0.98
median : 3.33
Column: qsec
--------------------------------------------------
mean : 17.85
sd : 1.79
median : 17.71
Statistics for dataset: iris
--------------------------------------------------
Column: Sepal.Length
--------------------------------------------------
mean : 5.84
sd : 0.83
median : 5.8
Column: Sepal.Width
--------------------------------------------------
mean : 3.06
sd : 0.44
median : 3
Column: Petal.Length
--------------------------------------------------
mean : 3.76
sd : 1.77
median : 4.35
Column: Petal.Width
--------------------------------------------------
mean : 1.2
sd : 0.76
median : 1.3
3. while loop
The while loop in R is used to execute a block of code repeatedly as long as specified condition if TRUE. It is particularly useful when the number of iterations is not known beforehand.
The basic syntax of a while loop in R is as follows:
while (condition) {
# code to be executed
}condition: A logical expression that is evaluated before the execution of the loop’s body. The loop runs as long as the condition is TRUE.
Example 8: Printing Numbers
In this example, a while loop is used to print numbers from 1 to 5.
i <- 1
while (i <= 5) {
print(i)
i <- i + 1
}[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
Example 9: Calculating mean of the variables
data <- mtcars
i <- 1
column_names <- names(data)
while (i <= length(column_names)) {
# Check if the column is numeric
if (is.numeric(data[[i]])) {
mean_val <- mean(data[[i]], na.rm = TRUE)
cat("Mean of", column_names[i], "is:", round(mean_val, 2), "\n")
}
i <- i + 1
}Mean of mpg is: 20.09
Mean of disp is: 230.72
Mean of hp is: 146.69
Mean of drat is: 3.6
Mean of wt is: 3.22
Mean of qsec is: 17.85
Example 10: Calculating mean and standard deviation in mtcars dataset
# Load the mtcars dataset
data <- mtcars
col_num <- 1
# Vectors to store the computed means and standard deviations
means <- numeric()
std_devs <- numeric()
while (col_num <= ncol(data)) {
# Check if the column is numeric (All columns in mtcars are numeric, but it's still good to check for generalizability)
if (is.numeric(data[, col_num])) {
# Calculate the mean
mean_val <- mean(data[, col_num], na.rm = TRUE)
# Calculate the standard deviation
sd_val <- sd(data[, col_num], na.rm = TRUE)
# Store the computed values
means <- c(means, mean_val)
std_devs <- c(std_devs, sd_val)
} else {
means <- c(means, NA) # If not numeric, store NA
std_devs <- c(std_devs, NA)
}
col_num <- col_num + 1
}
# Print the means and standard deviations for each variable
cat("Means for each variable:\n")Means for each variable:
names(means) <- names(data)
print(means) mpg cyl disp hp drat wt qsec
20.090625 NA 230.721875 146.687500 3.596563 3.217250 17.848750
vs am gear carb
NA NA NA NA
cat("\nStandard Deviations for each variable:\n")
Standard Deviations for each variable:
names(std_devs) <- names(data)
print(std_devs) mpg cyl disp hp drat wt
6.0269481 NA 123.9386938 68.5628685 0.5346787 0.9784574
qsec vs am gear carb
1.7869432 NA NA NA NA
Example 11: Detecting missing values
df <- data.frame(A = c(1, 2, NA, 4, 5),
B = c(NA, 2, 3, 4, NA))
row_num <- 1
col_num <- 1
while (row_num <= nrow(df)) {
col_num <- 1
while (col_num <= ncol(df)) {
if (is.na(df[row_num, col_num])) {
cat("Missing value detected at row", row_num, "and column", col_num, "\n")
}
col_num <- col_num + 1
}
row_num <- row_num + 1
}Missing value detected at row 1 and column 2
Missing value detected at row 3 and column 1
Missing value detected at row 5 and column 2
Example 12: Detecting missing values from airquality dataset
# Load the airquality dataset
library(datasets)
data <- airquality
row_num <- 1
missing_data_positions <- list()
while (row_num <= nrow(data)) {
col_num <- 1
while (col_num <= ncol(data)) {
if (is.na(data[row_num, col_num])) {
missing_data_positions <- append(missing_data_positions, list(c(row_num, col_num)))
}
col_num <- col_num + 1
}
row_num <- row_num + 1
}
# Print the positions of missing values
if (length(missing_data_positions) > 0) {
cat("Missing values detected at the following positions (row, column):\n")
for (position in missing_data_positions) {
cat("Row", position[1], "Column", position[2], "\n")
}
} else {
cat("No missing values detected.\n")
}Missing values detected at the following positions (row, column):
Row 5 Column 1
Row 5 Column 2
Row 6 Column 2
Row 10 Column 1
Row 11 Column 2
Row 25 Column 1
Row 26 Column 1
Row 27 Column 1
Row 27 Column 2
Row 32 Column 1
Row 33 Column 1
Row 34 Column 1
Row 35 Column 1
Row 36 Column 1
Row 37 Column 1
Row 39 Column 1
Row 42 Column 1
Row 43 Column 1
Row 45 Column 1
Row 46 Column 1
Row 52 Column 1
Row 53 Column 1
Row 54 Column 1
Row 55 Column 1
Row 56 Column 1
Row 57 Column 1
Row 58 Column 1
Row 59 Column 1
Row 60 Column 1
Row 61 Column 1
Row 65 Column 1
Row 72 Column 1
Row 75 Column 1
Row 83 Column 1
Row 84 Column 1
Row 96 Column 2
Row 97 Column 2
Row 98 Column 2
Row 102 Column 1
Row 103 Column 1
Row 107 Column 1
Row 115 Column 1
Row 119 Column 1
Row 150 Column 1
Example 13: Detecting missing values from airquality dataset by total
# Load the airquality dataset
library(datasets)
data <- airquality
col_num <- 1
missing_counts <- numeric()
while (col_num <= ncol(data)) {
# Count the missing values in the current column
missing_count <- sum(is.na(data[, col_num]))
# Store the count in the missing_counts vector
missing_counts <- c(missing_counts, missing_count)
col_num <- col_num + 1
}
# Print the counts of missing values for each variable
names(missing_counts) <- names(data)
print(missing_counts) Ozone Solar.R Wind Temp Month Day
37 7 0 0 0 0
4. repeat loop
The repeat loop in R is used to execute a block of code indefinitely until a break statement is encountered. It is useful in situations where the number of iterations is not known beforehand, and the loop should continue until a specific conditions is met.
The basic syntax of a repeat loop in R is as follows:
repeat {
# code to be executed
if (condition){
break
}
}condition: A logical expression. If TRUE, the break statement is executed, and the loop is terminated.
Example 14: Generating Random Numbers
In this example, a repeat loop is used to generate random numbers until a number greater than 0.9 is generated.
set.seed(123456)
repeat{
number <- runif(1) #Generate a random number between 0 and 1
print(number)
if (number > 0.9){
break
}
}[1] 0.7977843
[1] 0.7535651
[1] 0.3912557
[1] 0.3415567
[1] 0.3612941
[1] 0.1983447
[1] 0.534858
[1] 0.09652624
[1] 0.9878469
Example 15: Calculating mean for each numeric variable in airquality dataset
# Load the airquality dataset
data <- airquality
col_num <- 1
means <- numeric()
repeat {
# Check if all columns have been processed
if (col_num > ncol(data)) {
break
}
# Check if the column is numeric
if (is.numeric(data[, col_num])) {
# Calculate the mean
mean_val <- mean(data[, col_num], na.rm = TRUE)
# Store the mean value in the means vector
means <- c(means, mean_val)
} else {
means <- c(means, NA) # If not numeric, store NA
}
col_num <- col_num + 1
}
# Print the means for each variable
names(means) <- names(data)
print(means) Ozone Solar.R Wind Temp Month Day
42.129310 185.931507 9.957516 77.882353 6.993464 15.803922
Example 16: Calculating mean for each numeric variable in mtcars dataset
# Load the mtcars dataset
data <- mtcars[ , c("mpg","disp", "hp", "drat", "wt", "qsec") ]
col_num <- 1
means <- numeric()
repeat {
# Check if all columns have been processed
if (col_num > ncol(data)) {
break
}
# Since all columns in mtcars are numeric, we can directly calculate the mean
mean_val <- mean(data[, col_num], na.rm = TRUE)
# Store the mean value in the means vector
means <- c(means, mean_val)
col_num <- col_num + 1
}
# Print the means for each variable
names(means) <- names(data)
print(means) mpg disp hp drat wt qsec
20.090625 230.721875 146.687500 3.596563 3.217250 17.848750
Example 17: Calculating mean and standard deviation for each numeric variable in mtcars dataset
# Load the mtcars dataset
data <- mtcars[ , c("mpg","disp", "hp", "drat", "wt", "qsec") ]
col_num <- 1
result <- data.frame(Variable = character(), Mean = numeric(), StdDev = numeric())
repeat {
# Check if all columns have been processed
if (col_num > ncol(data)) {
break
}
# Calculate the mean and standard deviation
mean_val <- mean(data[, col_num], na.rm = TRUE)
sd_val <- sd(data[, col_num], na.rm = TRUE)
# Store the results
result <- rbind(result, data.frame(Variable = names(data)[col_num], Mean = mean_val, StdDev = sd_val))
col_num <- col_num + 1
}
# Print the results
print(result) Variable Mean StdDev
1 mpg 20.090625 6.0269481
2 disp 230.721875 123.9386938
3 hp 146.687500 68.5628685
4 drat 3.596563 0.5346787
5 wt 3.217250 0.9784574
6 qsec 17.848750 1.7869432
Example 18: Detecting missing values in airquality dataset
# Load the airquality dataset
data <- airquality
col_num <- 1
missing_counts <- numeric()
repeat {
# Check if all columns have been processed
if (col_num > ncol(data)) {
break
}
# Count the missing values in the current column
missing_count <- sum(is.na(data[, col_num]))
# Store the count in the missing_counts vector
missing_counts <- c(missing_counts, missing_count)
col_num <- col_num + 1
}
# Print the counts of missing values for each variable
names(missing_counts) <- names(data)
print(missing_counts) Ozone Solar.R Wind Temp Month Day
37 7 0 0 0 0
5. next loop
In R, the next statement is used within loop structures to skip the current iteration and proceed to the next iteration of the loop. It is useful for bypassing specific conditions within a loop without exiting the entire loop.
The basic syntax of the next statement in R is as follows:
for (value in sequence) {
if (condition) {
next
} # code to be executed
}condition: A logical expression. If TRUE, the next statement is executed, and the current iteration is skipped.
Example 19: Skipping even numbers
In this example, a for loop and next statement are used to print only the odd numbers from a sequence.
for (i in 1:10) {
if (i %% 2 == 0) {
next
}
print(i)
}[1] 1
[1] 3
[1] 5
[1] 7
[1] 9
Example 20: Calculate mean for all numerical variables in airquality dataset
# Load the airquality dataset
data <- airquality
col_num <- 1
means <- numeric()
repeat {
# Check if all columns have been processed
if (col_num > ncol(data)) {
break
}
# Check if the column is numeric; if not, skip to the next iteration
if (!is.numeric(data[, col_num])) {
col_num <- col_num + 1
next
}
# Calculate the mean
mean_val <- mean(data[, col_num], na.rm = TRUE)
# Store the mean value in the means vector
means <- c(means, mean_val)
col_num <- col_num + 1
}
# Print the means for each variable
names(means) <- names(data)
print(means) Ozone Solar.R Wind Temp Month Day
42.129310 185.931507 9.957516 77.882353 6.993464 15.803922
Example 21: Calculate mean for all numerical variables in mtcars dataset
# Load the mtcars dataset
data <- mtcars[ , c("mpg","disp", "hp", "drat", "wt", "qsec") ]
col_num <- 1
means <- numeric()
repeat {
# Check if all columns have been processed
if (col_num > ncol(data)) {
break
}
# Check if the column is numeric; if not, skip to the next iteration
if (!is.numeric(data[, col_num])) {
col_num <- col_num + 1
next
}
# Calculate the mean
mean_val <- mean(data[, col_num], na.rm = TRUE)
# Store the mean value in the means vector
means <- c(means, mean_val)
col_num <- col_num + 1
}
# Print the means for each variable
names(means) <- names(data)
print(means) mpg disp hp drat wt qsec
20.090625 230.721875 146.687500 3.596563 3.217250 17.848750
Example 22: Calculate mean and standard deviation for all numerical variables in mtcars dataset
# Load the mtcars dataset
data <- mtcars[ , c("mpg","disp", "hp", "drat", "wt", "qsec") ]
col_num <- 1
means <- numeric()
std_devs <- numeric()
repeat {
# Check if all columns have been processed
if (col_num > ncol(data)) {
break
}
# Check if the column is numeric; if not, skip to the next iteration
if (!is.numeric(data[, col_num])) {
col_num <- col_num + 1
next
}
# Calculate the mean and standard deviation
mean_val <- mean(data[, col_num], na.rm = TRUE)
sd_val <- sd(data[, col_num], na.rm = TRUE)
# Store the computed values
means <- c(means, mean_val)
std_devs <- c(std_devs, sd_val)
col_num <- col_num + 1
}
# Print the means and standard deviations for each variable
result <- data.frame(Variable = names(data), Mean = means, StdDev = std_devs)
print(result) Variable Mean StdDev
1 mpg 20.090625 6.0269481
2 disp 230.721875 123.9386938
3 hp 146.687500 68.5628685
4 drat 3.596563 0.5346787
5 wt 3.217250 0.9784574
6 qsec 17.848750 1.7869432
Example 23: Detecting missing values in airquality dataset
# Load the airquality dataset
data <- airquality
col_num <- 1
missing_counts <- numeric()
repeat {
# Check if all columns have been processed
if (col_num > ncol(data)) {
break
}
# Check if the column is numeric; if not, skip to the next iteration
if (!is.numeric(data[, col_num])) {
col_num <- col_num + 1
next
}
# Count the missing values in the current column
missing_count <- sum(is.na(data[, col_num]))
# Store the count in the missing_counts vector
missing_counts <- c(missing_counts, missing_count)
col_num <- col_num + 1
}
# Print the counts of missing values for each variable
names(missing_counts) <- names(data)
print(missing_counts) Ozone Solar.R Wind Temp Month Day
37 7 0 0 0 0
Exercise
Exercise 1 - Basic loop operation with iris dataset
a. Using a for loop, calculate the median of each numeric variable in the iris dataset.
b. with a while loop, find the range (minimum and maximum values) of each numeric variable in the iris dataset.
c. Employ a repeat loop to count the number of unique species in the iris dataset.
Exercise 2: Handling Missing values with datasets::state.x77
Note: For this exercise, we first introduce some missing values randomly into the state.x77 dataset.
a. Convert the datasets::state.x77 matrix into a dataframe and introduce missing values.
b. Using a for loop, detect columns that have missing values and report the count.
c. Implement a while loop to replace missing values in the dataframe with the mean of their respective columns.
d. Utilize a repeat loop to compute the standard deviation for each numeric column in the dataframe, and use the next statement to skip over columns that have more than 10 missing values.
Exercise 3: Advanced loop exercises with datasets::trees
a. Utilize a for loop to compute the variance of each numeric column in the trees dataset.
b. Implement a while loop to normalize each numeric column in the trees dataset (subtract mean and divide by standard deviation).
c. Using a repeat loop, count the number of rows in the trees dataset where the Volume exceeds 1.5. Terminate the loop once you’ve scanned all rows.
Exercise 4: Loop control with datasets::USArrests
a. Employ a for loop to find the state with the highest Murder rate. Print the state name and its rate.
b. Utilize a while loop to compute the median Assault rate across states.
c. Implement a repeat loop to find the average UrbanPop value. If the average exceeds 65, break out of the loop and print a message indicating high urban population.
d. In a loop of your choice, iterate over each column and compute the sum. Use the next statement to skip over the Rape column.