for (variable in sequence) {
# Code to be executed for each element
}
Control Flow - Looping
Looping in R
Looping is a fundamental concept in programming where a set of instructions is executed repeatedly based on a condition or for a fixed number of times. R provides several mechanisms for looping.
1. for
Loop
The for
loop is used to iterate over a sequence (like a vector or list) and execute a block of code for each element in the sequence.
•variable: A variable that takes the value from the sequence in each iteration.
•sequence: A vector or list over which the loop iterates.
Example 1: Printing Numbers
In this example, a for loop is used to print numbers from 1 to 5.
for (i in 1:5) {
print(i) }
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
Example 2: Calculating the Factorial of a Number
This example demonstrates the use of a for loop to calculate the factorial of a number.
<- 5
number <- 1
factorial for (i in 1:number) {
<- factorial * i
factorial
} print(paste("The factorial of", number, "is", factorial))
[1] "The factorial of 5 is 120"
Example 3: Summing the Elements of a Vector
Here, a for loop is used to calculate the sum of the elements of a vector.
<- c(2, 4, 6, 8, 10)
numbers <- 0
sum_numbers for (num in numbers) {
<- sum_numbers + num
sum_numbers
} print(paste("The sum of the numbers is", sum_numbers))
[1] "The sum of the numbers is 30"
Example 4: Changing data type to factor level
# List of variables to convert to factor
<- c("cyl", "am", "vs", "gear", "carb")
variables_to_convert
# Using a for loop to convert each variable
for (var in variables_to_convert) {
<- as.factor(mtcars[[var]])
mtcars[[var]]
}
# Checking the structure of the modified dataset
str(mtcars)
'data.frame': 32 obs. of 11 variables:
$ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
$ cyl : Factor w/ 3 levels "4","6","8": 2 2 1 2 3 2 3 1 1 2 ...
$ disp: num 160 160 108 258 360 ...
$ hp : num 110 110 93 110 175 105 245 62 95 123 ...
$ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
$ wt : num 2.62 2.88 2.32 3.21 3.44 ...
$ qsec: num 16.5 17 18.6 19.4 17 ...
$ vs : Factor w/ 2 levels "0","1": 1 1 2 2 1 2 1 2 2 2 ...
$ am : Factor w/ 2 levels "0","1": 2 2 2 1 1 1 1 1 1 1 ...
$ gear: Factor w/ 3 levels "3","4","5": 2 2 2 1 1 1 1 2 2 2 ...
$ carb: Factor w/ 6 levels "1","2","3","4",..: 4 4 1 1 2 1 4 2 2 4 ...
Example 5: Calculating mean for all numerical variables
# Create an empty vector to store the means
<- numeric()
means
# Loop through each column in mtcars
for (col_name in names(mtcars)) {
# Check if the column is numeric
if (is.numeric(mtcars[[col_name]])) {
# Compute the mean and store it in the means vector
<- mean(mtcars[[col_name]], na.rm = TRUE)
means[col_name]
}
}
# Print the means
print(means)
mpg disp hp drat wt qsec
20.090625 230.721875 146.687500 3.596563 3.217250 17.848750
Example 6: Calculating mean and standard deviation for all numerical variables
# Create empty lists to store the means and standard deviations
<- numeric()
means <- numeric()
sds
# Loop through each column in mtcars
for (col_name in names(mtcars)) {
# Check if the column is numeric
if (is.numeric(mtcars[[col_name]])) {
# Compute the mean and store it in the means list
<- mean(mtcars[[col_name]], na.rm = TRUE)
means[col_name] # Compute the standard deviation and store it in the sds list
<- sd(mtcars[[col_name]], na.rm = TRUE)
sds[col_name]
}
}
# Print the results
cat("Means:\n")
Means:
print(means)
mpg disp hp drat wt qsec
20.090625 230.721875 146.687500 3.596563 3.217250 17.848750
cat("\nStandard Deviations:\n")
Standard Deviations:
print(sds)
mpg disp hp drat wt qsec
6.0269481 123.9386938 68.5628685 0.5346787 0.9784574 1.7869432
2. Nested for
loop
Why Use Nested Loops?
Imagine you’re handling multiple datasets, and within each dataset, you have several variables. If you need to compute specific statistics for each variable across all datasets, a nested loop becomes an efficient solution. The outer loop can iterate over datasets, while the inner loop handles each variable within the current dataset.
Example 7: Finding mean, sd and median for mtcars and iris dataset
# List of datasets to process
<- list(mtcars=mtcars, iris=iris)
datasets_list
# Statistical measures to compute
<- c("mean", "sd", "median")
measures
# Loop through each dataset
for (dataset_name in names(datasets_list)) {
cat(paste("\nStatistics for dataset:", dataset_name, "\n"))
cat("--------------------------------------------------\n")
# Loop through each column in the dataset
for (col_name in names(datasets_list[[dataset_name]])) {
# Check if the column is numeric
if (is.numeric(datasets_list[[dataset_name]][[col_name]])) {
cat(paste("\nColumn:", col_name, "\n"))
cat("--------------------------------------------------\n")
# Loop through each statistical measure
for (measure in measures) {
if (measure == "mean") {
<- mean(datasets_list[[dataset_name]][[col_name]], na.rm = TRUE)
value else if (measure == "sd") {
} <- sd(datasets_list[[dataset_name]][[col_name]], na.rm = TRUE)
value else if (measure == "median") {
} <- median(datasets_list[[dataset_name]][[col_name]], na.rm = TRUE)
value
}
cat(paste(measure, ":", round(value, 2), "\n"))
}
}
} }
Statistics for dataset: mtcars
--------------------------------------------------
Column: mpg
--------------------------------------------------
mean : 20.09
sd : 6.03
median : 19.2
Column: disp
--------------------------------------------------
mean : 230.72
sd : 123.94
median : 196.3
Column: hp
--------------------------------------------------
mean : 146.69
sd : 68.56
median : 123
Column: drat
--------------------------------------------------
mean : 3.6
sd : 0.53
median : 3.7
Column: wt
--------------------------------------------------
mean : 3.22
sd : 0.98
median : 3.33
Column: qsec
--------------------------------------------------
mean : 17.85
sd : 1.79
median : 17.71
Statistics for dataset: iris
--------------------------------------------------
Column: Sepal.Length
--------------------------------------------------
mean : 5.84
sd : 0.83
median : 5.8
Column: Sepal.Width
--------------------------------------------------
mean : 3.06
sd : 0.44
median : 3
Column: Petal.Length
--------------------------------------------------
mean : 3.76
sd : 1.77
median : 4.35
Column: Petal.Width
--------------------------------------------------
mean : 1.2
sd : 0.76
median : 1.3
3. while
loop
The while
loop in R is used to execute a block of code repeatedly as long as specified condition if TRUE
. It is particularly useful when the number of iterations is not known beforehand.
The basic syntax of a while
loop in R is as follows:
while (condition) {
# code to be executed
}
condition: A logical expression that is evaluated before the execution of the loop’s body. The loop runs as long as the condition is TRUE
.
Example 8: Printing Numbers
In this example, a while loop is used to print numbers from 1 to 5.
<- 1
i while (i <= 5) {
print(i)
<- i + 1
i }
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
Example 9: Calculating mean of the variables
<- mtcars
data
<- 1
i <- names(data)
column_names
while (i <= length(column_names)) {
# Check if the column is numeric
if (is.numeric(data[[i]])) {
<- mean(data[[i]], na.rm = TRUE)
mean_val cat("Mean of", column_names[i], "is:", round(mean_val, 2), "\n")
}<- i + 1
i }
Mean of mpg is: 20.09
Mean of disp is: 230.72
Mean of hp is: 146.69
Mean of drat is: 3.6
Mean of wt is: 3.22
Mean of qsec is: 17.85
Example 10: Calculating mean and standard deviation in mtcars dataset
# Load the mtcars dataset
<- mtcars
data
<- 1
col_num
# Vectors to store the computed means and standard deviations
<- numeric()
means <- numeric()
std_devs
while (col_num <= ncol(data)) {
# Check if the column is numeric (All columns in mtcars are numeric, but it's still good to check for generalizability)
if (is.numeric(data[, col_num])) {
# Calculate the mean
<- mean(data[, col_num], na.rm = TRUE)
mean_val # Calculate the standard deviation
<- sd(data[, col_num], na.rm = TRUE)
sd_val
# Store the computed values
<- c(means, mean_val)
means <- c(std_devs, sd_val)
std_devs else {
} <- c(means, NA) # If not numeric, store NA
means <- c(std_devs, NA)
std_devs
}
<- col_num + 1
col_num
}
# Print the means and standard deviations for each variable
cat("Means for each variable:\n")
Means for each variable:
names(means) <- names(data)
print(means)
mpg cyl disp hp drat wt qsec
20.090625 NA 230.721875 146.687500 3.596563 3.217250 17.848750
vs am gear carb
NA NA NA NA
cat("\nStandard Deviations for each variable:\n")
Standard Deviations for each variable:
names(std_devs) <- names(data)
print(std_devs)
mpg cyl disp hp drat wt
6.0269481 NA 123.9386938 68.5628685 0.5346787 0.9784574
qsec vs am gear carb
1.7869432 NA NA NA NA
Example 11: Detecting missing values
<- data.frame(A = c(1, 2, NA, 4, 5),
df B = c(NA, 2, 3, 4, NA))
<- 1
row_num <- 1
col_num
while (row_num <= nrow(df)) {
<- 1
col_num while (col_num <= ncol(df)) {
if (is.na(df[row_num, col_num])) {
cat("Missing value detected at row", row_num, "and column", col_num, "\n")
}<- col_num + 1
col_num
}<- row_num + 1
row_num }
Missing value detected at row 1 and column 2
Missing value detected at row 3 and column 1
Missing value detected at row 5 and column 2
Example 12: Detecting missing values from airquality
dataset
# Load the airquality dataset
library(datasets)
<- airquality
data
<- 1
row_num <- list()
missing_data_positions
while (row_num <= nrow(data)) {
<- 1
col_num while (col_num <= ncol(data)) {
if (is.na(data[row_num, col_num])) {
<- append(missing_data_positions, list(c(row_num, col_num)))
missing_data_positions
}<- col_num + 1
col_num
}<- row_num + 1
row_num
}
# Print the positions of missing values
if (length(missing_data_positions) > 0) {
cat("Missing values detected at the following positions (row, column):\n")
for (position in missing_data_positions) {
cat("Row", position[1], "Column", position[2], "\n")
}else {
} cat("No missing values detected.\n")
}
Missing values detected at the following positions (row, column):
Row 5 Column 1
Row 5 Column 2
Row 6 Column 2
Row 10 Column 1
Row 11 Column 2
Row 25 Column 1
Row 26 Column 1
Row 27 Column 1
Row 27 Column 2
Row 32 Column 1
Row 33 Column 1
Row 34 Column 1
Row 35 Column 1
Row 36 Column 1
Row 37 Column 1
Row 39 Column 1
Row 42 Column 1
Row 43 Column 1
Row 45 Column 1
Row 46 Column 1
Row 52 Column 1
Row 53 Column 1
Row 54 Column 1
Row 55 Column 1
Row 56 Column 1
Row 57 Column 1
Row 58 Column 1
Row 59 Column 1
Row 60 Column 1
Row 61 Column 1
Row 65 Column 1
Row 72 Column 1
Row 75 Column 1
Row 83 Column 1
Row 84 Column 1
Row 96 Column 2
Row 97 Column 2
Row 98 Column 2
Row 102 Column 1
Row 103 Column 1
Row 107 Column 1
Row 115 Column 1
Row 119 Column 1
Row 150 Column 1
Example 13: Detecting missing values from airquality
dataset by total
# Load the airquality dataset
library(datasets)
<- airquality
data
<- 1
col_num <- numeric()
missing_counts
while (col_num <= ncol(data)) {
# Count the missing values in the current column
<- sum(is.na(data[, col_num]))
missing_count
# Store the count in the missing_counts vector
<- c(missing_counts, missing_count)
missing_counts
<- col_num + 1
col_num
}
# Print the counts of missing values for each variable
names(missing_counts) <- names(data)
print(missing_counts)
Ozone Solar.R Wind Temp Month Day
37 7 0 0 0 0
4. repeat
loop
The repeat
loop in R is used to execute a block of code indefinitely until a break statement is encountered. It is useful in situations where the number of iterations is not known beforehand, and the loop should continue until a specific conditions is met.
The basic syntax of a repeat
loop in R is as follows:
repeat {
# code to be executed
if (condition){
break
} }
condition: A logical expression. If TRUE
, the break
statement is executed, and the loop is terminated.
Example 14: Generating Random Numbers
In this example, a repeat loop is used to generate random numbers until a number greater than 0.9 is generated.
set.seed(123456)
repeat{
<- runif(1) #Generate a random number between 0 and 1
number print(number)
if (number > 0.9){
break
} }
[1] 0.7977843
[1] 0.7535651
[1] 0.3912557
[1] 0.3415567
[1] 0.3612941
[1] 0.1983447
[1] 0.534858
[1] 0.09652624
[1] 0.9878469
Example 15: Calculating mean for each numeric variable in airquality dataset
# Load the airquality dataset
<- airquality
data
<- 1
col_num <- numeric()
means
repeat {
# Check if all columns have been processed
if (col_num > ncol(data)) {
break
}
# Check if the column is numeric
if (is.numeric(data[, col_num])) {
# Calculate the mean
<- mean(data[, col_num], na.rm = TRUE)
mean_val
# Store the mean value in the means vector
<- c(means, mean_val)
means else {
} <- c(means, NA) # If not numeric, store NA
means
}
<- col_num + 1
col_num
}
# Print the means for each variable
names(means) <- names(data)
print(means)
Ozone Solar.R Wind Temp Month Day
42.129310 185.931507 9.957516 77.882353 6.993464 15.803922
Example 16: Calculating mean for each numeric variable in mtcars dataset
# Load the mtcars dataset
<- mtcars[ , c("mpg","disp", "hp", "drat", "wt", "qsec") ]
data
<- 1
col_num <- numeric()
means
repeat {
# Check if all columns have been processed
if (col_num > ncol(data)) {
break
}
# Since all columns in mtcars are numeric, we can directly calculate the mean
<- mean(data[, col_num], na.rm = TRUE)
mean_val
# Store the mean value in the means vector
<- c(means, mean_val)
means
<- col_num + 1
col_num
}
# Print the means for each variable
names(means) <- names(data)
print(means)
mpg disp hp drat wt qsec
20.090625 230.721875 146.687500 3.596563 3.217250 17.848750
Example 17: Calculating mean and standard deviation for each numeric variable in mtcars dataset
# Load the mtcars dataset
<- mtcars[ , c("mpg","disp", "hp", "drat", "wt", "qsec") ]
data <- 1
col_num <- data.frame(Variable = character(), Mean = numeric(), StdDev = numeric())
result
repeat {
# Check if all columns have been processed
if (col_num > ncol(data)) {
break
}
# Calculate the mean and standard deviation
<- mean(data[, col_num], na.rm = TRUE)
mean_val <- sd(data[, col_num], na.rm = TRUE)
sd_val
# Store the results
<- rbind(result, data.frame(Variable = names(data)[col_num], Mean = mean_val, StdDev = sd_val))
result
<- col_num + 1
col_num
}
# Print the results
print(result)
Variable Mean StdDev
1 mpg 20.090625 6.0269481
2 disp 230.721875 123.9386938
3 hp 146.687500 68.5628685
4 drat 3.596563 0.5346787
5 wt 3.217250 0.9784574
6 qsec 17.848750 1.7869432
Example 18: Detecting missing values in airquality dataset
# Load the airquality dataset
<- airquality
data
<- 1
col_num <- numeric()
missing_counts
repeat {
# Check if all columns have been processed
if (col_num > ncol(data)) {
break
}
# Count the missing values in the current column
<- sum(is.na(data[, col_num]))
missing_count
# Store the count in the missing_counts vector
<- c(missing_counts, missing_count)
missing_counts
<- col_num + 1
col_num
}
# Print the counts of missing values for each variable
names(missing_counts) <- names(data)
print(missing_counts)
Ozone Solar.R Wind Temp Month Day
37 7 0 0 0 0
5. next
loop
In R, the next
statement is used within loop structures to skip the current iteration and proceed to the next
iteration of the loop. It is useful for bypassing specific conditions within a loop without exiting the entire loop.
The basic syntax of the next
statement in R is as follows:
for (value in sequence) {
if (condition) {
next
# code to be executed
} }
condition: A logical expression. If TRUE
, the next
statement is executed, and the current iteration is skipped.
Example 19: Skipping even numbers
In this example, a for
loop and next
statement are used to print only the odd numbers from a sequence.
for (i in 1:10) {
if (i %% 2 == 0) {
next
} print(i)
}
[1] 1
[1] 3
[1] 5
[1] 7
[1] 9
Example 20: Calculate mean for all numerical variables in airquality dataset
# Load the airquality dataset
<- airquality
data
<- 1
col_num <- numeric()
means
repeat {
# Check if all columns have been processed
if (col_num > ncol(data)) {
break
}
# Check if the column is numeric; if not, skip to the next iteration
if (!is.numeric(data[, col_num])) {
<- col_num + 1
col_num next
}
# Calculate the mean
<- mean(data[, col_num], na.rm = TRUE)
mean_val
# Store the mean value in the means vector
<- c(means, mean_val)
means
<- col_num + 1
col_num
}
# Print the means for each variable
names(means) <- names(data)
print(means)
Ozone Solar.R Wind Temp Month Day
42.129310 185.931507 9.957516 77.882353 6.993464 15.803922
Example 21: Calculate mean for all numerical variables in mtcars dataset
# Load the mtcars dataset
<- mtcars[ , c("mpg","disp", "hp", "drat", "wt", "qsec") ]
data
<- 1
col_num <- numeric()
means
repeat {
# Check if all columns have been processed
if (col_num > ncol(data)) {
break
}
# Check if the column is numeric; if not, skip to the next iteration
if (!is.numeric(data[, col_num])) {
<- col_num + 1
col_num next
}
# Calculate the mean
<- mean(data[, col_num], na.rm = TRUE)
mean_val
# Store the mean value in the means vector
<- c(means, mean_val)
means
<- col_num + 1
col_num
}
# Print the means for each variable
names(means) <- names(data)
print(means)
mpg disp hp drat wt qsec
20.090625 230.721875 146.687500 3.596563 3.217250 17.848750
Example 22: Calculate mean and standard deviation for all numerical variables in mtcars dataset
# Load the mtcars dataset
<- mtcars[ , c("mpg","disp", "hp", "drat", "wt", "qsec") ]
data
<- 1
col_num <- numeric()
means <- numeric()
std_devs
repeat {
# Check if all columns have been processed
if (col_num > ncol(data)) {
break
}
# Check if the column is numeric; if not, skip to the next iteration
if (!is.numeric(data[, col_num])) {
<- col_num + 1
col_num next
}
# Calculate the mean and standard deviation
<- mean(data[, col_num], na.rm = TRUE)
mean_val <- sd(data[, col_num], na.rm = TRUE)
sd_val
# Store the computed values
<- c(means, mean_val)
means <- c(std_devs, sd_val)
std_devs
<- col_num + 1
col_num
}
# Print the means and standard deviations for each variable
<- data.frame(Variable = names(data), Mean = means, StdDev = std_devs)
result print(result)
Variable Mean StdDev
1 mpg 20.090625 6.0269481
2 disp 230.721875 123.9386938
3 hp 146.687500 68.5628685
4 drat 3.596563 0.5346787
5 wt 3.217250 0.9784574
6 qsec 17.848750 1.7869432
Example 23: Detecting missing values in airquality dataset
# Load the airquality dataset
<- airquality
data
<- 1
col_num <- numeric()
missing_counts
repeat {
# Check if all columns have been processed
if (col_num > ncol(data)) {
break
}
# Check if the column is numeric; if not, skip to the next iteration
if (!is.numeric(data[, col_num])) {
<- col_num + 1
col_num next
}
# Count the missing values in the current column
<- sum(is.na(data[, col_num]))
missing_count
# Store the count in the missing_counts vector
<- c(missing_counts, missing_count)
missing_counts
<- col_num + 1
col_num
}
# Print the counts of missing values for each variable
names(missing_counts) <- names(data)
print(missing_counts)
Ozone Solar.R Wind Temp Month Day
37 7 0 0 0 0
Exercise
Exercise 1 - Basic loop operation with iris
dataset
a. Using a for
loop, calculate the median of each numeric variable in the iris
dataset.
b. with a while
loop, find the range
(minimum and maximum values) of each numeric variable in the iris
dataset.
c. Employ a repeat
loop to count the number of unique species in the iris
dataset.
Exercise 2: Handling Missing values with datasets::state.x77
Note: For this exercise, we first introduce some missing values randomly into the state.x77
dataset.
a. Convert the datasets::state.x77
matrix into a dataframe and introduce missing values.
b. Using a for
loop, detect columns that have missing values and report the count.
c. Implement a while
loop to replace missing values in the dataframe with the mean of their respective columns.
d. Utilize a repeat
loop to compute the standard deviation for each numeric column in the dataframe, and use the next
statement to skip over columns that have more than 10 missing values.
Exercise 3: Advanced loop exercises with datasets::trees
a. Utilize a for
loop to compute the variance of each numeric column in the trees
dataset.
b. Implement a while
loop to normalize each numeric column in the trees
dataset (subtract mean and divide by standard deviation).
c. Using a repeat
loop, count the number of rows in the trees
dataset where the Volume
exceeds 1.5. Terminate the loop once you’ve scanned all rows.
Exercise 4: Loop control with datasets::USArrests
a. Employ a for
loop to find the state with the highest Murder
rate. Print the state name and its rate.
b. Utilize a while
loop to compute the median Assault
rate across states.
c. Implement a repeat
loop to find the average UrbanPop
value. If the average exceeds 65, break out of the loop and print a message indicating high urban population.
d. In a loop of your choice, iterate over each column and compute the sum. Use the next
statement to skip over the Rape
column.