Control Flow - Looping

Author

Dr. Mohammad Nasir Abdullah

Looping in R

Looping is a fundamental concept in programming where a set of instructions is executed repeatedly based on a condition or for a fixed number of times. R provides several mechanisms for looping.

1. `for` Loop

The for loop is used to iterate over a sequence (like a vector or list) and execute a block of code for each element in the sequence.

for (variable in sequence) {
  # Code to be executed for each element
}

•variable: A variable that takes the value from the sequence in each iteration.

•sequence: A vector or list over which the loop iterates.

Example 1: Printing Numbers

In this example, a for loop is used to print numbers from 1 to 5.

for (i in 1:5) {
  print(i) }

[1] 1
[1] 2
[1] 3
[1] 4
[1] 5

Example 2: Calculating the Factorial of a Number

This example demonstrates the use of a for loop to calculate the factorial of a number.

number <- 5 
factorial <- 1 
for (i in 1:number) { 
  factorial <- factorial * i 
  } 
print(paste("The factorial of", number, "is", factorial))

[1] "The factorial of 5 is 120"

Example 3: Summing the Elements of a Vector

Here, a for loop is used to calculate the sum of the elements of a vector.

numbers <- c(2, 4, 6, 8, 10) 
sum_numbers <- 0 
for (num in numbers) { 
  sum_numbers <- sum_numbers + num
  } 
print(paste("The sum of the numbers is", sum_numbers))

[1] "The sum of the numbers is 30"

Example 4: Changing data type to factor level

# List of variables to convert to factor
variables_to_convert <- c("cyl", "am", "vs", "gear", "carb")

# Using a for loop to convert each variable
for (var in variables_to_convert) {
  mtcars[[var]] <- as.factor(mtcars[[var]])
}

# Checking the structure of the modified dataset
str(mtcars)

'data.frame':   32 obs. of  11 variables:
 $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
 $ cyl : Factor w/ 3 levels "4","6","8": 2 2 1 2 3 2 3 1 1 2 ...
 $ disp: num  160 160 108 258 360 ...
 $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
 $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
 $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
 $ qsec: num  16.5 17 18.6 19.4 17 ...
 $ vs  : Factor w/ 2 levels "0","1": 1 1 2 2 1 2 1 2 2 2 ...
 $ am  : Factor w/ 2 levels "0","1": 2 2 2 1 1 1 1 1 1 1 ...
 $ gear: Factor w/ 3 levels "3","4","5": 2 2 2 1 1 1 1 2 2 2 ...
 $ carb: Factor w/ 6 levels "1","2","3","4",..: 4 4 1 1 2 1 4 2 2 4 ...

Example 5: Calculating mean for all numerical variables

# Create an empty vector to store the means
means <- numeric()

# Loop through each column in mtcars
for (col_name in names(mtcars)) {
  # Check if the column is numeric
  if (is.numeric(mtcars[[col_name]])) {
    # Compute the mean and store it in the means vector
    means[col_name] <- mean(mtcars[[col_name]], na.rm = TRUE)
  }
}

# Print the means
print(means)

       mpg       disp         hp       drat         wt       qsec 
 20.090625 230.721875 146.687500   3.596563   3.217250  17.848750

Example 6: Calculating mean and standard deviation for all numerical variables

# Create empty lists to store the means and standard deviations
means <- numeric()
sds <- numeric()

# Loop through each column in mtcars
for (col_name in names(mtcars)) {
  # Check if the column is numeric
  if (is.numeric(mtcars[[col_name]])) {
    # Compute the mean and store it in the means list
    means[col_name] <- mean(mtcars[[col_name]], na.rm = TRUE)
    # Compute the standard deviation and store it in the sds list
    sds[col_name] <- sd(mtcars[[col_name]], na.rm = TRUE)
  }
}

# Print the results
cat("Means:\n")

Means:

print(means)

       mpg       disp         hp       drat         wt       qsec 
 20.090625 230.721875 146.687500   3.596563   3.217250  17.848750

cat("\nStandard Deviations:\n")


Standard Deviations:

print(sds)

        mpg        disp          hp        drat          wt        qsec 
  6.0269481 123.9386938  68.5628685   0.5346787   0.9784574   1.7869432

2. Nested `for` loop

Why Use Nested Loops?

Imagine you’re handling multiple datasets, and within each dataset, you have several variables. If you need to compute specific statistics for each variable across all datasets, a nested loop becomes an efficient solution. The outer loop can iterate over datasets, while the inner loop handles each variable within the current dataset.

Example 7: Finding mean, sd and median for mtcars and iris dataset

# List of datasets to process
datasets_list <- list(mtcars=mtcars, iris=iris)

# Statistical measures to compute
measures <- c("mean", "sd", "median")

# Loop through each dataset
for (dataset_name in names(datasets_list)) {
  cat(paste("\nStatistics for dataset:", dataset_name, "\n"))
  cat("--------------------------------------------------\n")
  
  # Loop through each column in the dataset
  for (col_name in names(datasets_list[[dataset_name]])) {
    
    # Check if the column is numeric
    if (is.numeric(datasets_list[[dataset_name]][[col_name]])) {
      
      cat(paste("\nColumn:", col_name, "\n"))
      cat("--------------------------------------------------\n")
      
      # Loop through each statistical measure
      for (measure in measures) {
        
        if (measure == "mean") {
          value <- mean(datasets_list[[dataset_name]][[col_name]], na.rm = TRUE)
        } else if (measure == "sd") {
          value <- sd(datasets_list[[dataset_name]][[col_name]], na.rm = TRUE)
        } else if (measure == "median") {
          value <- median(datasets_list[[dataset_name]][[col_name]], na.rm = TRUE)
        }
        
        cat(paste(measure, ":", round(value, 2), "\n"))
      }
    }
  }
}


Statistics for dataset: mtcars 
--------------------------------------------------

Column: mpg 
--------------------------------------------------
mean : 20.09 
sd : 6.03 
median : 19.2 

Column: disp 
--------------------------------------------------
mean : 230.72 
sd : 123.94 
median : 196.3 

Column: hp 
--------------------------------------------------
mean : 146.69 
sd : 68.56 
median : 123 

Column: drat 
--------------------------------------------------
mean : 3.6 
sd : 0.53 
median : 3.7 

Column: wt 
--------------------------------------------------
mean : 3.22 
sd : 0.98 
median : 3.33 

Column: qsec 
--------------------------------------------------
mean : 17.85 
sd : 1.79 
median : 17.71 

Statistics for dataset: iris 
--------------------------------------------------

Column: Sepal.Length 
--------------------------------------------------
mean : 5.84 
sd : 0.83 
median : 5.8 

Column: Sepal.Width 
--------------------------------------------------
mean : 3.06 
sd : 0.44 
median : 3 

Column: Petal.Length 
--------------------------------------------------
mean : 3.76 
sd : 1.77 
median : 4.35 

Column: Petal.Width 
--------------------------------------------------
mean : 1.2 
sd : 0.76 
median : 1.3

3. `while` loop

The while loop in R is used to execute a block of code repeatedly as long as specified condition if TRUE. It is particularly useful when the number of iterations is not known beforehand.

The basic syntax of a while loop in R is as follows:

while (condition) { 
  # code to be executed 
  }

condition: A logical expression that is evaluated before the execution of the loop’s body. The loop runs as long as the condition is TRUE.

Example 8: Printing Numbers

In this example, a while loop is used to print numbers from 1 to 5.

i <- 1 
while (i <= 5) { 
  print(i)
  i <- i + 1 
  }

[1] 1
[1] 2
[1] 3
[1] 4
[1] 5

Example 9: Calculating mean of the variables

data <- mtcars

i <- 1
column_names <- names(data)

while (i <= length(column_names)) {
  # Check if the column is numeric
  if (is.numeric(data[[i]])) {
    mean_val <- mean(data[[i]], na.rm = TRUE)
    cat("Mean of", column_names[i], "is:", round(mean_val, 2), "\n")
  }
  i <- i + 1
}

Mean of mpg is: 20.09 
Mean of disp is: 230.72 
Mean of hp is: 146.69 
Mean of drat is: 3.6 
Mean of wt is: 3.22 
Mean of qsec is: 17.85

Example 10: Calculating mean and standard deviation in mtcars dataset

# Load the mtcars dataset
data <- mtcars

col_num <- 1

# Vectors to store the computed means and standard deviations
means <- numeric()
std_devs <- numeric()

while (col_num <= ncol(data)) {
  # Check if the column is numeric (All columns in mtcars are numeric, but it's still good to check for generalizability)
  if (is.numeric(data[, col_num])) {
    # Calculate the mean
    mean_val <- mean(data[, col_num], na.rm = TRUE)
    # Calculate the standard deviation
    sd_val <- sd(data[, col_num], na.rm = TRUE)
    
    # Store the computed values
    means <- c(means, mean_val)
    std_devs <- c(std_devs, sd_val)
  } else {
    means <- c(means, NA) # If not numeric, store NA
    std_devs <- c(std_devs, NA)
  }
  
  col_num <- col_num + 1
}

# Print the means and standard deviations for each variable
cat("Means for each variable:\n")

Means for each variable:

names(means) <- names(data)
print(means)

       mpg        cyl       disp         hp       drat         wt       qsec 
 20.090625         NA 230.721875 146.687500   3.596563   3.217250  17.848750 
        vs         am       gear       carb 
        NA         NA         NA         NA

cat("\nStandard Deviations for each variable:\n")


Standard Deviations for each variable:

names(std_devs) <- names(data)
print(std_devs)

        mpg         cyl        disp          hp        drat          wt 
  6.0269481          NA 123.9386938  68.5628685   0.5346787   0.9784574 
       qsec          vs          am        gear        carb 
  1.7869432          NA          NA          NA          NA

Example 11: Detecting missing values

df <- data.frame(A = c(1, 2, NA, 4, 5),
                 B = c(NA, 2, 3, 4, NA))

row_num <- 1
col_num <- 1

while (row_num <= nrow(df)) {
  col_num <- 1
  while (col_num <= ncol(df)) {
    if (is.na(df[row_num, col_num])) {
      cat("Missing value detected at row", row_num, "and column", col_num, "\n")
    }
    col_num <- col_num + 1
  }
  row_num <- row_num + 1
}

Missing value detected at row 1 and column 2 
Missing value detected at row 3 and column 1 
Missing value detected at row 5 and column 2

Example 12: Detecting missing values from `airquality` dataset

# Load the airquality dataset
library(datasets)
data <- airquality

row_num <- 1
missing_data_positions <- list()

while (row_num <= nrow(data)) {
  col_num <- 1
  while (col_num <= ncol(data)) {
    if (is.na(data[row_num, col_num])) {
      missing_data_positions <- append(missing_data_positions, list(c(row_num, col_num)))
    }
    col_num <- col_num + 1
  }
  row_num <- row_num + 1
}

# Print the positions of missing values
if (length(missing_data_positions) > 0) {
  cat("Missing values detected at the following positions (row, column):\n")
  for (position in missing_data_positions) {
    cat("Row", position[1], "Column", position[2], "\n")
  }
} else {
  cat("No missing values detected.\n")
}

Missing values detected at the following positions (row, column):
Row 5 Column 1 
Row 5 Column 2 
Row 6 Column 2 
Row 10 Column 1 
Row 11 Column 2 
Row 25 Column 1 
Row 26 Column 1 
Row 27 Column 1 
Row 27 Column 2 
Row 32 Column 1 
Row 33 Column 1 
Row 34 Column 1 
Row 35 Column 1 
Row 36 Column 1 
Row 37 Column 1 
Row 39 Column 1 
Row 42 Column 1 
Row 43 Column 1 
Row 45 Column 1 
Row 46 Column 1 
Row 52 Column 1 
Row 53 Column 1 
Row 54 Column 1 
Row 55 Column 1 
Row 56 Column 1 
Row 57 Column 1 
Row 58 Column 1 
Row 59 Column 1 
Row 60 Column 1 
Row 61 Column 1 
Row 65 Column 1 
Row 72 Column 1 
Row 75 Column 1 
Row 83 Column 1 
Row 84 Column 1 
Row 96 Column 2 
Row 97 Column 2 
Row 98 Column 2 
Row 102 Column 1 
Row 103 Column 1 
Row 107 Column 1 
Row 115 Column 1 
Row 119 Column 1 
Row 150 Column 1

Example 13: Detecting missing values from `airquality` dataset by total

# Load the airquality dataset
library(datasets)
data <- airquality

col_num <- 1
missing_counts <- numeric()

while (col_num <= ncol(data)) {
  # Count the missing values in the current column
  missing_count <- sum(is.na(data[, col_num]))
  
  # Store the count in the missing_counts vector
  missing_counts <- c(missing_counts, missing_count)
  
  col_num <- col_num + 1
}

# Print the counts of missing values for each variable
names(missing_counts) <- names(data)
print(missing_counts)

  Ozone Solar.R    Wind    Temp   Month     Day 
     37       7       0       0       0       0

4. `repeat` loop

The repeat loop in R is used to execute a block of code indefinitely until a break statement is encountered. It is useful in situations where the number of iterations is not known beforehand, and the loop should continue until a specific conditions is met.

The basic syntax of a repeat loop in R is as follows:

repeat { 
  # code to be executed
  if (condition){ 
    break
  }
  }

condition: A logical expression. If TRUE, the break statement is executed, and the loop is terminated.

Example 14: Generating Random Numbers

In this example, a repeat loop is used to generate random numbers until a number greater than 0.9 is generated.

set.seed(123456)
repeat{ 
  number <- runif(1) #Generate a random number between 0 and 1
  print(number)
  if (number > 0.9){ 
    break
  }
  }

[1] 0.7977843
[1] 0.7535651
[1] 0.3912557
[1] 0.3415567
[1] 0.3612941
[1] 0.1983447
[1] 0.534858
[1] 0.09652624
[1] 0.9878469

Example 15: Calculating mean for each numeric variable in airquality dataset

# Load the airquality dataset
data <- airquality

col_num <- 1
means <- numeric()

repeat {
  # Check if all columns have been processed
  if (col_num > ncol(data)) {
    break
  }
  
  # Check if the column is numeric
  if (is.numeric(data[, col_num])) {
    # Calculate the mean
    mean_val <- mean(data[, col_num], na.rm = TRUE)
    
    # Store the mean value in the means vector
    means <- c(means, mean_val)
  } else {
    means <- c(means, NA) # If not numeric, store NA
  }
  
  col_num <- col_num + 1
}

# Print the means for each variable
names(means) <- names(data)
print(means)

     Ozone    Solar.R       Wind       Temp      Month        Day 
 42.129310 185.931507   9.957516  77.882353   6.993464  15.803922

Example 16: Calculating mean for each numeric variable in mtcars dataset

# Load the mtcars dataset
data <- mtcars[ , c("mpg","disp", "hp", "drat", "wt", "qsec") ]

col_num <- 1
means <- numeric()

repeat {
  # Check if all columns have been processed
  if (col_num > ncol(data)) {
    break
  }
  
  # Since all columns in mtcars are numeric, we can directly calculate the mean
  mean_val <- mean(data[, col_num], na.rm = TRUE)
  
  # Store the mean value in the means vector
  means <- c(means, mean_val)
  
  col_num <- col_num + 1
}

# Print the means for each variable
names(means) <- names(data)
print(means)

       mpg       disp         hp       drat         wt       qsec 
 20.090625 230.721875 146.687500   3.596563   3.217250  17.848750

Example 17: Calculating mean and standard deviation for each numeric variable in mtcars dataset

# Load the mtcars dataset
data <- mtcars[ , c("mpg","disp", "hp", "drat", "wt", "qsec") ]
col_num <- 1
result <- data.frame(Variable = character(), Mean = numeric(), StdDev = numeric())

repeat {
  # Check if all columns have been processed
  if (col_num > ncol(data)) {
    break
  }
  
  # Calculate the mean and standard deviation
  mean_val <- mean(data[, col_num], na.rm = TRUE)
  sd_val <- sd(data[, col_num], na.rm = TRUE)
  
  # Store the results
  result <- rbind(result, data.frame(Variable = names(data)[col_num], Mean = mean_val, StdDev = sd_val))
  
  col_num <- col_num + 1
}

# Print the results
print(result)

  Variable       Mean      StdDev
1      mpg  20.090625   6.0269481
2     disp 230.721875 123.9386938
3       hp 146.687500  68.5628685
4     drat   3.596563   0.5346787
5       wt   3.217250   0.9784574
6     qsec  17.848750   1.7869432

Example 18: Detecting missing values in airquality dataset

# Load the airquality dataset
data <- airquality

col_num <- 1
missing_counts <- numeric()

repeat {
  # Check if all columns have been processed
  if (col_num > ncol(data)) {
    break
  }
  
  # Count the missing values in the current column
  missing_count <- sum(is.na(data[, col_num]))
  
  # Store the count in the missing_counts vector
  missing_counts <- c(missing_counts, missing_count)
  
  col_num <- col_num + 1
}

# Print the counts of missing values for each variable
names(missing_counts) <- names(data)
print(missing_counts)

  Ozone Solar.R    Wind    Temp   Month     Day 
     37       7       0       0       0       0

5. `next` loop

In R, the next statement is used within loop structures to skip the current iteration and proceed to the next iteration of the loop. It is useful for bypassing specific conditions within a loop without exiting the entire loop.

The basic syntax of the next statement in R is as follows:

for (value in sequence) { 
  if (condition) {
    next 
  } # code to be executed 
  }

condition: A logical expression. If TRUE, the next statement is executed, and the current iteration is skipped.

Example 19: Skipping even numbers

In this example, a for loop and next statement are used to print only the odd numbers from a sequence.

for (i in 1:10) {
  if (i %% 2 == 0) {
    next 
  } 
  print(i) 
  }

[1] 1
[1] 3
[1] 5
[1] 7
[1] 9

Example 20: Calculate mean for all numerical variables in airquality dataset

# Load the airquality dataset
data <- airquality

col_num <- 1
means <- numeric()

repeat {
  # Check if all columns have been processed
  if (col_num > ncol(data)) {
    break
  }
  
  # Check if the column is numeric; if not, skip to the next iteration
  if (!is.numeric(data[, col_num])) {
    col_num <- col_num + 1
    next
  }
  
  # Calculate the mean
  mean_val <- mean(data[, col_num], na.rm = TRUE)
  
  # Store the mean value in the means vector
  means <- c(means, mean_val)
  
  col_num <- col_num + 1
}

# Print the means for each variable
names(means) <- names(data)
print(means)

     Ozone    Solar.R       Wind       Temp      Month        Day 
 42.129310 185.931507   9.957516  77.882353   6.993464  15.803922

Example 21: Calculate mean for all numerical variables in mtcars dataset

# Load the mtcars dataset
data <- mtcars[ , c("mpg","disp", "hp", "drat", "wt", "qsec") ]

col_num <- 1
means <- numeric()

repeat {
  # Check if all columns have been processed
  if (col_num > ncol(data)) {
    break
  }
  
  # Check if the column is numeric; if not, skip to the next iteration
  if (!is.numeric(data[, col_num])) {
    col_num <- col_num + 1
    next
  }
  
  # Calculate the mean
  mean_val <- mean(data[, col_num], na.rm = TRUE)
  
  # Store the mean value in the means vector
  means <- c(means, mean_val)
  
  col_num <- col_num + 1
}

# Print the means for each variable
names(means) <- names(data)
print(means)

       mpg       disp         hp       drat         wt       qsec 
 20.090625 230.721875 146.687500   3.596563   3.217250  17.848750

Example 22: Calculate mean and standard deviation for all numerical variables in mtcars dataset

# Load the mtcars dataset
data <- mtcars[ , c("mpg","disp", "hp", "drat", "wt", "qsec") ]

col_num <- 1
means <- numeric()
std_devs <- numeric()

repeat {
  # Check if all columns have been processed
  if (col_num > ncol(data)) {
    break
  }
  
  # Check if the column is numeric; if not, skip to the next iteration
  if (!is.numeric(data[, col_num])) {
    col_num <- col_num + 1
    next
  }
  
  # Calculate the mean and standard deviation
  mean_val <- mean(data[, col_num], na.rm = TRUE)
  sd_val <- sd(data[, col_num], na.rm = TRUE)
  
  # Store the computed values
  means <- c(means, mean_val)
  std_devs <- c(std_devs, sd_val)
  
  col_num <- col_num + 1
}

# Print the means and standard deviations for each variable
result <- data.frame(Variable = names(data), Mean = means, StdDev = std_devs)
print(result)

  Variable       Mean      StdDev
1      mpg  20.090625   6.0269481
2     disp 230.721875 123.9386938
3       hp 146.687500  68.5628685
4     drat   3.596563   0.5346787
5       wt   3.217250   0.9784574
6     qsec  17.848750   1.7869432

Example 23: Detecting missing values in airquality dataset

# Load the airquality dataset
data <- airquality

col_num <- 1
missing_counts <- numeric()

repeat {
  # Check if all columns have been processed
  if (col_num > ncol(data)) {
    break
  }
  
  # Check if the column is numeric; if not, skip to the next iteration
  if (!is.numeric(data[, col_num])) {
    col_num <- col_num + 1
    next
  }
  
  # Count the missing values in the current column
  missing_count <- sum(is.na(data[, col_num]))
  
  # Store the count in the missing_counts vector
  missing_counts <- c(missing_counts, missing_count)
  
  col_num <- col_num + 1
}

# Print the counts of missing values for each variable
names(missing_counts) <- names(data)
print(missing_counts)

  Ozone Solar.R    Wind    Temp   Month     Day 
     37       7       0       0       0       0

Exercise

Exercise 1 - Basic loop operation with `iris` dataset

a. Using a for loop, calculate the median of each numeric variable in the iris dataset.

b. with a while loop, find the range (minimum and maximum values) of each numeric variable in the iris dataset.

c. Employ a repeat loop to count the number of unique species in the iris dataset.

Exercise 2: Handling Missing values with `datasets::state.x77`

Note: For this exercise, we first introduce some missing values randomly into the state.x77 dataset.

a. Convert the datasets::state.x77 matrix into a dataframe and introduce missing values.

b. Using a for loop, detect columns that have missing values and report the count.

c. Implement a while loop to replace missing values in the dataframe with the mean of their respective columns.

d. Utilize a repeat loop to compute the standard deviation for each numeric column in the dataframe, and use the next statement to skip over columns that have more than 10 missing values.

Exercise 3: Advanced loop exercises with `datasets::trees`

a. Utilize a for loop to compute the variance of each numeric column in the trees dataset.

b. Implement a while loop to normalize each numeric column in the trees dataset (subtract mean and divide by standard deviation).

c. Using a repeat loop, count the number of rows in the trees dataset where the Volume exceeds 1.5. Terminate the loop once you’ve scanned all rows.

Exercise 4: Loop control with `datasets::USArrests`

a. Employ a for loop to find the state with the highest Murder rate. Print the state name and its rate.

b. Utilize a while loop to compute the median Assault rate across states.

c. Implement a repeat loop to find the average UrbanPop value. If the average exceeds 65, break out of the loop and print a message indicating high urban population.

d. In a loop of your choice, iterate over each column and compute the sum. Use the next statement to skip over the Rape column.