Noah Lindley
Noah Lindley

Software Developer

View on GitLab

Disease Propagation Modeling

Introduction

With the current state of the world, disease propagation modeling is becoming more significant by the day. In the following experiment, C++ code was implemented to run the following experiments. This was done by simulating a population of people with the following variables: population size, disease length, chance of infection, proportion of population vaccinated, and the number of interactions between humans.

Some key facts to note about the simulation:

  • There are 4 states for people: inoculated, susceptible, sick, and recovered.
  • Once a person has been sick, we assume they can’t get sick again.

We will be visualizing and interpreting the results of the data yielded from the simulation to see how they affect our populations.

Days vs. The Number of People Infected

In this experiment, we’re going to visualize a population holding the following variables constant:

  • Population size: 10,000
  • Number of Random Interactions: 10
  • Infection Length: 7 days
  • Infection Chance: 10%

While varying the proportion of the population that is inoculated to visualize the amount of people that are sick each day during the simulation.

Visualization

days_vs_infected <- read.delim("days_vs_infected.tsv", sep = "\t") %>%
    mutate(vaccinated = paste(vaccinated, "%", sep = ""))
    
days_vs_infected %>% head %>% kable() %>% kable_styling(bootstrap_options = ("striped"))
days_vs_infected %>% group_by(vaccinated) %>% ggplot(aes(x = day, 
    y = infected, color = vaccinated)) + geom_line() + ggtitle("Number of Days vs. Number of People Infected") + 
    ylab("Number of Infected People") + xlab("Days since the start of the Disease") + 
    theme_minimal() + facet_wrap(~vaccinated)

In figure 1.1 the plots are grouped by the percentage of the population that is inoculated. You can see that as the number of people that are inoculated increases the normal distribution begins to flatten (More variance in number of infected people).

days_vs_infected %>% group_by(vaccinated) %>% summarize(max(infected)) %>% 
    t %>% kable() %>% kable_styling(bootstrap_options = "striped")  # create table to view

The table above shows the maximum number of people that are infected at the same time based on the vaccination percentage of the population.

Disease Length vs. Population Vaccination Percentage

The focus of the following simulation is going to be the average length of the infection based on the amount of the population that is inoculated. The simulation was ran 1000 times in each level of population inoculated percentage and the average number of days each infection lasted was recorded.

Visualization

disease_length_data <- read.delim("disease_length.tsv", sep = "\t")  # Read in dataset
disease_length_data %>% round(digits = 1) %>% head %>% kable() %>% 
    kable_styling(bootstrap_options = ("striped"))  # viewing data
disease_length_data %>% ggplot(aes(x = proportion_vaccinated, 
    y = disease_length)) + geom_point() + geom_line() + ggtitle("Proportion of Population Inoculated vs. Average Length of the Disease") + 
    ylab("Length of Disease (days)") + xlab("Proportion of Population Inoculated(%)") + 
    theme_minimal() + scale_x_continuous(limits = c(0, 100), 
    breaks = seq(0, 100, by = 5)) + geom_vline(xintercept = 80, 
    lty = 3, color = "red")

Based on the model above, we can see that the length of the disease increases as the proportion of the population inoculated increases. Then it rapidly decreases after that 80% mark. This is likely because our model assumes that after a person has been sick and recovered they will be immune to the disease.

Note that when the the inoculation percentage is 0 the disease rapidly makes its way through the population causing the simulation to terminate quickly. So, the average length of the disease in days increases because it’s not spreading to everyone like wildfire because inoculated patients are slowing down the spread of the disease.

The model above does show that this simulation is not as applicable to the real world as other simulations that take more variables into account because it assumes patients are immune after they receive the disease.

On the contrary, a significant result of this model is that you can see how fast this disease spreads to the entire portion of the population that isn’t inoculated. So, as more people become inoculated the rate at which the disease spreads among the entire non-inoculated population begins to slow down. This result suggests that in the range 0% to 80% of the population being inoculated reduces the rate at which the disease spreads to all of the non-inoculated people the most significant drop in the average length of the disease occurs after 80% of the population is inoculated. The length of time the disease lasts in the population decreases dramatically. This is due to many people being inoculated this prevents the disease from multiplying rapidly when people interact with one another.

Herd immunity

The Herd Immunity affect is going to be investigated by varying the proportion of the population in relation to our constant variables (pop. size, random interactions, infection length, infection chance). The importance of this section is to determine the minimum percentage of the population that needs to be inoculated in order for 95% of the population to never get sick.

Visualization

herd.immunity.1 <- read.delim("herd_immunity1.tsv", sep = "\t") %>% 
    round(digits = 2)  # Read in dataset
herd.immunity.1$percent.never.infected <- herd.immunity.1$percent.never.infected * 
    100
herd.immunity.1 %>% head %>% kable() %>% kable_styling(bootstrap_options = "striped")  # sample output of the data
herd.immunity.2 <- read.delim("herd_immunity2.tsv", sep = "\t") %>% 
    dplyr::select(-X) %>% round(digits = 4) %>% mutate_all(function(x) x * 
    100)  # Read in dataset

rownames(herd.immunity.2) <- paste("Y", substr(colnames(herd.immunity.2), 
    start = 2, stop = nchar(colnames(herd.immunity.2))), sep = "")

herd.immunity.2 %>% dplyr::select(1:6) %>% head %>% kable() %>% 
    kable_styling(bootstrap_options = "striped")  # sample output of the data

In the following graph each data point represents the percentage of people that didn’t get infected after a simulation holding the same variables constant stated in section 1 while varying the proportion of the population that is inoculated.

herd.immunity.1 %>% ggplot(aes(x = prob_vaccinated, y = percent.never.infected)) + 
    geom_point() + geom_line() + geom_vline(xintercept = 80, 
    lty = 3, color = "red")

For the current population to have a ‘Herd Immunity’ effect we need to have 80% of population vaccinated. This is interesting because if you noticed it’s the same percentage that appeared in Figure 2. So, when we reach the herd immunity affect by having 95% or more of the people never getting infected is also the point where the disease length begins to decrease.

Figure 3.2

Visualizes the proportion of the population that needs to be inoculated based on the chances of becoming infected with the disease. We want to keep the amount of people that don’t get infected below 5 at the end of the simulation.

In figure 3.2, each tile represents a simulation with the same variables as previous models. The difference is that we are varying:

Proportion of the population inoculated (X-axis) Probability of disease contagiousness (Y-axis) When a tile is red more than 5% of the population in the simulation has been infected with the disease. Blue tiles represent a population infected with less than or equal to 5% for a simulation.

For instance, in the case from figure 3.1 we have disease contagiousness of 10% we can see that 85% is the proportion of the population that need to be inoculated for 95% or more of the population to never get sick.

Social Distancing

This section will be investigating how social distancing affects the disease propagation model. Our population is going symbolize the UT population we will do this by using the following variables:

Population size: 40,000 Infection Length: 7 days Infection Chance: 10% Amount of Population Inoculated: 0% While holding these variables constant and the amount of interactions will be varied to visualize how the maximum number of people that are infected in a single day changes.

Visualization

social_distancing <- read.delim("social_distancing.tsv", sep = "\t") %>% 
    mutate(contacts = as.factor(contacts))  # read in dataset

Figure 4.1

soc.dis.1 <- social_distancing %>% dplyr::filter(contacts %in% 
    c(10, 20, 40))  # only rows with 40, 20, or 10 contacts
soc.dis.1 %>% ggplot(aes(x = day, y = num_sick, color = contacts)) + 
    geom_line()
soc.dis.1 %>% group_by(contacts) %>% summarise(max = max(num_sick)) %>% 
    kable() %>% kable_styling(bootstrap_options = "striped")

Viewing Figure 4.1 and Table 4.1 there is hardly a decrease in the maximum number of sick people in a day when people come into contact with 40, 20, or 10 people per day. To further investigate this, the number of contacts is going to keep reducing to see if it affects the maximum number of sick people in a day.

Figure 4.2

soc.dis.2 <- social_distancing %>% dplyr::filter(contacts %in% 
    seq(1, 9))  # only rows with 1 to 9 contacts
soc.dis.2 %>% ggplot(aes(x = day, y = num_sick, color = contacts)) + 
    geom_line()
soc.dis.2 %>% group_by(contacts) %>% summarise(max = max(num_sick)) %>% 
    kable() %>% kable_styling(bootstrap_options = "striped")

Figure 4.2 and Table 4.2 show that the most significant changes in the maximum number of sick people occurs in this range. Especially, when interactions are limited to coming into contact with 1 person. This isn’t always a feasible option for some people, but for those that can limit interactions in a day should do so as it keeps the disease from spreading rapidly.

Conclusion

Note, this simulation takes very few variables into affect compared to much more sophisticated models. This experiment isn’t necessary to be accurate, but is more to show the relationship between these variables and how they affect an arbitrary disease.

Now, after visualizing the data from the experiments, it can be seen that these variables do play a role in the spread of a disease in the simulations. I hope that the following visualizations and experiment helped give more perspective on managing the spread of a disease.