Research paper on Among Us Dumpy Bot

This was my final for ICS 80 (Data science). Enjoy this silly little analysis on a shitpost bot. Source code for each graph included. As this was a group project, please note that a lot of the writing is not me speaking. Please also note the data used is from July 27th to December 6th.


An analysis on the growth and resource usage of Among Us Dumpy Bot

Kainoa Kanter, Chun Yin Harris Wan, Brian Kim, José Valdivia

12/10/2021


What is Among Us Dumpy Bot?

Simply put, Among Us Dumpy Bot is a simple add-on (referred to as a “bot”) for the social media platform Discord that takes any image and converts it into a series of a twerking crewmates from the hit game Among Us, which received a fair amount of recognition and notoriety online due to the absolute absurdity and hilarity it can produce, eventually leading it to be added to over 30,000 groups (or “servers”) and accumulating over 1.6 million unique users over a period of around 6 months.

A couple things we are interested in in analyzing is how user engagement affects both overall popularity and growth of an application and how effectively the physical server hosting the bot scales to a higher workload.


Question 1

calc <- 
  data %>% 
  mutate(rowID = row_number())

popular <-
  map_dfr(set_names(calc$popular, calc$rowID), function(pop_var)
  fromJSON(pop_var),
  .id = "rowID"
)

popular %>% 
  group_by(name) %>% 
  summarise(count = sum(
    as.numeric(count))) %>% 
  ggplot(aes(
    x = count,
    y = reorder(name, count),
  )) +
  geom_col() + 
  theme_rose_pine() +
  scale_x_continuous(
    limits = c(0, 265000), 
    labels = scales::comma,
    breaks = seq(0, 265000, by = 50000)
  ) +
  labs(x = "# of times used",
       y = "Name of command",
       title = "Total command popularity",
       color = "#c4a7e7") +
  geom_text(aes(
    label = count),
    position = position_dodge(
      width = 0.4),
    vjust = 0.5,
    hjust = -0.05)

This is a bar illustrating the popularity of each command available in the bot. The x-axis indicates the number of times each command was used, and the y-axis indicates the name of the command used. The command dumpy is observed to the most popular, most likely due to the fact that it's what the bot is mostly used for. However, we are also able to see the usage of other commands available. For example, the help command (which shows other commands the bot contains) has also been used a fair amount of times, as well as the background command, which lets users set a custom background for all images generated.


Question 2

ggplot(data,
       aes(x = memload,
           y = active)) +
  geom_point() +
  theme_rose_pine() +
  scale_y_continuous(
    labels = scales::comma,
    breaks = seq(0, 2000, by = 400)
  ) +
  labs(x = "Memory load (%)",
       y = "# of active users",
       title = "Relationship Between User Engagement and Resource Usage")

The scatter plot illustrates the relationship between user engagement and resource usage. The x-axis is the memory load and the y-axis is the number of active users. From this, we can see that there isn't necessarily a direct correlation between the number of active users and the amount of memory that the bot uses, as shown with the average number of users being 200-800 displaying memory usage all the way from ~30% to ~85%.


Question 3

ggplot(data,
       aes(x = servers,
           y = users)) +
  geom_line() +
  theme_rose_pine() +
  scale_x_continuous(
    limits = c(18000, 30100), 
    labels = scales::comma,
    breaks = seq(18000, 30100, by = 3000)
  ) +
  scale_y_continuous(
    limits = c(800000, 1600000), 
    labels = scales::comma,
    breaks = seq(800000, 1600000, by = 200000)
  ) +
  labs(x = "# of Servers",
       y = "# of Users",
       title = "Relationship Between Server and User Growth")

This is a line graph signifying the relationship between the number of servers the bot appears on and user growth. The x-axis represents the amount of servers the bot is in, while the y-axis represents the amount of users of the bot.

The number of users in this graph starts at around 800,000 and steadily grows from there, with occasional dips in users. Past ~26,000 servers however, the line dramatically jumps, and evens out into a straight line going diagonally up.

This indicates a positive relationship: as the number of servers the bot is increases, so does the amount of users. Similarly, this does not necessarily indicate a causal relationship, and has to be looked at further.


Question 4

ggplot(data,
       aes(x = memload,
           y = bandwidth / 1000000000)) +
  geom_density_2d() +
  theme_rose_pine() +
  scale_x_continuous(
    limits = c(0, 100), 
    labels = scales::comma,
    breaks = seq(0, 100, by = 20)
  ) +
  labs(x = "Memory Load (%)",
       y = "Bandwidth (gb)",
       title = "Relationship Between Memory Load and Bandwidth Usage")

This is a density graph representing the relationship between the memory load of the bot on the host server and the amount of bandwidth used as the bot sends data back and forth from the host server to and from Discord. The x-axis represents the memory load percentage, and the y-axis represents bandwidth in gigabytes.

In the graph, there is less space between lines around the 10% to ~50% range, and at the ~60% to ~90% range below 5 GB of bandwidth. There are outliers present, as the density graph stretches out into higher bandwidths.

This graph shows that there isn't much of a relationship between the two variables. It's mostly gathered around constant areas, with some taking far more bandwidth for no apparent reason in this graph.


Question 5

t <- data %>% 
  select(time) %>% 
  pull() %>%
  as_datetime()

data %>% 
  ggplot(aes(x = t,
             y = active)) +
  geom_line() +
  theme_rose_pine() +
  scale_y_continuous( 
    labels = scales::comma,
    breaks = seq(0, 2000, by = 500)
  ) +
  labs(x = "Time",
       y = "# of active users",
       title = "Relationship between time and user engagement")

The line graph shows the relationship between time and user engagement. The x-axis is the time in months from July to December, and the y-axis is the number of active users.

As shown, the number of active users started from a bit over 2000 in July, but then it has an overall decline and some fluctuations in between. The number of active users drastically dropped to around 100 when close to December, which could be explained by the fall in popularity leading to the fall in engagement.

And recently, there is a rapid increase in the number of active users that reaches almost 1,500 and then declines again. The reason for the sudden rise is that SomeOrdinaryGamers, a Youtuber with 3 million subscribers, made a video which directly mentioning the bot which caused a significant portion of his viewers to check the bot out.

It seems that there is an overall negative relationship between the two variables. However, we have to take into consideration of other factors that might change the relationship.


Conclusion

Using the data we got from AUDB, we succeeded in answering all of our questions. We were able to effectively deduce which command was the most popular, dumpy, being used 244,697 times.

Bar plot: average popular command across days

Scatter plot: relationship between user engagement and server resource usage

Line graphs: relationship between server growth and user growth (positive); relationship between time and user engagement (negative)

Density graph: relationship between the memory load of the bot on the host server and the amount of bandwidth used as the bot sends data to and from the host server to and from Discord.


Welp, thanks for reading! Feel free to check out the bot here: https://dumpy.t1c.dev