









Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
Mainly for baby examples exercises
Typology: Study notes
1 / 15
This page cannot be seen from the preview
Don't miss anything!
The Birthday Problem, also known as the Birthday Paradox, is a classic problem in probability theory that deals with the likelihood of at least two people in a group sharing the same birthday. Although it's called a "paradox," it's not a true paradox but rather a surprising and counterintuitive result in probability. The Birthday Problem was first introduced by the Hungarian mathematician Richard von Mises in 1939. It gained popularity as an illustration of how human intuition often fails to grasp the true nature of probability and chance. The problem can be stated as follows: In a group of people, what is the probability that at least two of them have the same birthday? For simplicity, we assume there are 365 days in a year (ignoring leap years), and all birthdays are equally likely. Solution for 2 people in the room: There are 365 possible birthdays for the first person, and only 1 of those days would result in a shared birthday with the second person. Thus, the probability of a shared birthday is A slightly different way to think about it is that the first person has 365 days to “choose” from out of 365. The second person then only has 1 way. Yet another way to think about it (trust me) is to find the probability that they have different birthdays and then subtract that from 1 to find the probability that they share a birthday. The first person can have any of the 365 days as their birthday. The second person has 364 days to choose from. The resulting calculation is Solution for 3 people in the room: When there are three people in the room, it is harder to use the first two methods we used for 2 people in the room. This is because you must do multiple comparisons between sets of different people (similar problem to the Handshake Problem).
In the image above, we need to consider the following scenarios: A has the same birthday as B. A has the same birthday as C. B has the same birthday as C. A, B, and C all have the same birthday. Another issue is that there is some overlap (e.g., A having the same birthday as B would also be counted in the A, B, and C all having the same birthday situation). In order to calculate the probability we need to consider the following events: A has the same birthday as B and C has a different birthday. A has the same birthday as C and B has a different birthday B has the same birthday as C and A has a different birthday. A, B, and C all have the same birthday. The first three will all result in the same probability. To calculate this, the first person can have a birthday on any of the 365 days. The second person has only one day that can be their birthday if they are to have the same birthday as the first person. The third person can have their birthday on any of the remaining 364 days. (this occurs three times so the total probability is ) For the scenario of all three having the same birthday, the first person can have a birthday on any of the 365 days. The second and third person must have the same birthday so they can only have 1 day available for their birthday. Adding these together we get.
As increases, the probability of at least one shared birthday grows rapidly. For example, with a group of 23 people, the probability of at least one shared birthday is over 50%. This result is often surprising because it's much lower than what our intuition suggests, which is why the Birthday Problem is considered a fascinating illustration of probability theory.
This R code demonstrates how to calculate the probability of at least two people sharing a birthday in a group (the Birthday Problem) using various methods, including built-in functions, custom functions, and simulations. It covers groups of 2 and 3 people. The code starts with preliminary results for demonstration purposes, using the built-in pbirthday function to calculate the probability for 2 people. It then calculates the same probability using basic arithmetic and a custom function p_no_match. After that, the code simulates the Birthday Problem for 2 people by generating random birthdays and checking for duplicates. It repeats this simulation 100,000 times (1e5) to approximate the probability. It then compares the probability obtained through simulation to the exact probability calculated using pbirthday. The code continues by doing the same for a group of 3 people. It calculates the probability using pbirthday, basic arithmetic, and simulation. It also compares the simulation probability to the exact probability. Finally, the code introduces a custom function my_pbirthday that uses the permutation formula to calculate the probability. _# R has a built in function, pbirthday
message("Preliminary results for demonstration")
pbirthday(n = 2 , classes = 365 )
1 / 365
1 - ( 365 / 365 )*( 364 / 365 )
_# Custom function to calculate probability there are no matches
p_no_match <- function (d, n) { prod(d:(d - n + 1 )) / d^n } 1 - p_no_match( 365 , 2 )
message("\n\n----- 2 People in Room -----")
_# Simulation approach
set.seed( 6644 ) (birthdays <- sort(sample( 365 , size = 25 , replace = TRUE)))
187 204 213 229
message(paste("Any duplicated birthdays? ", any(duplicated(birthdays))))
# Do this many times to approximate the probability calculated above check_sample <- function (d, n) { birthdays <- sample(d, size = n, replace = TRUE) any(duplicated(birthdays)) } results <- replicate(1e5, check_sample( 365 , 2 )) message(paste("Probability from simulation: ", mean(results)))
message(paste("Probabiltiy from prbirthday: ", pbirthday(n = 2 , classes = 365 )))
# See how far off we were diff <- mean(results) - pbirthday(n = 2 , classes = 365 ) message(paste("Difference between simulation and exact: ", diff))
Estimating pi by throwing random darts at a circle inscribed in a unit square is a classic example of the Monte Carlo method, a computational technique that uses random sampling to solve mathematical problems. The method of estimating pi in this way is also known as the "Buffon's needle problem" or the "dart method." The Monte Carlo method was named after the famous casino in Monaco, as it relies on random sampling and probability, much like gambling. The technique was developed during the 1940s by Stanislaw Ulam, John von Neumann, and Nicholas Metropolis while working on the Manhattan Project. However, the idea of using random sampling to estimate pi dates back to the 18th century, with the work of Georges-Louis Leclerc, Comte de Buffon, a French mathematician, and naturalist. The problem can be stated as follows: Given a circle of radius inscribed in a square of side length 1, we can estimate the value of pi by throwing random darts at the square and counting the number of darts that land inside the circle. The circle's area can be represented by the formula , where is the radius. Since the radius of the circle is , the circle's area is. The square's area is . We can get an estimate of the probability by “throwing” random darts at the square and finding the proportion that land in the circle. Let be the number of darts that land in the circle and be the total number of darts thrown.
The estimate of is times the proportion of darts that land in the circle.
This R code demonstrates how to approximate the value of pi using the Monte Carlo method, by simulating random darts thrown at a circle inscribed in a square, as previously discussed. The code includes visualizations using the ggplot2 and ggforce packages. ● Load the required libraries (ggplot2 and ggforce). ● Generate 1000 random x and y coordinates in the range of 0 to 1. ● Determine if the points lie inside the circle using the distance formula ● Create a scatter plot of the points, color-coded based on whether they are inside or outside the circle. ● Calculate the proportion of points inside the circle and the approximation of pi by multiplying the proportion by 4. ● Create a custom function get_pi() that takes the number of points to be simulated as input, generates random x and y coordinates, calculates the proportion of points inside the circle and returns the approximation of pi. ● Run the get_pi() function with 500 points and observe the result. ● Run the simulation of 500 points 10,000 times (1e4), and store the results in a variable pis. ● Analyze the distribution of pi approximations using summary statistics, histogram, and quantiles. ● Repeat steps 8 and 9 with a higher number of points (10,000) to observe the improvement in the accuracy of the pi approximation. _# If you don't have the following packages, then install them by uncommenting
library(ggplot2) library(ggforce) # Generate many uniform xs and ys between 0 and 1 x <- runif(1e3) y <- runif(1e3) # Logical indicating whether or not the point "landed" inside the
x <- runif(n_points) y <- runif(n_points) inside_circle <- (x - 0.5)^ 2 + (y - 0.5)^ 2 < 0.5^ 2 mean(inside_circle) * 4 } # See what happens if select 500 points get_pi( 500 )
# Run the simulation of 500 points 1000 times pis <- replicate(1e4, get_pi( 500 )) # Look at the distribution of pi approximations summary(pis)
hist(pis, 100 ) quantile(pis, probs = c(0.025, 0.5, 0.975))
# Run the simulation with many, many more points pis <- replicate(1e4, get_pi(1e4)) summary(pis)
hist(pis, 100 ) quantile(pis, probs = c(0.025, 0.5, 0.975))
● Create a function called monte_carlo_integration that takes the number of random points (n) as an input and performs the following steps: ○ Generate n random points (x) uniformly distributed over the interval [0, 1]. ○ Evaluate the function f(x) for each random point x. ○ Calculate the sum of all the function values (f(x)) and multiply it by the length of the interval (b - a) divided by the total number of random points (n) to get the integral approximation. ○ Return the random points (x), function values (f_x), and the integral approximation. ● Perform the Monte Carlo integration for 1e7 random points and print the integral approximation. ● Create a ggplot2 visualization that shows the function curve in red, the random points in blue, and vertical dashed lines connecting 50 of the random points to the x-axis. The code also prints the exact answer using R's built-in integrate function, allowing you to compare the Monte Carlo approximation with the true integral value. # Load the ggplot2 library for visualization library(ggplot2) set.seed( 1 ) # Define the function f(x) = sin(pi * x) f <- function (x) { sin(pi * x) } # Monte Carlo integration monte_carlo_integration <- function (n) { # Generate n random points uniformly distributed over [0, 1] x <- runif(n) # Evaluate the function f(x) for each random point x f_x <- f(x) # Take the value of the function times the width of each rectangle and sum integral_approximation <- ( 1 /n) * sum(f(x)) return(list(x = x, f_x = f_x, integral_approximation = integral_approximation)) } # Number of random points n <- 1e # Approximate the integral using Monte Carlo integration result <- monte_carlo_integration(n) integral_approximation <- result$integral_approximation
print(paste("Integral approximation using", n, "random points:", integral_approximation))
0.636563788515587" print(paste("Exact answer", integrate(f, 0 , 1 )$value))
# Select a random sample of 50 points for visualization sample_size <- 50 sample_indices <- sample( 1 :n, sample_size) # Create a data frame for ggplot with the random sample data <- data.frame(x = result$x[sample_indices], f_x = result$f_x[sample_indices]) # Create the visualization ggplot(data, aes(x = x, y = f_x)) + geom_point(color = "blue", alpha = 0.5) + geom_segment(aes(x = x, y = 0 , xend = x, yend = f_x), linetype = "dashed", alpha = 0.5) + stat_function(fun = f, geom = "line", color = "red") + labs(title = "Monte Carlo Integration", subtitle = paste("Integral Approximation:", round(integral_approximation, 4 )), x = "x", y = "f(x)") + theme_minimal()