Yahoo Answers is shutting down on May 4th, 2021 (Eastern Time) and the Yahoo Answers website is now in read-only mode. There will be no changes to other Yahoo properties or services, or your Yahoo account. You can find more information about the Yahoo Answers shutdown and how to download your data on this help page.
Trending News
Monkeys typing poetry at random?
Suppose we let k monkeys type on k typewriters at random typing any character next with a probability of q. When a monkey dies it is immediately replaced by another monkey. Each typewriter is pressed 10^7 times a year. We let this experiment run for 10^10 years (a little less than the entire life of Earth). What the monkeys do not know is, that they are actually set up for the specific task of randomly producing the following quote from Shakespeares 'As You Like It', Act V, Scene 1:
The fool doth think he is wise, but the wise man knows himself to be a fool.
This quote is 76 characters long (counting punctuation and spaces). The capital 'T' at the beginning is the only capital 'T' throughout the phrase. The question is:
what is the longest substring (starting from the beginning of the phrase) that has an expected time of at most 10^10 years to occur.
Feel free to consider simpler cases and provide answers to these - it might aim towards a solution of the full sized problem. This question was inspired by the following:
@gianlino: Let M be a set of characters. Then form k independent sequences of characters chosen uniformly from M. So q = 1/|M| or |M| = 1/q as Michael stated. Now consider a specific sequence a(1),...,a(76) in M with a(i) ≠ a(1) for i>1. Then I ask:
What is the longest length L of a subsequence of the form a(1),...,a(L) so that the expected length of the k (equally long) sequences is at most 10^17?
@Michael: I covered the case k=1 in the related q. I also gave specific examples:
1. k=2 didn't differ much from half the expectation [E(k=1)=10000 vs. E(k=2)=5000.75]
2. a(i) = a(1) for some i>1 did matter [E(012345)=1000000 vs. E(012034)=999000]
I am mainly interested in the cases k>1 as I already solved k=1, but, being too ignorant as always =), I din't know this was called "elementary renewal".
@gianlino: the expected length of the k sequences, for a(1),...,a(L) to occur, that is...
@gianlino: The sequence a(1),...,a(L) must occur at least once in one of the k equally long independent sequences of characters chosen uniformly from M. For which L is the expected length of the k sequences at most 10^17 for that to happen?
@gianlino: Let E(L,k) denote the expected time for a(1,L) = a(1),...,a(L) to occur in k sequences. Clearly E(L,k) > L. But if we 'just divide by k' then E(L,k) = E(L,1)/k. Then we can choose k big enough to get E(L,k) < L. A contradiction! I am particularly interested in generalizations to large k's.
@Michael: Hilarious! Your test makes me think of gianlino's prev. q "how do the monkeys interact". Some other monkey called Stein already wrote "rose is a rose is a rose". I am sure that is just pure coincidence. Otherwise this suggests that real life monkeys may not conform to the model of being independent... I'll get back to you about your bounds when I have read it more carefully. Looks interesting indeed!
1 Answer
- Anonymous8 years agoFavorite Answer
**EDITED: I wrote and rewrote my answer several times, first giving conservative bounds (I elaborate on those below), and then striving to get tighter ones. I now erase my analysis of the tighter bounds because I based them on the following incorrect claim: I defined X[n] as an indicator function that is 1 if the desired L-character string ends at slot n, and 0 else. I then claimed (erroneously):
Pr[X[n]=1 | X[r] = 0 for all r < n]
= Pr[X[n] = 1 | X[r] = 0 for all r in {n-L+1, ..., n-1}]
While this claim seemed intuitive, it is not true because the value of X[r] for r < n-L+1 can influence the values of X[m] for m in {n-L+1, ..., n-1}, and hence ultimately has some loose dependency with X[n].
**Okay, I'll revert back to my more conservative bounds from before:
1) Upper bound. Let STRING be the desired L-character string. Let T be the number of time slots for the first monkey to type STRING. Break the timeline into consecutive L-character blocks. Let T' be the time for the first occurrence of STRING in one of these blocks (necessarily a multiple of L). Note that T' >= T since now the problem is more constrained.
Looking at a particular block, the probability that at least one monkey types STRING in that block is 1 - (1-q^L)^k. Thus:
E[T] <= E[T'] = L/[1 - (1-q^L)^k]
Using 1/q = 55, k=1000000, L=12 gives E[T'] ~= 9.1946 x 10^(15).
http://www.wolframalpha.com/input/?i=12%2F%281-%28...
Therefore, E[T] <= E[T'] <= 10^(17) if L=12. So we can do at least 12 characters, being "The fool dot"
2) Lower bound. Let n >= L. For a single monkey, the expected number of occurrences of STRING in n time slots is exactly (n-L+1)q^L. Thus:
Pr[monkey types STRING at least once in n slots] <= (n-L+1)q^L
Pr[monkey does not type STRING in first n slots] >= 1 - (n-L+1)q^L
Define g(n) = Pr[monkey does not type STRING in first n slots], so g(n) >= 1 - (n-L+1)q^L. Then, assuming (n-L+1)q^L < 1:
Pr[None of the k monkeys type STRING in first n slots]
= g(n)^k
>= [1 - (n-L+1)q^L]^k
Then:
E[T]
= sum_{n=0}^{infinity} Pr[T>n]
= sum_{n=0}^{infinity} Pr[None of the monkeys type STRING in first n slots]
= L-1 + sum_{n=L}^{infinity} Pr[None of the monkeys type STRING in first n slots]
= L-1 + sum_{n=L}^{infinity} g(n)^k
>= L-1 + sum_{n in [L, …, L-1 + 1/q^L]} g(n)^k
>= L-1 + sum_{n in [L, …, L-1+1/q^L]} (1-(n-L+1)q^L)^k
This lower bound is hard to numerically compute because it sums over so many terms! Let's lower bound this lower bound:
Define z = k/(k+1). Then 1 - (n-L+1)q^L >= z if and only if n <= L - 1 + (1-z)(1/q)^L. Let M be the number of slots in the set {L, L+1, …, floor( L-1 + (1-z)(1/q)^L)}. Note that:
M >= [L-1 + (1-z)(1/q)^L - 1] - L + 1 = (1-z)(1/q)^L - 1.
Then:
sum_{n in [L, …, L-1+1/q^L]} (1-(n-L+1)q^L)^k
>= Mz^k
>= z^k[(1-z)(1/q)^L - 1]
>= z^k(1-z)(1/q)^L - 1
= (k/(k+1))^k (1/(k+1))(1/q)^L - 1
>= (1/e)(1/(k+1))(1/q)^L - 1
Thus:
E[T] >= L-2 + (1/e)(1/(k+1))(1/q)^L
For 1/q = 55, k=1000000, L=14 we get:
lower bound ~= 8.52673 x 10^17.
http://www.wolframalpha.com/input/?i=14-2+%2B+%281...
So L=14 is too large. Since we know L=12 works in this case, the answer is either L=12 or L=13. =)
***
Once again I will (pretend?) to be a monkey and test this via random typing:
sfawp awjRose is a rose is a rose is a rose092-4t9 jskv wlfmsfw0e2onfaosd
Nope. No Shakespeare this time either. =)