Picking up a conversation from many

Someone asked a very interesting question on the physics Stackexchange forum: Listening to one of two people talking at the same time. The poster asked how we can distinguish between two people talking simultaneously. He had some hypotheses about frequency range, for example a man and a woman talking. This is surely an easy case, but not the only way to do so.

An answer got into a lot of details about the fact that we have two ears, and that therefore the sound that gets to each one is delayed slightly relative to each other, due to the non-infinite speed of sound in air. Our brain can use this information, and other ones, to pinpoint location, and similarly probably isolate a single person speaking in a crowd. This is the well known cocktail party effect.

However, I thought that our brain was much more powerful than that. I took a sample of a man talking and superposed to it the same sample but delayed in time by about 1 minute using the excellent Audacity sound processing program. I then played only the right channel of the stereo signal on a single speaker. So here, I have the exact same voice, same frequencies coming from a single source, but saying two completely different things. I was surprised with the ease with which I could still follow either the original speech of the time delayed one at will.

This is probably because speech has a certain structure. For example, you expect that after hearing “cr”there will be a vowel, as for example in “cry”. You for sure don’t expect “crbttjk”, at least not in English. If the speaker plays “i” and “f” at the same time after I heard “cr”, the “f” is probably considered noise by my brain. This is still really amazing.