Everything sound & ear training related

SoundGym

profile
Flynn Moen
Nov 04, 2022
Hello experts of Soundgym,

I am having a very hard time differentiating between short delays (5-30ms) in the Delay Control game. I can hear the phasing for very short delays (less than 5ms) which makes it easier to identify, but with 5 to 30ms delays, I cannot seem to hear any differences even when I focus my entire attention on the sound. What am I supposed to focus on when I try to differentiate between these short delays?

Thank you in advance!
profile
Andy Lowe
Nov 04, 2022
I know exactly what you mean! Would love to get some hints on this.
profile
Cuantas Vacas
Nov 04, 2022
I've been searching for accurate info about this subject for months, but all I've found are vague 'descriptions' and, usually, in contradiction with each other!
profile
Shen 某某
Nov 04, 2022
I think the game has no sense. We don't have to figure out what does short delay sound like and what time it is. I think spend more time on figuring out different frequencies is more valuable. So, I give up the game and never play it.
profile
James Rosewater
Nov 04, 2022
Hey Flynn,

I'm by no means an expert, but have spend some time studying the theory in uni, which might be of some help...

The phasing (comb-filter) effect which causes timbral changes in the audio occurs between 0.1-20ms, after which the dip and peak frequencies fall to close together to be perceived. When comparing the pre- and post-delay audio, you can hear these timbral shift as the in-phase frequencies will add together, and the out-of-phase frequencies cancel out.

The 20-40ms region is known as the Haas window, in which there is little change in timbre, but a slight thickening/doubling effect. In this region you still perceive a single sound, rather than a distinct echo, but I think of it as getting fuller, warmer and slightly more blurred.

Hope this is somewhat helpful :)
profile
Cuantas Vacas
Nov 05, 2022
Thanks for stopping by and sharing some clarifying information, @James Rosewater ! I'd like to make a question about the timbral changes, which is the only thing I can perceive when comparing delays if one is between 1 and 5 ms, and the other one's value falls into the 6-20 ms. I was wondering, for a given delay time...is timbral changes' perceived intensity or quality measurable? Do these two parameters behave in different ways depending on the instrument used? I don't know if this makes any sense, but I'm always searching for a method that allows me to know what am I listening when I'm asked to tell 3 ms from 5ms...
profile
Flynn Moen
Nov 05, 2022
Thank you @James Rosewater for the explanation! I'll keep that information in mind when I do my next levels.

@Cuantas Vacas I am really not looking forward to telling apart 3ms from 5ms delays...
profile
James Rosewater
Nov 05, 2022
@Cuantas Vacas The timbral changes are indeed measurable and consistent; The comb filters, resulting from the phase misalignment, are essentially frequency summations and cancellations which occur at regular intervals through the frequency spectrum. these summations and cancellations will always be the same for a given delay time.

For example, at 1ms the first peak frequency is 1 kHz. All subsequent peak frequencies at this delay time are integer multiples of the first one (In this case, 2 kHz, 3 kHz, 4 kHz,...) with the cancellations being the values directly in between (500 Hz, 1.5 kHz, 2.5 kHz,...). The formula for the first peak frequency is 1 / Delay time in seconds

Therefore:
1ms delay: First Peak = 1000Hz; First dip = 500Hz
2ms delay: First Peak = 500Hz; First dip = 250Hz
3ms delay: First Peak = 333Hz; First dip = 166Hz
4ms delay: First Peak = 250Hz; First dip = 125Hz
5ms delay: First Peak = 200Hz; First dip = 100Hz
10ms delay: First Peak = 100Hz; First dip = 50Hz
15ms delay: First Peak = 66Hz; First dip = 33.3Hz
20ms delay First Peak = 50Hz; First dip = 25Hz

As the delay time increases, the first peak and dip frequencies get lower, and all the subsequent peaks and dips occur closer together. (The result being a downwards frequency sweep, which gets slightly less strong as you move towards 20ms)

I find that delay times between 1-10ms have a lot more pronounced resonances in the mid to upper mid frequencies, and more noticeable 'holes' where frequencies have been cancelled out, than the delay times between 10-20ms. While the shape of the comb filter will always be consistent at a given delay time, it's a lot easier to hear all these effects in audio with more harmonic content.

I hope you can use this information to develop what to listen for in these situations. I myself haven't been training my ears for that long, and am not at that level yet where I am trying to differentiate 3ms from 5ms, so I can't really say if there is any deeper method or approach to identify the different frequencies.
profile
Flynn Moen
Nov 05, 2022
Wow. I'm blown away by the explanation. I'm gonna keep a screenshot of this for the next time I play Delay Control haha
profile
Wolfgang Robinig
Nov 05, 2022
James already gave an excellent explaination!
But there's even more to it: Our brain can interpret short delays as room information. The early reflections from a reverb already gives us a lot of info about the room, the first reflexion alone can as well create some impressions of the room or space it might be in. Since it's only one reflexion, the info is not distinct and can be interpreted in multiple ways:

* The size of the room: Shorter delay times indicate a smaller room size (3ms means 1m difference between direct sound and first reflection, could be in a small room; 30ms translate to 10m difference between direct sound and first reflection, could be in a bigger room.

* The distance between the instrument and the listener: Within a fixed room size a small predelay time means, that the instrument is further away (closer to the wall behind it). With very small delaytimes the source sounds more distant.
On the other hand, the bigger the predelay gets, the closer the instrument will sound. (Yep, it's really this way around!)

So you can listen for:
- phasing (1 - 20 ms)
- first peak in phasing (1 - 10 ms)
- Haas window (20 - 40 ms)
- room size (especially between 15 and 45 ms; longer delay - bigger room)
- perceived distance (especially between 15 and 45 ms; longer delay - shorter distance)

Hope that helps!
profile
Cuantas Vacas
Nov 05, 2022
😦😮 At this moment I feel amazed, overwhelmed and grateful, each at 33,33% of my whole capacity for feeling, @James Rosewater & @Wolfgang Robinig !! You both have kindly taken the time to condense your knowledge into accesible texts that make so much sense for someone that's quite lost between sound physics, perception and many misunderstood concepts taken from here and there...

Now, I might proceed as @Flynn Moen suggested, and keep your explanations printed on paper always at hand. But first I need to read them again and again until I'm sure I have understood every part of this precious information, and then replace the disastrous mess of concepts I keep in my brain for the real, good stuff!!

Thank you again. It's great to have people like you around!🙏
profile
Flynn Moen
Nov 05, 2022
Yes, thank you again @James Rosewater and @Wolfgang Robinig for taking your time to explain our questions! We greatly appreciate it 😄
profile
li lac
Nov 06, 2022
heyy @James Rosewater thankss for providing the 1st dip and peak numbers that correspond to the delay time it actually really gave me a new perspective on how to hear these 30ms delay after struggling for so long!! i do have a question tho, sometimes SG give the option of ex. 10ms vs 18ms and based on the info, 18ms is supposed to have lower 1st peak and dip compared to the 10ms one, but the delayed sound has some 300hz boost in it (from what i hear) compared to the original sound & the answer is the 18ms option, so this made me think how does the amplitude of the successive peaks and dips changes (after the first peak and dip)? i would deeply appreciate if u can share about those too!
profile
li lac
Nov 06, 2022
@James Rosewater less than 30ms delay*
profile
James Rosewater
Nov 06, 2022
@li lac Yeah of course :) I'm always happy to help!

The amplitude of the peaks/dips is really dependant on the frequency content that is already there in the recording.

Take for instance a delay of 10ms. At the first dip (50 Hz), that particular frequency will in theory cancel completely. The first peak will instead sum together resulting in an increase of 6dB. This applies to all other dips/peaks also.

But of course when delaying a recording of a guitar or other instrument, the amplitude at any given frequency will be slightly different 10ms later, so the amount of summation/cancellation will vary ever so slightly.

Furthermore, if there is a lot of information at e.g. 300 Hz, but less in the lows, where the first peaks/dips are, the effects may be more perceivable in mids.
profile
li lac
Nov 06, 2022
@James Rosewater ahhh i seee! thanks for the explanation, so imma try summarizing to make sure im on the right track, if there's an option of 4ms and 20ms, i will be listening for the peaks and dips of 4ms which is 250,500,750hz etc..(integer multiple of the first peak) and also try to listen for dip (midpoint of peaks) vs 20ms which is 50,100,150,200,250hz etc? if all the peaks are around the same amplitudes then some of the peak frequencies (between the 4ms and 20ms one) will overlap too rightt
profile
john astacio
Nov 06, 2022
I had that issue b4, now I figured it out dont pay attention on the sound JUST LISTEN THE ECHO of the delay
profile
James Rosewater
Nov 06, 2022
@li lac Yeah, that all looks right! Frequencies will overlap, but the resulting comb filter will also be a lot more compact as the delay approaches 20ms, resulting in less distinct peaks and dips.
profile
li lac
Nov 07, 2022
@James Rosewater gotchu, thanks mann! time to use these new clues to listen hahah 🙏🙌