WebPagetest Forums
Combination of edge cases triggers ~200ms delay in TTFB - Printable Version

+- WebPagetest Forums (https://www.webpagetest.org/forums)
+-- Forum: WebPagetest (/forumdisplay.php?fid=7)
+--- Forum: Bugs/Issues (/forumdisplay.php?fid=10)
+--- Thread: Combination of edge cases triggers ~200ms delay in TTFB (/showthread.php?tid=13322)

Pages: 1 2

Combination of edge cases triggers ~200ms delay in TTFB - josephscott - 10-24-2014 03:34 AM

I recently spun up a private instance of WPT using the ami-2a1cb142 EC2 image. In general I've been really happy with it.

Now I'm tracking an usual edge case. Here are the factors:
- Chrome, on Windows ( the EC2 instance mentioned above )
- Nginx w/ SPDY enabled, and gzip compression ( level 5 or 6 is what I've tested with )
- A file that is 1,602 bytes or smaller ( exact size might vary depending on how well it compresses ) that is the first resource requested in a SPDY connection

To simplify testing this I've reproduced the conditions on my private site with these two files:

- https://josephscott.org/dev/spdy/test-slow.html
- https://josephscott.org/dev/spdy/test-fast.html

These contain mostly random characters, the only difference between the two is a single character ( test-fast.html is 1,603 bytes, test-slow.html is 1,602 ).

The test-fast.html request is very quick, works the way I'd expect. The test-slow.html has a ~200ms delay in the time to first byte.

If these resources were just part of a page and the initial file requested in the SPDY connection was larger then this ~200ms delay never happens. It doesn't happen for smaller files that are requested after the initial connection.

I've provided HAR files for test results from my private instance:

- https://josephscott.org/dev/spdy/josephscott.org.141023_ZP_D2-fast.har
- https://josephscott.org/dev/spdy/josephscott.org.141023_3A_D3-slow.har

The only difference in the two tests is the resource requested, and the only difference between those is one byte.

The delay doesn't happen when testing with IE 10 - https://josephscott.org/dev/spdy/josephscott.org.141023_JS_DJ-IE10.har

So far I've only been able to recreate this delay using the WPT EC2 instance. But, I've got one other data point that confuses this as well. I found a small file on cloudflare.com to test against - https://www.cloudflare.com/static/javascripts/live/fp-c71f8d1e6cd0e89831d25176cca548d0/page/index.js?v=CB6-2013-08-01-1 - and it doesn't have the ~200ms delay.

This leads me to believe that the wptdriver isn't the only variable, but at the same time I'm only able to produce this delay on my EC2 instance.

In addition to sites at work and reproducing it on my personal site, there are others where this happens too:

- slow: https://thethemefoundry.com/wp-content/themes/ttf-reloaded/images/home/design.svg
- fast: https://thethemefoundry.com/wp-includes/js/jquery/jquery-migrate.min.js?ver=1.2.1

You can also see this when running tests from webpagetest.org:

- Chrome, slow: http://www.webpagetest.org/result/141023_K9_VX2/
- Chrome, fast: http://www.webpagetest.org/result/141023_D2_VXH/
- IE 10: http://www.webpagetest.org/result/141023_89_VXR/

It isn't an exact comparison to my private instance, but you can still see the TTFB delay in there.

I haven't been able to reproduce this on my Mac, an older Windows 7 desktop, the Windows 8 system of a co-worker. I can't even reproduce this on separate non-WPT EC2 images:

- Windows_Server-2012-R2_RTM-English-64Bit-Base-2014.09.10 (ami-904be6f8)
- Windows_Server-2008-R2_SP1-English-64Bit-Base-2014.09.10 (ami-dc65c8b4)

Based on the data that I have so far it looks like this could be a bug on the Nginx side ( since CloudFlare doesn't seem to have this issue ), but not being able to reproduce it on other Windows clients still leaves me with a small chance that this is WPT related ( or one of the components, like dummynet ).

Has anyone else seen this issue? If not, can others reproduce it with their WPT installs? And the big piece, can anyone reproduce this on a Windows system that has never had WPT installed?

Thank you.

RE: Combination of edge cases triggers ~200ms delay in TTFB - pmeenan - 10-24-2014 03:44 AM

If you're on twitter I recommend pinging @igrigorik and pointing him to the write-up to see if he has any thoughts.

If it happened with all browsers my guess would have been something related to the SSL window size. 200ms is exactly the time for Windows delayed-acks though so something more bizarre may be going on.

RE: Combination of edge cases triggers ~200ms delay in TTFB - josephscott - 10-24-2014 04:05 AM

Thanks, I'll do that.

The delayed ack issue was one of the ideas that came up in discussions at work. Since this doesn't happen with Chrome over plain HTTPS with SPDY turned off I think we can rule that out.

RE: Combination of edge cases triggers ~200ms delay in TTFB - pmeenan - 10-24-2014 04:27 AM

Only other thing that comes to mind is if Nginx isn't setting TCP_NO_DELAY on the SPDY connections for some reason and Nagle is kicking in. Still, I'd expect to see it in the cert negotiation in that case too.

RE: Combination of edge cases triggers ~200ms delay in TTFB - pmeenan - 10-24-2014 04:29 AM

BTW, the times in Chrome are what are reported by Chrome while IE and Firefox timings are measured directly (though since WPT doesn't decode SPDY yet you can't see the raw wire-requests in Firefox). Could also be something in Chrome's measurements but I'd expect to see it elsewhere.

A tcpdump might help though there's going to be a bit of guessing to figure out which packets are the data packets.

RE: Combination of edge cases triggers ~200ms delay in TTFB - igrigorik - 10-24-2014 06:38 AM

- 3G "slow": http://www.webpagetest.org/result/141023_ZE_b6a6f4451b9045c8475681efa4e4a825/1/details/
- 3G "fast": http://www.webpagetest.org/result/141023_6F_54420cbc28ac6cca5341d30e3ee06017/3/details/
- cable "slow": http://www.webpagetest.org/result/141023_HY_66ea4b6ddf2a0200e30098bb49b13a1d/3/details/
- cable "fast": http://www.webpagetest.org/result/141023_43_b344eaa1cdd2b36ae5a56daf744c20ec/3/details/

All of the above look the same. Running with "native" (no shaping):
- http://www.webpagetest.org/result/141023_KN_6fcd65a19900d95376fc023b977e6a18/1/details/
- tcp trace: https://www.cloudshark.org/captures/458e260ec4e0?filter=tcp.stream%3D%3D3

It looks like the good old 200ms delayed ack problem on windows.

- http://support.microsoft.com/kb/214397
- http://blogs.technet.com/b/nettracer/archive/2013/01/05/tcp-delayed-ack-combined-with-nagle-algorithm-can-badly-impact-communication-performance.aspx

RE: Combination of edge cases triggers ~200ms delay in TTFB - josephscott - 10-24-2014 07:09 AM

Here is the same slow test, over plain HTTPS ( SPDY turned off ):

- http://www.webpagetest.org/result/141023_RS_11FT/

If you compare that with the "fast" test with SPDY:

- http://www.webpagetest.org/result/141023_PB_11CA/

The fast SPDY test had a TTFB of 38ms. The slow plain HTTPS test had a TTFB of 31ms. For our purposes I'll call this basically the same. Now compare those to the slow test with SPDY turned on that you linked to:

- http://www.webpagetest.org/result/141023_KN_6fcd65a19900d95376fc023b977e6a18/1/details/

The TTFB jumps to 237ms.

If this is the standard Windows ~200ms delay, then it only happens with SPDY turned on.

RE: Combination of edge cases triggers ~200ms delay in TTFB - pmeenan - 10-24-2014 07:13 AM

Hmm, looks like Nginx's tcp_nodelay only kicks in when a socket goes into keep-alive (so subsequent responses don't get buffered): http://nginx.org/en/docs/http/ngx_http_core_module.html#tcp_nodelay

Admittedly it's been a long time since I was solving Nagle issues and it was in a completely different space but I thought all web servers basically disabled it across-the-board.

Do you have any tcp_nopush or sendfile settings configured on the server? If it is using sendfile and tcp_nopush is enabled I could see it kicking in.

RE: Combination of edge cases triggers ~200ms delay in TTFB - josephscott - 10-24-2014 07:18 AM

I've had these turned on in Nginx for all tests against my personal site:

- sendfile on;
- tcp_nopush on;
- tcp_nodelay on;

None of those options were changed between plain HTTPS and SPDY tests.

RE: Combination of edge cases triggers ~200ms delay in TTFB - pmeenan - 10-24-2014 07:20 AM

Hmm, doesn't seem like it's a TCP_NODELAY thing. In the SPDY tcpdump that Ilya shared, the server packets have the PSH bit set on small packets which is usually a good sign that it's doing nodelay. Undecided