Great work. Some suggestions on the stat's:
1. With multiple runs, the main results page states: "Performance Results (Median Run)" The results appear (?) to be arithmetic means, not median values. Perhaps change the title? 2. I see that there is an attempt to select one of the (10) test results that represents the "mean", but how is this determined? On what result (or aggregation) is this result selected? Perhaps provide a doc explaining the method? 3. If "Visually Complete" is available across all runs, promote that up to this summary...averaged. 4. For the timings, it would be nice to get further statistics here, though I'm not sure how to determine these, or what type of distributions these all are. For example:  mean, median, std. dev, variance, etc...conformant to the expected distribution. For last mile, all things being equal, distributions are relatively 'normal' due to the wide range of independent variables. The problem, though, is that 10 trials is probably insufficient to overcome the variance...erg. 5. For the 'static values' (bytes, request #, Dom Elements), it might be nice to get a range. 

