So, the implementation of W'bal in GoldenCheetah has been a bit of a challenge.
In 1965 two scientists Monod and Scherrer presented a ‘Critical Power Model’ where the Critical Power of a muscle is defined as ‘the maximum rate of work that it can keep up for a very long time without fatigue’. They also proposed an ‘energy store’ (later to be termed W’, pronounced double-ewe-prime) that represented a finite amount of work that could be done above that Critical Power.
In the chart above the solid red curved line represents the best power you can perform for a given duration; a so-called 'power-duration' curve. As time gets longer to the right, the power you can sustain tends towards the asymptote 'CP'. But at shorter durations we work at power above CP, and to do this we use a finite capacity (sometimes called the Anaerobic Work Capacity, AWC). It is in the region of 10-40 kJ, mine for example is typically 23 kJ. The harder we go, the quicker it is used up hence the shape of the curve above.
In cycling parlance W’ would be referred to as the matchbook– the harder you go the quicker it will be used up, but temper your efforts and you can ‘save a match’ for the last sprint. CP, on the other hand, is that intensity (or power output) where you are uncomfortable but stable, akin to your TT pace. You know that if you try to go any harder you are gonna blow up pretty quickly.
Monod and Scherrer also provided a mathematical formula to estimate the maximum power you can go for any given duration using W’ and CP as parameters. This formula is pretty reliable for durations between 2 and 20 minutes or so, but less so over short and longer durations. And so, over the last 50 years, variations of these models have been developed to address this, and it continues to be a topic of great scientific interest.
Now, we know from the Critical Power model that when we work above CP we start eating into our limited W’ stores. If we keep going hard enough for long enough we will blow when it’s all gone. But, we also know that it will also be replenished over time too.
When we work below CP the energy stores within the muscles are restocked. The further below CP we are the faster we will recover, and for the first 30 seconds of recovery we get the most bang for buck as blood-flow into the muscles is still high from the previous bout.
Phil Skiba along with his colleagues Weerapong Chidnok, Anni Vanhatalo, and Andrew Jones derived a formula for tracking the levels of W’ remaining during intermittent exercise, called W’bal, that we can plot alongside power. It is particularly useful for assessing workouts for likely failure and reviewing and comparing intervals within a single workout, even when they are of differing durations.
It is likely that in the near future you will see W’bal appear on bike computer headunits to show you the capacity remaining as you race.
It's largely comprised of a bunch of constants as you can see, 546, 316 and 0.01 are all derived from experiment and test performed during the research. The only moving part is DCP -- this is basically how many watts below CP you recovered at. It is computed as the difference between your CP and the average of all work performed below CP.
It is trivial to calculate since the only thing we need to compute from a ride is just the average power below CP.
This equation is basically saying that to compute the W'bal for a specific point in time (t) then we take the starting W' (remember we all have a CP and W', my bests are 275 and 23kJ respectively) and subtract an integral.
That integral is the sum of all values from time 0 to the point in time (t) that we are computing W'bal for i.e. we need to compute a sum of some formula from the beginning of the ride to this point just to work out what W'bal is at one point in time.
The formula we apply at each point in time itself is pretty simple; it is the W' bal expended at each point in time with an exponential decay applied, i.e. W' is recovering exponentially since the point it was used.
So, if at the start of a ride I was 100w above CP (and I am using 1 second samples) then in that 1 second I expended (100w x 1s) 100 joules. So W'EXP at time 0 is 100. At each point in time after that effort it will have recovered a little, exponentially. This is calculated as applying an exponential decay taking into account how long after the sample I am (t-u).
So 10 seconds after that 100 joules effort I will have recovered 100 x (e^-(10)/tau) joules -- even though at that 10 second point I might still be working above CP !
But of course this is only right because the recovery time 'tau' also takes into account the overall recovery rate (it is computed from the entire ride) and so although we are computing recovery at a point in time when we might be working above CP still it all balances out in the end, or at least that is the theory.
So -- to compute an entire data series of W'bal for a ride from start to finish we will need to compute an integral for every single point in the ride. That is very expensive in computational terms !
The trouble is, computing integrals of exponential recoveries is not something that Excel is particularly good at. Dr Skiba had clearly needed to jump through a few hoops just to get Excel to do what he wanted. I would share the original sheet, but it contains a few things that cannot be shared.
To give you a clue though, it computed a decay factor as a column of 1200 data points and then computed a given point in time by using an Excel function SUMPRODUCT on the ride data, in reverse order.
I then took that tortuous approach and reimplemented it in C++, including the temporary data series and a SUMPRODUCT function. The results were predictably poor from a performance perspective but at least we had an implementation that worked !
So in order that the whole of GoldenCheetah wasn't slowed to a crawl computing it I isolated the code in its own C++ class (WPrime) and only accessed it's data when I really needed it (i.e. the user has requested it is displayed on a plot.
I didn't add any metrics (because it would mean W'bal would need to be computed every time we wanted one of the metrics) and bought myself a book on calculus (and for other reasons, linear algebra) with a view to reimplementing more efficiently (or at least reimplementing the SUMPRODUCT function).
First of all, the 1200 points back is arbitrary. Tau might be 60 or it might be 500. Secondly it means we effectively loop 1200 for every second in a ride, which is typically 4000 - 10000 samples. So in total it is 1200 x 10000 or 12 million times !
So the most obvious thing to do was to split the computation into separate threads; but since we need to support low end netbooks as well as desktop workstations I took the (lazy) approach of just splitting into two threads, a and b. Thread a works on the first half of the ride, and thread b works on the second half of the ride. It had a pretty immediate effect.
Secondly I turned the algorithm on its head. Instead of looking backward from time t and summing the decay of all prior points to get the value for point t I did the opposite. I took the point at which W' was depleted and computed how much it was recovered for every point in time into the future, summing them as I added to them. I only needed to loop from where W' had been depleted.
That point needs to be explained.
On a typical ride, lets say a tempo ride for 2 hours, you will go above CP on relatively few occasions. Even a TT will hover above and below CP. So typically you will spend a lot less time above CP than at it or below it. But the earlier implementation didn't care, it summed up for all prior points regardless of whether there was depletion or not.
By turning it round, in a ride of say 10000 samples, there will typically only be 500 where W' is depleted .. so we only need to compute the replenishment for those samples. We thus have 500x1200 loops instead of the previous 12 million. Quite a saving. But this only works because we take into account what rides typically look like, i.e. it is a domain specific optimisation.
Lastly, I looked at the 1200 loops. It was arbitrary. We know what the decay half-life is, because we computed it ! Tau tells us how many seconds before we have recovered to half. So instead of looping for 1200 times I changed it to loop to Tau * 3; but that would only be helpful for riders with fast recovery .. us slow MAMILs may have a 500s recovery, so this would make it take longer.
So as well as limiting the loop based upon the tau calculated I also looked at bounding based upon the value left. Since we are working in numbers like 10000 - 45000 joules we're not too fussed when we're down to increments of 10s of joules. So I bounded the looping to stop when we were left with less than 10 joules of W' being computed. Compared to the fixed loops that saved over 20% on previous runtimes.
So now, we have an implementation that is typically 20x faster than the original. It will now compute W'bal for 500 1hr+ rides in less than 30 seconds on an dual-core 2009 vintage Macbook Pro.
Differential Equation
Well, somewhere around Spring of this year Andy Froncioni and Dave Clarke both came up with a differential form of the formula, where "the exponentials dropped out".
This differential form didn't need tau since it "took into account how low you are below CP implicitly". BUT, we know that there is a big difference between athletes in tau and the differential form has an implicit Tau = W' / CP. This will typically result in a tau of 60-100s as opposed to the circa 300-400 values I was seeing for my own personal data. For some, highly trained, riders this will not be an issue.
However, it shows great promise and no doubt will be adapted to address this and other problems in time. We have implemented it in GoldenCheetah and it can be selected in preferences.
Speed up with OpenCL
I spent some time this weekend converting the code to use openCL - this uses the GPUs in most modern computers to perform parallel computation. It is unbelievably powerful and with high end graphics cards can perform vectorized functions very, very quickly.
In fact, OpenCL will be a focus in v3.2 of GoldenCheetah as we move away from computing ride metrics using the CPU to make metric computation real-time and move away from the 'Updating Statistics' dialog we have all come to endure.
One of the first candidates for optimisation via OpenCL is clearly the WPrime class.
UPDATE October 2014 : Dave Waterworth has adjusted the original formula to remove the cost of integral computation making W'bal computable in a single pass. It means this is now super fast and ready to go onto a headunit.
The Science
I wanted to explain what we've done and how it works in this blog post, but realised that first I need to explain the science behind W'bal, W' and CP.W' and CP
How hard can you go, in watts, for half an hour is going to be very different to how hard you can go for say, 20 seconds. And then thinking about how hard you can go for a very long time will be different again. But when it comes to reviewing and tracking changes in your performance and planning future workouts you quickly realise how useful it is to have a good understanding of your own limits.In 1965 two scientists Monod and Scherrer presented a ‘Critical Power Model’ where the Critical Power of a muscle is defined as ‘the maximum rate of work that it can keep up for a very long time without fatigue’. They also proposed an ‘energy store’ (later to be termed W’, pronounced double-ewe-prime) that represented a finite amount of work that could be done above that Critical Power.
Monod and Scherrer CP Model |
In cycling parlance W’ would be referred to as the matchbook– the harder you go the quicker it will be used up, but temper your efforts and you can ‘save a match’ for the last sprint. CP, on the other hand, is that intensity (or power output) where you are uncomfortable but stable, akin to your TT pace. You know that if you try to go any harder you are gonna blow up pretty quickly.
Monod and Scherrer also provided a mathematical formula to estimate the maximum power you can go for any given duration using W’ and CP as parameters. This formula is pretty reliable for durations between 2 and 20 minutes or so, but less so over short and longer durations. And so, over the last 50 years, variations of these models have been developed to address this, and it continues to be a topic of great scientific interest.
W' bal
Unless we’re riding the pursuit or a very flat time trial, when we train and race we tend to ride sustained efforts interspersed with recovery. These intermittent bouts might occur when we climb a hill, or sprint out of a corner or bridge a gap. In fact almost all training and racing away from the turbo tends to be variable because of this.Now, we know from the Critical Power model that when we work above CP we start eating into our limited W’ stores. If we keep going hard enough for long enough we will blow when it’s all gone. But, we also know that it will also be replenished over time too.
When we work below CP the energy stores within the muscles are restocked. The further below CP we are the faster we will recover, and for the first 30 seconds of recovery we get the most bang for buck as blood-flow into the muscles is still high from the previous bout.
W'bal for an evenly paced 2x20 workout |
Phil Skiba along with his colleagues Weerapong Chidnok, Anni Vanhatalo, and Andrew Jones derived a formula for tracking the levels of W’ remaining during intermittent exercise, called W’bal, that we can plot alongside power. It is particularly useful for assessing workouts for likely failure and reviewing and comparing intervals within a single workout, even when they are of differing durations.
It is likely that in the near future you will see W’bal appear on bike computer headunits to show you the capacity remaining as you race.
Now everyone is different. It appears that highly trained individuals recover their W' much faster than detrained individuals. This needs to be taken into account. And so as well as the recovery mechanism there is also a formula to determine the recovery timescale in seconds, this is computed based upon all the data in the file, but is specific to an individual and not a workout. So in theory, once you know your personal 'tau' it should be used for all analysis -- but of course, as W' and CP change, no doubt, so does your recovery period 'tau'.
The Maths
There are two parts; working out what tau is then computing W'bal as a data series for a workout.Tau
The formula for computing tau is easy, so lets start there:It's largely comprised of a bunch of constants as you can see, 546, 316 and 0.01 are all derived from experiment and test performed during the research. The only moving part is DCP -- this is basically how many watts below CP you recovered at. It is computed as the difference between your CP and the average of all work performed below CP.
It is trivial to calculate since the only thing we need to compute from a ride is just the average power below CP.
W'bal
Mathematically it is relatively simple to describe:W'bal formula from Skiba et al |
This equation is basically saying that to compute the W'bal for a specific point in time (t) then we take the starting W' (remember we all have a CP and W', my bests are 275 and 23kJ respectively) and subtract an integral.
That integral is the sum of all values from time 0 to the point in time (t) that we are computing W'bal for i.e. we need to compute a sum of some formula from the beginning of the ride to this point just to work out what W'bal is at one point in time.
The formula we apply at each point in time itself is pretty simple; it is the W' bal expended at each point in time with an exponential decay applied, i.e. W' is recovering exponentially since the point it was used.
So, if at the start of a ride I was 100w above CP (and I am using 1 second samples) then in that 1 second I expended (100w x 1s) 100 joules. So W'EXP at time 0 is 100. At each point in time after that effort it will have recovered a little, exponentially. This is calculated as applying an exponential decay taking into account how long after the sample I am (t-u).
So 10 seconds after that 100 joules effort I will have recovered 100 x (e^-(10)/tau) joules -- even though at that 10 second point I might still be working above CP !
But of course this is only right because the recovery time 'tau' also takes into account the overall recovery rate (it is computed from the entire ride) and so although we are computing recovery at a point in time when we might be working above CP still it all balances out in the end, or at least that is the theory.
So -- to compute an entire data series of W'bal for a ride from start to finish we will need to compute an integral for every single point in the ride. That is very expensive in computational terms !
The Implementation
Part One: Copying Excel
Initially I was handed an implementation by Dr Skiba in Excel. And given it was the reference I needed to implement it like for like.The trouble is, computing integrals of exponential recoveries is not something that Excel is particularly good at. Dr Skiba had clearly needed to jump through a few hoops just to get Excel to do what he wanted. I would share the original sheet, but it contains a few things that cannot be shared.
To give you a clue though, it computed a decay factor as a column of 1200 data points and then computed a given point in time by using an Excel function SUMPRODUCT on the ride data, in reverse order.
I then took that tortuous approach and reimplemented it in C++, including the temporary data series and a SUMPRODUCT function. The results were predictably poor from a performance perspective but at least we had an implementation that worked !
So in order that the whole of GoldenCheetah wasn't slowed to a crawl computing it I isolated the code in its own C++ class (WPrime) and only accessed it's data when I really needed it (i.e. the user has requested it is displayed on a plot.
I didn't add any metrics (because it would mean W'bal would need to be computed every time we wanted one of the metrics) and bought myself a book on calculus (and for other reasons, linear algebra) with a view to reimplementing more efficiently (or at least reimplementing the SUMPRODUCT function).
Part Two: Reimplementing, Threads & Domain specific optimisation
So after examining the code I realised that the sumproduct code could be replaced. It computes the W'bal exactly as the formula suggests by looking back 1200 points and applying the replenishment exponential decay to every point and summing them for the current point in time.First of all, the 1200 points back is arbitrary. Tau might be 60 or it might be 500. Secondly it means we effectively loop 1200 for every second in a ride, which is typically 4000 - 10000 samples. So in total it is 1200 x 10000 or 12 million times !
So the most obvious thing to do was to split the computation into separate threads; but since we need to support low end netbooks as well as desktop workstations I took the (lazy) approach of just splitting into two threads, a and b. Thread a works on the first half of the ride, and thread b works on the second half of the ride. It had a pretty immediate effect.
Secondly I turned the algorithm on its head. Instead of looking backward from time t and summing the decay of all prior points to get the value for point t I did the opposite. I took the point at which W' was depleted and computed how much it was recovered for every point in time into the future, summing them as I added to them. I only needed to loop from where W' had been depleted.
That point needs to be explained.
On a typical ride, lets say a tempo ride for 2 hours, you will go above CP on relatively few occasions. Even a TT will hover above and below CP. So typically you will spend a lot less time above CP than at it or below it. But the earlier implementation didn't care, it summed up for all prior points regardless of whether there was depletion or not.
By turning it round, in a ride of say 10000 samples, there will typically only be 500 where W' is depleted .. so we only need to compute the replenishment for those samples. We thus have 500x1200 loops instead of the previous 12 million. Quite a saving. But this only works because we take into account what rides typically look like, i.e. it is a domain specific optimisation.
Lastly, I looked at the 1200 loops. It was arbitrary. We know what the decay half-life is, because we computed it ! Tau tells us how many seconds before we have recovered to half. So instead of looping for 1200 times I changed it to loop to Tau * 3; but that would only be helpful for riders with fast recovery .. us slow MAMILs may have a 500s recovery, so this would make it take longer.
The fastest code is always the simplest ! |
So as well as limiting the loop based upon the tau calculated I also looked at bounding based upon the value left. Since we are working in numbers like 10000 - 45000 joules we're not too fussed when we're down to increments of 10s of joules. So I bounded the looping to stop when we were left with less than 10 joules of W' being computed. Compared to the fixed loops that saved over 20% on previous runtimes.
So now, we have an implementation that is typically 20x faster than the original. It will now compute W'bal for 500 1hr+ rides in less than 30 seconds on an dual-core 2009 vintage Macbook Pro.
Part Three : Where Next ?
Well, somewhere around Spring of this year Andy Froncioni and Dave Clarke both came up with a differential form of the formula, where "the exponentials dropped out".
This differential form didn't need tau since it "took into account how low you are below CP implicitly". BUT, we know that there is a big difference between athletes in tau and the differential form has an implicit Tau = W' / CP. This will typically result in a tau of 60-100s as opposed to the circa 300-400 values I was seeing for my own personal data. For some, highly trained, riders this will not be an issue.
However, it shows great promise and no doubt will be adapted to address this and other problems in time. We have implemented it in GoldenCheetah and it can be selected in preferences.
Speed up with OpenCL
I spent some time this weekend converting the code to use openCL - this uses the GPUs in most modern computers to perform parallel computation. It is unbelievably powerful and with high end graphics cards can perform vectorized functions very, very quickly.
In fact, OpenCL will be a focus in v3.2 of GoldenCheetah as we move away from computing ride metrics using the CPU to make metric computation real-time and move away from the 'Updating Statistics' dialog we have all come to endure.
One of the first candidates for optimisation via OpenCL is clearly the WPrime class.
UPDATE October 2014 : Dave Waterworth has adjusted the original formula to remove the cost of integral computation making W'bal computable in a single pass. It means this is now super fast and ready to go onto a headunit.