Our lovely family dog has passed. We miss him.

# 40 Goals

Follow the exploits of a forty-something trying to beat father time by setting ambitious goals and striving to achieve them ...

## Wednesday, January 13, 2016

## Tuesday, January 12, 2016

### Impossible twice before breakfast

I've been working on a workout editor over the Christmas break. Its not done, but lets you edit ERG files and get the MMP and W'bal plotted as you go.

I've just re-used the sustained interval algorithm to find aerobic efforts that are impossible (at least according to the model anyway). It highlighs sections that are impossible with a red line along the x-axis.

Because it is so fast we can search every time the user drags a point. Really pleased with the results and TBH feels like the best use of the algorithm !

I've just re-used the sustained interval algorithm to find aerobic efforts that are impossible (at least according to the model anyway). It highlighs sections that are impossible with a red line along the x-axis.

Because it is so fast we can search every time the user drags a point. Really pleased with the results and TBH feels like the best use of the algorithm !

## Tuesday, November 10, 2015

### GoldenCheetah Download Stats

I recently looked into the number of times the release binaries for GoldenCheetah were being downloaded. Until we shifted to hosting at Github it has been tricky to find out. But we can now get stats going back to 3.0SP2.

It is clear that v3.1 is very popular, but v3.2 has only been out for 3 months and downloads are running at about 250 per day at present. The numbers below for all releases change, I find it surprising that 1 or 2 people will download 3.0 every day (!)

I wish there was a way we could estimate the number of real users from these stats...

v3.0-SP2: 26,865

v3.1: 73,875

v3.2: 15,619

v3.2-RC1X: 146

v3.3-DEV9: 67

It is clear that v3.1 is very popular, but v3.2 has only been out for 3 months and downloads are running at about 250 per day at present. The numbers below for all releases change, I find it surprising that 1 or 2 people will download 3.0 every day (!)

I wish there was a way we could estimate the number of real users from these stats...

**Release Builds (via GoldenCheetah.org)**v3.0-SP2: 26,865

v3.1: 73,875

v3.2: 15,619

**Development Builds (via forum announcements)**v3.2-RC1X: 146

v3.3-DEV9: 67

## Sunday, May 17, 2015

### Finding TTE and Sustained Efforts in a ride

### Defining the problem

Any given training ride or race will contain periods of sustained effort, sometimes to exhaustion. Perhaps during the last sprint, or over a long climb, bridging a gap or chasing on after a roundabout or corner.

Being able to identify these efforts is rather useful.

The trouble is, deciding what represents a maximal or sustained effort is often discussed, and generally has fallen into discussions about intensity and FTP or Critical Power. These discussions have tended to then focus on trying to account for the interval duration, periods of freewheeling and applying smoothing etc.

But we already have an excellent description of what constitutes a maximal effort. It is the primary purpose of any power duration model.

Power duration models estimate the maximal effort you can sustain for any given duration through to exhaustion. So if you want to identify maximal efforts its your friend.

Using the model below we can see, for example, that the athlete it represents could sustain 300 watts for 10 minutes before having to stop with exhaustion. So a time-to-exhaustion (TTE) of 10 minutes at 300 watts.

A sustained interval, is going to be similar; it will lie on or near the points on this curve.

And so, the problem of finding

*maximal*efforts is quite easy to define (we will come back to sustained intervals at the end):

Identify intervals of any duration whose average power is equal to (or higher) than the estimate from your chosen power duration model.

### Solving the problem

The problem is not difficult to solve. You could simply iterate over an entire ride of data from beginning to the end, computing the average power for every possible interval length from here to the end of the ride and seeing if it equals or betters your PD model estimate.

The code would look something like this:

010 for i=0 to end_of_ride

020 interval_total = 0;

030 for j=0 to ( end_of_ride - i ) {

040 interval_total += power_at_time[i+j]

050 interval_average = interval_total / j;

060 if (interval_average >= model_estimate(j) ) {

070 output "Max Effort at" i "of" interval_average

090 }

100 }

080 }

This approach is described as taking O(n

So the real problem to solve is how we can find these maximal efforts quickly.

The next problem to consider is, as shown in the case above, the situation where we have sustained efforts embedded within a longer one. It shows that whilst out doing a hearty 2hr tempo ride we perform a long vo2max interval of 6 minutes that also contains a near maximal sprint. Do select the best peak value (the vo2max effort in the middle) or are there actually 3 sustained efforts ?

That's for another blog post ...

080 }

This approach is described as taking O(n

^{2}) time i.e. the length of time it takes to run grows as a square of the input. This is because we run over the ride and at every point run over every point of the ride. A 1 hour (3600s) ride is 3600 x 3600 iterations. Thats a lot of iterations and realistically that is not practical to do for all rides, especially as they get longer. Users get impatient and don't like the fan whirring on their computers!So the real problem to solve is how we can find these maximal efforts quickly.

## An example

Lets use this example ride.

I rode at a moderate intensity of about 200w for 30 minutes then turned up the gas for 10 minutes at 300w. It turns out, according to the model above, that that 300w was a maximal effort. So we will try and find it.

## Monotonicity

One of the really useful aspects of power (and heartrate) data is that it is impossible to record negative values. You are either generating power, or you are not. If we convert power to energy (joules, which is calculated as power x time) then we can accumulate that over the course of a ride.

In the chart above you will see energy slowly increasing over time as the ride progresses. Indeed the interval we are looking for, starts at 1800s where 360kj of energy has been accumulated and ends at 2400s when a further 180kj of energy has been accumulated.

If we create an array of the accumulated energy throughout the ride we will be able to calculate the average power for any given interval with a single subtraction !

This insight was highlighted by Mark Rages in 2011 and forms the basis for a similar algorithm that finds best intervals for all durations in a ride.

## Maximal efforts fall on the PD curve

Looking at the chart below, you can see that if we compare the PD estimate in joules with the accumulated energy from any given point in a ride, the maximal effort will be where the two curves intersect (I have drawn the PD model estimate as a parabola as an aid to explain the algorithm, in reality it is linear and quite steep)

So we have now redefined the problem; we will look through a ride, from beginning to end, and at each point we will look for the interval that has an accumulated energy that is equal to (or higher) than the PD model estimated joules for that duration.

## Scan back not forward

Clearly the trick above to compute the interval average with a single subtraction is going to be significantly faster than computing each one by iterating over the data. But it will still mean iterating for every interval length possible at every point in the ride.

So my insight into solving this problem was to start at the maximum interval length we are looking for and scan backwards. But crucially, we are going to exploit the monotonic nature of the data to skip those intervals that cannot possibly be maximal.

We are going to repeat the following process at every point within the ride. In the example above we are looking at point n=1800s which is, by no coincidence, the starting point for the maximal effort in our example.

First we compute the total joules for a 1hr (3600s) interval using the subtraction trick from above. Next, using our chosen power duration model we calculate the duration it would take to deplete the energy in a maximal effort.

In our case, over that 3600s we might have accumulated say, 360 kJ. The PD model tells us that a maximal effort of 2000 seconds would burn 360 kJ *. This is considerably shorter than the 3600s we are currently looking at.

Now, because the accumulated series is monotonic we know that it is impossible for any values between 2000s and 3600s to be higher than the value at 3600s. So we can skip back to an interval of length 2000s and not bother checking intervals of length 2000-3600s from this point in the ride.

As we repeat this scan back we jump back to 800s and more until eventually we find our maximal effort; the TTE returned by the PD model at 600s is equal to the 600s length of the interval we are at. So it is a maximal effort.

So, in our example above, we have found the maximal interval starting from 1800s into the ride in about 4 iterations. This will be repeated at every point in the ride, but typically we will only need to iterate a handful of times at each point in the ride.

In computational terms, we have moved from an algorithm of O(n

In reality, this new approach performs dramatically faster for rides where the rider doesn't do any sustained efforts, because the scan back jumps back in a couple of iterations. When a ride contains lots of hard efforts then it can take longer. But in my experience so far the worst case is typcially 1:100 times faster than the brute force approach and the best case is 1:10000 times faster. It's a pretty neat trick.

In computational terms, we have moved from an algorithm of O(n

^{2}) time to one that iterates over all points but then only looks at a subset, more formally to a runtime of O(n log n).In reality, this new approach performs dramatically faster for rides where the rider doesn't do any sustained efforts, because the scan back jumps back in a couple of iterations. When a ride contains lots of hard efforts then it can take longer. But in my experience so far the worst case is typcially 1:100 times faster than the brute force approach and the best case is 1:10000 times faster. It's a pretty neat trick.

* converting energy to a TTE duration means we take the power duration model formula and solve for time rather than for power. For example, the Monod and Scherrer formula is P(t) = W'/t + CP. If we convert that to joules we get Joules(t) = (W'/t + CP) * t. We can then take that and solve for t, in which case we end up with a rather elegant (and fast) t = (Joules - W') / CP.

## But, the model is always wrong

Now, of course, in the real world the model is generally wrong. It is quite common to see efforts that exceed the estimates a fair amount. With training we get stronger and we don't always update our model parameters (CP, W' etc) to keep up.

So we end up with the situation below:

You can see that the model estimate is much lower than the actual performance. So we end up with a range of intervals from 400-600s long that are 'maximal'.

Now as it stands, our algorithm will stop when it finds the maximal effort at 600s, but if you look closely you will see that the accumulated energy from the "peak" doesn't rise. In other words, the 600s interval has a few zeroes at the end of it. From a user perspective this is quite odd.

So to address this, we cannot stop once we hit an interval that is TTE, we must keep going back, 1 second at a time, until we are no longer a TTE effort. As we do this we can calculate the 'quality' of the interval in relationship to the PD model and only keep the interval at that "peak".

Once we get to 400s we will iterate back to 0 quite quickly and can stop searching.

### Finally, Sustained intervals

So now we can apply exactly the same algorithm to find sustained intervals. Any 'near' maximal effort is clearly a sustained effort and we just need to define what we mean by 'near'. And for cases where the model is wrong, but predicting a slightly higher TTE than we are capable of it is very useful to seek values that are 'near' to the estimate.

By applying a bit of a tolerance we can account for when the model is over and under estimating the TTE we are capable of.

By applying a bit of a tolerance we can account for when the model is over and under estimating the TTE we are capable of.

In the case of the implementation in GoldenCheetah, we have chosen to use 85% of maximal as a sustained effort and this seems to align to most user expectations so far.

We may tweak it later, but for now it seems to be working well and will be released alongside v3.2 in the next couple of months.

### Multiple Overlapping Efforts

The next problem to consider is, as shown in the case above, the situation where we have sustained efforts embedded within a longer one. It shows that whilst out doing a hearty 2hr tempo ride we perform a long vo2max interval of 6 minutes that also contains a near maximal sprint. Do select the best peak value (the vo2max effort in the middle) or are there actually 3 sustained efforts ?

That's for another blog post ...

## Saturday, October 11, 2014

### W'bal optimisation by a mathematician !

So the integral computation for W'bal was expensive.

I tried to optimise from a domain and programming perspective, where Dave Waterworth, a mathematician found a much more elegant and fast reformulation.

I posted a comment on you Blog post on optimising the Wbal model. I've done some more thinking and I defn think it can be done without visiting the previous samples as the Skiba formula can be decomposed further, i.e. From your blog I believe the integral part of the equation is:

Basically this takes a weighted sum of preceding W'exp samples where the weight decays at a rate determined by tau, older samples are weighted less than newer ones. We can approximate as a sum provided tau is large compared to Ts (the sample rate):

Basic properties of exponential functions allow the formula to be rearranged as below. Written this way, the integral can be computed as a running sum rather than an inner / out loop.

i.e. Decompose into:

The sum can be updated at each time step, then the integral calculated very efficiently.

The same logic can be applied if the time step isn’t uniform as the as the original integral can be rearranged before converting to discrete form – you’d just need to use a numerical integration technique which can handle variable step length (I'd imagine the above will be fine, just vary Ts).

I tried to optimise from a domain and programming perspective, where Dave Waterworth, a mathematician found a much more elegant and fast reformulation.

**This means W'bal can EASILY be computed as you ride.****To explain the math here are his words;**

I posted a comment on you Blog post on optimising the Wbal model. I've done some more thinking and I defn think it can be done without visiting the previous samples as the Skiba formula can be decomposed further, i.e. From your blog I believe the integral part of the equation is:

Basically this takes a weighted sum of preceding W'exp samples where the weight decays at a rate determined by tau, older samples are weighted less than newer ones. We can approximate as a sum provided tau is large compared to Ts (the sample rate):

Basic properties of exponential functions allow the formula to be rearranged as below. Written this way, the integral can be computed as a running sum rather than an inner / out loop.

i.e. Decompose into:

The sum can be updated at each time step, then the integral calculated very efficiently.

The same logic can be applied if the time step isn’t uniform as the as the original integral can be rearranged before converting to discrete form – you’d just need to use a numerical integration technique which can handle variable step length (I'd imagine the above will be fine, just vary Ts).

## Saturday, July 12, 2014

### W'bal its implementation and optimisation

So, the implementation of W'bal in GoldenCheetah has been a bit of a challenge.

In 1965 two scientists Monod and Scherrer presented a ‘Critical Power Model’ where the Critical Power of a muscle is defined as ‘the maximum rate of work that it can keep up for a very long time without fatigue’. They also proposed an ‘energy store’ (later to be termed W’, pronounced double-ewe-prime) that represented a finite amount of work that could be done above that Critical Power.

In the chart above the solid red curved line represents the best power you can perform for a given duration; a so-called 'power-duration' curve. As time gets longer to the right, the power you can sustain tends towards the asymptote 'CP'. But at shorter durations we work at power above CP, and to do this we use a finite capacity (sometimes called the Anaerobic Work Capacity, AWC). It is in the region of 10-40 kJ, mine for example is typically 23 kJ. The harder we go, the quicker it is used up hence the shape of the curve above.

In cycling parlance W’ would be referred to as the matchbook– the harder you go the quicker it will be used up, but temper your efforts and you can ‘save a match’ for the last sprint. CP, on the other hand, is that intensity (or power output) where you are uncomfortable but stable, akin to your TT pace. You know that if you try to go any harder you are gonna blow up pretty quickly.

Monod and Scherrer also provided a mathematical formula to estimate the maximum power you can go for any given duration using W’ and CP as parameters. This formula is pretty reliable for durations between 2 and 20 minutes or so, but less so over short and longer durations. And so, over the last 50 years, variations of these models have been developed to address this, and it continues to be a topic of great scientific interest.

Now, we know from the Critical Power model that when we work above CP we start eating into our limited W’ stores. If we keep going hard enough for long enough we will blow when it’s all gone. But, we also know that it will also be replenished over time too.

When we work below CP the energy stores within the muscles are restocked. The further below CP we are the faster we will recover, and for the first 30 seconds of recovery we get the most bang for buck as blood-flow into the muscles is still high from the previous bout.

Phil Skiba along with his colleagues Weerapong Chidnok, Anni Vanhatalo, and Andrew Jones derived a formula for tracking the levels of W’ remaining during intermittent exercise, called W’bal, that we can plot alongside power. It is particularly useful for assessing workouts for likely failure and reviewing and comparing intervals within a single workout, even when they are of differing durations.

It is likely that in the near future you will see W’bal appear on bike computer headunits to show you the capacity remaining as you race.

It's largely comprised of a bunch of constants as you can see, 546, 316 and 0.01 are all derived from experiment and test performed during the research. The only moving part is DCP -- this is basically how many watts below CP you recovered at. It is computed as the difference between your CP and the average of all work performed below CP.

It is trivial to calculate since the only thing we need to compute from a ride is just the average power below CP.

This equation is basically saying that to compute the W'bal for a specific point in time (t) then we take the starting W' (remember we all have a CP and W', my bests are 275 and 23kJ respectively) and subtract an integral.

That integral is the sum of all values from time 0 to the point in time (t) that we are computing W'bal for i.e. we need to compute a sum of some formula from the beginning of the ride to this point

The formula we apply at each point in time itself is pretty simple; it is the W' bal expended at each point in time with an exponential decay applied, i.e. W' is recovering exponentially since the point it was used.

So, if at the start of a ride I was 100w above CP (and I am using 1 second samples) then in that 1 second I expended (100w x 1s) 100 joules. So W'EXP at time 0 is 100. At each point in time after that effort it will have recovered a little, exponentially. This is calculated as applying an exponential decay taking into account how long after the sample I am (t-u).

So 10 seconds after that 100 joules effort I will have recovered 100 x (e^-(10)/tau) joules -- even though at that 10 second point I might still be working above CP !

But of course this is only right because the recovery time 'tau' also takes into account the overall recovery rate (it is computed from the

So -- to compute an entire data series of W'bal for a ride from start to finish we will need to compute an integral for every single point in the ride. That is very expensive in computational terms !

The trouble is, computing integrals of exponential recoveries is not something that Excel is particularly good at. Dr Skiba had clearly needed to jump through a few hoops just to get Excel to do what he wanted. I would share the original sheet, but it contains a few things that cannot be shared.

To give you a clue though, it computed a decay factor as a column of 1200 data points and then computed a given point in time by using an Excel function SUMPRODUCT on the ride data, in reverse order.

I then took that tortuous approach and reimplemented it in C++, including the temporary data series and a SUMPRODUCT function. The results were predictably poor from a performance perspective but at least we had an implementation that worked !

So in order that the whole of GoldenCheetah wasn't slowed to a crawl computing it I isolated the code in its own C++ class (WPrime) and only accessed it's data when I really needed it (i.e. the user has requested it is displayed on a plot.

I didn't add any metrics (because it would mean W'bal would need to be computed every time we wanted one of the metrics) and bought myself a book on calculus (and for other reasons, linear algebra) with a view to reimplementing more efficiently (or at least reimplementing the SUMPRODUCT function).

First of all, the 1200 points back is arbitrary. Tau might be 60 or it might be 500. Secondly it means we effectively loop 1200 for every second in a ride, which is typically 4000 - 10000 samples. So in total it is 1200 x 10000 or 12 million times !

So the most obvious thing to do was to split the computation into separate threads; but since we need to support low end netbooks as well as desktop workstations I took the (lazy) approach of just splitting into two threads, a and b. Thread a works on the first half of the ride, and thread b works on the second half of the ride. It had a pretty immediate effect.

Secondly I turned the algorithm on its head. Instead of looking backward from time t and summing the decay of all prior points to get the value for point t I did the opposite. I took the point at which W' was depleted and computed how much it was recovered for every point in time into the future, summing them as I added to them.

That point needs to be explained.

On a typical ride, lets say a tempo ride for 2 hours, you will go above CP on relatively few occasions. Even a TT will hover above and below CP. So typically you will spend a lot less time above CP than at it or below it. But the earlier implementation didn't care, it summed up for all prior points regardless of whether there was depletion or not.

By turning it round, in a ride of say 10000 samples, there will typically only be 500 where W' is depleted .. so we only need to compute the replenishment for those samples. We thus have 500x1200 loops instead of the previous 12 million. Quite a saving. But this only works because we take into account what rides typically look like, i.e. it is a domain specific optimisation.

Lastly, I looked at the 1200 loops. It was arbitrary. We know what the decay half-life is, because we computed it ! Tau tells us how many seconds before we have recovered to half. So instead of looping for 1200 times I changed it to loop to Tau * 3; but that would only be helpful for riders with fast recovery .. us slow MAMILs may have a 500s recovery, so this would make it take longer.

So as well as limiting the loop based upon the tau calculated I also looked at bounding based upon the value left. Since we are working in numbers like 10000 - 45000 joules we're not too fussed when we're down to increments of 10s of joules. So I bounded the looping to stop when we were left with less than 10 joules of W' being computed. Compared to the fixed loops that saved over 20% on previous runtimes.

So now, we have an implementation that is typically 20x faster than the original. It will now compute W'bal for 500 1hr+ rides in less than 30 seconds on an dual-core 2009 vintage Macbook Pro.

Well, somewhere around Spring of this year Andy Froncioni and Dave Clarke both came up with a differential form of the formula, where "the exponentials dropped out".

This differential form didn't need tau since it "took into account how low you are below CP implicitly". BUT, we know that there is a big difference between athletes in tau and the differential form has an implicit Tau = W' / CP. This will typically result in a tau of 60-100s as opposed to the circa 300-400 values I was seeing for my own personal data. For some, highly trained, riders this will not be an issue.

However, it shows great promise and no doubt will be adapted to address this and other problems in time. We have implemented it in GoldenCheetah and it can be selected in preferences.

I spent some time this weekend converting the code to use openCL - this uses the GPUs in most modern computers to perform parallel computation. It is unbelievably powerful and with high end graphics cards can perform vectorized functions very, very quickly.

In fact, OpenCL will be a focus in v3.2 of GoldenCheetah as we move away from computing ride metrics using the CPU to make metric computation real-time and move away from the 'Updating Statistics' dialog we have all come to endure.

One of the first candidates for optimisation via OpenCL is clearly the WPrime class.

### The Science

I wanted to explain what we've done and how it works in this blog post, but realised that first I need to explain the science behind W'bal, W' and CP.#### W' and CP

How hard can you go, in watts, for half an hour is going to be very different to how hard you can go for say, 20 seconds. And then thinking about how hard you can go for a very long time will be different again. But when it comes to reviewing and tracking changes in your performance and planning future workouts you quickly realise how useful it is to have a good understanding of your own limits.In 1965 two scientists Monod and Scherrer presented a ‘Critical Power Model’ where the Critical Power of a muscle is defined as ‘the maximum rate of work that it can keep up for a very long time without fatigue’. They also proposed an ‘energy store’ (later to be termed W’, pronounced double-ewe-prime) that represented a finite amount of work that could be done above that Critical Power.

Monod and Scherrer CP Model |

In cycling parlance W’ would be referred to as the matchbook– the harder you go the quicker it will be used up, but temper your efforts and you can ‘save a match’ for the last sprint. CP, on the other hand, is that intensity (or power output) where you are uncomfortable but stable, akin to your TT pace. You know that if you try to go any harder you are gonna blow up pretty quickly.

Monod and Scherrer also provided a mathematical formula to estimate the maximum power you can go for any given duration using W’ and CP as parameters. This formula is pretty reliable for durations between 2 and 20 minutes or so, but less so over short and longer durations. And so, over the last 50 years, variations of these models have been developed to address this, and it continues to be a topic of great scientific interest.

#### W' bal

Unless we’re riding the pursuit or a very flat time trial, when we train and race we tend to ride sustained efforts interspersed with recovery. These intermittent bouts might occur when we climb a hill, or sprint out of a corner or bridge a gap. In fact almost all training and racing away from the turbo tends to be variable because of this.Now, we know from the Critical Power model that when we work above CP we start eating into our limited W’ stores. If we keep going hard enough for long enough we will blow when it’s all gone. But, we also know that it will also be replenished over time too.

When we work below CP the energy stores within the muscles are restocked. The further below CP we are the faster we will recover, and for the first 30 seconds of recovery we get the most bang for buck as blood-flow into the muscles is still high from the previous bout.

W'bal for an evenly paced 2x20 workout |

Phil Skiba along with his colleagues Weerapong Chidnok, Anni Vanhatalo, and Andrew Jones derived a formula for tracking the levels of W’ remaining during intermittent exercise, called W’bal, that we can plot alongside power. It is particularly useful for assessing workouts for likely failure and reviewing and comparing intervals within a single workout, even when they are of differing durations.

It is likely that in the near future you will see W’bal appear on bike computer headunits to show you the capacity remaining as you race.

Now everyone is different. It appears that highly trained individuals recover their W' much faster than detrained individuals. This needs to be taken into account. And so as well as the recovery mechanism there is also a formula to determine the recovery timescale in seconds, this is computed based upon all the data in the file, but is specific to an individual and not a workout. So in theory, once you know your personal 'tau' it should be used for all analysis -- but of course, as W' and CP change, no doubt, so does your recovery period 'tau'.

### The Maths

There are two parts; working out what tau is then computing W'bal as a data series for a workout.#### Tau

The formula for computing tau is easy, so lets start there:It's largely comprised of a bunch of constants as you can see, 546, 316 and 0.01 are all derived from experiment and test performed during the research. The only moving part is DCP -- this is basically how many watts below CP you recovered at. It is computed as the difference between your CP and the average of all work performed below CP.

It is trivial to calculate since the only thing we need to compute from a ride is just the average power below CP.

#### W'bal

Mathematically it is relatively simple to describe:W'bal formula from Skiba et al |

This equation is basically saying that to compute the W'bal for a specific point in time (t) then we take the starting W' (remember we all have a CP and W', my bests are 275 and 23kJ respectively) and subtract an integral.

That integral is the sum of all values from time 0 to the point in time (t) that we are computing W'bal for i.e. we need to compute a sum of some formula from the beginning of the ride to this point

*just to work out what W'bal is at one point in time*.The formula we apply at each point in time itself is pretty simple; it is the W' bal expended at each point in time with an exponential decay applied, i.e. W' is recovering exponentially since the point it was used.

So, if at the start of a ride I was 100w above CP (and I am using 1 second samples) then in that 1 second I expended (100w x 1s) 100 joules. So W'EXP at time 0 is 100. At each point in time after that effort it will have recovered a little, exponentially. This is calculated as applying an exponential decay taking into account how long after the sample I am (t-u).

So 10 seconds after that 100 joules effort I will have recovered 100 x (e^-(10)/tau) joules -- even though at that 10 second point I might still be working above CP !

But of course this is only right because the recovery time 'tau' also takes into account the overall recovery rate (it is computed from the

*entire ride*) and so although we are computing recovery at a point in time when we might be working above CP still it all balances out in the end, or at least that is the theory.So -- to compute an entire data series of W'bal for a ride from start to finish we will need to compute an integral for every single point in the ride. That is very expensive in computational terms !

### The Implementation

#### Part One: Copying Excel

Initially I was handed an implementation by Dr Skiba in Excel. And given it was the reference I needed to implement it like for like.The trouble is, computing integrals of exponential recoveries is not something that Excel is particularly good at. Dr Skiba had clearly needed to jump through a few hoops just to get Excel to do what he wanted. I would share the original sheet, but it contains a few things that cannot be shared.

To give you a clue though, it computed a decay factor as a column of 1200 data points and then computed a given point in time by using an Excel function SUMPRODUCT on the ride data, in reverse order.

I then took that tortuous approach and reimplemented it in C++, including the temporary data series and a SUMPRODUCT function. The results were predictably poor from a performance perspective but at least we had an implementation that worked !

So in order that the whole of GoldenCheetah wasn't slowed to a crawl computing it I isolated the code in its own C++ class (WPrime) and only accessed it's data when I really needed it (i.e. the user has requested it is displayed on a plot.

I didn't add any metrics (because it would mean W'bal would need to be computed every time we wanted one of the metrics) and bought myself a book on calculus (and for other reasons, linear algebra) with a view to reimplementing more efficiently (or at least reimplementing the SUMPRODUCT function).

#### Part Two: Reimplementing, Threads & Domain specific optimisation

So after examining the code I realised that the sumproduct code could be replaced. It computes the W'bal exactly as the formula suggests by looking back 1200 points and applying the replenishment exponential decay to every point and summing them for the current point in time.First of all, the 1200 points back is arbitrary. Tau might be 60 or it might be 500. Secondly it means we effectively loop 1200 for every second in a ride, which is typically 4000 - 10000 samples. So in total it is 1200 x 10000 or 12 million times !

So the most obvious thing to do was to split the computation into separate threads; but since we need to support low end netbooks as well as desktop workstations I took the (lazy) approach of just splitting into two threads, a and b. Thread a works on the first half of the ride, and thread b works on the second half of the ride. It had a pretty immediate effect.

Secondly I turned the algorithm on its head. Instead of looking backward from time t and summing the decay of all prior points to get the value for point t I did the opposite. I took the point at which W' was depleted and computed how much it was recovered for every point in time into the future, summing them as I added to them.

**I only needed to loop from where W' had been depleted**.That point needs to be explained.

On a typical ride, lets say a tempo ride for 2 hours, you will go above CP on relatively few occasions. Even a TT will hover above and below CP. So typically you will spend a lot less time above CP than at it or below it. But the earlier implementation didn't care, it summed up for all prior points regardless of whether there was depletion or not.

By turning it round, in a ride of say 10000 samples, there will typically only be 500 where W' is depleted .. so we only need to compute the replenishment for those samples. We thus have 500x1200 loops instead of the previous 12 million. Quite a saving. But this only works because we take into account what rides typically look like, i.e. it is a domain specific optimisation.

Lastly, I looked at the 1200 loops. It was arbitrary. We know what the decay half-life is, because we computed it ! Tau tells us how many seconds before we have recovered to half. So instead of looping for 1200 times I changed it to loop to Tau * 3; but that would only be helpful for riders with fast recovery .. us slow MAMILs may have a 500s recovery, so this would make it take longer.

The fastest code is always the simplest ! |

So as well as limiting the loop based upon the tau calculated I also looked at bounding based upon the value left. Since we are working in numbers like 10000 - 45000 joules we're not too fussed when we're down to increments of 10s of joules. So I bounded the looping to stop when we were left with less than 10 joules of W' being computed. Compared to the fixed loops that saved over 20% on previous runtimes.

So now, we have an implementation that is typically 20x faster than the original. It will now compute W'bal for 500 1hr+ rides in less than 30 seconds on an dual-core 2009 vintage Macbook Pro.

#### Part Three : Where Next ?

**Differential Equation**Well, somewhere around Spring of this year Andy Froncioni and Dave Clarke both came up with a differential form of the formula, where "the exponentials dropped out".

This differential form didn't need tau since it "took into account how low you are below CP implicitly". BUT, we know that there is a big difference between athletes in tau and the differential form has an implicit Tau = W' / CP. This will typically result in a tau of 60-100s as opposed to the circa 300-400 values I was seeing for my own personal data. For some, highly trained, riders this will not be an issue.

However, it shows great promise and no doubt will be adapted to address this and other problems in time. We have implemented it in GoldenCheetah and it can be selected in preferences.

**Speed up with OpenCL**I spent some time this weekend converting the code to use openCL - this uses the GPUs in most modern computers to perform parallel computation. It is unbelievably powerful and with high end graphics cards can perform vectorized functions very, very quickly.

In fact, OpenCL will be a focus in v3.2 of GoldenCheetah as we move away from computing ride metrics using the CPU to make metric computation real-time and move away from the 'Updating Statistics' dialog we have all come to endure.

One of the first candidates for optimisation via OpenCL is clearly the WPrime class.

**UPDATE October 2014 : Dave Waterworth has adjusted the original formula to remove the cost of integral computation making W'bal computable in a single pass. It means this is now super fast and ready to go onto a headunit.**## Friday, July 04, 2014

### Version 3.1 with Compare Mode

A new release candidate should come out shortly .. a year on and GoldenCheetah is now a fairly good platform for Cycling Analysis.

I've recorded a video to show the main stuff ..

.. just need to find the motivation to actually ride my bike !

I've recorded a video to show the main stuff ..

What's New in GoldenCheetah Version 3.1 from mark liversedge on Vimeo.

.. just need to find the motivation to actually ride my bike !

Subscribe to:
Posts (Atom)