## Thursday, September 13, 2012

### Online Population Projection

For some reason, this came to my attention. Because, math was wrong? What?

Ben Foster anticipated Facebook's billionth user would login last month, in August 2012. He wasn't alone. Here was NBC News' technology blog, reporting on a prediction by iCrossing:

And Bloomberg Businessweek would only call it for "later this year:"

Which all seem to presume linear growth from here on out. Indefinitely? I don't know. Which just makes me wonder, to what extent can we apply usual population growth-type logic to online populations? If Facebook were growing within an environment with biological limiting factors, we would have expected what we've already seen, for example, exponential growth at first. For quite a while, Facebook was growing at a lovely exponential clip of approximately 10% a month. This shows their growth for December 2004 through December 2009, with an exponential regression to fit:

However, maintaining that growth rate was clearly unsustainable. If it were, Facebook's population would have reached seven billion In June of 2012. Obviously, that didn't happen.

In biological populations with finite space and resources, we expect growth that looks exponential at first, but due to limiting factors, levels off eventually. And, indeed, Facebook's growth did not continue exponentially.

And I suppose it has looked rather linear for a while, but I'm not sure that's the best model. The rate of increase has slightly decreased the past year or so (shout-out to the second derivative!)

If we apply a logistic model to the data so far, we get:

Which has Facebook reaching a billion users in April of 2013, and predicts its eventual population will top out at less than 1.1 billion.

But this all raises more questions than it resolves. Facebook may be approaching its maximum realistic number of users within the United States. However, as far as I understand, it has lots of room to grow in other huge markets. So this logistic growth model is flawed as well. I'll cop to not understanding how graphing calculators come up with logistic regression equations, like, at all. At least not with nearly the depth that I understand how they calculate linear regression equations. I simply know how to apply it as a blunt instrument to a table of values. On the other hand, linear growth, as the news organizations have used, has not panned out - as we're past August 2012, and have not reached a billion users yet. Have worldwide, internet ecosystem limiting factors unavoidably kicked in already? Should we expect another period of exponential growth in the future, if it catches on in India? Are there reasons to think this linear-looking growth will continue for a long time? And for how long? I don't know if any of these are answerable! But I do love the questions.

Here is a Geogebra file, if you want to play.

Matthew Sauter said...

This subject matter is something most if not all of my students could relate to. Feel like it is a good discussion point about the limitations of growth models. It would interesting to get the break down by country and compare the population to the amount of people on facebook and find a correlation in internet access to better predict what could happen in the near future.

Tom Fiddaman said...

I did some modeling of generalized logistic growth of Facebook & Groupon, with models available here: http://models.metasd.com/facebook-valuation-with-a-logistic-model/ (runnable with free versions of Vensim). Discussion of the models is here: http://blog.metasd.com/?s=facebook

mrwardteaches said...

This is an awesome find. So awesome I literally implanted it into my "Parent's Back to School" talk tonight just an hour before showtime.

In my Algebra classes I played first act videos and talked about how we can take what's in the textbook and create rich problem solving experiences.

I didn't have anything decent for my analysis class - until I saw this. I threw it in there, talked about how the textbook gives us Denver's logistic growth, but how we can instead debate and discuss those models with Facebook's growth. There were only three parents present, but I think it definitely came across as a refreshing take!

Now I just have to hope FB doesn't hit the Billion mark before I get to exponential growth!

gasstationwithoutpumps said...

I think that one of the common methods for fitting logistic models is a general curve-fitting method:
http://en.wikipedia.org/wiki/Levenberg%E2%80%93Marquardt_algorithm

thescamdog said...

This one is fantastic and timely (for me). I was just working on a population growth task today. This one is much more compelling.

Tom Fiddaman said...

Levenberg-Marquardt works for fitting, as does pretty much any hill-climbing method (I used a Powell search in my link above).

There's also a pencil and paper way to fit the logistic. Plot the growth rate against the population, and you get a downward-sloping line, starting from a maximum growth rate that prevails for small population. The point at which that line intersects the population axis yields an estimate of the carrying capacity (population @ which growth rate = 0).

This is easy to do by hand. It's not an unbiased estimate, but it's fine as long as there's not too much noise. As I recall, this is how M. King Hubbert estimated the logistic for his famous prediction of a peak in US oil production.

Sue VanHattum said...

Tom, could you point me to a longer, more detailed explanation of that?

Tuomas said...

Note though that your fit is VERY sensitive to that last point. Moving it slightly back/forth will have a huge effect on the predicted number of users --> since you don't actually know that the number of users obeys this formula you can't yet reliably predict it with that accuracy.

Kate Nowak said...

So, the last point /is/ the prediction. I should have made it a different color or something, sorry. Is that what you're talking about. I mean, otherwise, I think I pretty much agree with you. I think I was pretty clear that there's no reason to be confident about Apr '13.

Tom Fiddaman said...

Sue - were you looking for a reference to Hubbert, or the fitting procedure?

Sue VanHattum said...

Fitting procedure. But the other would be lovely too.

Tom Fiddaman said...

Here's a bit more:

The logistic equation is

dP/dt = r*P*(K-P)/K

= r*P*(1-P/K)

where P is population, r is a growth rate parameter, and K is carrying capacity (the eventual max population).

So, when P is small, (1-P/K) is approximately 1, and this gives simple exponential growth at the rate r. When P=K, growth stops altogether. There's an inflection point at P/K=1/2, where the behavior changes from growth to saturation (the top of the S-shaped behavior).

So, if you plot the effective growth rate of your data, log(P(t)/P(t-dt))/dt on the y-axis against P on the x-axis, you get a downward sloping line. It intercepts the y axis at r, the max growth rate, and intercepts the x axis at K.

The flies in the ointment are:
- A linear fit to the data might not properly weight the errors for the low-P points vs. high-P points.
- A small uncertainty about the slope of the growth rate line can translate to a substantial uncertainty in K
- The logistic assumption might not be true - the data might represent multiple superimposed logistics (different regions), or the growth rate might be nonlinear (dP/dt = r*P*(1-P/K)^alpha). Or, population could go down (which the logistic can't handle, but seems like a possibility for facebook at some point).

These flies explain why the history of forecasting with the logistic is so spotty. Generally, anything that can be done to put bounds on K from first principles, the better the forecast. For facebook, one might ask how many people in emerging markets have internet connections, for example.

Even in the absence of a good point forecast, merely contemplating that growth can't go on forever is often a useful revelation for people.

Sue VanHattum said...

Thank you! (I need to play with this somehow to understand it. I'd like to do that, but I'm not sure when...)

GregT said...

New Data Point: Oct 4, 2012