life expectancy

Recently I stumbled upon this math problem:
The positive integer N has a finite value but is unknown to us.
We are looking for a function f(X) which minimizes the sum
E = |f(0) - N| + |f(1) - N| + |f(2) - N| + ... + |f(N-1) - N|
for almost all N.

Notice that we do not know the value of N, so f(X) cannot depend on it, which eliminates the trivial solution f(X) = N among others.
In order to illustrate what I am looking for, consider as first example
1] f(X) = 0, which results in E = N2.
However, there is a better solution
2] f(X) = X, which results in E = (N+1)*N/2, which is less for almost all N (except N=1).

Unfortunately, I do not know the best solution f(X) and this is where you are invited to leave a comment to help me out.
But I do have strong evidence (i.e. a numerical test up to large values of N) that
3] f(X) = X + sqrt(X) is an even better solution.

So what is the motivation for this problem and why the title for this blog post?
Consider a process or phenomenon which exists already for X years and we try to estimate the total lifetime N without any further information. So our estimate can only depend on X and we try to find a function which minimizes the total estimation error E as described above; every year we make an estimate f(X) which is wrong by the amount |f(X) - N| and we try to minimize the sum of those errors. In some sense this is a variation of the infamous 'doomsday argument'.

It is now obvious why 1] is a bad solution and 2] is much better. Btw the function f(X) = 2*X would give the same total error E minus a small constant, so whether we assume that the process ends immediately or estimate that it will last twice as long (as it already did) makes no significant difference.

Btw the solution 3] creates a paradox: The best estimate for the life expectancy seems to depend on what units one uses: We get a different result if we calculate with days rather than years.

added later: If I assume f(X) = c*X, perhaps motivated by that paradox, then I can show that c=sqrt(2) minimizes E for large enough N. However, the function
4] f(X) = sqrt(2)*X + sqrt(X) seems to be an even better candidate, but I do not know how to determine a,b to minimize E for f(X) = a*X + b*sqrt(X) or determine the general solution for f(X).

added later: There are better ways to illustrate the problem; e.g. a box contains an unknown number N of candies. You take out one after another and at each step X you have to guess N. At the end there is a penalty proportional to the sum of your wrong guesses.
Perhaps a "deeper" example considers a Turing machine, which performed already X steps and we try to guess after how many steps N it halts.
The reason these examples are "better" is that there is no change of physical units (e.g. from years to days) that would affect N.