Big Data/Analytics Zone is brought to you in partnership with:

Arthur Charpentier, ENSAE, PhD in Mathematics (KU Leuven), Fellow of the French Institute of Actuaries, professor at UQàM in Actuarial Science. Former professor-assistant at ENSAE Paritech, associate professor at Ecole Polytechnique and professor assistant in economics at Université de Rennes 1. Arthur is a DZone MVB and is not an employee of DZone and has posted 159 posts at DZone. You can read more from them at their website. View Full User Profile

# Smoothing Mortality Rates in R

11.14.2013
| 2341 views |

Recently, I was working with Julie, a student of mine, coming from Rennes, on mortality tables. Actually, we work on genealogical datasets from a small region in Québec, and we can observe a lot of volatiliy. If I borrow one of her graphs, we get something like

Since we have some missing data, we wanted to use some Generalized Nonlinear Models. So let us see how to get a smooth estimator of the mortality surface.  We will write some code that we can use on our data later on (the dataset we have has been obtained after signing a lot of official documents, and I guess I cannot upload it here, even partially).

```DEATH <- read.table(
"http://freakonometrics.free.fr/Deces-France.txt",
"http://freakonometrics.free.fr/Exposures-France.txt",
library(gnm)
D=DEATH\$Male
E=EXPO\$Male
A=as.numeric(as.character(DEATH\$Age))
Y=DEATH\$Year
I=(A<100)
base=data.frame(D=D,E=E,Y=Y,A=A)
subbase=base[I,]
subbase=subbase[!is.na(subbase\$A),]```

The first idea can be to use a Poisson model, where the mortality rate is a smooth function of the age and the year, something like

$D_{x,t}\sim\mathcal{P}(E_{x,t}\cdot \exp[{\color{blue}s(x,t)}>)$

```library(mgcv)
regbsp=gam(D~s(A,Y,bs="cr")+offset(log(E)),data=subbase,family=quasipoisson)
predmodel=function(a,y) predict(regbsp,newdata=data.frame(A=a,Y=y,E=1))
vX=trunc(seq(0,99,length=41))
vY=trunc(seq(1900,2005,length=41))
vZ=outer(vX,vY,predmodel)
ylab="Years (1900-2005)",zlab="Mortality rate (log)")```

The mortality surface is here

It is also possible to extract the average value of the years, which is the interpretation of the $a_x$ coefficient in the Lee-Carter model,

```predAx=function(a) mean(predict(regbsp,newdata=data.frame(A=a,
Y=seq(min(subbase\$Y),max(subbase\$Y)),E=1)))
plot(seq(0,99),Vectorize(predAx)(seq(0,99)),col="red",lwd=3,type="l")```

We have the following smoothed mortality rate

Recall that the Lee-Carter model is

$D_{x,t}\sim\mathcal{P}(E_{x,t}\cdot \exp[{\color{blue}a_x+b_x\cdot k_t}>)$

where parameter estimates can be obtained using

```regnp=gnm(D~factor(A)+Mult(factor(A),factor(Y))+offset(log(E)),
data=subbase,family=quasipoisson)```
```predmodel=function(a,y) predict(regnp,newdata=data.frame(A=a,Y=y,E=1))
vZ=outer(vX,vY,predmodel)
ylab="Years (1900-2005)",zlab="Mortality rate (log)")```

The (crude) mortality surface is

with the following $a_x$ coefficients.

`plot(seq(1,99),coefficients(regnp)[2:100],col="red",lwd=3,type="l")`

Here we have a lot of coefficients, and unfortunately, on a smaller dataset, we have much more variability. Can we smooth our Lee-Carter model? To get something which looks like

$D_{x,t}\sim\mathcal{P}(E_{x,t}\cdot \exp[{\color{blue}s_a(x)+s_b(x)\cdot s_k(t)}>)$

Actually, we can, and the code is rather simple:

Published at DZone with permission of Arthur Charpentier, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

"Starting from scratch" is seductive but disease ridden