Big Data/Analytics Zone is brought to you in partnership with:

Arthur Charpentier, ENSAE, PhD in Mathematics (KU Leuven), Fellow of the French Institute of Actuaries, professor at UQàM in Actuarial Science. Former professor-assistant at ENSAE Paritech, associate professor at Ecole Polytechnique and professor assistant in economics at Université de Rennes 1. Arthur is a DZone MVB and is not an employee of DZone and has posted 160 posts at DZone. You can read more from them at their website. View Full User Profile

Smoothing Mortality Rates in R

11.14.2013
| 2420 views |
  • submit to reddit

Recently, I was working with Julie, a student of mine, coming from Rennes, on mortality tables. Actually, we work on genealogical datasets from a small region in Québec, and we can observe a lot of volatiliy. If I borrow one of her graphs, we get something like

Since we have some missing data, we wanted to use some Generalized Nonlinear Models. So let us see how to get a smooth estimator of the mortality surface.  We will write some code that we can use on our data later on (the dataset we have has been obtained after signing a lot of official documents, and I guess I cannot upload it here, even partially).

DEATH <- read.table(
"http://freakonometrics.free.fr/Deces-France.txt",
header=TRUE)
EXPO  <- read.table(
"http://freakonometrics.free.fr/Exposures-France.txt",
header=TRUE,skip=2)
library(gnm)
D=DEATH$Male
E=EXPO$Male
A=as.numeric(as.character(DEATH$Age))
Y=DEATH$Year
I=(A<100)
base=data.frame(D=D,E=E,Y=Y,A=A)
subbase=base[I,]
subbase=subbase[!is.na(subbase$A),]

The first idea can be to use a Poisson model, where the mortality rate is a smooth function of the age and the year, something like

library(mgcv)
regbsp=gam(D~s(A,Y,bs="cr")+offset(log(E)),data=subbase,family=quasipoisson)
predmodel=function(a,y) predict(regbsp,newdata=data.frame(A=a,Y=y,E=1))
vX=trunc(seq(0,99,length=41))
vY=trunc(seq(1900,2005,length=41))
vZ=outer(vX,vY,predmodel)
persp(vZ,theta=-30,col="green",shade=TRUE,xlab="Ages (0-100)",
ylab="Years (1900-2005)",zlab="Mortality rate (log)")

The mortality surface is here

It is also possible to extract the average value of the years, which is the interpretation of the  coefficient in the Lee-Carter model,

predAx=function(a) mean(predict(regbsp,newdata=data.frame(A=a,
Y=seq(min(subbase$Y),max(subbase$Y)),E=1)))
plot(seq(0,99),Vectorize(predAx)(seq(0,99)),col="red",lwd=3,type="l")

We have the following smoothed mortality rate

Recall that the Lee-Carter model is

where parameter estimates can be obtained using

regnp=gnm(D~factor(A)+Mult(factor(A),factor(Y))+offset(log(E)),
data=subbase,family=quasipoisson)
predmodel=function(a,y) predict(regnp,newdata=data.frame(A=a,Y=y,E=1))
vZ=outer(vX,vY,predmodel)
persp(vZ,theta=-30,col="green",shade=TRUE,xlab="Ages (0-100)",
ylab="Years (1900-2005)",zlab="Mortality rate (log)")

The (crude) mortality surface is

with the following  coefficients.

plot(seq(1,99),coefficients(regnp)[2:100],col="red",lwd=3,type="l")

Here we have a lot of coefficients, and unfortunately, on a smaller dataset, we have much more variability. Can we smooth our Lee-Carter model? To get something which looks like

Actually, we can, and the code is rather simple:

Published at DZone with permission of Arthur Charpentier, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)