### Recommended Links

Like this piece? Share it with your friends:

Recently, in the ACT2040 class (on non-life insurance), we’ve discussed the difference between observable and non-observable heterogeneity in ratemaking (from an economic perspective). To illustrate that point (we will spend more time, later on, discussing observable and non-observable risk factors), we looked at the following simple example. Let

denote the height of a person. Consider the following dataset:

> Davis=read.table(
+ "http://socserv.socsci.mcmaster.ca/jfox/Books/Applied-Regression-2E/datasets/Davis

There is a small typo in the dataset, so let us make manual changes here.

> Davis[12,c(2,3)]=Davis[12,c(3,2)]

Here, the variable of interest is the height of a given person:

> X=Davis$height

If we look at the histogram, we have:

> hist(X,col="light green", border="white",proba=TRUE,xlab="",main="")

Can we assume that we have a Gaussian distribution?

Maybe not … here, if we fit a Gaussian distribution, plot it, and add a kernel based estimator, we get:

> (param <- fitdistr(X,"normal")$estimate)
> f1 <- function(x) dnorm(x,param[1],param[2])
> x=seq(100,210,by=.2)
> lines(x,f1(x),lty=2,col="red")
> lines(density(X))

If you look at that black line, you might think of a mixture, something like:

(using standard mixture notations). Mixtures are obtained when we have a **non-observable heterogeneity factor**: with probability , we have a random variable (call it type [1]), and with probability , a random variable (call it type [2]). So far, nothing new. And we can fit such a mixture distribution, such as:

> library(mixtools)
> mix <- normalmixEM(X)
number of iterations= 335
> (param12 <- c(mix$lambda[1],mix$mu,mix$sigma))
[1] 0.4002202 178.4997298 165.2703616 6.3561363 5.9460023

If we plot that mixture of two Gaussian distributions, we get:

> f2 <- function(x){ param12[1]*dnorm(x,param12[2],param12[4])
+ (1-param12[1])*dnorm(x,param12[3],param12[5]) }
> lines(x,f2(x),lwd=2, col="red") lines(density(X))

Not bad. Actually, we can try to maximize the likelihood with our own codes:

> logdf <- function(x,parameter){
+ p <- parameter[1]
+ m1 <- parameter[2]
+ s1 <- parameter[4]
+ m2 <- parameter[3]
+ s2 <- parameter[5]
+ return(log(p*dnorm(x,m1,s1)+(1-p)*dnorm(x,m2,s2)))
+ }
> logL <- function(parameter) -sum(logdf(X,parameter))
> Amat <- matrix(c(1,-1,0,0,0,0,
+ 0,0,0,0,1,0,0,0,0,0,0,0,0,1), 4, 5)
> bvec <- c(0,-1,0,0)
> constrOptim(c(.5,160,180,10,10), logL, NULL, ui = Amat, ci = bvec)$par
[1] 0.5996263 165.2690084 178.4991624 5.9447675 6.3564746

Here, we include some constraints, to insurance that the probability belongs to the unit interval, and that the variance parameters remain positive. Note that we have something close to the previous output.