James’s Page | Actuary / CExamParaModels

Construction and Selection of Parametric Models (25-30%)

1. Estimate the parameters of failure time and loss distributions

Assuming {$n$} independent observations throughout, label them {$x_1, \ldots, x_n$}. The general problem is: Given a distribution, find the parameters of the distribution.

a) Maximum likelihood

Given a distribution {$X$} with parameter {$\theta$}, construct a likelihood function {$L(\theta)$} and find the value of {$\theta$} that maximizes {$L$}. Or maximizes the loglikelihood function, {$l(\theta) = \log{L(\theta)}$}. More specifically, given a series of events {$A_1, \ldots, A_n$}, where each {$A_n$} is an observed value (or interval), then

{$$L(\theta) = \prod_{j=1}^{n} Prob(X\in A_j | \theta).$$}

When {$A_j$} is a single point observation {$x_j$}, we can estimate {$Prob(X=x_j)$} by the density function {$f(x_j)$}. When {$A_j$} is an interval we can use cdf {$F$}.

If there is more than one parameter, we can maximize the likelihood function by taking partial derivatives.

If the data is grouped into {$(0, c_1], (c_1, c_2], \ldots, (c_k,\infty)$}, and there are {$n_j$} observations in each {$(c_{j-1}, c_j]$}, then

{$$L(\theta) = \prod_{j=1}^{k} [F(c_j) - F(c_{j-1})]^{n_j} \text{ (All cdf values given }\theta).$$}

b) Method of moments

If there are {$p$} parameters, form and solve a system of {$p$} equations by equating the 1st to {$p$}th moment of the data, and of the distribution.

c) Percentile matching

If there are {$p$} parameters, and {$g_1, \ldots, g_p$} are arbitrary values between 0 and 100, form and solve a system of {$p$} equations by equation the {$g_1, \ldots, g_p$}th percentile of the data and of the distribution; more accurately, use smoothed empirical estimate of a {$g$}th percentile: Look at the {$n+1$} intervals formed by {$(-\infty, x_1), (x_1, x_2), \ldots, (x_n, \infty)$}. Determine the interval {$(x_{j}, x_{j+1})$} where the {$g$}th percentile lands in by {$j = \text{greatest integer }<= g/100 \cdot (n+1) $}. Let {$h$} be the leftover decimal part. Then the {$g$}th percentile is a weighted average between {$x_j$} and {$x_{j+1}$} by {$h$}.

d) Bayesian procedures

2. Estimate the parameters of failure time and loss distributions with censored and/or truncated data using maximum likelihood.

If the data is censored at {$c$}, and {$n$} observations are censored, then the factor to use for those observations is {$ [1-F(c)]^n $}.

If the data is truncated at {$t$}, we can either

Use the shift approach: Ignore all observations under {$t$}, and shift all observations above {$t$} by subtracting {$t$} off of every one of them.

Use the unshifted approach: Ignore all observations under {$t$}, and keep original observations {$x_j$} above {$t$}, but instead of {$f(x_j)$}, use {$f(x_j)/(1-F(t))$}, i.e., probability of {$x_j$} given that {$x_j \geq t$}.

3. Estimate the variance of estimators and the confidence intervals for the parameters and functions of parameters of failure time and loss distributions.

We can estimate the variance of the mle (maximum likelihood estimator) by the following (due to a long and complicated theorem): If the underlying density function {$f$} is smooth enough, then {$Var(\hat{\theta}) \approx \frac{1}{I(\theta)}$}, where

{$$I(\theta) = -E\left[ \frac{\partial^2}{\partial \theta^2} l(\theta)\right] = E\left[ \left(\frac{\partial}{\partial \theta} l(\theta)\right)^2\right]$$}

{$I$} is called (Fisher's) information.

The multivariable version of this is: {$$I$$} is an information matrix, where the {$(r,s)$}th element is

{$$I(\theta)_{rs} = -E\left[ \frac{\partial}{\partial \theta_r}\frac{\partial}{\partial \theta_s} l(\theta)\right] = E\left[ \frac{\partial}{\partial \theta_r} l(\theta)\frac{\partial}{\partial \theta_s} l(\theta)\right].$$}

So that the variances of the individual RVs (estimators) are on the diagonal, and the off diagonal entries are the covariances.

Steps for the calculation:

a) Start with a distribution (book example uses lognormal; see here for an example using Bernoulli). Find a formula for the likelihood function {$L$} and the loglikelyhood function {$l$}, in terms of the parameters, the number of sample points {$n$}, and each observed value {$x_1, \ldots, x_n$}.

b) Take the 2nd partial derivatives with respect to each parameter.

c) Find the expected values of the 2nd derivatives (by basically substituting the know expected value of the distribution we started with)

d) Put the negatives of the the result from c) into matrix form. The result is an information matrix in terms of the parameters.

e) Estimate the parameters by setting the 1st derivatives to 0.

f) Estimate the values of the information matrix using the estimated parameters from e).

g) Estimate a confidence interval using the fact that variance = reciprocal of the diagonal values in the information matrix in f).

Variations:

a) Instead of taking expected values in c) above, plug in observed values.

b) Instead of taking derivatives in b) above, use the following approximation on 2nd derivatives:

{$$ \frac{\partial}{\partial \theta_r}\frac{\partial}{\partial \theta_s} l(\vec\theta) \approx \frac{ (l(\vec{\theta}+\frac{1}{2}h_r\vec{e_r} + \frac{1}{2}h_s \vec{e_s}) -l(\vec{\theta}+\frac{1}{2}h_r\vec{e_r} - \frac{1}{2}h_s \vec{e_s}) -l(\vec{\theta}-\frac{1}{2}h_r\vec{e_r} + \frac{1}{2}h_s \vec{e_s}) +l(\vec{\theta}-\frac{1}{2}h_r\vec{e_r}- \frac{1}{2}h_s \vec{e_s}) }{h_rh_s}$$}

There is also a method called delta method that is used to approximate expected values and variances of a function of estimators. Here is the one-dimensional statement of the theorem:

Suppose {$\hat{\theta}$} is an estimator of {$\theta$} that has an asymptotic normal distribution with mean {$\theta$} and variance {$\sigma^2/n$}. Then {$g(\hat{\theta})$} has an asymptotic normal distribution with mean {$g(\theta)$} and asymptotic variance {$g'(\theta)^2 \sigma^2/n$}.

Steps for the calculation:

a)Mind the mle {$\hat{\theta}$} of the parameter {$\theta$}.

a)Figure out the quantity we're estimating {$g(\theta)$} This could be, say, a probability.

b)Then the new mle is {$g(\hat{\theta})$}.

c)Find the mean and variance of the first estimator {$\hat{\theta}$}; then find the mean and variance of the 2nd estimator according to the theorem.

4. Apply the following concepts in estimating failure time and loss distributions:

a) Unbiasedness

b) Asymptotic unbiasedness

c) Consistency

d) Mean squared error

e) Uniform minimum variance estimator

5. Determine the acceptability of a fitted model and/or compare models

If the data has been modeled with a parametric model, with a resulting distribution function {$F$} and density function {$f$}, and the data is truncated at {$t$}, then the modified functions are:

{$$F^*(x)= \frac{F(x)-F(t)}{1-F(t)}, \text{ for } x\geq t, \text{ and } 0 \text{ for } x<t;$$} {$$f^*(x)= \frac{f(x)}{1-F(t)}, \text{ for } x\geq t, \text{ and } 0 \text{ for } x<t;$$}

Point: Compare {$F^*$} with {$F_n$} (the empirical esitmate). There are several ways to do this:

Graphically: graph a few things (either {$F^*$} with {$F_n$} directly, or some other values) and visually tell how close the functions are.

Hypothesis tests: this involves constructing of the following elements:

''Null hypothesis {$H_0$}: The data came from a population with the stated model.

Alternative hypothesis {$H_1$}: The data did not come from such a population.

Test statistic: a function of the observations, defining what the test is that we're using.

Rejection region, the boundary of which is the critical values, labeled {$c$}; this tells us that if the test statistic in in the region, the null hypothesis is rejected, otherwise we fail to reject it.

Type I error occurs when we reject a null hypothesis when it is true. Define {$\alpha$} to be the (significance) level, the probability of rejecting a null hypothesis when it is true.

Type II error occurs when we do not reject a null hypothesis when the alternative is true. The probability of this kind of error is denoted {$\beta$}.

Sometimes we're given the rejection region, and we can calculate {$\alpha$}. More often, it seems, we're given {$\alpha$}, and the methods to calculate the test statistic and rejection region, and we determine whether to reject {$H_0$} based on whether the test statistic is in the rejection region.

Given the same analysis on the same set of observations, if {$\alpha_1 > \alpha_2$}, then {$\alpha_1$} would result in a larger rejection region than {$\alpha_2$}, so we can have a situation where {$H_0$} is not rejected at the level of {$\alpha_2$}, but is rejected at the level of {$\alpha_1$}.

We can also compute a p-value (attained significance level) -- the smallest level of significance {$\alpha$} for which {$H_0$} is rejected based on the test statistic. The way to calculate the p-value should be stated as part of the test method.

If {$p$} is above 10%, then the data has no evidence to support the alternative hypothesis.

If {$p$} is below 1%, then the data gives strong support for the alternative hypothesis.

a) Graphical procedures

b) Kolmogorov-Smirnov test

If {$t$} is the truncation point and {$u$} is the censoring point, test statistic is

{$$D = \max_{t\leq x\leq u} |F_n(x) - F^*(x)|.$$}

The critical values are

{$$ 1.22/\sqrt{n} \text{ for }\alpha = 0.1, \qquad 1.36/\sqrt{n} \text{ for }\alpha = 0.05, \qquad 1.63/\sqrt{n} \text{ for }\alpha = 0.01.$$}

(These values appear to be always given in the problem.)

c) Anderson-Darling test

If {$t$} is the truncation point and {$u$} is the censoring point, test statistic is

{$$A^2 =n \int_t^u \frac{[F_n(x)-F^*(x)]^2}{F^*(x)(1-F^*(x))} f^*(x)\,dx.$$}

For individual data, this integral can be evaluated as a complicated sum.

The critical values are

{$$ 1.933/\sqrt{n} \text{ for }\alpha = 0.1, \qquad 2.492/\sqrt{n} \text{ for }\alpha = 0.05, \qquad 3.857/\sqrt{n} \text{ for }\alpha = 0.01.$$}

This test statistic formula seems to be too complicated to be on the exam.

d) Chi-square goodness-of-fit test

Select {$k-1$} arbitrary values, {$t=c_0 < c_1< \cdots < c_{k-1} < c_k=\infty$}. Define

{$$\hat{p_j} = F^*(c_j)-F^*(c_{j-1}); \text{i.e., the probability that a (truncated) observations falls in } (c_{j-1}, c_j);$$} {$$p_{n,j} = F_n(c_j)-F_n(c_{j-1}); \text{i.e., the same probability with empirical distribution}.$$}

test statistic is

{$$\chi^2 =\sum_{j=1}^k \frac{n [\hat{p}_j -p_{n,j}]^2}{\hat{p}_j }.$$}

Easier to use the following: Let {$E_j = n\hat{p_j}$} = # of Expected observations in the interval, {$O_j = n\hat{p_{n,j}}$} = # of actual observations in the interval. Then

{$$\chi^2 =\sum_{j=1}^k \frac{[E_j-O_j]^2}{E_j },$$}

The test is as follows:

a) Start with a chi-square distribution with degrees of freedom = {$k - 1 -$} # of estimated parameters; say the survival function is {$S$}.

b) For the given significance level {$\alpha$}, the critical value is {$c=S^{-1}(\alpha).$} The rejection region is {$[c, \infty)$}.

c) The p-value is {$S(\chi^2)$}. (Note that the "values of P" given in the chart is {$F=1-S$}).

d) If {$\chi^2\geq c$}, then reject the null hypothesis at the {$\alpha$} significance level; if {$\chi^2<c$}, then do not reject the null hypothesis at the {$\alpha$} significance level.

e) Or, look on the chart for values of {$P_1, P_2$} where {$P_1<F(\chi^2)<P_2$}. Then reject the null hypothesis at the {$1-P_1$} significance level; do not reject the null hypothesis at the {$1-P_2$} significance level.

e) Likelihood ratio test

Here our hypotheses are:

H_0: The data came from a population with distribution A. (null hypothesis)

H_1: The data came from a population with distribution B, where A is a special case of B. (alternative hypothesis)

Let {$\theta_0, \theta_1$} be the values of parameters that maximizes the likelihood function {$L$}, within the range of values allowed in the null and alternative hypotheses, respectively. Let {$L_0 = L(\theta_0), L_1 = L(\theta_1)$}. The test statistic is

{$$T = 2 \ln (L_1/L_0).$$}

The critical value is determined as follows:

a) Start with a chi-square distribution with degrees of freedom = # of free parameters from the alternative hypothesis - # of free parameters from the null hypothesis; say the survival function is {$S$}.

b) For the given significance level {$\alpha$}, the critical value is {$c=S^{-1}(\alpha).$}

c) If T>c, the null hypothesis is rejected.

d) If T<c, the null hypothesis is not rejected.

e) The p-value is {$S(T)$}.

f) Tell how much the model supports the alternative hypothesis by the 1%/10% rule with p-value stated above.

f) Schwarz Bayesian Criterion