Deprecated: Function create_function() is deprecated in /home/jtung/jamestung.com/pmwiki-2.2.71/pmwiki.php on line 456 Deprecated: Function create_function() is deprecated in /home/jtung/jamestung.com/pmwiki-2.2.71/pmwiki.php on line 456 |
Actuary /
CExamParaModelsConstruction and Selection of Parametric Models (25-30%)1. Estimate the parameters of failure time and loss distributions Assuming {$n$} independent observations throughout, label them {$x_1, \ldots, x_n$}. The general problem is: Given a distribution, find the parameters of the distribution.
a) Maximum likelihood
Given a distribution {$X$} with parameter {$\theta$}, construct a likelihood function {$L(\theta)$} and find the value of {$\theta$} that maximizes {$L$}. Or maximizes the loglikelihood function, {$l(\theta) = \log{L(\theta)}$}. More specifically, given a series of events {$A_1, \ldots, A_n$}, where each {$A_n$} is an observed value (or interval), then
{$$L(\theta) = \prod_{j=1}^{n} Prob(X\in A_j | \theta).$$} When {$A_j$} is a single point observation {$x_j$}, we can estimate {$Prob(X=x_j)$} by the density function {$f(x_j)$}. When {$A_j$} is an interval we can use cdf {$F$}.
If there is more than one parameter, we can maximize the likelihood function by taking partial derivatives.
If the data is grouped into {$(0, c_1], (c_1, c_2], \ldots, (c_k,\infty)$}, and there are {$n_j$} observations in each {$(c_{j-1}, c_j]$}, then
{$$L(\theta) = \prod_{j=1}^{k} [F(c_j) - F(c_{j-1})]^{n_j} \text{ (All cdf values given }\theta).$$} b) Method of moments
If there are {$p$} parameters, form and solve a system of {$p$} equations by equating the 1st to {$p$}th moment of the data, and of the distribution.
c) Percentile matching
If there are {$p$} parameters, and {$g_1, \ldots, g_p$} are arbitrary values between 0 and 100, form and solve a system of {$p$} equations by equation the {$g_1, \ldots, g_p$}th percentile of the data and of the distribution; more accurately, use smoothed empirical estimate of a {$g$}th percentile: Look at the {$n+1$} intervals formed by {$(-\infty, x_1), (x_1, x_2), \ldots, (x_n, \infty)$}. Determine the interval {$(x_{j}, x_{j+1})$} where the {$g$}th percentile lands in by {$j = \text{greatest integer }<= g/100 \cdot (n+1) $}. Let {$h$} be the leftover decimal part. Then the {$g$}th percentile is a weighted average between {$x_j$} and {$x_{j+1}$} by {$h$}.
d) Bayesian procedures
2. Estimate the parameters of failure time and loss distributions with censored and/or truncated data using maximum likelihood. If the data is censored at {$c$}, and {$n$} observations are censored, then the factor to use for those observations is {$ [1-F(c)]^n $}.
If the data is truncated at {$t$}, we can either
Use the shift approach: Ignore all observations under {$t$}, and shift all observations above {$t$} by subtracting {$t$} off of every one of them.
Use the unshifted approach: Ignore all observations under {$t$}, and keep original observations {$x_j$} above {$t$}, but instead of {$f(x_j)$}, use {$f(x_j)/(1-F(t))$}, i.e., probability of {$x_j$} given that {$x_j \geq t$}.
3. Estimate the variance of estimators and the confidence intervals for the parameters and functions of parameters of failure time and loss distributions. We can estimate the variance of the mle (maximum likelihood estimator) by the following (due to a long and complicated theorem): If the underlying density function {$f$} is smooth enough, then {$Var(\hat{\theta}) \approx \frac{1}{I(\theta)}$}, where
{$$I(\theta) = -E\left[ \frac{\partial^2}{\partial \theta^2} l(\theta)\right] = E\left[ \left(\frac{\partial}{\partial \theta} l(\theta)\right)^2\right]$$} {$I$} is called (Fisher's) information.
The multivariable version of this is: {$$I$$} is an information matrix, where the {$(r,s)$}th element is
{$$I(\theta)_{rs} = -E\left[ \frac{\partial}{\partial \theta_r}\frac{\partial}{\partial \theta_s} l(\theta)\right] = E\left[ \frac{\partial}{\partial \theta_r} l(\theta)\frac{\partial}{\partial \theta_s} l(\theta)\right].$$} So that the variances of the individual RVs (estimators) are on the diagonal, and the off diagonal entries are the covariances.
Steps for the calculation:
a) Start with a distribution (book example uses lognormal; see here for an example using Bernoulli). Find a formula for the likelihood function {$L$} and the loglikelyhood function {$l$}, in terms of the parameters, the number of sample points {$n$}, and each observed value {$x_1, \ldots, x_n$}.
b) Take the 2nd partial derivatives with respect to each parameter.
c) Find the expected values of the 2nd derivatives (by basically substituting the know expected value of the distribution we started with)
d) Put the negatives of the the result from c) into matrix form. The result is an information matrix in terms of the parameters.
e) Estimate the parameters by setting the 1st derivatives to 0.
f) Estimate the values of the information matrix using the estimated parameters from e).
g) Estimate a confidence interval using the fact that variance = reciprocal of the diagonal values in the information matrix in f).
Variations:
a) Instead of taking expected values in c) above, plug in observed values.
b) Instead of taking derivatives in b) above, use the following approximation on 2nd derivatives:
{$$ \frac{\partial}{\partial \theta_r}\frac{\partial}{\partial \theta_s} l(\vec\theta) \approx \frac{ (l(\vec{\theta}+\frac{1}{2}h_r\vec{e_r} + \frac{1}{2}h_s \vec{e_s}) -l(\vec{\theta}+\frac{1}{2}h_r\vec{e_r} - \frac{1}{2}h_s \vec{e_s}) -l(\vec{\theta}-\frac{1}{2}h_r\vec{e_r} + \frac{1}{2}h_s \vec{e_s}) +l(\vec{\theta}-\frac{1}{2}h_r\vec{e_r}- \frac{1}{2}h_s \vec{e_s}) }{h_rh_s}$$} There is also a method called delta method that is used to approximate expected values and variances of a function of estimators. Here is the one-dimensional statement of the theorem:
Suppose {$\hat{\theta}$} is an estimator of {$\theta$} that has an asymptotic normal distribution with mean {$\theta$} and variance {$\sigma^2/n$}. Then {$g(\hat{\theta})$} has an asymptotic normal distribution with mean {$g(\theta)$} and asymptotic variance {$g'(\theta)^2 \sigma^2/n$}.
Steps for the calculation:
a)Mind the mle {$\hat{\theta}$} of the parameter {$\theta$}.
a)Figure out the quantity we're estimating {$g(\theta)$} This could be, say, a probability.
b)Then the new mle is {$g(\hat{\theta})$}.
c)Find the mean and variance of the first estimator {$\hat{\theta}$}; then find the mean and variance of the 2nd estimator according to the theorem.
4. Apply the following concepts in estimating failure time and loss distributions: a) Unbiasedness
b) Asymptotic unbiasedness
c) Consistency
d) Mean squared error
e) Uniform minimum variance estimator
5. Determine the acceptability of a fitted model and/or compare models If the data has been modeled with a parametric model, with a resulting distribution function {$F$} and density function {$f$}, and the data is truncated at {$t$}, then the modified functions are:
{$$F^*(x)= \frac{F(x)-F(t)}{1-F(t)}, \text{ for } x\geq t, \text{ and } 0 \text{ for } x<t;$$} {$$f^*(x)= \frac{f(x)}{1-F(t)}, \text{ for } x\geq t, \text{ and } 0 \text{ for } x<t;$$} Point: Compare {$F^*$} with {$F_n$} (the empirical esitmate). There are several ways to do this:
Graphically: graph a few things (either {$F^*$} with {$F_n$} directly, or some other values) and visually tell how close the functions are.
Hypothesis tests: this involves constructing of the following elements:
''Null hypothesis {$H_0$}: The data came from a population with the stated model.
Alternative hypothesis {$H_1$}: The data did not come from such a population.
Test statistic: a function of the observations, defining what the test is that we're using.
Rejection region, the boundary of which is the critical values, labeled {$c$}; this tells us that if the test statistic in in the region, the null hypothesis is rejected, otherwise we fail to reject it.
Type I error occurs when we reject a null hypothesis when it is true. Define {$\alpha$} to be the (significance) level, the probability of rejecting a null hypothesis when it is true.
Type II error occurs when we do not reject a null hypothesis when the alternative is true. The probability of this kind of error is denoted {$\beta$}.
Sometimes we're given the rejection region, and we can calculate {$\alpha$}. More often, it seems, we're given {$\alpha$}, and the methods to calculate the test statistic and rejection region, and we determine whether to reject {$H_0$} based on whether the test statistic is in the rejection region.
Given the same analysis on the same set of observations, if {$\alpha_1 > \alpha_2$}, then {$\alpha_1$} would result in a larger rejection region than {$\alpha_2$}, so we can have a situation where {$H_0$} is not rejected at the level of {$\alpha_2$}, but is rejected at the level of {$\alpha_1$}.
We can also compute a p-value (attained significance level) -- the smallest level of significance {$\alpha$} for which {$H_0$} is rejected based on the test statistic. The way to calculate the p-value should be stated as part of the test method.
If {$p$} is above 10%, then the data has no evidence to support the alternative hypothesis.
If {$p$} is below 1%, then the data gives strong support for the alternative hypothesis.
a) Graphical procedures
b) Kolmogorov-Smirnov test
If {$t$} is the truncation point and {$u$} is the censoring point, test statistic is
{$$D = \max_{t\leq x\leq u} |F_n(x) - F^*(x)|.$$}
{$$ 1.22/\sqrt{n} \text{ for }\alpha = 0.1, \qquad 1.36/\sqrt{n} \text{ for }\alpha = 0.05, \qquad 1.63/\sqrt{n} \text{ for }\alpha = 0.01.$$}
c) Anderson-Darling test
If {$t$} is the truncation point and {$u$} is the censoring point, test statistic is
{$$A^2 =n \int_t^u \frac{[F_n(x)-F^*(x)]^2}{F^*(x)(1-F^*(x))} f^*(x)\,dx.$$}
{$$ 1.933/\sqrt{n} \text{ for }\alpha = 0.1, \qquad 2.492/\sqrt{n} \text{ for }\alpha = 0.05, \qquad 3.857/\sqrt{n} \text{ for }\alpha = 0.01.$$}
d) Chi-square goodness-of-fit test
Select {$k-1$} arbitrary values, {$t=c_0 < c_1< \cdots < c_{k-1} < c_k=\infty$}. Define
{$$\hat{p_j} = F^*(c_j)-F^*(c_{j-1}); \text{i.e., the probability that a (truncated) observations falls in } (c_{j-1}, c_j);$$} {$$p_{n,j} = F_n(c_j)-F_n(c_{j-1}); \text{i.e., the same probability with empirical distribution}.$$}
{$$\chi^2 =\sum_{j=1}^k \frac{n [\hat{p}_j -p_{n,j}]^2}{\hat{p}_j }.$$}
{$$\chi^2 =\sum_{j=1}^k \frac{[E_j-O_j]^2}{E_j },$$}
e) Likelihood ratio test
Here our hypotheses are:
H_0: The data came from a population with distribution A. (null hypothesis)
H_1: The data came from a population with distribution B, where A is a special case of B. (alternative hypothesis)
Let {$\theta_0, \theta_1$} be the values of parameters that maximizes the likelihood function {$L$}, within the range of values allowed in the null and alternative hypotheses, respectively. Let {$L_0 = L(\theta_0), L_1 = L(\theta_1)$}. The test statistic is
{$$T = 2 \ln (L_1/L_0).$$}
f) Schwarz Bayesian Criterion
|