Independence for Continuous Joint Density Function

Probability Distributions I

B.R. Martin , in Statistics for Physical Science, 2012

3.3.1 Joint Probability Distributions

The multivariate joint density function $f (x_{1}, x_{2}, \dots, x_{n})$ of the n continuous random variables $x_{1}, x_{2}, \dots, x_{n}$ is a single-valued non-negative real number for all real values of $x_{1}, x_{2}, \dots, x_{n}$ , normalized so that

(3.17) $\int_{- ∞}^{+ ∞} \dots \int_{- ∞}^{+ ∞} f (x_{1}, x_{2}, \dots, x_{n}) \prod_{i = 1}^{n} d x_{i} = 1,$

and the probability that $x_{1}$ falls between any two numbers $a_{1}$ and $b_{1}$ , $x_{2}$ falls between any two numbers $a_{2}$ and $b_{2}$ , $\dots$ , and $x_{n}$ falls between any two numbers $a_{n}$ and $b_{n}$ , simultaneously, is defined by

(3.18) $P [a_{1} \leq x_{1} \leq b_{1}; \dots; a_{n} \leq x_{n} \leq b_{n}] \equiv \int_{a_{n}}^{b_{n}} \dots \int_{a_{1}}^{b_{1}} f (x_{1}, x_{2}, \dots, x_{n}) \prod_{i = 1}^{n} d x_{i} .$

Similarly, the multivariate joint distribution function $F (x_{1}, x_{2}, \dots, x_{n})$ of the n random variables $x_{1}, x_{2}, \dots, x_{m}$ is

(3.19) $F (x_{1}, x_{2}, \dots, x_{n}) \equiv \int_{- ∞}^{x_{n}} \dots \int_{- ∞}^{x_{1}} f (t_{1}, t_{2}, \dots, t_{n}) \prod_{i = 1}^{n} d t_{i} .$

For simplicity, consider the case of just two random variables x and y. These could correspond to the energy and angle of emission of a particle emitted in a nuclear scattering reaction. If an event A corresponds to the variable x being observed in the range $(x, x + d x)$ and the event B corresponds to the variable y being observed in the range $(y, y + d y)$ , then

$\begin{matrix} P [A \cap B] = probability of x being in (x, x + d x) and y being in (y, y + d y) \\ = f (x, y) d x d y . \end{matrix}$

As noted in Chapter 1, the joint density function corresponds to the density of points on a scatter plot of x and y in the limit of an infinite number of points. This is illustrated in Fig. 3.3, using the data shown on the scatter plot of Fig. 1.3(b).

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780123877604000032

Random Variables

Sheldon M. Ross , in Introduction to Probability Models (Twelfth Edition), 2019

2.5.3 Covariance and Variance of Sums of Random Variables

The covariance of any two random variables X and Y, denoted by $Cov (X, Y)$ , is defined by

$\begin{matrix} Cov (X, Y) & = E [(X - E [X]) (Y - E [Y])] \\ = E [X Y - Y E [X] - X E [Y] + E [X] E [Y]] \\ = E [X Y] - E [Y] E [X] - E [X] E [Y] + E [X] E [Y] \\ = E [X Y] - E [X] E [Y] \end{matrix}$

Note that if X and Y are independent, then by Proposition 2.3 it follows that $Cov (X, Y) = 0$ .

Let us consider now the special case where X and Y are indicator variables for whether or not the events A and B occur. That is, for events A and B, define

$X = {\begin{matrix} 1, & if A occurs \\ 0, & otherwise, \end{matrix} Y = {\begin{matrix} 1, & if B occurs \\ 0, & otherwise \end{matrix}$

Then,

$Cov (X, Y) = E [X Y] - E [X] E [Y]$

and, because XY will equal 1 or 0 depending on whether or not both X and Y equal 1, we see that

$Cov (X, Y) = P {X = 1, Y = 1} - P {X = 1} P {Y = 1}$

From this we see that

$\begin{matrix} Cov (X, Y) > 0 & \Leftrightarrow P {X = 1, Y = 1} > P {X = 1} P {Y = 1} \\ \Leftrightarrow \frac{P {X = 1, Y = 1}}{P {X = 1}} > P {Y = 1} \\ \Leftrightarrow P {Y = 1 | X = 1} > P {Y = 1} \end{matrix}$

That is, the covariance of X and Y is positive if the outcome $X = 1$ makes it more likely that $Y = 1$ (which, as is easily seen by symmetry, also implies the reverse).

In general it can be shown that a positive value of $Cov (X, Y)$ is an indication that Y tends to increase as X does, whereas a negative value indicates that Y tends to decrease as X increases.

Example 2.33

The joint density function of $X, Y$ is

$f (x, y) = \frac{1}{y} e^{- (y + x / y)}, 0 < x, y < \infty$

(a): Verify that the preceding is a joint density function.
(b): Find Cov $(X, Y)$ .

Solution: To show that $f (x, y)$ is a joint density function we need to show it is nonnegative, which is immediate, and that $\int_{- \infty}^{\infty} \int_{- \infty}^{\infty} f (x, y) d y d x = 1$ . We prove the latter as follows:

$\begin{matrix} \int_{- \infty}^{\infty} \int_{- \infty}^{\infty} f (x, y) d y d x & = \int_{0}^{\infty} \int_{0}^{\infty} \frac{1}{y} e^{- (y + x / y)} d y d x \\ = \int_{0}^{\infty} e^{- y} \int_{0}^{\infty} \frac{1}{y} e^{- x / y} d x d y \\ = \int_{0}^{\infty} e^{- y} d y \\ = 1 \end{matrix}$

To obtain Cov

(X, Y)

, note that the density function of Y is

$f_{Y} (y) = e^{- y} \int_{0}^{\infty} \frac{1}{y} e^{- x / y} d x = e^{- y}$

Thus, Y is an exponential random variable with parameter 1, showing (see Example 2.21) that

$E [Y] = 1$

We compute

E [X]

and

E [X Y]

as follows:

$\begin{matrix} E [X] & = \int_{- \infty}^{\infty} \int_{- \infty}^{\infty} x f (x, y) d y d x \\ = \int_{0}^{\infty} e^{- y} \int_{0}^{\infty} \frac{x}{y} e^{- x / y} d x d y \end{matrix}$

Now,

\int_{0}^{\infty} \frac{x}{y} e^{- x / y} d x

is the expected value of an exponential random variable with parameter

1 / y

, and thus is equal to y. Consequently,

$E [X] = \int_{0}^{\infty} y e^{- y} d y = 1$

Also

$\begin{matrix} E [X Y] & = \int_{- \infty}^{\infty} \int_{- \infty}^{\infty} x y f (x, y) d y d x \\ = \int_{0}^{\infty} y e^{- y} \int_{0}^{\infty} \frac{x}{y} e^{- x / y} d x d y \\ = \int_{0}^{\infty} y^{2} e^{- y} d y \end{matrix}$

Integration by parts

(d v = e^{- y} d y, u = y^{2})

gives

$E [X Y] = \int_{0}^{\infty} y^{2} e^{- y} d y = - y^{2} e^{- y} |_{0}^{\infty} + \int_{0}^{\infty} 2 y e^{- y} d y = 2 E [Y] = 2$

Consequently,

$Cov (X, Y) = E [X Y] - E [X] E [Y] = 1 ■$

The following are important properties of covariance.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B978012814346900007X

Elements of Probability

Sheldon Ross , in Simulation (Fifth Edition), 2013

2.10 Conditional Expectation and Conditional Variance

If $X$ and $Y$ are jointly discrete random variables, we define $E [X | Y = y]$ , the conditional expectation of $X$ given that $Y = y$ , by

$\begin{matrix} E [X | Y = y] = & \sum_{x} xP {X = x | Y = y} \\ = & \frac{\sum_{x} xP {X = x, Y = y}}{P {Y = y}} \end{matrix}$

In other words, the conditional expectation of $X$ , given that $Y = y$ , is defined like $E [X]$ as a weighted average of all the possible values of $X$ , but now with the weight given to the value $x$ being equal to the conditional probability that $X$ equals $x$ given that $Y$ equals $y$ .

Similarly, if $X$ and $Y$ are jointly continuous with joint density function $f (x, y)$ , we define the conditional expectation of $X$ , given that $Y = y$ , by

$E [X | Y = y] = \frac{\int xf (x, y) dx}{\int f (x, y) dx}$

Let $E [X | Y]$ denote that function of the random variable $Y$ whose value at $Y = y$ is $E [X | Y = y]$ ; and note that $E [X | Y]$ is itself a random variable. The following proposition is quite useful.

Proposition 7

(2.11) $E [E [X | Y]] = E [X]$

If $Y$ is a discrete random variable, then Equation (2.11) states that

$E [X] = \sum_{y} E [X | Y = y] P {Y = y}$

whereas if $Y$ is continuous with density $g$ , then (2.11) states

$E [X] = \int E [X | Y = y] g (y) dy$

We now give a proof of the preceding proposition when $X$ and $Y$ are discrete:

$\begin{matrix} \sum_{y} E [X | Y = y] P {Y = y} = & \sum_{y} \sum_{x} xP {X = x | Y = y} P {Y = y} \\ = & \sum_{y} \sum_{x} xP {X = x, Y = y} \\ = & \sum_{x} x \sum_{y} P {X = x, Y = y} \\ = & \sum_{x} xP {X = x} \\ = & E [X] \end{matrix}$

We can also define the conditional variance of $X$ , given the value of $Y$ , as follows:

$Var (X | Y) = E [(X - E [X | Y])^{2} | Y]$

That is, $Var (X | Y)$ is a function of $Y$ , which at $Y = y$ is equal to the variance of $X$ given that $Y = y$ . By the same reasoning that yields the identity $Var (X) = E [X^{2}] - (E [X])^{2}$ we have that

$Var (X | Y) = E [X^{2} | Y] - (E [X | Y])^{2}$

Taking expectations of both sides of the above equation gives

(2.12) $\begin{matrix} E [Var (X | Y)] = & E [E [X^{2} | Y]] - E [(E [X | Y])^{2}] \\ = & E [X^{2}] - E [(E [X | Y])^{2}] \end{matrix}$

Also, because $E [E [X | Y]] = E [X]$ , we have that

(2.13) $Var (E [X | Y]) = E [(E [X | Y])^{2}] - (E [X])^{2}$

Upon adding Equations (2.12) and (2.13) we obtain the following identity, known as the conditional variance formula.

The Conditional Variance Formula

$Var (X) = E [Var (X | Y)] + Var (E [X | Y])$

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780124158252000024

The local Gaussian partial correlation

Dag Tjøstheim , ... Bård Støve , in Statistical Modeling Using Local Gaussian Approximation, 2022

11.2 The local Gaussian partial correlation

Let $X = {(X_{1}, \dots, X_{p})}^{T}$ be a random vector. Denote by $(X_{1}, X_{2}, X_{3})$ a partition of X into vectors of dimensions $p_{1}$ , $p_{2}$ , and $p_{3}$ , respectively, such that $X^{(1)} = (X_{1}, X_{2}) = {(X_{1}, \dots, X_{p_{1} + p_{2}})}^{T}$ consists of the first $p_{1} + p_{2}$ components in X , and $X^{(2)} = X_{3} = {(X_{p_{1} + p_{2} + 1}, \dots, X_{p_{1} + p_{2} + p_{3}})}^{T}$ contains the remaining $p_{3}$ variables, where $p = p_{1} + p_{2} + p_{3}$ . We assume that the mean vector μ and covariance matrix Σ of X exist and partition them correspondingly writing

(11.1) $μ = (\begin{matrix} μ_{1} \\ μ_{2} \\ μ_{3} \end{matrix}) and Σ = (\begin{matrix} Σ_{11} & Σ_{12} & Σ_{13} \\ Σ_{21} & Σ_{22} & Σ_{23} \\ Σ_{31} & Σ_{32} & Σ_{33} \end{matrix}),$

where $Σ_{i j}$ is the covariance matrices of $(X_{i}, X_{j})$ , $i, j = 1, 2, 3$ . There are two main concepts of correlation when $X^{(2)} = X_{3}$ is given, the partial and conditional correlations, which coincide in several joint distributions, among them, the Gaussian. See, for example, Baba et al. (2004) for details. We will use the partial correlation as a starting point when defining the LGPC. The partial variance–covariance matrix of $X^{(1)} = (X_{1}, X_{2})$ given $X^{(2)} = X_{3}$ is

(11.2) $Σ_{12 | 3} = Σ^{11} - Σ^{12} {(Σ^{22})}^{- 1} Σ^{21},$

where

$Σ^{11} = (\begin{matrix} Σ_{11} & Σ_{12} \\ Σ_{21} & Σ_{22} \end{matrix}), Σ^{12} = (\begin{matrix} Σ_{13} \\ Σ_{23} \end{matrix}), Σ^{21} = (\begin{matrix} Σ_{31} & Σ_{32} \end{matrix}), and Σ^{22} = Σ_{33},$

and $Σ_{12 | 3}$ is the covariance matrix in the conditional (Gaussian) distribution of $X^{(1)}$ given $X_{3}$ if X is jointly normal. The partial correlation matrix between $X_{1}$ and $X_{2}$ given $X_{3}$ is naturally defined as

(11.3) $R_{12 | 3} = D^{- 1 / 2} Σ_{12 | 3} D^{- 1 / 2},$

where $D = diag (Σ_{12 | 3})$ . We identify in the same way the partial correlation matrix (11.3) with the correlation matrix in the conditional (Gaussian) distribution of $X^{(1)}$ given $X^{(2)}$ if X is jointly normal. Eqs. (11.2) and (11.3) will serve as the starting point for our definition of the local partial correlation.

11.2.1 Definition

We further assume that the components of X are continuous with joint density function $f_{X}$ , and we again set up a local likelihood framework for obtaining local estimates of the parameters in the multivariate normal distribution. Given a point x , we approximate $f_{X}$ in a neighborhood of x by a multivariate Gaussian density

(11.4) $ψ (x, v) = \frac{1}{{(2 π)}^{p / 2} | Σ (x) |^{1 / 2}} \exp {- \frac{1}{2} {(v - μ (x))}^{T} Σ^{- 1} (x) (v - μ (x))},$

where $x = {(x_{1} \dots, x_{p})}^{T}$ , $μ (x) = {μ_{j} (x)}$ , and $Σ (x) = {σ_{j k} (x)}$ for $j, k = 1, \dots, p$ . Moving to another point y , there is another (generally, different) Gaussian approximation $ψ (y, v)$ . In this way, we approximate $f_{X}$ by a family of multivariate Gaussian densities defined by a set of smooth parameter functions ${μ (x), Σ (x)}$ , and if $f_{X}$ is itself a Gaussian density, then the parameter functions collapse to constants corresponding to the true parameter values, and $ψ (x) \equiv f_{X} (x)$ . Hjort and Jones (1996) provide a general framework for estimating such parameter functions non-parametrically from a given data set using a local likelihood procedure, and the basic idea in the following treatment is replacing the components in the partial covariance matrix (11.2) by their locally estimated counterparts to obtain a local measure of conditional dependence.

In this chapter, we use the same transformation technique as that introduced already in Chapter 4.7 and subsequently used in the following chapters. This improves and simplifies the estimation of the LGPC. It is shown that the estimation of the local parameter functions ${μ (x), Σ (x)}$ becomes easier by transforming each $X_{j}$ to a standard normal variable $Z_{j} = Φ^{- 1} (U_{j})$ , where $U_{j}$ is a uniform variable, $U_{j} = F_{j} (X_{j})$ with $F_{j}$ being the cumulative distribution function of $X_{j}$ . Define the random vector Z by this transformation of $X = (X^{(1)}, X^{(2)}) = (X_{1}, X_{2}, X_{3}) = {(X_{1}, X_{2}, \dots, X_{p})}^{T}$ to marginal standard normality:

(11.5) $Z = {(Φ^{- 1} (F_{1} (X_{1})), Φ^{- 1} (F_{2} (X_{2})), \dots, Φ^{- 1} (F_{p} (X_{p})))}^{T} .$

The transformation enables us to simplify the local Gaussian approximation (11.4) by writing the density $f_{Z}$ of Z at the point $v = z$ as

(11.6) $f_{Z} (z) = ψ (z, R (z)) = \frac{1}{| 2 π R (z) |^{1 / 2}} \exp {- \frac{1}{2} z^{T} R^{- 1} (z) z},$

where, as in Otneim and Tjøstheim (2017, 2018) and in Eqs. (9.6) and (9.7) in Chapter 9, in a further simplified approximation, we have fixed local means and standard deviations $μ_{j} (z) \equiv 0$ and $σ_{j}^{2} (z) \equiv 1$ , $j = 1, \dots, p$ , and where $R (z) = {ρ_{j k} (z)}$ is the local correlation matrix.

In practice, we do not know $F_{j}$ , but we can instead use the empirical distribution function

${\hat{F}}_{j} (x) = \frac{1}{n} \sum_{i = 1}^{n} I (X_{j i} ⩽ x),$

where I is the indicator function, n is the number of observations, and $X_{j i}$ is the ith observation of $X_{j}$ , and where $1 / n$ can be replaced by $1 / (n + 1)$ for small or moderate sample sizes. This results in pseudo-standard normal variables ${\hat{Z}}_{j} = Φ^{- 1} ({\hat{F}}_{j} (X_{j}))$ .

In the following, we will not always distinguish between $Z_{j}$ and ${\hat{Z}}_{j}$ . In fact, by using the technique of proof of Theorem 4.5 under the regularity conditions of that theorem, the error made by estimating $R (z)$ using the empirically transformed variables ${\hat{Z}}_{j}$ instead of $Z_{j}$ is smaller in the limit than the estimation error made when estimating the local correlations themselves.

In this chapter, we refer to X and its probability density function $f_{X}$ as being on the x -scale and to Z and its probability density function $f_{Z}$ as being on the z -scale. For further discussion of the simplified z -approximation, we refer to Chapters 9.2.2 and 9.7.

Denote by $(Z^{(1)}, Z^{(2)}) = (Z_{1}, Z_{2}, Z_{3})$ the partitioning of Z corresponding to the partitioning $(X^{(1)}, X^{(2)}) = (X_{1}, X_{2}, X_{3})$ of X . A natural definition of the local partial covariance matrix of $Z^{(1)} | Z^{(2)}$ is the local version of Eq. (11.2):

(11.7) $Σ_{12 | 3} (z) = R^{11} (z^{(1)}) - R^{12} (z) {(R^{22} (z^{(2)}))}^{- 1} R^{21} (z) .$

If $p_{1} = p_{2} = 1$ , then $Σ_{12 | 3} (z)$ is a $2 \times 2$ matrix, and we define the local Gaussian partial correlation $α (z)$ between the two variables in $Z^{(1)} = (Z_{1}, Z_{2})$ given $Z^{(2)} = Z_{3}$ in accordance with the ordinary (global) partial correlation provided by Eq. (11.3):

(11.8) $α (z) = R_{12 | 3} (z) = \frac{{Σ_{12 | 3} (z)}_{12}}{{Σ_{12 | 3} (z)}_{11}^{1 / 2} {Σ_{12 | 3} (z)}_{22}^{1 / 2}},$

which, when $Z^{(2)} = Z_{3}$ is scalar, reduces to

(11.9) $α (z) = ρ_{12 | 3} (z_{1}, z_{2} | z_{3}) = \frac{ρ_{12} (z_{1}, z_{2}) - ρ_{13} (z_{1}, z_{3}) ρ_{23} (z_{2}, z_{3})}{\sqrt{1 - ρ_{13}^{2} (z_{1}, z_{3})} \sqrt{1 - ρ_{23}^{2} (z_{2}, z_{3})}} .$

This is easily recognizable as a local version of the standard global partial correlation coefficient. It is of course possible to introduce an LGPC $α (x)$ directly on the x -scale, but that representation is in many ways harder to handle both computationally and asymptotically. For a multivariate Gaussian distribution, we have that $α_{X} (x) = α_{Z} (z) = α$ . In the remainder of this chapter, we write mainly in terms of the z -representation using the LGPC $α (z) = α_{Z} (z)$ , but when we write the local partial correlation between $X_{1}$ and $X_{2}$ given $X_{3} = x_{3}$ at the point $(x_{1}, x_{2}, x_{3})$ , this is simply $α (z)$ with inserted $z_{j} = Φ^{- 1} (F_{j} (x_{j}))$ , $j = 1, \dots, p$ .

In the more general case where $p_{1} > 1$ and/or $p_{2} > 1$ , $Σ_{12 | 3} (z)$ is a $(p_{1} + p_{2}) \times (p_{1} + p_{2})$ matrix that describes the non-linear conditional dependence among the variables in $Z^{(1)} = (Z_{1}, Z_{2})$ given $Z^{(2)} = Z_{3}$ (or, alternatively, $X^{(1)} = (X_{1}, X_{2})$ given $X^{(2)} = X_{3}$ ), which in particular can be used to analyze the conditional dependence between two sets of random variables of dimensions $p_{1}$ and $p_{2}$ , respectively, given a third set of variables of dimension $p_{3}$ . This case poses no conceptual challenges to our approach, but it requires a rather large investment in new notation and does not lead to simple formulas like Eq. (11.9). We will therefore, for the most part, focus on the local Gaussian partial correlation between two stochastic scalar variables in the remainder of this chapter. A complete description of the general multivariate case can be found in the online supplement of Otneim and Tjøstheim (2021) and, somewhat briefly, in Section 11.8.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780128158616000183

RANDOM VARIABLE THEORY

Daniel T. Gillespie , in Markov Processes, 1992

Random Variable Transformation (RVT) Theorem

If the n random variables X ₁, …, X_n have the joint density function P(x ₁,…,x_n ), and if the m random variables Y ₁, …, Y_m are defined by Y_i =f_i (X ₁,…,X_n ) [i=1 to m], then the joint density function Q(y ₁,…,y_m ) of Y ₁, …, Y_m is given by

(1.6-4) $Q (y_{1}, \dots, y_{m}) = \int_{- \infty}^{\infty} d x_{1} \dots \int_{- \infty}^{\infty} d x_{n} P (x_{1}, \dots, x_{n}) \prod_{i = 1}^{m} δ (y_{i} - f_{i} (x_{1}, \dots, x_{n})) .$

A useful special case of the RVT theorem is the following result:

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780080918372500065

SPECIAL RANDOM VARIABLES

Sheldon M. Ross , in Introduction to Probability and Statistics for Engineers and Scientists (Fourth Edition), 2009

EXAMPLE 5.4e

The random vector X, Y is said to have a uniform distribution over the two-dimensional region R if its joint density function is constant for points in R, and is 0 for points outside of R. That is, if

$f (x, y) = {\begin{matrix} c & if (x, y) \in R \\ 0 & if otherwise \end{matrix}$

Because

$\begin{array}{l} 1 = \int_{R} f (x, y) d x d y \\ = \int_{R} c d x d y \\ = c \times Area of R \end{array}$

it follows that

$c = \frac{1}{Area of R}$

For any region A ⊂ R,

$\begin{array}{l} P {(X, Y) \in A} = \int \int_{(x, y) \in A} f (x, y) d x d y \\ = \int \int_{(x, y) \in A} c d x d y \\ = \frac{Area of A}{Area of R} \end{array}$

Suppose now that X, Y is uniformly distributed over the following rectangular region R:

Its joint density function is

$f (x, y) = {\begin{matrix} c & if 0 \leq x \leq a, 0 \leq y \leq b \\ 0 & otherwise \end{matrix}$

where $c = \frac{1}{Area of rectangle} = \frac{1}{a b}$ . In this case, X and Y are independent uniform random variables. To show this, note that for 0 ≤ x ≤ a, 0 ≤ y ≤ b

(5.4.5) $P {X \leq x, Y \leq y} = c \int_{0}^{x} \int_{0}^{y} d y d x = \frac{x y}{a b}$

First letting y = b, and then letting x = a, in the preceding shows that

(5.4.6) $P {X \leq x} = \frac{x}{a}, P {Y \leq y} = \frac{y}{b}$

Thus, from Equations 5.4.5 and 5.4.6 we can conclude that X and Y are independent, with X being uniform on (0, a) and Y being uniform on (0, b).

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780123704832000102

Multivariate stochastic orders

Félix Belzunce , ... Julio Mulero , in An Introduction to Stochastic Orders, 2016

3.5 The multivariate likelihood ratio order

In a similar way to the univariate case, it is possible to check the multivariate stochastic and hazard rate orders in terms of a property of the joint density functions. This property leads to the definition of the multivariate likelihood ratio order, which is a generalization of the univariate likelihood ratio order. In this section, the results are given in the continuous case but they can be restated for the general case. The main intention of the multivariate likelihood ratio order is to provide a sufficient condition for the multivariate hazard rate order.

Definition 3.5.1

Given two continuous random vectors X = (X ₁,…,X _n) and Y = (Y ₁,…,Y _n) with joint density functions f and g, respectively, we say that X is smaller than Y in the multivariate likelihood ratio order, denoted by X ≤_lr Y, if

$f (x) g (y) \leq f (x \land y) g (x \lor y), for all x, y \in R^{n} .$

Clearly, this is a generalization of the likelihood ratio order in the univariate case. However, the multivariate likelihood ratio order is not necessarily reflexive. When a random vector X satisfies X ≤_lr X, we have the MTP₂ property introduced in Section 1.3.

The following result provides a set of sufficient conditions for the multivariate likelihood ratio order.

Theorem 3.5.2

Let X = (X ₁,…,X _n) and Y = (Y ₁,…,Y _n) be two continuous random vectors with joint density functions f and g, respectively. If X or Y or both are MTP₂, and

(3.13) $\begin{array}{l} f (y) g (x) \leq f (x) g (y), for all x, y \in R^{n}, such that x \leq y, \end{array}$

then

$X \leq_{lr} Y .$

Proof

Let us assume that X is MTP₂ (in the other case the proof is similar), then we have the following inequalities:

$f (x) g (y) \leq \frac{f (x \land y) f (x \lor y)}{f (y)} g (y) \leq f (x \land y) g (x \lor y),$

for all

x, y \in R^{n}

, such that x ≤y, where the first inequality follows from the MTP₂ property and the second one from (3.13). Therefore, X ≤_lr Y.

The multivariate likelihood ratio order is preserved under conditioning on sublattices, as we see next. Recall that a subset $A \subseteq R^{n}$ is called a sublattice if x,y ∈ A implies x ∧y ∈ A and x ∨y ∈ A. This result will be used to show the relationship among the multivariate likelihood ratio order and the multivariate dynamic hazard rate order. The proof is obvious from the definition.

Theorem 3.5.3

Let X = (X ₁,…,X _n) and Y = (Y ₁,…,Y _n) be two continuous random vectors. If X ≤_lr Y, then

$[X | X \subseteq A] \leq_{lr} [Y | Y \subseteq A], for all sublattice A \subseteq R^{n} .$

In particular, from the previous theorem, the multivariate likelihood ratio order is preserved under marginalization. This result is useful because, in some cases, it is easier to provide a result in the multivariate case rather that in the univariate case.

Theorem 3.5.4

Let X = (X ₁,…,X _n) and Y = (Y ₁,…,Y _n) be two continuous random vectors. If X ≤_lr Y, then

$X_{I} \leq_{lr} Y_{I}, for all I \subseteq {1, \dots, n} .$

Next, it is showed that the multivariate likelihood ratio order is stronger than the multivariate hazard rate order.

Theorem 3.5.5

Let X = (X ₁,…,X _n) and Y = (Y ₁,…,Y _n) be two continuous random vectors. If X ≤_lr Y, then

$X \leq_{hr} Y .$

Proof

Let us check the conditions for the definition of the multivariate dynamic hazard rate order. In particular, denoting by η and λ the multivariate dynamic hazard rates of X and Y, respectively, let us see if

$η_{k} (t | h_{t}) \geq λ_{k} (t | h_{t}^{'}), for all t \geq 0,$

where

$h_{t} = {X_{I \cup J} = x_{I \cup J}, X_{\bar{I \cup J}} > t e},$

and

$h_{t}^{'} = {Y_{I} = y_{I}, Y_{\bar{I}} > t e},$

whenever

I \cap J = \emptyset

, 0 ≤x _I ≤y _I ≤ t e, 0 ≤x _J ≤ t e, and for all

k \in \bar{I \cup J}

The result will follow by proving that, as we shall see later,

(3.14) $\begin{array}{l} [X_{\bar{I \cup J}}| h_{t}] \leq_{lr} [Y_{\bar{I \cup J}}| h_{t}^{'}] . \end{array}$

Condition (3.14) will follow if

(3.15) $\begin{array}{l} [X_{\bar{I \cup J}}| X_{I \cup J} = x_{I \cup J}] \leq_{lr} [Y_{\bar{I \cup J}}| Y_{I} = y_{I}, Y_{J} > t e], \end{array}$

holds.

Let us see that (3.15) follows if X ≤_lr Y holds. Denoting by f and g the joint density functions of $(X_{I}, X_{J}, X_{I \cup J})$ and $(Y_{I}, Y_{J}, Y_{I \cup J})$ , respectively, and by f _I,J and g _I,J the joint densities of X _I,J and Y _I,J, respectively, we see that the joint density function of $[X_{\bar{I \cup J}}| X_{I \cup J} = x_{I \cup J}]$ is given by

$f_{\bar{I \cup J}} (x_{\bar{I \cup J}}) = \frac{f (x_{I}, x_{J}, x_{\bar{I \cup J}})}{f_{I, J} (x_{I}, x_{J})},$

and the joint density function of

[Y_{\bar{I \cup J}}| Y_{I} = y_{I}, Y_{J} > t e]

is given by

$g_{\bar{I \cup J}} (x_{\bar{I \cup J}}) = \frac{\int_{y_{J} > t e} g (y_{I}, y_{J}, x_{\bar{I \cup J}}) d y_{J}}{\int_{y_{J} > t e} g_{I, J} (y_{I}, y_{J}) d y_{J}} .$

Given y _J > t e, we see that x _J < t e <y _J and, analogously, x _I ≤y _I. Since X ≤_lr Y, given $y_{\bar{I \cup J}}$ , we see that

$\begin{array}{l} f (x_{I}, x_{J}, x_{\bar{I \cup J}}) g (y_{I}, y_{J}, y_{\bar{I \cup J}}) \\ \leq f (x_{I}, x_{J}, x_{\bar{I \cup J}} \land y_{\bar{I \cup J}}) g (y_{I}, y_{J}, x_{\bar{I \cup J}} \lor y_{\bar{I \cup J}}), \end{array}$

which, upon integration, yields

$f_{\bar{I \cup J}} (x_{\bar{I \cup J}}) g_{\bar{I \cup J}} (y_{\bar{I \cup J}}) \leq f_{\bar{I \cup J}} (x_{\bar{I \cup J}} \land y_{\bar{I \cup J}}) g_{\bar{I \cup J}} (x_{\bar{I \cup J}} \lor y_{\bar{I \cup J}}) .$

Therefore, from previous inequality, we see that (3.15) holds. Now, from Theorem 3.5.3, we see that (3.14) also holds. In particular, $[X_{k}| h_{t}] \leq_{lr} [Y_{k}| h_{t}^{'}]$ holds, for all $k \in \bar{I \cup J}$ . Now, since (2.25), we see that the hazard rates of $[X_{k}| h_{t}]$ and $[Y_{k}| h_{t}^{'}]$ are ordered and, consequently, we see that

$η_{k} (t | h_{t}) \geq λ_{k} (t | h_{t}^{'}) .$

Finally, a result for the multivariate likelihood ratio order among random vectors with conditionally independent components is provided. In particular, we consider the same background as that in Theorem 3.3.8.

Theorem 3.5.6

Assume that

(i): X _i(θ) =_st Y _i(θ), for all θ and for all i = 1,…,n,
(ii): X _i(θ) ≤_lr Y _j(θ′), for all θ ≤ θ′ and for all 1 ≤ i ≤ j ≤ n, and
(iii): θ ₁ ≤_lr θ ₂.

Then,

$(X_{1}, \dots, X_{n}) \leq_{lr} (Y_{1}, \dots, Y_{n}) .$

Proof

Let f _i(x|θ) be the density function of X _i(θ). From condition (ii), we see that

$\prod_{i = 1}^{n} f_{i} (x_{i} | θ_{i}) {is TP}_{2} in (x_{1}, \dots, x_{n}, θ_{1}, \dots, θ_{n}) .$

Furthermore, condition (iii) is equivalent to the fact that h _i(θ ₁,…,θ _n) is TP₂ in (θ ₁,…,θ _n,i) ∈ S ×{1,2}. Due to these facts and from Theorem 1.2.3, we see that $\int f (x_{1}, \dots, x_{n} | θ_{1}, \dots, θ_{n}) d H_{i} (θ)$ is TP₂ in (x ₁,…,x _n,i), and we get the result.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B978012803768300003X

Convex Functions, Partial Orderings, and Statistical Applications

In Mathematics in Science and Engineering, 1992

13.6 Some Properties of Log-Concave Density Functions

Log-concave density functions which satisfy (13.19) play an important role in statistics and probability. In the following we observe some known facts concerning this class of densities.

13.24 Fact

Let X ₁ …, X _n be i.i.d. univariate random variables with a common density function h(x). If h(x) is a log-concave function of x for x ∈ ℝ, then the joint density function of (X ₁, …, X _n) is a log-concave function of x for x ∈ ℝⁿ.

13.25 Fact

If f(x) = g(T(x)) where g: ℝ → [0, ∞) is decreasing and T(x) is a convex function of x for x ∈ ℝⁿ, then f is a log-concave function of x for x ∈ ℝⁿ.

The following theorem, due to Prékopa (1971) and Brascamp and Lieb (1975), shows that the integral of a log-concave function is log-concave:

13.26 Theorem

Let f(x, y): ℝ^n+m → [0, ∞) be a log-concave function of (x, y) for x ∈ ℝⁿ and y ∈ ℝ^m. Then the function g: ℝⁿ → [0, ∞) given by

(13.27) $g (x) = \int_{ℝ^{m}} f (x, y) d y$

is log-concave.

Proof

We adopt the proof in Brascamp and Lieb (1975). First note that it suffices to prove the theorem for m = n = 1 because the general case follows by Fubini's theorem and induction. Let x ₁, x ₂ be two points in ℝ such that g(x ₁)g(x ₂) ≠ 0. For convenience we may assume that

$\underset{y}{\sup ╡} f (x, y) = \underset{y}{\sup ╡} f (x^{'}, y);$

because otherwise we can replace f(x, y) by e ^bx f(x, y) for suitably chosen b and the problem remains unchanged. For each fixed λ > 0, denote

$\begin{matrix} C_{1} (λ) = {(x, y) : f (x, y) \geq λ} \subset ℝ^{2}, \\ C_{2} (x, λ) = {y : f (x, y) \geq λ} \subset ℝ . \end{matrix}$

Then, by log-concavity of f, C ₁(λ) is convex and C ₂(x, λ) is an interval. (For the convexity of C ₁(λ), see Fact 13.28). Letting v(x, y) = ∫_{C ₂(x,λ)} dy be the Lebesgue measure of the set C ₂(x, λ), we have, by Theorem 13.18,

$\begin{matrix} v (α x_{1} + (1 - α) x_{2}, λ) \geq α v (x_{1}, λ) + (1 - α) v (x_{2}, λ) & for all & α \in [0, 1] \end{matrix} .$

Since g(x) can be expressed as g(x) = ∫₀ ^∞ v(x, λ) dλ, we have

$g (α x_{1}) + (1 - α) x_{2}) \geq α g (x_{1}) + (1 - α) g (x_{2}) \geq {(g (x_{1}))}^{α} {(g (x_{2}))}^{1 - α}$

for all α ∈ [0, 1], where the second inequality follows from the arithmetic mean-geometric mean inequality.

A simple application of Theorem 13.26 is (Brascamp and Lieb, 1975; see also Barlow and Proschan, 1981, p. 104):

13.27 Corollary

The convolution of two log-concave density functions in ℝⁿ is log-concave.

Proof

Let f ₁, f ₂ be log-concave density functions. Then f ₁(x − y)f ₂(y) is jointly log-concave in (x, y) ∈ ℝ²ⁿ. Thus by Theorem 13.26

$g (x) = \int_{ℝ^{n}} f_{1} (x - y) f_{2} (y) d y$

is log-concave.

A density function f is said to be unimodal if the set

(13.28) $D_{λ} = {x : x \in ℝ^{n}, f (x) \geq λ}$

is a convex set in ℝⁿ for all λ > 0. The following facts show how log-concavity and unimodality are related.

13.28 Fact

If f: ℝⁿ → [0, ∞) is a probability density function, then log-concavity off implies unimodality of f.

Proof

Let x ₁, x ₂ ∈ ℝⁿ be in D _λ. Then for every α ∈ [0, 1], we have, by (13.20),

(13.29) $\begin{matrix} \log ╡ f (α x_{1} + (1 - α) x_{2}) \geq α \log ╡ f (x_{1}) + (1 - α) \log ╡ f (x_{2}) \\ \geq α \log ╡ λ + (1 - α) \log ╡ λ \geq \log ╡ λ . \end{matrix}$

Thus αx ₁ + (1 − α)x ₂ is also in D _λ.

A function f is said to be Schur-concave if y ≻ x implies f(x) ≥ f(y) for all x, y ∈ ℝⁿ (see Definition 12.23). It is known that all Schur-concave functions are permutation invariant (see Theorem 12.24). Furthermore, it is known that

13.29 Fact

If f: ℝⁿ → [0, ∞) is a permutation-invariant and log-concave function of x ∈ ℝⁿ, then it is a Schur-concave function of x ∈ ℝⁿ.

Proof

Assume y ≻ x and, without loss of generality, it may be assumed that x, y are of the form

$\begin{matrix} x = (x_{1}, x_{2}, x_{3}, \dots, x_{n}), & y = (y_{1}, y_{2}, x_{3}, \dots, x_{n}), \end{matrix}$

where y ₂ < x ₂ ≤ x ₁ < y ₁ and x ₁ + x ₂ = y ₁ + y ₂. Let y ^* = (y ₂, y ₁, x ₃, …, x _n). Then there exists an α ∈ (0, 1) such that x = αy + (1 − α)y ^*. Thus by the permutation-invariance and log-concavity properties of f, we have

$\begin{matrix} \log ╡ f (x) = \log ╡ f (α y + (1 - α) y^{*}) \geq α \log ╡ f (y) + (1 - α) \log ╡ f (y^{*}) \\ = \log ╡ f (y) . \end{matrix}$

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/S0076539208628258

Preliminaries

Félix Belzunce , ... Julio Mulero , in An Introduction to Stochastic Orders, 2016

1.3.5 Parametric families of multivariate distributions

In this section, some known multivariate distributions are recalled. These models will be compared in Chapter 3 in some multivariate stochastic orders.

It is worth mentioning that the mean of a random vector X = (X ₁,…,X _n) is given by

$E [X] = (\begin{array}{l} E [X_{1}] \\ ⋮ \\ E [X_{n}] \end{array}),$

and the covariance matrix of X is

$Cov (X) = E [(X - E [X]) {(X - E [X])}^{T}] .$

Next, we consider some multivariate distributions. First, the definition of the multivariate normal distribution is recalled.

Definition 1.3.2

Given a random vector X = (X ₁,…,X _n), it is said that X follows a multivariate normal distribution with mean vector $μ \in R^{n}$ and covariance matrix $Σ \in R^{n} \times R^{n}$ , denoted by X ∼ N _n(μ,Σ ), if its joint density function is given by

$f (x) = \frac{1}{{(2 π)}^{\frac{n}{2}} | Σ |^{\frac{1}{2}}} exp \{- \frac{1}{2} {(x - μ)}^{T} Σ^{- 1} (x - μ)\}, for all x \in R^{n} .$

The marginal distribution functions follow univariate normal models. Furthermore, the copula of a N _n(μ,Σ) is the same to that of N _n(0,P) where P is the correlation matrix obtained through the covariance matrix Σ. In this sense, all multivariate normal distributions with the same dimension and correlation matrix have the same (Gaussian) copula.

Another well-known multivariate family is the elliptically contoured model. Let us consider the formal definition.

Definition 1.3.3

Given a random vector X = (X ₁,…,X _n), it is said that X follows an elliptically contoured distribution, denoted by E _n(μ,Σ,g), if its joint density function is given by

(1.17) $\begin{array}{l} f (x) = \frac{1}{\sqrt{| Σ |}} g ({(x - μ)}^{t} Σ^{- 1} (x - μ)), for all x \in R^{n}, \end{array}$

where μ is the median vector (which is also the mean vector if the latter exists), Σ is a symmetric positive definite matrix which is proportional to the covariance matrix, if the latter exists, and

g : R_{+} \mapsto R_{+}

such that

\int g (x) d x < + \infty

A particular case of elliptically contoured distributions is the case of a multivariate normal taking

$g (x) = \frac{1}{{(2 π)}^{n}} exp \{- \frac{1}{2} x\} .$

Some other particular cases are described in Ref. [52].

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780128037683000016

GENERAL FEATURES OF A MARKOV PROCESS

Daniel T. Gillespie , in Markov Processes, 1992

2.1 The Markov State Density Function

We consider a time-evolving or 'dynamical' system whose possible states can be represented by points on the real axis, and we let

(2.1-1) $X (t) \equiv the state point, or state, of the system at time t.$

We shall assume that the value of X at some initial time t ₀ is fixed,

(2.1-2) $X (t_{0}) = x_{0},$

but that X(t) for any t> t ₀ can be predicted only probabilistically; more specifically, we assume that X(t) for any given t> t ₀ is a random variable, as defined in Section 1.2 Since it makes sense to inquire about the state of the system at successive instants t ₁, t ₂, …, t_n , where t ₀ < t ₁ < t ₂ < … < t_n , then we can ascribe to the corresponding n random variables X(t ₁), X(t ₂), …, X(t_n ) a joint density function P _n ⁽¹⁾, which is defined as follows:

(2.1-3) $\begin{array}{l} P_{n}^{(1)} (x_{n}, t_{n}; x_{n - 1}, t_{n - 1}; …; x_{1}, t_{1} | x_{0}, t_{0}) d x_{n} d x_{n - 1} … d x_{1} \\ \equiv P r o b {X (t_{i}) \in [x_{i}, x_{i} + d x_{i}) for i = 1, 2, …, n \\ given that X (t_{0}) = x_{0}, with t_{0} \leq t_{1} \leq … \leq t_{n}}. \end{array}$

If all these assumptions are satisfied, then we say that X(t) is a stochastic process.

It is evident that a stochastic process X(t) has infinitely many joint density functions P_n ⁽¹⁾, corresponding to n = 1, 2, … . And associated with each of these joint density functions is a plethora of subordinate density functions; for example,

$P_{n - j}^{(j + 1)} (x_{n}, t_{n}; …; x_{j + 1}, t_{j + 1} | x_{j}, t_{J}; …; x_{1}, t_{1}; x_{0}, t_{0}),$

is defined to be the joint density function of the n – j random variables $X (t_{j} + 1), \dots, X (t_{n})$ given the j + 1 conditions $X (t_{0}) = x_{0}, X (t_{1}) = x_{1}, \dots, X (t_{j}) = x_{j} .$ Notice that the subscript on the density function P_k ^(j) refers to the number of (x,t) pairs to the left of the 'given' bar, while the superscript refers to the number of (x,t) pairs to the right of the 'given' bar; thus, P_k ^(j) is a k-variate joint density function with j conditionings.

It is always possible to calculate the function P _n-1 ⁽¹⁾ from the function P_n ⁽¹⁾ by simply integrating the latter over any one of the variables x ₁, …, x_n . However, it is not in general possible to deduce the function P _n+1 ⁽¹⁾ from the function P_n ⁽¹⁾. This 'open-ended' nature of the density functions for a general stochastic process usually makes any substantive analysis extremely difficult. But we shall be concerned here with only a very restricted subclass of stochastic processes, namely those that have the 'past-forgetting' property that, for all j ≥ 2 and t _i-1 ≤ t_i ,

(2.1-4) $\begin{array}{l} P_{1}^{(j)} (x_{j}, t_{j} | x_{j - 1}, t_{j - 1}; …; x_{1}, t_{1}; x_{0}, t_{0}) \\ = P_{1}^{(1)} (x_{j}, t_{j} | x_{j - 1}, t_{j - 1}) \equiv P (x_{j}, t_{j} | x_{j - 1}, t_{J^{- 1}}) . \end{array}$

This is called the Markov property, and it says that only the most recent conditioning matters: Given that X(t') = x', then our ability to predict X(t) for any t > t' will not be enhanced by a knowledge of any values of the process earlier than t'. Any stochastic process X(t) that has this past-forgetting property is called a Markovian stochastic process, or more simply, a Markov process. In what follows it may always be assumed, unless explicitly stated otherwise, that the stochastic process X(t) under consideration is a Markov process.

The Markov property (2.1-4) breaks the open-endedness of the hierarchy of joint state density functions in a dramatic way. For the joint density function P ₂ ⁽¹⁾ we have

$\begin{array}{l} P_{2}^{(1)} (x_{2}, t_{2}; x_{1}, t_{1} | x_{0}, t_{0}) \\ = P_{1}^{(1)} (x_{1}, t_{1} | x_{0}, t_{0}) P_{1}^{(2)} (x_{2}, t_{2} | x_{1}, t_{1}; x_{0}, t_{0}) [by (1.5 - 9 d)] \end{array}$

$= P_{1}^{(1)} (x_{1}, t_{1} | x_{0}, t_{0}) P_{1}^{(1)} (x_{2}, t_{2} | x_{1}, t_{1}) . [by (2.1 - 4)]$

Hence, writing P ₁ ⁽¹⁾ ≡ P in accordance with the notation suggested in Eq. (2.1-4), we have

(2.1-5) $P_{2}^{(1)} (x_{2}, t_{2}; x_{1}, t_{1} | x_{0}, t_{0}) = P (x_{2}, t_{2} | x_{1}, t_{1}) P (x_{1}, t_{1} | x_{0}, t_{0}) .$

The same kind of reasoning shows that

$P_{3}^{(1)} (x_{3}, t_{3}; x_{2}, t_{2}; x_{1}, t_{1} | x_{0}, t_{0}) = P (x_{3}, t_{3} | x_{2}, t_{2}) P (x_{2}, t_{2} | x_{1}, t_{1}) P (x_{1}, t_{1} | x_{0}, t_{0}),$

and more generally, for any set of times $t_{n} \geq t_{n - 1} \geq … \geq t_{0},$

(2.1-6) $P_{n}^{(1)} (x_{n}, t_{n}; …; x_{1}, t_{1} | x_{0}, t_{0}) = \prod_{i =1}^{n} P (x_{i}, t_{i} | x_{i - 1}, t_{i - 1})$

So for a Markov process, every conditioned state density function $P_{n}^{(1)}$ can be written solely in terms of the particular conditioned state density function $P_{1}^{(1)} \equiv P$ The function $P_{1}^{(1)} \equiv P$ thus becomes the principle focus of our study, and we shall henceforth refer to it as the Markov state density function. For future reference, the formal definition of the Markov state density function is [cf.Eq. (2.1-3)]

(2.1-7) $\begin{array}{l} P (x_{2}, t_{2} | x_{1}, t_{1}) d x_{2} \\ \equiv Prob {X (t_{2}) \in [x_{2}, x_{2} + d x_{2}), given X (t_{1}) = x_{1}, with t_{2} \geq t_{1}} . \end{array}$

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780080918372500077

ungcyruch.blogspot.com

Source: https://www.sciencedirect.com/topics/mathematics/joint-density-function

Independence for Continuous Joint Density Function

Probability Distributions I

3.3.1 Joint Probability Distributions

Random Variables

2.5.3 Covariance and Variance of Sums of Random Variables

Elements of Probability

2.10 Conditional Expectation and Conditional Variance

The Conditional Variance Formula

The local Gaussian partial correlation

11.2 The local Gaussian partial correlation

11.2.1 Definition

RANDOM VARIABLE THEORY

Random Variable Transformation (RVT) Theorem

SPECIAL RANDOM VARIABLES

EXAMPLE 5.4e

Multivariate stochastic orders

3.5 The multivariate likelihood ratio order

Convex Functions, Partial Orderings, and Statistical Applications

13.6 Some Properties of Log-Concave Density Functions

13.24 Fact

13.25 Fact

13.26 Theorem

Proof

13.27 Corollary

Proof

13.28 Fact

Proof

13.29 Fact

Proof

Preliminaries

1.3.5 Parametric families of multivariate distributions

GENERAL FEATURES OF A MARKOV PROCESS

2.1 The Markov State Density Function

0 Response to "Independence for Continuous Joint Density Function"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel