Independence for Continuous Joint Density Function
Probability Distributions I
B.R. Martin , in Statistics for Physical Science, 2012
3.3.1 Joint Probability Distributions
The multivariate joint density function of the n continuous random variables is a single-valued non-negative real number for all real values of , normalized so that
(3.17)
and the probability that falls between any two numbers and , falls between any two numbers and , , and falls between any two numbers and , simultaneously, is defined by
(3.18)
Similarly, the multivariate joint distribution function of the n random variables is
(3.19)
For simplicity, consider the case of just two random variables x and y. These could correspond to the energy and angle of emission of a particle emitted in a nuclear scattering reaction. If an event A corresponds to the variable x being observed in the range and the event B corresponds to the variable y being observed in the range , then
As noted in Chapter 1, the joint density function corresponds to the density of points on a scatter plot of x and y in the limit of an infinite number of points. This is illustrated in Fig. 3.3, using the data shown on the scatter plot of Fig. 1.3(b).
FIGURE 3.3. A scatter plot of 1000 events that are functions of two random variables x and y showing two infinitesimal bands dx and dy. The area of intersection of the bands is and is the probability of finding x in the interval and y in the interval .
Read full chapter
URL:
https://www.sciencedirect.com/science/article/pii/B9780123877604000032
Random Variables
Sheldon M. Ross , in Introduction to Probability Models (Twelfth Edition), 2019
2.5.3 Covariance and Variance of Sums of Random Variables
The covariance of any two random variables X and Y, denoted by , is defined by
Note that if X and Y are independent, then by Proposition 2.3 it follows that .
Let us consider now the special case where X and Y are indicator variables for whether or not the events A and B occur. That is, for events A and B, define
Then,
and, because XY will equal 1 or 0 depending on whether or not both X and Y equal 1, we see that
From this we see that
That is, the covariance of X and Y is positive if the outcome makes it more likely that (which, as is easily seen by symmetry, also implies the reverse).
In general it can be shown that a positive value of is an indication that Y tends to increase as X does, whereas a negative value indicates that Y tends to decrease as X increases.
Example 2.33
The joint density function of is
- (a)
-
Verify that the preceding is a joint density function.
- (b)
-
Find Cov .
-
Solution: To show that is a joint density function we need to show it is nonnegative, which is immediate, and that . We prove the latter as follows:
To obtain Cov , note that the density function of Y is
Thus, Y is an exponential random variable with parameter 1, showing (see Example 2.21) that
We compute and as follows:
Now, is the expected value of an exponential random variable with parameter , and thus is equal to y. Consequently,
Also
Integration by parts gives
Consequently,
The following are important properties of covariance.
Read full chapter
URL:
https://www.sciencedirect.com/science/article/pii/B978012814346900007X
Elements of Probability
Sheldon Ross , in Simulation (Fifth Edition), 2013
2.10 Conditional Expectation and Conditional Variance
If and are jointly discrete random variables, we define , the conditional expectation of given that , by
In other words, the conditional expectation of , given that , is defined like as a weighted average of all the possible values of , but now with the weight given to the value being equal to the conditional probability that equals given that equals .
Similarly, if and are jointly continuous with joint density function , we define the conditional expectation of , given that , by
Let denote that function of the random variable whose value at is ; and note that is itself a random variable. The following proposition is quite useful.
Proposition 7
(2.11)
If is a discrete random variable, then Equation (2.11) states that
whereas if is continuous with density , then (2.11) states
We now give a proof of the preceding proposition when and are discrete:
We can also define the conditional variance of , given the value of , as follows:
That is, is a function of , which at is equal to the variance of given that . By the same reasoning that yields the identity we have that
Taking expectations of both sides of the above equation gives
(2.12)
Also, because , we have that
(2.13)
Upon adding Equations (2.12) and (2.13) we obtain the following identity, known as the conditional variance formula.
The Conditional Variance Formula
Read full chapter
URL:
https://www.sciencedirect.com/science/article/pii/B9780124158252000024
The local Gaussian partial correlation
Dag Tjøstheim , ... Bård Støve , in Statistical Modeling Using Local Gaussian Approximation, 2022
11.2 The local Gaussian partial correlation
Let be a random vector. Denote by a partition of X into vectors of dimensions , , and , respectively, such that consists of the first components in X , and contains the remaining variables, where . We assume that the mean vector μ and covariance matrix Σ of X exist and partition them correspondingly writing
(11.1)
where is the covariance matrices of , . There are two main concepts of correlation when is given, the partial and conditional correlations, which coincide in several joint distributions, among them, the Gaussian. See, for example, Baba et al. (2004) for details. We will use the partial correlation as a starting point when defining the LGPC. The partial variance–covariance matrix of given is
(11.2)
where
and is the covariance matrix in the conditional (Gaussian) distribution of given if X is jointly normal. The partial correlation matrix between and given is naturally defined as
(11.3)
where . We identify in the same way the partial correlation matrix (11.3) with the correlation matrix in the conditional (Gaussian) distribution of given if X is jointly normal. Eqs. (11.2) and (11.3) will serve as the starting point for our definition of the local partial correlation.
11.2.1 Definition
We further assume that the components of X are continuous with joint density function , and we again set up a local likelihood framework for obtaining local estimates of the parameters in the multivariate normal distribution. Given a point x , we approximate in a neighborhood of x by a multivariate Gaussian density
(11.4)
where , , and for . Moving to another point y , there is another (generally, different) Gaussian approximation . In this way, we approximate by a family of multivariate Gaussian densities defined by a set of smooth parameter functions , and if is itself a Gaussian density, then the parameter functions collapse to constants corresponding to the true parameter values, and . Hjort and Jones (1996) provide a general framework for estimating such parameter functions non-parametrically from a given data set using a local likelihood procedure, and the basic idea in the following treatment is replacing the components in the partial covariance matrix (11.2) by their locally estimated counterparts to obtain a local measure of conditional dependence.
In this chapter, we use the same transformation technique as that introduced already in Chapter 4.7 and subsequently used in the following chapters. This improves and simplifies the estimation of the LGPC. It is shown that the estimation of the local parameter functions becomes easier by transforming each to a standard normal variable , where is a uniform variable, with being the cumulative distribution function of . Define the random vector Z by this transformation of to marginal standard normality:
(11.5)
The transformation enables us to simplify the local Gaussian approximation (11.4) by writing the density of Z at the point as
(11.6)
where, as in Otneim and Tjøstheim (2017, 2018) and in Eqs. (9.6) and (9.7) in Chapter 9, in a further simplified approximation, we have fixed local means and standard deviations and , , and where is the local correlation matrix.
In practice, we do not know , but we can instead use the empirical distribution function
where I is the indicator function, n is the number of observations, and is the ith observation of , and where can be replaced by for small or moderate sample sizes. This results in pseudo-standard normal variables .
In the following, we will not always distinguish between and . In fact, by using the technique of proof of Theorem 4.5 under the regularity conditions of that theorem, the error made by estimating using the empirically transformed variables instead of is smaller in the limit than the estimation error made when estimating the local correlations themselves.
In this chapter, we refer to X and its probability density function as being on the x -scale and to Z and its probability density function as being on the z -scale. For further discussion of the simplified z -approximation, we refer to Chapters 9.2.2 and 9.7.
Denote by the partitioning of Z corresponding to the partitioning of X . A natural definition of the local partial covariance matrix of is the local version of Eq. (11.2):
(11.7)
If , then is a matrix, and we define the local Gaussian partial correlation between the two variables in given in accordance with the ordinary (global) partial correlation provided by Eq. (11.3):
(11.8)
which, when is scalar, reduces to
(11.9)
This is easily recognizable as a local version of the standard global partial correlation coefficient. It is of course possible to introduce an LGPC directly on the x -scale, but that representation is in many ways harder to handle both computationally and asymptotically. For a multivariate Gaussian distribution, we have that . In the remainder of this chapter, we write mainly in terms of the z -representation using the LGPC , but when we write the local partial correlation between and given at the point , this is simply with inserted , .
In the more general case where and/or , is a matrix that describes the non-linear conditional dependence among the variables in given (or, alternatively, given ), which in particular can be used to analyze the conditional dependence between two sets of random variables of dimensions and , respectively, given a third set of variables of dimension . This case poses no conceptual challenges to our approach, but it requires a rather large investment in new notation and does not lead to simple formulas like Eq. (11.9). We will therefore, for the most part, focus on the local Gaussian partial correlation between two stochastic scalar variables in the remainder of this chapter. A complete description of the general multivariate case can be found in the online supplement of Otneim and Tjøstheim (2021) and, somewhat briefly, in Section 11.8.
Read full chapter
URL:
https://www.sciencedirect.com/science/article/pii/B9780128158616000183
RANDOM VARIABLE THEORY
Daniel T. Gillespie , in Markov Processes, 1992
Random Variable Transformation (RVT) Theorem
If the n random variables X 1, …, Xn have the joint density function P(x 1,…,xn ), and if the m random variables Y 1, …, Ym are defined by Yi =fi (X 1,…,Xn ) [i=1 to m], then the joint density function Q(y 1,…,ym ) of Y 1, …, Ym is given by
(1.6-4)
A useful special case of the RVT theorem is the following result:
Read full chapter
URL:
https://www.sciencedirect.com/science/article/pii/B9780080918372500065
SPECIAL RANDOM VARIABLES
Sheldon M. Ross , in Introduction to Probability and Statistics for Engineers and Scientists (Fourth Edition), 2009
EXAMPLE 5.4e
The random vector X, Y is said to have a uniform distribution over the two-dimensional region R if its joint density function is constant for points in R, and is 0 for points outside of R. That is, if
Because
it follows that
For any region A ⊂ R,
Suppose now that X, Y is uniformly distributed over the following rectangular region R:
Its joint density function is
where . In this case, X and Y are independent uniform random variables. To show this, note that for 0 ≤ x ≤ a, 0 ≤ y ≤ b
(5.4.5)
First letting y = b, and then letting x = a, in the preceding shows that
(5.4.6)
Thus, from Equations 5.4.5 and 5.4.6 we can conclude that X and Y are independent, with X being uniform on (0, a) and Y being uniform on (0, b).
Read full chapter
URL:
https://www.sciencedirect.com/science/article/pii/B9780123704832000102
Multivariate stochastic orders
Félix Belzunce , ... Julio Mulero , in An Introduction to Stochastic Orders, 2016
3.5 The multivariate likelihood ratio order
In a similar way to the univariate case, it is possible to check the multivariate stochastic and hazard rate orders in terms of a property of the joint density functions. This property leads to the definition of the multivariate likelihood ratio order, which is a generalization of the univariate likelihood ratio order. In this section, the results are given in the continuous case but they can be restated for the general case. The main intention of the multivariate likelihood ratio order is to provide a sufficient condition for the multivariate hazard rate order.
Definition 3.5.1
Given two continuous random vectors X = (X 1,…,X n ) and Y = (Y 1,…,Y n ) with joint density functions f and g, respectively, we say that X is smaller than Y in the multivariate likelihood ratio order, denoted by X ≤lr Y, if
Clearly, this is a generalization of the likelihood ratio order in the univariate case. However, the multivariate likelihood ratio order is not necessarily reflexive. When a random vector X satisfies X ≤lr X, we have the MTP2 property introduced in Section 1.3.
The following result provides a set of sufficient conditions for the multivariate likelihood ratio order.
Theorem 3.5.2
Let X = (X 1,…,X n ) and Y = (Y 1,…,Y n ) be two continuous random vectors with joint density functions f and g, respectively. If X or Y or both are MTP2, and
(3.13)
then
Proof
Let us assume that X is MTP2 (in the other case the proof is similar), then we have the following inequalities:
for all , such that x ≤y, where the first inequality follows from the MTP2 property and the second one from (3.13). Therefore, X ≤lr Y.
The multivariate likelihood ratio order is preserved under conditioning on sublattices, as we see next. Recall that a subset is called a sublattice if x,y ∈ A implies x ∧y ∈ A and x ∨y ∈ A. This result will be used to show the relationship among the multivariate likelihood ratio order and the multivariate dynamic hazard rate order. The proof is obvious from the definition.
Theorem 3.5.3
Let X = (X 1,…,X n ) and Y = (Y 1,…,Y n ) be two continuous random vectors. If X ≤lr Y, then
In particular, from the previous theorem, the multivariate likelihood ratio order is preserved under marginalization. This result is useful because, in some cases, it is easier to provide a result in the multivariate case rather that in the univariate case.
Theorem 3.5.4
Let X = (X 1,…,X n ) and Y = (Y 1,…,Y n ) be two continuous random vectors. If X ≤lr Y, then
Next, it is showed that the multivariate likelihood ratio order is stronger than the multivariate hazard rate order.
Theorem 3.5.5
Let X = (X 1,…,X n ) and Y = (Y 1,…,Y n ) be two continuous random vectors. If X ≤lr Y, then
Proof
Let us check the conditions for the definition of the multivariate dynamic hazard rate order. In particular, denoting by η and λ the multivariate dynamic hazard rates of X and Y, respectively, let us see if
where
and
whenever , 0 ≤x I ≤y I ≤ t e, 0 ≤x J ≤ t e, and for all .
The result will follow by proving that, as we shall see later,
(3.14)
Condition (3.14) will follow if
(3.15)
holds.Let us see that (3.15) follows if X ≤lr Y holds. Denoting by f and g the joint density functions of and , respectively, and by f I,J and g I,J the joint densities of X I,J and Y I,J , respectively, we see that the joint density function of is given by
and the joint density function of is given by
Given y J > t e, we see that x J < t e <y J and, analogously, x I ≤y I . Since X ≤lr Y, given , we see that
which, upon integration, yields
Therefore, from previous inequality, we see that (3.15) holds. Now, from Theorem 3.5.3, we see that (3.14) also holds. In particular, holds, for all . Now, since (2.25), we see that the hazard rates of and are ordered and, consequently, we see that
Finally, a result for the multivariate likelihood ratio order among random vectors with conditionally independent components is provided. In particular, we consider the same background as that in Theorem 3.3.8.
Theorem 3.5.6
Assume that
- (i)
-
X i (θ) =st Y i (θ), for all θ and for all i = 1,…,n,
- (ii)
-
X i (θ) ≤lr Y j (θ′), for all θ ≤ θ′ and for all 1 ≤ i ≤ j ≤ n, and
- (iii)
-
θ 1 ≤lr θ 2.
Then,
Proof
Let f i (x|θ) be the density function of X i (θ). From condition (ii), we see that
Furthermore, condition (iii) is equivalent to the fact that h i (θ 1,…,θ n ) is TP2 in (θ 1,…,θ n ,i) ∈ S ×{1,2}. Due to these facts and from Theorem 1.2.3, we see that is TP2 in (x 1,…,x n ,i), and we get the result.
Read full chapter
URL:
https://www.sciencedirect.com/science/article/pii/B978012803768300003X
Convex Functions, Partial Orderings, and Statistical Applications
In Mathematics in Science and Engineering, 1992
13.6 Some Properties of Log-Concave Density Functions
Log-concave density functions which satisfy (13.19) play an important role in statistics and probability. In the following we observe some known facts concerning this class of densities.
13.24 Fact
Let X 1 …, X n be i.i.d. univariate random variables with a common density function h(x). If h(x) is a log-concave function of x for x ∈ ℝ, then the joint density function of (X 1, …, X n ) is a log-concave function of x for x ∈ ℝ n .
13.25 Fact
If f(x) = g(T(x)) where g: ℝ → [0, ∞) is decreasing and T(x) is a convex function of x for x ∈ ℝ n , then f is a log-concave function of x for x ∈ ℝ n .
The following theorem, due to Prékopa (1971) and Brascamp and Lieb (1975), shows that the integral of a log-concave function is log-concave:
13.26 Theorem
Let f(x, y): ℝ n+m → [0, ∞) be a log-concave function of (x, y) for x ∈ ℝ n and y ∈ ℝ m . Then the function g: ℝ n → [0, ∞) given by
(13.27)
is log-concave.
Proof
We adopt the proof in Brascamp and Lieb (1975). First note that it suffices to prove the theorem for m = n = 1 because the general case follows by Fubini's theorem and induction. Let x 1, x 2 be two points in ℝ such that g(x 1)g(x 2) ≠ 0. For convenience we may assume that
because otherwise we can replace f(x, y) by e bx f(x, y) for suitably chosen b and the problem remains unchanged. For each fixed λ > 0, denote
Then, by log-concavity of f, C 1(λ) is convex and C 2(x, λ) is an interval. (For the convexity of C 1(λ), see Fact 13.28). Letting v(x, y) = ∫ C 2(x,λ) dy be the Lebesgue measure of the set C 2(x, λ), we have, by Theorem 13.18,
Since g(x) can be expressed as g(x) = ∫0 ∞ v(x, λ) dλ, we have
for all α ∈ [0, 1], where the second inequality follows from the arithmetic mean-geometric mean inequality.
A simple application of Theorem 13.26 is (Brascamp and Lieb, 1975; see also Barlow and Proschan, 1981, p. 104):
13.27 Corollary
The convolution of two log-concave density functions in ℝ n is log-concave.
Proof
Let f 1, f 2 be log-concave density functions. Then f 1(x − y)f 2(y) is jointly log-concave in (x, y) ∈ ℝ2n . Thus by Theorem 13.26
is log-concave.
A density function f is said to be unimodal if the set
(13.28)
is a convex set in ℝ n for all λ > 0. The following facts show how log-concavity and unimodality are related.
13.28 Fact
If f: ℝ n → [0, ∞) is a probability density function, then log-concavity off implies unimodality of f.
Proof
Let x 1, x 2 ∈ ℝ n be in D λ. Then for every α ∈ [0, 1], we have, by (13.20),
(13.29)
Thus αx 1 + (1 − α)x 2 is also in D λ.
A function f is said to be Schur-concave if y ≻ x implies f(x) ≥ f(y) for all x, y ∈ ℝ n (see Definition 12.23). It is known that all Schur-concave functions are permutation invariant (see Theorem 12.24). Furthermore, it is known that
13.29 Fact
If f: ℝ n → [0, ∞) is a permutation-invariant and log-concave function of x ∈ ℝ n , then it is a Schur-concave function of x ∈ ℝ n .
Proof
Assume y ≻ x and, without loss of generality, it may be assumed that x, y are of the form
where y 2 < x 2 ≤ x 1 < y 1 and x 1 + x 2 = y 1 + y 2. Let y * = (y 2, y 1, x 3, …, x n ). Then there exists an α ∈ (0, 1) such that x = αy + (1 − α)y *. Thus by the permutation-invariance and log-concavity properties of f, we have
Read full chapter
URL:
https://www.sciencedirect.com/science/article/pii/S0076539208628258
Preliminaries
Félix Belzunce , ... Julio Mulero , in An Introduction to Stochastic Orders, 2016
1.3.5 Parametric families of multivariate distributions
In this section, some known multivariate distributions are recalled. These models will be compared in Chapter 3 in some multivariate stochastic orders.
It is worth mentioning that the mean of a random vector X = (X 1,…,X n ) is given by
and the covariance matrix of X is
Next, we consider some multivariate distributions. First, the definition of the multivariate normal distribution is recalled.
Definition 1.3.2
Given a random vector X = (X 1,…,X n ), it is said that X follows a multivariate normal distribution with mean vector and covariance matrix , denoted by X ∼ N n (μ,Σ ), if its joint density function is given by
The marginal distribution functions follow univariate normal models. Furthermore, the copula of a N n (μ,Σ) is the same to that of N n (0,P) where P is the correlation matrix obtained through the covariance matrix Σ. In this sense, all multivariate normal distributions with the same dimension and correlation matrix have the same (Gaussian) copula.
Another well-known multivariate family is the elliptically contoured model. Let us consider the formal definition.
Definition 1.3.3
Given a random vector X = (X 1,…,X n ), it is said that X follows an elliptically contoured distribution, denoted by E n (μ,Σ,g), if its joint density function is given by
(1.17)
where μ is the median vector (which is also the mean vector if the latter exists), Σ is a symmetric positive definite matrix which is proportional to the covariance matrix, if the latter exists, and such that .A particular case of elliptically contoured distributions is the case of a multivariate normal taking
Some other particular cases are described in Ref. [52].
Read full chapter
URL:
https://www.sciencedirect.com/science/article/pii/B9780128037683000016
GENERAL FEATURES OF A MARKOV PROCESS
Daniel T. Gillespie , in Markov Processes, 1992
2.1 The Markov State Density Function
We consider a time-evolving or 'dynamical' system whose possible states can be represented by points on the real axis, and we let
(2.1-1)
We shall assume that the value of X at some initial time t 0 is fixed,
(2.1-2)
but that X(t) for any t> t 0 can be predicted only probabilistically; more specifically, we assume that X(t) for any given t> t 0 is a random variable, as defined in Section 1.2 Since it makes sense to inquire about the state of the system at successive instants t 1, t 2, …, tn , where t 0 < t 1 < t 2 < … < tn , then we can ascribe to the corresponding n random variables X(t 1), X(t 2), …, X(tn ) a joint density function P n (1), which is defined as follows:
(2.1-3)
If all these assumptions are satisfied, then we say that X(t) is a stochastic process.
It is evident that a stochastic process X(t) has infinitely many joint density functions Pn (1), corresponding to n = 1, 2, … . And associated with each of these joint density functions is a plethora of subordinate density functions; for example,
is defined to be the joint density function of the n – j random variables given the j + 1 conditions Notice that the subscript on the density function Pk (j) refers to the number of (x,t) pairs to the left of the 'given' bar, while the superscript refers to the number of (x,t) pairs to the right of the 'given' bar; thus, Pk (j) is a k-variate joint density function with j conditionings.
It is always possible to calculate the function P n-1 (1) from the function Pn (1) by simply integrating the latter over any one of the variables x 1, …, xn . However, it is not in general possible to deduce the function P n+1 (1) from the function Pn (1). This 'open-ended' nature of the density functions for a general stochastic process usually makes any substantive analysis extremely difficult. But we shall be concerned here with only a very restricted subclass of stochastic processes, namely those that have the 'past-forgetting' property that, for all j ≥ 2 and t i-1 ≤ ti ,
(2.1-4)
This is called the Markov property, and it says that only the most recent conditioning matters: Given that X(t') = x', then our ability to predict X(t) for any t > t' will not be enhanced by a knowledge of any values of the process earlier than t'. Any stochastic process X(t) that has this past-forgetting property is called a Markovian stochastic process, or more simply, a Markov process. In what follows it may always be assumed, unless explicitly stated otherwise, that the stochastic process X(t) under consideration is a Markov process.
The Markov property (2.1-4) breaks the open-endedness of the hierarchy of joint state density functions in a dramatic way. For the joint density function P 2 (1) we have
Hence, writing P 1 (1) ≡ P in accordance with the notation suggested in Eq. (2.1-4), we have
(2.1-5)
The same kind of reasoning shows that
and more generally, for any set of times
(2.1-6)
So for a Markov process, every conditioned state density function can be written solely in terms of the particular conditioned state density function The function thus becomes the principle focus of our study, and we shall henceforth refer to it as the Markov state density function. For future reference, the formal definition of the Markov state density function is [cf.Eq. (2.1-3)]
(2.1-7)
Read full chapter
URL:
https://www.sciencedirect.com/science/article/pii/B9780080918372500077
Source: https://www.sciencedirect.com/topics/mathematics/joint-density-function
0 Response to "Independence for Continuous Joint Density Function"
Post a Comment