cosmoslik_plugins.samplers package¶
Submodules¶
-
class
cosmoslik_plugins.samplers.emcee.
emcee
(params, num_samples, nwalkers=100, cov_est=None, output_extra_params=None, output_file=None, output_freq=10)¶ Bases:
cosmoslik.cosmoslik.SlikSampler
-
__init__
(params, num_samples, nwalkers=100, cov_est=None, output_extra_params=None, output_file=None, output_freq=10)¶ Initialize self. See help(type(self)) for accurate signature.
-
-
cosmoslik_plugins.samplers.emcee.
multivariate_normal
(mean, cov[, size, check_valid, tol])¶ Draw random samples from a multivariate normal distribution.
The multivariate normal, multinormal or Gaussian distribution is a generalization of the one-dimensional normal distribution to higher dimensions. Such a distribution is specified by its mean and covariance matrix. These parameters are analogous to the mean (average or “center”) and variance (standard deviation, or “width,” squared) of the one-dimensional normal distribution.
Parameters: - mean (1-D array_like, of length N) – Mean of the N-dimensional distribution.
- cov (2-D array_like, of shape (N, N)) – Covariance matrix of the distribution. It must be symmetric and positive-semidefinite for proper sampling.
- size (int or tuple of ints, optional) – Given a shape of, for example,
(m,n,k)
,m*n*k
samples are generated, and packed in an m-by-n-by-k arrangement. Because each sample is N-dimensional, the output shape is(m,n,k,N)
. If no shape is specified, a single (N-D) sample is returned. - check_valid ({ 'warn', 'raise', 'ignore' }, optional) – Behavior when the covariance matrix is not positive semidefinite.
- tol (float, optional) – Tolerance when checking the singular values in covariance matrix.
Returns: out – The drawn samples, of shape size, if that was provided. If not, the shape is
(N,)
.In other words, each entry
out[i,j,...,:]
is an N-dimensional value drawn from the distribution.Return type: ndarray
Notes
The mean is a coordinate in N-dimensional space, which represents the location where samples are most likely to be generated. This is analogous to the peak of the bell curve for the one-dimensional or univariate normal distribution.
Covariance indicates the level to which two variables vary together. From the multivariate normal distribution, we draw N-dimensional samples, \(X = [x_1, x_2, ... x_N]\). The covariance matrix element \(C_{ij}\) is the covariance of \(x_i\) and \(x_j\). The element \(C_{ii}\) is the variance of \(x_i\) (i.e. its “spread”).
Instead of specifying the full covariance matrix, popular approximations include:
- Spherical covariance (cov is a multiple of the identity matrix)
- Diagonal covariance (cov has non-negative elements, and only on the diagonal)
This geometrical property can be seen in two dimensions by plotting generated data-points:
>>> mean = [0, 0] >>> cov = [[1, 0], [0, 100]] # diagonal covariance
Diagonal covariance means that points are oriented along x or y-axis:
>>> import matplotlib.pyplot as plt >>> x, y = np.random.multivariate_normal(mean, cov, 5000).T >>> plt.plot(x, y, 'x') >>> plt.axis('equal') >>> plt.show()
Note that the covariance matrix must be positive semidefinite (a.k.a. nonnegative-definite). Otherwise, the behavior of this method is undefined and backwards compatibility is not guaranteed.
References
[1] Papoulis, A., “Probability, Random Variables, and Stochastic Processes,” 3rd ed., New York: McGraw-Hill, 1991. [2] Duda, R. O., Hart, P. E., and Stork, D. G., “Pattern Classification,” 2nd ed., New York: Wiley, 2001. Examples
>>> mean = (1, 2) >>> cov = [[1, 0], [0, 1]] >>> x = np.random.multivariate_normal(mean, cov, (3, 3)) >>> x.shape (3, 3, 2)
The following is probably true, given that 0.6 is roughly twice the standard deviation:
>>> list((x[0,0,:] - mean) < 0.6) [True, True]
-
class
cosmoslik_plugins.samplers.metropolis_hastings.
metropolis_hastings
(params, output_file=None, output_extra_params=None, num_samples=100, print_level=0, cov_est=None, proposal_scale=2.4, proposal_update=True, proposal_update_start=1000, mpi_comm_freq=100, max_weight=10, debug_output=False, temp=1, reseed=True, yield_rejected=False)¶ Bases:
cosmoslik.cosmoslik.SlikSampler
An adaptive metropolis hastings sampler.
To run with MPI, call your script with:
cosmoslik -n <nchains+1> script.py(Note one process is a “master” so run one more process than you want chains)
-
__init__
(params, output_file=None, output_extra_params=None, num_samples=100, print_level=0, cov_est=None, proposal_scale=2.4, proposal_update=True, proposal_update_start=1000, mpi_comm_freq=100, max_weight=10, debug_output=False, temp=1, reseed=True, yield_rejected=False)¶ Parameters: - params – The script to which this sampler is attached
- output_file – File where to save the chain (if running with MPI, everything still gets dumped into one file). By default only sampled parameters get saved. Use cosmoslik.utils.load_chain to load chains.
- output_extra_params – Extra parameters besides the sampled ones which to save to file. Arbitrary objects can be outputted, in which case entires should be tuples of (<name>,’object’), or for more efficient and faster file write/reads (<name>,<dtype>) where <dtype> is a valid numpy dtype (e.g. ‘(10,10)d’ for a 10x10 array of doubles, etc…)
- num_samples – The number of total desired samples (including rejected ones)
- print_level – 0/1/2 to print little/medium/alot
- cov_est – One or a list of covariances which will be combined with K.chains.combine_cov (see documentation there for understood formats) to produce the full proposal covariance. Covariance for any sampled parameter not provided here will be taken from the scale attribute of that parameters. This should be a best estimate of the posterior covariance. The actual proposal covariance is multiplied by proposal_scale**2 / N where N is the number of parameters. (default: diagonal covariance taken from the scale of each parameter)
- proposal_scale – Scale the proposal matrix. (default: 2.4)
- proposal_update – Whether to update the proposal matrix. Ignored if not running with MPI. The proposal is updated by taking the sample covariance of the last half of each chain. (default: True)
- proposal_update_start – If proposal_update is True, how many total samples (including rejected) per chain to wait before starting to do updates (default: 1000).
- mpi_comm_freq – Number of accepted samples to wait inbetween the chains communicating with the master process and having their progress written to file (default: 50)
- max_weight – If a the chain stays in the same location more than this number of samples, it is broken up as distinct steps
- reseed – Draw a random seed based on system time and process number before starting. (default: True)
- yield_rejected – Yield samples with 0 weight (default: False)
- debug_output – Print (code) debugging messages.
-
sample
(lnl)¶ Returns a generator which yields samples from the likelihood using the Metropolis-Hastings algorithm.
- The samples returned are tuples of (x,weight,lnl,extra)
- lnl - likelihood weight - the statistical weight (could be 0 for rejected steps) x - the vector of parameter values extra - the extra information returned by lnl
-
-
cosmoslik_plugins.samplers.metropolis_hastings.
multivariate_normal
(mean, cov[, size, check_valid, tol])¶ Draw random samples from a multivariate normal distribution.
The multivariate normal, multinormal or Gaussian distribution is a generalization of the one-dimensional normal distribution to higher dimensions. Such a distribution is specified by its mean and covariance matrix. These parameters are analogous to the mean (average or “center”) and variance (standard deviation, or “width,” squared) of the one-dimensional normal distribution.
Parameters: - mean (1-D array_like, of length N) – Mean of the N-dimensional distribution.
- cov (2-D array_like, of shape (N, N)) – Covariance matrix of the distribution. It must be symmetric and positive-semidefinite for proper sampling.
- size (int or tuple of ints, optional) – Given a shape of, for example,
(m,n,k)
,m*n*k
samples are generated, and packed in an m-by-n-by-k arrangement. Because each sample is N-dimensional, the output shape is(m,n,k,N)
. If no shape is specified, a single (N-D) sample is returned. - check_valid ({ 'warn', 'raise', 'ignore' }, optional) – Behavior when the covariance matrix is not positive semidefinite.
- tol (float, optional) – Tolerance when checking the singular values in covariance matrix.
Returns: out – The drawn samples, of shape size, if that was provided. If not, the shape is
(N,)
.In other words, each entry
out[i,j,...,:]
is an N-dimensional value drawn from the distribution.Return type: ndarray
Notes
The mean is a coordinate in N-dimensional space, which represents the location where samples are most likely to be generated. This is analogous to the peak of the bell curve for the one-dimensional or univariate normal distribution.
Covariance indicates the level to which two variables vary together. From the multivariate normal distribution, we draw N-dimensional samples, \(X = [x_1, x_2, ... x_N]\). The covariance matrix element \(C_{ij}\) is the covariance of \(x_i\) and \(x_j\). The element \(C_{ii}\) is the variance of \(x_i\) (i.e. its “spread”).
Instead of specifying the full covariance matrix, popular approximations include:
- Spherical covariance (cov is a multiple of the identity matrix)
- Diagonal covariance (cov has non-negative elements, and only on the diagonal)
This geometrical property can be seen in two dimensions by plotting generated data-points:
>>> mean = [0, 0] >>> cov = [[1, 0], [0, 100]] # diagonal covariance
Diagonal covariance means that points are oriented along x or y-axis:
>>> import matplotlib.pyplot as plt >>> x, y = np.random.multivariate_normal(mean, cov, 5000).T >>> plt.plot(x, y, 'x') >>> plt.axis('equal') >>> plt.show()
Note that the covariance matrix must be positive semidefinite (a.k.a. nonnegative-definite). Otherwise, the behavior of this method is undefined and backwards compatibility is not guaranteed.
References
[1] Papoulis, A., “Probability, Random Variables, and Stochastic Processes,” 3rd ed., New York: McGraw-Hill, 1991. [2] Duda, R. O., Hart, P. E., and Stork, D. G., “Pattern Classification,” 2nd ed., New York: Wiley, 2001. Examples
>>> mean = (1, 2) >>> cov = [[1, 0], [0, 1]] >>> x = np.random.multivariate_normal(mean, cov, (3, 3)) >>> x.shape (3, 3, 2)
The following is probably true, given that 0.6 is roughly twice the standard deviation:
>>> list((x[0,0,:] - mean) < 0.6) [True, True]
-
cosmoslik_plugins.samplers.metropolis_hastings.
seed
(seed=None)¶ Seed the generator.
This method is called when RandomState is initialized. It can be called again to re-seed the generator. For details, see RandomState.
Parameters: seed (int or 1-d array_like, optional) – Seed for RandomState. Must be convertible to 32 bit unsigned integers. See also
RandomState()
-
cosmoslik_plugins.samplers.metropolis_hastings.
uniform
(low=0.0, high=1.0, size=None)¶ Draw samples from a uniform distribution.
Samples are uniformly distributed over the half-open interval
[low, high)
(includes low, but excludes high). In other words, any value within the given interval is equally likely to be drawn by uniform.Parameters: - low (float or array_like of floats, optional) – Lower boundary of the output interval. All values generated will be greater than or equal to low. The default value is 0.
- high (float or array_like of floats) – Upper boundary of the output interval. All values generated will be less than high. The default value is 1.0.
- size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g.,
(m, n, k)
, thenm * n * k
samples are drawn. If size isNone
(default), a single value is returned iflow
andhigh
are both scalars. Otherwise,np.broadcast(low, high).size
samples are drawn.
Returns: out – Drawn samples from the parameterized uniform distribution.
Return type: ndarray or scalar
See also
randint()
- Discrete uniform distribution, yielding integers.
random_integers()
- Discrete uniform distribution over the closed interval
[low, high]
. random_sample()
- Floats uniformly distributed over
[0, 1)
. random()
- Alias for random_sample.
rand()
- Convenience function that accepts dimensions as input, e.g.,
rand(2,2)
would generate a 2-by-2 array of floats, uniformly distributed over[0, 1)
.
Notes
The probability density function of the uniform distribution is
\[p(x) = \frac{1}{b - a}\]anywhere within the interval
[a, b)
, and zero elsewhere.When
high
==low
, values oflow
will be returned. Ifhigh
<low
, the results are officially undefined and may eventually raise an error, i.e. do not rely on this function to behave when passed arguments satisfying that inequality condition.Examples
Draw samples from the distribution:
>>> s = np.random.uniform(-1,0,1000)
All values are within the given interval:
>>> np.all(s >= -1) True >>> np.all(s < 0) True
Display the histogram of the samples, along with the probability density function:
>>> import matplotlib.pyplot as plt >>> count, bins, ignored = plt.hist(s, 15, normed=True) >>> plt.plot(bins, np.ones_like(bins), linewidth=2, color='r') >>> plt.show()
-
class
cosmoslik_plugins.samplers.priors.
priors
(params, output_extra_params=None, **kwargs)¶ Bases:
cosmoslik_plugins.samplers.metropolis_hastings.metropolis_hastings
Sample from the prior.
-
__init__
(params, output_extra_params=None, **kwargs)¶ Args: params:
The script to which this sampler is attached- output_file:
- File where to save the chain (if running with MPI, everything still gets dumped into one file). By default only sampled parameters get saved. Use cosmoslik.utils.load_chain to load chains.
- output_extra_params:
- Extra parameters besides the sampled ones which to save to file. Arbitrary objects can be outputted, in which case entires should be tuples of (<name>,’object’), or for more efficient and faster file write/reads (<name>,<dtype>) where <dtype> is a valid numpy dtype (e.g. ‘(10,10)d’ for a 10x10 array of doubles, etc…)
- num_samples:
- The number of total desired samples (including rejected ones)
- print_level:
- 0/1/2 to print little/medium/alot
- cov_est:
- One or a list of covariances which will be combined with K.chains.combine_cov (see documentation there for understood formats) to produce the full proposal covariance. Covariance for any sampled parameter not provided here will be taken from the scale attribute of that parameters. This should be a best estimate of the posterior covariance. The actual proposal covariance is multiplied by proposal_scale**2 / N where N is the number of parameters. (default: diagonal covariance taken from the scale of each parameter)
- proposal_scale:
- Scale the proposal matrix. (default: 2.4)
- proposal_update:
- Whether to update the proposal matrix. Ignored if not running with MPI. The proposal is updated by taking the sample covariance of the last half of each chain. (default: True)
- proposal_update_start:
- If proposal_update is True, how many total samples (including rejected) per chain to wait before starting to do updates (default: 1000).
- mpi_comm_freq:
- Number of accepted samples to wait inbetween the chains communicating with the master process and having their progress written to file (default: 50)
- max_weight:
- If a the chain stays in the same location more than this number of samples, it is broken up as distinct steps
- reseed:
- Draw a random seed based on system time and process number before starting. (default: True)
- yield_rejected:
- Yield samples with 0 weight (default: False)
- debug_output:
- Print (code) debugging messages.
-
-
cosmoslik_plugins.samplers.priors.
uniform
(low=0.0, high=1.0, size=None)¶ Draw samples from a uniform distribution.
Samples are uniformly distributed over the half-open interval
[low, high)
(includes low, but excludes high). In other words, any value within the given interval is equally likely to be drawn by uniform.Parameters: - low (float or array_like of floats, optional) – Lower boundary of the output interval. All values generated will be greater than or equal to low. The default value is 0.
- high (float or array_like of floats) – Upper boundary of the output interval. All values generated will be less than high. The default value is 1.0.
- size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g.,
(m, n, k)
, thenm * n * k
samples are drawn. If size isNone
(default), a single value is returned iflow
andhigh
are both scalars. Otherwise,np.broadcast(low, high).size
samples are drawn.
Returns: out – Drawn samples from the parameterized uniform distribution.
Return type: ndarray or scalar
See also
randint()
- Discrete uniform distribution, yielding integers.
random_integers()
- Discrete uniform distribution over the closed interval
[low, high]
. random_sample()
- Floats uniformly distributed over
[0, 1)
. random()
- Alias for random_sample.
rand()
- Convenience function that accepts dimensions as input, e.g.,
rand(2,2)
would generate a 2-by-2 array of floats, uniformly distributed over[0, 1)
.
Notes
The probability density function of the uniform distribution is
\[p(x) = \frac{1}{b - a}\]anywhere within the interval
[a, b)
, and zero elsewhere.When
high
==low
, values oflow
will be returned. Ifhigh
<low
, the results are officially undefined and may eventually raise an error, i.e. do not rely on this function to behave when passed arguments satisfying that inequality condition.Examples
Draw samples from the distribution:
>>> s = np.random.uniform(-1,0,1000)
All values are within the given interval:
>>> np.all(s >= -1) True >>> np.all(s < 0) True
Display the histogram of the samples, along with the probability density function:
>>> import matplotlib.pyplot as plt >>> count, bins, ignored = plt.hist(s, 15, normed=True) >>> plt.plot(bins, np.ones_like(bins), linewidth=2, color='r') >>> plt.show()
-
cosmoslik_plugins.samplers.utils.
initialize_covariance
(sampled, covs=None)¶ Get the covariance for the set of parameters in sampled
Prepare the proposal covariance based on anything passed to self.proposal_cov, defaulting to the scale of each sampled parameter otherwise.