cosmoslik_plugins.samplers package

Submodules

class cosmoslik_plugins.samplers.emcee.emcee(params, num_samples, nwalkers=100, cov_est=None, output_extra_params=None, output_file=None, output_freq=10)

Bases: cosmoslik.cosmoslik.SlikSampler

__init__(params, num_samples, nwalkers=100, cov_est=None, output_extra_params=None, output_file=None, output_freq=10)

Initialize self. See help(type(self)) for accurate signature.

cosmoslik_plugins.samplers.emcee.multivariate_normal(mean, cov[, size, check_valid, tol])

Draw random samples from a multivariate normal distribution.

The multivariate normal, multinormal or Gaussian distribution is a generalization of the one-dimensional normal distribution to higher dimensions. Such a distribution is specified by its mean and covariance matrix. These parameters are analogous to the mean (average or “center”) and variance (standard deviation, or “width,” squared) of the one-dimensional normal distribution.

Parameters:
  • mean (1-D array_like, of length N) – Mean of the N-dimensional distribution.
  • cov (2-D array_like, of shape (N, N)) – Covariance matrix of the distribution. It must be symmetric and positive-semidefinite for proper sampling.
  • size (int or tuple of ints, optional) – Given a shape of, for example, (m,n,k), m*n*k samples are generated, and packed in an m-by-n-by-k arrangement. Because each sample is N-dimensional, the output shape is (m,n,k,N). If no shape is specified, a single (N-D) sample is returned.
  • check_valid ({ 'warn', 'raise', 'ignore' }, optional) – Behavior when the covariance matrix is not positive semidefinite.
  • tol (float, optional) – Tolerance when checking the singular values in covariance matrix.
Returns:

out – The drawn samples, of shape size, if that was provided. If not, the shape is (N,).

In other words, each entry out[i,j,...,:] is an N-dimensional value drawn from the distribution.

Return type:

ndarray

Notes

The mean is a coordinate in N-dimensional space, which represents the location where samples are most likely to be generated. This is analogous to the peak of the bell curve for the one-dimensional or univariate normal distribution.

Covariance indicates the level to which two variables vary together. From the multivariate normal distribution, we draw N-dimensional samples, \(X = [x_1, x_2, ... x_N]\). The covariance matrix element \(C_{ij}\) is the covariance of \(x_i\) and \(x_j\). The element \(C_{ii}\) is the variance of \(x_i\) (i.e. its “spread”).

Instead of specifying the full covariance matrix, popular approximations include:

  • Spherical covariance (cov is a multiple of the identity matrix)
  • Diagonal covariance (cov has non-negative elements, and only on the diagonal)

This geometrical property can be seen in two dimensions by plotting generated data-points:

>>> mean = [0, 0]
>>> cov = [[1, 0], [0, 100]]  # diagonal covariance

Diagonal covariance means that points are oriented along x or y-axis:

>>> import matplotlib.pyplot as plt
>>> x, y = np.random.multivariate_normal(mean, cov, 5000).T
>>> plt.plot(x, y, 'x')
>>> plt.axis('equal')
>>> plt.show()

Note that the covariance matrix must be positive semidefinite (a.k.a. nonnegative-definite). Otherwise, the behavior of this method is undefined and backwards compatibility is not guaranteed.

References

[1]Papoulis, A., “Probability, Random Variables, and Stochastic Processes,” 3rd ed., New York: McGraw-Hill, 1991.
[2]Duda, R. O., Hart, P. E., and Stork, D. G., “Pattern Classification,” 2nd ed., New York: Wiley, 2001.

Examples

>>> mean = (1, 2)
>>> cov = [[1, 0], [0, 1]]
>>> x = np.random.multivariate_normal(mean, cov, (3, 3))
>>> x.shape
(3, 3, 2)

The following is probably true, given that 0.6 is roughly twice the standard deviation:

>>> list((x[0,0,:] - mean) < 0.6)
[True, True]
class cosmoslik_plugins.samplers.metropolis_hastings.metropolis_hastings(params, output_file=None, output_extra_params=None, num_samples=100, print_level=0, cov_est=None, proposal_scale=2.4, proposal_update=True, proposal_update_start=1000, mpi_comm_freq=100, max_weight=10, debug_output=False, temp=1, reseed=True, yield_rejected=False)

Bases: cosmoslik.cosmoslik.SlikSampler

An adaptive metropolis hastings sampler.

To run with MPI, call your script with:

cosmoslik -n <nchains+1> script.py

(Note one process is a “master” so run one more process than you want chains)

__init__(params, output_file=None, output_extra_params=None, num_samples=100, print_level=0, cov_est=None, proposal_scale=2.4, proposal_update=True, proposal_update_start=1000, mpi_comm_freq=100, max_weight=10, debug_output=False, temp=1, reseed=True, yield_rejected=False)
Parameters:
  • params – The script to which this sampler is attached
  • output_file – File where to save the chain (if running with MPI, everything still gets dumped into one file). By default only sampled parameters get saved. Use cosmoslik.utils.load_chain to load chains.
  • output_extra_params – Extra parameters besides the sampled ones which to save to file. Arbitrary objects can be outputted, in which case entires should be tuples of (<name>,’object’), or for more efficient and faster file write/reads (<name>,<dtype>) where <dtype> is a valid numpy dtype (e.g. ‘(10,10)d’ for a 10x10 array of doubles, etc…)
  • num_samples – The number of total desired samples (including rejected ones)
  • print_level – 0/1/2 to print little/medium/alot
  • cov_est – One or a list of covariances which will be combined with K.chains.combine_cov (see documentation there for understood formats) to produce the full proposal covariance. Covariance for any sampled parameter not provided here will be taken from the scale attribute of that parameters. This should be a best estimate of the posterior covariance. The actual proposal covariance is multiplied by proposal_scale**2 / N where N is the number of parameters. (default: diagonal covariance taken from the scale of each parameter)
  • proposal_scale – Scale the proposal matrix. (default: 2.4)
  • proposal_update – Whether to update the proposal matrix. Ignored if not running with MPI. The proposal is updated by taking the sample covariance of the last half of each chain. (default: True)
  • proposal_update_start – If proposal_update is True, how many total samples (including rejected) per chain to wait before starting to do updates (default: 1000).
  • mpi_comm_freq – Number of accepted samples to wait inbetween the chains communicating with the master process and having their progress written to file (default: 50)
  • max_weight – If a the chain stays in the same location more than this number of samples, it is broken up as distinct steps
  • reseed – Draw a random seed based on system time and process number before starting. (default: True)
  • yield_rejected – Yield samples with 0 weight (default: False)
  • debug_output – Print (code) debugging messages.
sample(lnl)

Returns a generator which yields samples from the likelihood using the Metropolis-Hastings algorithm.

The samples returned are tuples of (x,weight,lnl,extra)
lnl - likelihood weight - the statistical weight (could be 0 for rejected steps) x - the vector of parameter values extra - the extra information returned by lnl
cosmoslik_plugins.samplers.metropolis_hastings.multivariate_normal(mean, cov[, size, check_valid, tol])

Draw random samples from a multivariate normal distribution.

The multivariate normal, multinormal or Gaussian distribution is a generalization of the one-dimensional normal distribution to higher dimensions. Such a distribution is specified by its mean and covariance matrix. These parameters are analogous to the mean (average or “center”) and variance (standard deviation, or “width,” squared) of the one-dimensional normal distribution.

Parameters:
  • mean (1-D array_like, of length N) – Mean of the N-dimensional distribution.
  • cov (2-D array_like, of shape (N, N)) – Covariance matrix of the distribution. It must be symmetric and positive-semidefinite for proper sampling.
  • size (int or tuple of ints, optional) – Given a shape of, for example, (m,n,k), m*n*k samples are generated, and packed in an m-by-n-by-k arrangement. Because each sample is N-dimensional, the output shape is (m,n,k,N). If no shape is specified, a single (N-D) sample is returned.
  • check_valid ({ 'warn', 'raise', 'ignore' }, optional) – Behavior when the covariance matrix is not positive semidefinite.
  • tol (float, optional) – Tolerance when checking the singular values in covariance matrix.
Returns:

out – The drawn samples, of shape size, if that was provided. If not, the shape is (N,).

In other words, each entry out[i,j,...,:] is an N-dimensional value drawn from the distribution.

Return type:

ndarray

Notes

The mean is a coordinate in N-dimensional space, which represents the location where samples are most likely to be generated. This is analogous to the peak of the bell curve for the one-dimensional or univariate normal distribution.

Covariance indicates the level to which two variables vary together. From the multivariate normal distribution, we draw N-dimensional samples, \(X = [x_1, x_2, ... x_N]\). The covariance matrix element \(C_{ij}\) is the covariance of \(x_i\) and \(x_j\). The element \(C_{ii}\) is the variance of \(x_i\) (i.e. its “spread”).

Instead of specifying the full covariance matrix, popular approximations include:

  • Spherical covariance (cov is a multiple of the identity matrix)
  • Diagonal covariance (cov has non-negative elements, and only on the diagonal)

This geometrical property can be seen in two dimensions by plotting generated data-points:

>>> mean = [0, 0]
>>> cov = [[1, 0], [0, 100]]  # diagonal covariance

Diagonal covariance means that points are oriented along x or y-axis:

>>> import matplotlib.pyplot as plt
>>> x, y = np.random.multivariate_normal(mean, cov, 5000).T
>>> plt.plot(x, y, 'x')
>>> plt.axis('equal')
>>> plt.show()

Note that the covariance matrix must be positive semidefinite (a.k.a. nonnegative-definite). Otherwise, the behavior of this method is undefined and backwards compatibility is not guaranteed.

References

[1]Papoulis, A., “Probability, Random Variables, and Stochastic Processes,” 3rd ed., New York: McGraw-Hill, 1991.
[2]Duda, R. O., Hart, P. E., and Stork, D. G., “Pattern Classification,” 2nd ed., New York: Wiley, 2001.

Examples

>>> mean = (1, 2)
>>> cov = [[1, 0], [0, 1]]
>>> x = np.random.multivariate_normal(mean, cov, (3, 3))
>>> x.shape
(3, 3, 2)

The following is probably true, given that 0.6 is roughly twice the standard deviation:

>>> list((x[0,0,:] - mean) < 0.6)
[True, True]
cosmoslik_plugins.samplers.metropolis_hastings.seed(seed=None)

Seed the generator.

This method is called when RandomState is initialized. It can be called again to re-seed the generator. For details, see RandomState.

Parameters:seed (int or 1-d array_like, optional) – Seed for RandomState. Must be convertible to 32 bit unsigned integers.

See also

RandomState()

cosmoslik_plugins.samplers.metropolis_hastings.uniform(low=0.0, high=1.0, size=None)

Draw samples from a uniform distribution.

Samples are uniformly distributed over the half-open interval [low, high) (includes low, but excludes high). In other words, any value within the given interval is equally likely to be drawn by uniform.

Parameters:
  • low (float or array_like of floats, optional) – Lower boundary of the output interval. All values generated will be greater than or equal to low. The default value is 0.
  • high (float or array_like of floats) – Upper boundary of the output interval. All values generated will be less than high. The default value is 1.0.
  • size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. If size is None (default), a single value is returned if low and high are both scalars. Otherwise, np.broadcast(low, high).size samples are drawn.
Returns:

out – Drawn samples from the parameterized uniform distribution.

Return type:

ndarray or scalar

See also

randint()
Discrete uniform distribution, yielding integers.
random_integers()
Discrete uniform distribution over the closed interval [low, high].
random_sample()
Floats uniformly distributed over [0, 1).
random()
Alias for random_sample.
rand()
Convenience function that accepts dimensions as input, e.g., rand(2,2) would generate a 2-by-2 array of floats, uniformly distributed over [0, 1).

Notes

The probability density function of the uniform distribution is

\[p(x) = \frac{1}{b - a}\]

anywhere within the interval [a, b), and zero elsewhere.

When high == low, values of low will be returned. If high < low, the results are officially undefined and may eventually raise an error, i.e. do not rely on this function to behave when passed arguments satisfying that inequality condition.

Examples

Draw samples from the distribution:

>>> s = np.random.uniform(-1,0,1000)

All values are within the given interval:

>>> np.all(s >= -1)
True
>>> np.all(s < 0)
True

Display the histogram of the samples, along with the probability density function:

>>> import matplotlib.pyplot as plt
>>> count, bins, ignored = plt.hist(s, 15, normed=True)
>>> plt.plot(bins, np.ones_like(bins), linewidth=2, color='r')
>>> plt.show()
class cosmoslik_plugins.samplers.priors.priors(params, output_extra_params=None, **kwargs)

Bases: cosmoslik_plugins.samplers.metropolis_hastings.metropolis_hastings

Sample from the prior.

__init__(params, output_extra_params=None, **kwargs)

Args: params:

The script to which this sampler is attached
output_file:
File where to save the chain (if running with MPI, everything still gets dumped into one file). By default only sampled parameters get saved. Use cosmoslik.utils.load_chain to load chains.
output_extra_params:
Extra parameters besides the sampled ones which to save to file. Arbitrary objects can be outputted, in which case entires should be tuples of (<name>,’object’), or for more efficient and faster file write/reads (<name>,<dtype>) where <dtype> is a valid numpy dtype (e.g. ‘(10,10)d’ for a 10x10 array of doubles, etc…)
num_samples:
The number of total desired samples (including rejected ones)
print_level:
0/1/2 to print little/medium/alot
cov_est:
One or a list of covariances which will be combined with K.chains.combine_cov (see documentation there for understood formats) to produce the full proposal covariance. Covariance for any sampled parameter not provided here will be taken from the scale attribute of that parameters. This should be a best estimate of the posterior covariance. The actual proposal covariance is multiplied by proposal_scale**2 / N where N is the number of parameters. (default: diagonal covariance taken from the scale of each parameter)
proposal_scale:
Scale the proposal matrix. (default: 2.4)
proposal_update:
Whether to update the proposal matrix. Ignored if not running with MPI. The proposal is updated by taking the sample covariance of the last half of each chain. (default: True)
proposal_update_start:
If proposal_update is True, how many total samples (including rejected) per chain to wait before starting to do updates (default: 1000).
mpi_comm_freq:
Number of accepted samples to wait inbetween the chains communicating with the master process and having their progress written to file (default: 50)
max_weight:
If a the chain stays in the same location more than this number of samples, it is broken up as distinct steps
reseed:
Draw a random seed based on system time and process number before starting. (default: True)
yield_rejected:
Yield samples with 0 weight (default: False)
debug_output:
Print (code) debugging messages.
cosmoslik_plugins.samplers.priors.uniform(low=0.0, high=1.0, size=None)

Draw samples from a uniform distribution.

Samples are uniformly distributed over the half-open interval [low, high) (includes low, but excludes high). In other words, any value within the given interval is equally likely to be drawn by uniform.

Parameters:
  • low (float or array_like of floats, optional) – Lower boundary of the output interval. All values generated will be greater than or equal to low. The default value is 0.
  • high (float or array_like of floats) – Upper boundary of the output interval. All values generated will be less than high. The default value is 1.0.
  • size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. If size is None (default), a single value is returned if low and high are both scalars. Otherwise, np.broadcast(low, high).size samples are drawn.
Returns:

out – Drawn samples from the parameterized uniform distribution.

Return type:

ndarray or scalar

See also

randint()
Discrete uniform distribution, yielding integers.
random_integers()
Discrete uniform distribution over the closed interval [low, high].
random_sample()
Floats uniformly distributed over [0, 1).
random()
Alias for random_sample.
rand()
Convenience function that accepts dimensions as input, e.g., rand(2,2) would generate a 2-by-2 array of floats, uniformly distributed over [0, 1).

Notes

The probability density function of the uniform distribution is

\[p(x) = \frac{1}{b - a}\]

anywhere within the interval [a, b), and zero elsewhere.

When high == low, values of low will be returned. If high < low, the results are officially undefined and may eventually raise an error, i.e. do not rely on this function to behave when passed arguments satisfying that inequality condition.

Examples

Draw samples from the distribution:

>>> s = np.random.uniform(-1,0,1000)

All values are within the given interval:

>>> np.all(s >= -1)
True
>>> np.all(s < 0)
True

Display the histogram of the samples, along with the probability density function:

>>> import matplotlib.pyplot as plt
>>> count, bins, ignored = plt.hist(s, 15, normed=True)
>>> plt.plot(bins, np.ones_like(bins), linewidth=2, color='r')
>>> plt.show()
cosmoslik_plugins.samplers.utils.initialize_covariance(sampled, covs=None)

Get the covariance for the set of parameters in sampled

Prepare the proposal covariance based on anything passed to self.proposal_cov, defaulting to the scale of each sampled parameter otherwise.

Module contents