Class ZipfDistribution

  • All Implemented Interfaces:
    DiscreteDistribution

    public class ZipfDistribution
    extends java.lang.Object
    Implementation of the Zipf distribution.

    Parameters: For a random variable X whose values are distributed according to this distribution, the probability mass function is given by

       P(X = k) = H(N,s) * 1 / k^s    for k = 1,2,...,N.
     
    H(N,s) is the normalizing constant which corresponds to the generalized harmonic number of order N of s.
    • N is the number of elements
    • s is the exponent
    • Constructor Summary

      Constructors 
      Constructor Description
      ZipfDistribution​(int numberOfElements, double exponent)
      Creates a distribution.
    • Method Summary

      Modifier and Type Method Description
      DiscreteDistribution.Sampler createSampler​(UniformRandomProvider rng)
      Creates a sampler.
      double cumulativeProbability​(int x)
      For a random variable X whose values are distributed according to this distribution, this method returns P(X <= x).
      double getExponent()
      Get the exponent characterizing the distribution.
      double getMean()
      Gets the mean of this distribution.
      int getNumberOfElements()
      Get the number of elements (e.g.
      int getSupportLowerBound()
      Gets the lower bound of the support.
      int getSupportUpperBound()
      Gets the upper bound of the support.
      double getVariance()
      Gets the variance of this distribution.
      int inverseCumulativeProbability​(double p)
      Computes the quantile function of this distribution.
      boolean isSupportConnected()
      Indicates whether the support is connected, i.e.
      double logProbability​(int x)
      For a random variable X whose values are distributed according to this distribution, this method returns log(P(X = x)), where log is the natural logarithm.
      double probability​(int x)
      For a random variable X whose values are distributed according to this distribution, this method returns P(X = x).
      double probability​(int x0, int x1)
      For a random variable X whose values are distributed according to this distribution, this method returns P(x0 < X <= x1).
      static int[] sample​(int n, DiscreteDistribution.Sampler sampler)
      Utility function for allocating an array and filling it with n samples generated by the given sampler.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • ZipfDistribution

        public ZipfDistribution​(int numberOfElements,
                                double exponent)
        Creates a distribution.
        Parameters:
        numberOfElements - Number of elements.
        exponent - Exponent.
        Throws:
        java.lang.IllegalArgumentException - if numberOfElements <= 0 or exponent <= 0.
    • Method Detail

      • getNumberOfElements

        public int getNumberOfElements()
        Get the number of elements (e.g. corpus size) for the distribution.
        Returns:
        the number of elements
      • getExponent

        public double getExponent()
        Get the exponent characterizing the distribution.
        Returns:
        the exponent
      • probability

        public double probability​(int x)
        For a random variable X whose values are distributed according to this distribution, this method returns P(X = x). In other words, this method represents the probability mass function (PMF) for the distribution.
        Parameters:
        x - Point at which the PMF is evaluated.
        Returns:
        the value of the probability mass function at x.
      • logProbability

        public double logProbability​(int x)
        For a random variable X whose values are distributed according to this distribution, this method returns log(P(X = x)), where log is the natural logarithm.
        Parameters:
        x - Point at which the PMF is evaluated.
        Returns:
        the logarithm of the value of the probability mass function at x.
      • cumulativeProbability

        public double cumulativeProbability​(int x)
        For a random variable X whose values are distributed according to this distribution, this method returns P(X <= x). In other, words, this method represents the (cumulative) distribution function (CDF) for this distribution.
        Parameters:
        x - Point at which the CDF is evaluated.
        Returns:
        the probability that a random variable with this distribution takes a value less than or equal to x.
      • getMean

        public double getMean()
        Gets the mean of this distribution. For number of elements N and exponent s, the mean is Hs1 / Hs, where
        • Hs1 = generalizedHarmonic(N, s - 1),
        • Hs = generalizedHarmonic(N, s).
        Returns:
        the mean, or Double.NaN if it is not defined.
      • getVariance

        public double getVariance()
        Gets the variance of this distribution. For number of elements N and exponent s, the mean is (Hs2 / Hs) - (Hs1^2 / Hs^2), where
        • Hs2 = generalizedHarmonic(N, s - 2),
        • Hs1 = generalizedHarmonic(N, s - 1),
        • Hs = generalizedHarmonic(N, s).
        Returns:
        the variance, or Double.NaN if it is not defined.
      • getSupportLowerBound

        public int getSupportLowerBound()
        Gets the lower bound of the support. This method must return the same value as inverseCumulativeProbability(0), i.e. inf {x in Z | P(X <= x) > 0}. By convention, Integer.MIN_VALUE should be substituted for negative infinity. The lower bound of the support is always 1 no matter the parameters.
        Returns:
        lower bound of the support (always 1)
      • getSupportUpperBound

        public int getSupportUpperBound()
        Gets the upper bound of the support. This method must return the same value as inverseCumulativeProbability(1), i.e. inf {x in R | P(X <= x) = 1}. By convention, Integer.MAX_VALUE should be substituted for positive infinity. The upper bound of the support is the number of elements.
        Returns:
        upper bound of the support
      • isSupportConnected

        public boolean isSupportConnected()
        Indicates whether the support is connected, i.e. whether all integers between the lower and upper bound of the support are included in the support. The support of this distribution is connected.
        Returns:
        true
      • probability

        public double probability​(int x0,
                                  int x1)
        For a random variable X whose values are distributed according to this distribution, this method returns P(x0 < X <= x1). The default implementation uses the identity P(x0 < X <= x1) = P(X <= x1) - P(X <= x0)
        Specified by:
        probability in interface DiscreteDistribution
        Parameters:
        x0 - Lower bound (exclusive).
        x1 - Upper bound (inclusive).
        Returns:
        the probability that a random variable with this distribution will take a value between x0 and x1, excluding the lower and including the upper endpoint.
      • sample

        public static int[] sample​(int n,
                                   DiscreteDistribution.Sampler sampler)
        Utility function for allocating an array and filling it with n samples generated by the given sampler.
        Parameters:
        n - Number of samples.
        sampler - Sampler.
        Returns:
        an array of size n.