Class WeightedStringsFromCSV
- java.lang.Object
-
- io.virtdata.libbasics.shared.distributions.WeightedStringsFromCSV
-
- All Implemented Interfaces:
java.util.function.LongFunction<java.lang.String>
public class WeightedStringsFromCSV extends java.lang.Object implements java.util.function.LongFunction<java.lang.String>Provides sampling of a given field in a CSV file according to discrete probabilities. The CSV file must have headers which can be used to find the named columns for value and weight. The value column contains the string result to be returned by the function. The weight column contains the floating-point weight or mass associated with the value on the same line. All the weights are normalized automatically.If there are multiple file names containing the same format, then they will all be read in the same way.
If the first word in the filenames list is 'map', then the values will not be pseudo-randomly selected. Instead, they will be mapped over in some other unsorted and stable order as input values vary from 0L to Long.MAX_VALUE.
Generally, you want to leave out the 'map' directive to get "random sampling" of these values.
This function works the same as the three-parametered form of WeightedStrings, which is deprecated in lieu of this one. Use this one instead.
-
-
Constructor Summary
Constructors Constructor Description WeightedStringsFromCSV(java.lang.String valueColumn, java.lang.String weightColumn, java.lang.String... filenames)Create a sampler of strings from the given CSV file.
-
Method Summary
Modifier and Type Method Description java.lang.Stringapply(long value)
-
-
-
Constructor Detail
-
WeightedStringsFromCSV
public WeightedStringsFromCSV(java.lang.String valueColumn, java.lang.String weightColumn, java.lang.String... filenames)Create a sampler of strings from the given CSV file. The CSV file must have plain CSV headers as its first line.- Parameters:
valueColumn- The name of the value column to be sampledweightColumn- The name of the weight column, which must be parsable as a doublefilenames- One or more file names which will be read in to the sampler buffer
-
-