01-15-2015, 07:55 AM
I'm looking through the code and noticing (in the random name generation) the following comment:
Ok, here's how: You use TreeMap's lowerEntry() method.
A bit of code ... Assuming you have read your names and corresponding weights into the following data structure.
... which, in the context of name generation, would probably look like follows:
You can now generate the TreeMap with the weights as follows:
The format of the data is { (sum of weights until this one) : (name) } - Easy enough.
Now to get a random weighted name out of it:
That's it. It works pretty well even with non-integer weights if you want. Hope it helps.
Code:
// TODO: how do I weight name vectors by frequency, without making them
// gargantuan?
Ok, here's how: You use TreeMap's lowerEntry() method.
A bit of code ... Assuming you have read your names and corresponding weights into the following data structure.
Code:
private List<Pair<String, Integer>> nameList;
... which, in the context of name generation, would probably look like follows:
Code:
/* 40000 should be enough for the current firstnames_female.txt */
nameList = new ArrayList<Pair<String,Integer>>(40000);
/* ... loop through the lines */
String name = values[0];
int weight = Integer.parseInt(values[1]);
nameList.add(Pair.<String,Integer>of(name, weight));
/* ... */
You can now generate the TreeMap with the weights as follows:
Code:
private TreeMap<Integer, String> nameDistribution;
private int maxNameWeight;
private void generateNameDistribution()
{
int count = 0;
for( Pair<String, Integer> name : nameList) {
nameDistribution.put(count, name.first);
count += name.second;
}
maxNameWeight = count;
}
The format of the data is { (sum of weights until this one) : (name) } - Easy enough.
Now to get a random weighted name out of it:
Code:
private String getRandomName(Random rnd)
{
return nameDistribution.lowerEntry(rnd.nextInt(maxNameWeight) + 1).getValue();
}
That's it. It works pretty well even with non-integer weights if you want. Hope it helps.