C++: Quickest way to get integer within a range -
i need generate hash keys n=100 million keys. study appears murmur3 (murmurhash3_x86_32, see murmur3 hash) fastest hashing function best latency , small enough collision rate. problem facing function returns key void *
. more specifically, template is:
void murmurhash3_x86_32 (const void *key, int len, uint32_t seed, void *out);
as hash table size smaller largest hash can generate, need table range [0, n-1]. easiest solution seems to use %
operator. known slow operator, wonder if there quicker way solve problem.
one interesting suggestion found given is there alternative using % (modulus) in c/c++? on stackoverflow itself. suggests 'a power of two, following works (assuming twos complement representation)':
return & (n-1);
my issue on newer cpus, (or of times?), performance degrades @ around sizes 2^n, iirc, due multiway cache lines. (this link provides illustration regarding insertions big memory, part 3.5: google sparsehash!).
at moment, advantages of murmur3 seems nullified hard-ware related issues , known inefficiency of %
operator. performance constraint, request low latency , faster solutions requirement if not murmurhash3_x86_32.
the problem facing function returns key
void *
.
it not. returns nothing (void
). hash result recorded in buffer specify (a pointer to) via last argument. murmurhash3_x86_32()
, makes sense pointer uint32_t
.
as hash table size smaller largest hash can generate, need table range [0, n-1]. easiest solution seems to use % operator. known slow operator, wonder if there quicker way solve problem.
the %
not easiest solution, usual one. "slow" relative -- %
slower +
, much, much faster 1 call murmurhash3_x86_32()
.
one interesting suggestion found [...] suggests [using power-of-two table size, , computing modulus via
&
operator]
note contrary assertion in answer, in fact has no dependency @ on twos' complement representation.
my issue on newer cpus, (or of times?), performance degrades @ around sizes 2^n, iirc, due multiway cache lines. (this link provides illustration regarding insertions big memory, part 3.5: google sparsehash!).
the performance degredation described in report linked attributed re-hashing, seems quite plausible. has nothing operations you're asking about. conceivable cache (lack of) associativity impact performance large hash tables, not more having large hash tables generally. memory access patterns inherent in use of hash table naturally produce poor cache locality. that's practically point.
at moment, advantages of murmur3 seems nullified hard-ware related issues , known inefficiency of % operator. performance constraint, request low latency , faster solutions requirement if not murmurhash3_x86_32.
you way overthinking this. lack of effective use of cpu cache price pay using large hash table. not associated hash function (as long hash function job well). cost of single arithmetic operation, whether %
or &
, not noticeable compared cost of computing hash operate on, hardly matters of choose. if want tiny advantage operation use power-of-two sized table , &
operator. on other hand, throws away of hash bits went trouble compute. consider instead choosing prime hash table size , %
operator -- hash bits contribute bucket selection, may improve spread.
Comments
Post a Comment