Is there a data structure that only stores hash codes and not the actual objects?
My use-case is that I'm looking for a data structure in Java that will let me see if an object with the same hash code is inside (by calling contains()), but I will never need to iterate through the elements or retrieve the actual objects. A HashSet is close, but from my understanding, it still contains references to the actual objects, and that would be a waste of memory since I won't ever need the contents of the actual objects. The best option I can think of is a HashSet of type Integer storing only the hash codes, but I'm wondering if there is a built-in data structure that would accomplish the same thing (and only accept one type as opposed to HashSet of type Integer which will accept the hash code of any object).
java hashset
|
show 5 more comments
My use-case is that I'm looking for a data structure in Java that will let me see if an object with the same hash code is inside (by calling contains()), but I will never need to iterate through the elements or retrieve the actual objects. A HashSet is close, but from my understanding, it still contains references to the actual objects, and that would be a waste of memory since I won't ever need the contents of the actual objects. The best option I can think of is a HashSet of type Integer storing only the hash codes, but I'm wondering if there is a built-in data structure that would accomplish the same thing (and only accept one type as opposed to HashSet of type Integer which will accept the hash code of any object).
java hashset
4
Is your hash function perfect? Or can you have multiple objects with the same hash value?
– arshajii
3 hours ago
5
what about hashing collisions?
– Nathan Hughes台湾不在中国
3 hours ago
7
TheHashSet
will contain a reference to your object, not a copy, so don't worry about space. AHashSet<Integer>
would probably use up more space because it has references to integers.
– Sweeper
3 hours ago
I agree with @Sweeper, unless you have a real need for super-duper optimization. Also, your second idea with storing hashcodes as integer wouln't be more efficient as it would store the hash+the hash of the hash.
– Joel
3 hours ago
@Sweeper The HashSet uses internally a HashMap. The memory space is the same.
– Octavian R.
3 hours ago
|
show 5 more comments
My use-case is that I'm looking for a data structure in Java that will let me see if an object with the same hash code is inside (by calling contains()), but I will never need to iterate through the elements or retrieve the actual objects. A HashSet is close, but from my understanding, it still contains references to the actual objects, and that would be a waste of memory since I won't ever need the contents of the actual objects. The best option I can think of is a HashSet of type Integer storing only the hash codes, but I'm wondering if there is a built-in data structure that would accomplish the same thing (and only accept one type as opposed to HashSet of type Integer which will accept the hash code of any object).
java hashset
My use-case is that I'm looking for a data structure in Java that will let me see if an object with the same hash code is inside (by calling contains()), but I will never need to iterate through the elements or retrieve the actual objects. A HashSet is close, but from my understanding, it still contains references to the actual objects, and that would be a waste of memory since I won't ever need the contents of the actual objects. The best option I can think of is a HashSet of type Integer storing only the hash codes, but I'm wondering if there is a built-in data structure that would accomplish the same thing (and only accept one type as opposed to HashSet of type Integer which will accept the hash code of any object).
java hashset
java hashset
edited 3 hours ago
B Yellow
asked 3 hours ago
B YellowB Yellow
413
413
4
Is your hash function perfect? Or can you have multiple objects with the same hash value?
– arshajii
3 hours ago
5
what about hashing collisions?
– Nathan Hughes台湾不在中国
3 hours ago
7
TheHashSet
will contain a reference to your object, not a copy, so don't worry about space. AHashSet<Integer>
would probably use up more space because it has references to integers.
– Sweeper
3 hours ago
I agree with @Sweeper, unless you have a real need for super-duper optimization. Also, your second idea with storing hashcodes as integer wouln't be more efficient as it would store the hash+the hash of the hash.
– Joel
3 hours ago
@Sweeper The HashSet uses internally a HashMap. The memory space is the same.
– Octavian R.
3 hours ago
|
show 5 more comments
4
Is your hash function perfect? Or can you have multiple objects with the same hash value?
– arshajii
3 hours ago
5
what about hashing collisions?
– Nathan Hughes台湾不在中国
3 hours ago
7
TheHashSet
will contain a reference to your object, not a copy, so don't worry about space. AHashSet<Integer>
would probably use up more space because it has references to integers.
– Sweeper
3 hours ago
I agree with @Sweeper, unless you have a real need for super-duper optimization. Also, your second idea with storing hashcodes as integer wouln't be more efficient as it would store the hash+the hash of the hash.
– Joel
3 hours ago
@Sweeper The HashSet uses internally a HashMap. The memory space is the same.
– Octavian R.
3 hours ago
4
4
Is your hash function perfect? Or can you have multiple objects with the same hash value?
– arshajii
3 hours ago
Is your hash function perfect? Or can you have multiple objects with the same hash value?
– arshajii
3 hours ago
5
5
what about hashing collisions?
– Nathan Hughes台湾不在中国
3 hours ago
what about hashing collisions?
– Nathan Hughes台湾不在中国
3 hours ago
7
7
The
HashSet
will contain a reference to your object, not a copy, so don't worry about space. A HashSet<Integer>
would probably use up more space because it has references to integers.– Sweeper
3 hours ago
The
HashSet
will contain a reference to your object, not a copy, so don't worry about space. A HashSet<Integer>
would probably use up more space because it has references to integers.– Sweeper
3 hours ago
I agree with @Sweeper, unless you have a real need for super-duper optimization. Also, your second idea with storing hashcodes as integer wouln't be more efficient as it would store the hash+the hash of the hash.
– Joel
3 hours ago
I agree with @Sweeper, unless you have a real need for super-duper optimization. Also, your second idea with storing hashcodes as integer wouln't be more efficient as it would store the hash+the hash of the hash.
– Joel
3 hours ago
@Sweeper The HashSet uses internally a HashMap. The memory space is the same.
– Octavian R.
3 hours ago
@Sweeper The HashSet uses internally a HashMap. The memory space is the same.
– Octavian R.
3 hours ago
|
show 5 more comments
5 Answers
5
active
oldest
votes
A Bloom filter can tell whether an object might be a member, or is definitely not a member. You can control the likelihood of false positives. A single bit is stored per hash value.
The Guava library provides an implementation in Java.
Nice. This seems like the solution for very low storage overhead. But you have to worry about the false negative case. If you can statistically eliminate that, this is great!
– Steve
2 hours ago
1
False positives, but you can control their probability. Another disadvantage is that you can't remove elements.
– Andy Thomas
2 hours ago
The question was for a data structure that checks using only the predefinedhashCode()
, which can potentially have 2^31 values (counting only positives). A Bloom filter that uses one hash function with 2 ^ 31 possible values would be extraordinarily large, seeing as it is basically just a BitSet. I don't see how that counts as "very low storage overhead".
– Leo Aso
2 hours ago
add a comment |
There is no data structure like that, but you can create a wrapper around HashSet to suit your needs. Something like this:
public class MyHashSet{
private HashSet<Integer> set;
public MyHashSet(){
set = new HashSet<>();
}
public void add(int hash) {
set.add(hash);
}
public boolean contains(int hash) {
return set.contains(hash);
}
}
1
I think the OP's question wasn't about API, but about memory usage. This doesn't help with that since the result still acts like a HashSet.
– Steve
2 hours ago
@Steve His main concern was a reference to the objects. Here it is storing only referencing to Integer objects.
– Pritam Banerjee
2 hours ago
add a comment |
There is no such built-in data structure, because such a data structure is rarely needed. It's easy to build one, though.
public class HashCodeSet<T> {
private final HashSet<Integer> hashCodes;
public MyHashSet() {
hashCodes = new HashSet<>();
}
public MyHashSet(int initialCapacity) {
hashCodes = new HashSet<>(initialCapacity);
}
public HashCodeSet(HashCodeSet toCopy) {
hashCodes = new HashSet<>(toCopy.hashCodes);
}
public void add(T element) {
hashCodes.add(element.hashCode());
}
public boolean containsHashCodeOf(T element) {
return hashCodes.contains(element.hashCode());
}
@Override
public boolean equals(o: Object) {
return o == this || o instanceof HashCodeSet &&
((HashCodeSet) o).hashCodes.equals(hashCodes);
}
@Override
public int hashCode() {
return hashCodes.hashCode(); // hash-ception
}
@Override
public String toString() {
return hashCodes.toString();
}
}
1
I think the OP's question wasn't about API, but about memory usage. This doesn't help with that since the result still acts like a HashSet.
– Steve
2 hours ago
I realize that I was unfair completely on this. I've removed my objections. Feel free to delete your comments on my comments. I still think there's something more to the problem, but I was off base. Sorry
– Steve
2 hours ago
add a comment |
You could use a primitive collection implementation like IntSet to store values of hash codes. Obviously as others have mentioned this assumes collisions aren't a problem.
add a comment |
If you want to track if a hash code is already present and to do it memory efficient a BitSet
may suite your requirements.
Look at the following example:
public static void main(String args) {
BitSet hashCodes = new BitSet();
hashCodes.set("1".hashCode());
System.out.println(hashCodes.get("1".hashCode())); // true
System.out.println(hashCodes.get("2".hashCode())); // false
}
The BitSet
"implements a vector of bits that grows as needed.". It's a JDK "built-in data structure" which doesn't contain "references to the actual objects". It stores only if "the same hash code is inside".
EDIT:
As @Steve mentioned in his comment the implementation of the BitSet
isn't the most memory efficient one. But there are more memory efficient implementations of a bit set - though not built-in.
I don't know how a BitSet stores individual bits. Since obviously this usage will spread bits across a very large input domain, are you sure those are stored efficiently? Just asking. The naive assumption is that the structure would be an array of bytes, where the array was just expanded to include any bit position and all positions before that, which would be monstrously inefficient. But I don't know how it actually represents bits spread way apart.
– Steve
2 hours ago
It appears that your solution won't work. See github.com/brettwooldridge/SparseBitSet
– Steve
2 hours ago
@Steve You're right. Found additionally this post. But the idea of an bit set is basically not bad. It's rather the implementation of the JDKBitSet
.
– LuCio
2 hours ago
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55190768%2fis-there-a-data-structure-that-only-stores-hash-codes-and-not-the-actual-objects%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
5 Answers
5
active
oldest
votes
5 Answers
5
active
oldest
votes
active
oldest
votes
active
oldest
votes
A Bloom filter can tell whether an object might be a member, or is definitely not a member. You can control the likelihood of false positives. A single bit is stored per hash value.
The Guava library provides an implementation in Java.
Nice. This seems like the solution for very low storage overhead. But you have to worry about the false negative case. If you can statistically eliminate that, this is great!
– Steve
2 hours ago
1
False positives, but you can control their probability. Another disadvantage is that you can't remove elements.
– Andy Thomas
2 hours ago
The question was for a data structure that checks using only the predefinedhashCode()
, which can potentially have 2^31 values (counting only positives). A Bloom filter that uses one hash function with 2 ^ 31 possible values would be extraordinarily large, seeing as it is basically just a BitSet. I don't see how that counts as "very low storage overhead".
– Leo Aso
2 hours ago
add a comment |
A Bloom filter can tell whether an object might be a member, or is definitely not a member. You can control the likelihood of false positives. A single bit is stored per hash value.
The Guava library provides an implementation in Java.
Nice. This seems like the solution for very low storage overhead. But you have to worry about the false negative case. If you can statistically eliminate that, this is great!
– Steve
2 hours ago
1
False positives, but you can control their probability. Another disadvantage is that you can't remove elements.
– Andy Thomas
2 hours ago
The question was for a data structure that checks using only the predefinedhashCode()
, which can potentially have 2^31 values (counting only positives). A Bloom filter that uses one hash function with 2 ^ 31 possible values would be extraordinarily large, seeing as it is basically just a BitSet. I don't see how that counts as "very low storage overhead".
– Leo Aso
2 hours ago
add a comment |
A Bloom filter can tell whether an object might be a member, or is definitely not a member. You can control the likelihood of false positives. A single bit is stored per hash value.
The Guava library provides an implementation in Java.
A Bloom filter can tell whether an object might be a member, or is definitely not a member. You can control the likelihood of false positives. A single bit is stored per hash value.
The Guava library provides an implementation in Java.
edited 2 hours ago
answered 2 hours ago
Andy ThomasAndy Thomas
68k980133
68k980133
Nice. This seems like the solution for very low storage overhead. But you have to worry about the false negative case. If you can statistically eliminate that, this is great!
– Steve
2 hours ago
1
False positives, but you can control their probability. Another disadvantage is that you can't remove elements.
– Andy Thomas
2 hours ago
The question was for a data structure that checks using only the predefinedhashCode()
, which can potentially have 2^31 values (counting only positives). A Bloom filter that uses one hash function with 2 ^ 31 possible values would be extraordinarily large, seeing as it is basically just a BitSet. I don't see how that counts as "very low storage overhead".
– Leo Aso
2 hours ago
add a comment |
Nice. This seems like the solution for very low storage overhead. But you have to worry about the false negative case. If you can statistically eliminate that, this is great!
– Steve
2 hours ago
1
False positives, but you can control their probability. Another disadvantage is that you can't remove elements.
– Andy Thomas
2 hours ago
The question was for a data structure that checks using only the predefinedhashCode()
, which can potentially have 2^31 values (counting only positives). A Bloom filter that uses one hash function with 2 ^ 31 possible values would be extraordinarily large, seeing as it is basically just a BitSet. I don't see how that counts as "very low storage overhead".
– Leo Aso
2 hours ago
Nice. This seems like the solution for very low storage overhead. But you have to worry about the false negative case. If you can statistically eliminate that, this is great!
– Steve
2 hours ago
Nice. This seems like the solution for very low storage overhead. But you have to worry about the false negative case. If you can statistically eliminate that, this is great!
– Steve
2 hours ago
1
1
False positives, but you can control their probability. Another disadvantage is that you can't remove elements.
– Andy Thomas
2 hours ago
False positives, but you can control their probability. Another disadvantage is that you can't remove elements.
– Andy Thomas
2 hours ago
The question was for a data structure that checks using only the predefined
hashCode()
, which can potentially have 2^31 values (counting only positives). A Bloom filter that uses one hash function with 2 ^ 31 possible values would be extraordinarily large, seeing as it is basically just a BitSet. I don't see how that counts as "very low storage overhead".– Leo Aso
2 hours ago
The question was for a data structure that checks using only the predefined
hashCode()
, which can potentially have 2^31 values (counting only positives). A Bloom filter that uses one hash function with 2 ^ 31 possible values would be extraordinarily large, seeing as it is basically just a BitSet. I don't see how that counts as "very low storage overhead".– Leo Aso
2 hours ago
add a comment |
There is no data structure like that, but you can create a wrapper around HashSet to suit your needs. Something like this:
public class MyHashSet{
private HashSet<Integer> set;
public MyHashSet(){
set = new HashSet<>();
}
public void add(int hash) {
set.add(hash);
}
public boolean contains(int hash) {
return set.contains(hash);
}
}
1
I think the OP's question wasn't about API, but about memory usage. This doesn't help with that since the result still acts like a HashSet.
– Steve
2 hours ago
@Steve His main concern was a reference to the objects. Here it is storing only referencing to Integer objects.
– Pritam Banerjee
2 hours ago
add a comment |
There is no data structure like that, but you can create a wrapper around HashSet to suit your needs. Something like this:
public class MyHashSet{
private HashSet<Integer> set;
public MyHashSet(){
set = new HashSet<>();
}
public void add(int hash) {
set.add(hash);
}
public boolean contains(int hash) {
return set.contains(hash);
}
}
1
I think the OP's question wasn't about API, but about memory usage. This doesn't help with that since the result still acts like a HashSet.
– Steve
2 hours ago
@Steve His main concern was a reference to the objects. Here it is storing only referencing to Integer objects.
– Pritam Banerjee
2 hours ago
add a comment |
There is no data structure like that, but you can create a wrapper around HashSet to suit your needs. Something like this:
public class MyHashSet{
private HashSet<Integer> set;
public MyHashSet(){
set = new HashSet<>();
}
public void add(int hash) {
set.add(hash);
}
public boolean contains(int hash) {
return set.contains(hash);
}
}
There is no data structure like that, but you can create a wrapper around HashSet to suit your needs. Something like this:
public class MyHashSet{
private HashSet<Integer> set;
public MyHashSet(){
set = new HashSet<>();
}
public void add(int hash) {
set.add(hash);
}
public boolean contains(int hash) {
return set.contains(hash);
}
}
answered 3 hours ago
Pritam BanerjeePritam Banerjee
11k64567
11k64567
1
I think the OP's question wasn't about API, but about memory usage. This doesn't help with that since the result still acts like a HashSet.
– Steve
2 hours ago
@Steve His main concern was a reference to the objects. Here it is storing only referencing to Integer objects.
– Pritam Banerjee
2 hours ago
add a comment |
1
I think the OP's question wasn't about API, but about memory usage. This doesn't help with that since the result still acts like a HashSet.
– Steve
2 hours ago
@Steve His main concern was a reference to the objects. Here it is storing only referencing to Integer objects.
– Pritam Banerjee
2 hours ago
1
1
I think the OP's question wasn't about API, but about memory usage. This doesn't help with that since the result still acts like a HashSet.
– Steve
2 hours ago
I think the OP's question wasn't about API, but about memory usage. This doesn't help with that since the result still acts like a HashSet.
– Steve
2 hours ago
@Steve His main concern was a reference to the objects. Here it is storing only referencing to Integer objects.
– Pritam Banerjee
2 hours ago
@Steve His main concern was a reference to the objects. Here it is storing only referencing to Integer objects.
– Pritam Banerjee
2 hours ago
add a comment |
There is no such built-in data structure, because such a data structure is rarely needed. It's easy to build one, though.
public class HashCodeSet<T> {
private final HashSet<Integer> hashCodes;
public MyHashSet() {
hashCodes = new HashSet<>();
}
public MyHashSet(int initialCapacity) {
hashCodes = new HashSet<>(initialCapacity);
}
public HashCodeSet(HashCodeSet toCopy) {
hashCodes = new HashSet<>(toCopy.hashCodes);
}
public void add(T element) {
hashCodes.add(element.hashCode());
}
public boolean containsHashCodeOf(T element) {
return hashCodes.contains(element.hashCode());
}
@Override
public boolean equals(o: Object) {
return o == this || o instanceof HashCodeSet &&
((HashCodeSet) o).hashCodes.equals(hashCodes);
}
@Override
public int hashCode() {
return hashCodes.hashCode(); // hash-ception
}
@Override
public String toString() {
return hashCodes.toString();
}
}
1
I think the OP's question wasn't about API, but about memory usage. This doesn't help with that since the result still acts like a HashSet.
– Steve
2 hours ago
I realize that I was unfair completely on this. I've removed my objections. Feel free to delete your comments on my comments. I still think there's something more to the problem, but I was off base. Sorry
– Steve
2 hours ago
add a comment |
There is no such built-in data structure, because such a data structure is rarely needed. It's easy to build one, though.
public class HashCodeSet<T> {
private final HashSet<Integer> hashCodes;
public MyHashSet() {
hashCodes = new HashSet<>();
}
public MyHashSet(int initialCapacity) {
hashCodes = new HashSet<>(initialCapacity);
}
public HashCodeSet(HashCodeSet toCopy) {
hashCodes = new HashSet<>(toCopy.hashCodes);
}
public void add(T element) {
hashCodes.add(element.hashCode());
}
public boolean containsHashCodeOf(T element) {
return hashCodes.contains(element.hashCode());
}
@Override
public boolean equals(o: Object) {
return o == this || o instanceof HashCodeSet &&
((HashCodeSet) o).hashCodes.equals(hashCodes);
}
@Override
public int hashCode() {
return hashCodes.hashCode(); // hash-ception
}
@Override
public String toString() {
return hashCodes.toString();
}
}
1
I think the OP's question wasn't about API, but about memory usage. This doesn't help with that since the result still acts like a HashSet.
– Steve
2 hours ago
I realize that I was unfair completely on this. I've removed my objections. Feel free to delete your comments on my comments. I still think there's something more to the problem, but I was off base. Sorry
– Steve
2 hours ago
add a comment |
There is no such built-in data structure, because such a data structure is rarely needed. It's easy to build one, though.
public class HashCodeSet<T> {
private final HashSet<Integer> hashCodes;
public MyHashSet() {
hashCodes = new HashSet<>();
}
public MyHashSet(int initialCapacity) {
hashCodes = new HashSet<>(initialCapacity);
}
public HashCodeSet(HashCodeSet toCopy) {
hashCodes = new HashSet<>(toCopy.hashCodes);
}
public void add(T element) {
hashCodes.add(element.hashCode());
}
public boolean containsHashCodeOf(T element) {
return hashCodes.contains(element.hashCode());
}
@Override
public boolean equals(o: Object) {
return o == this || o instanceof HashCodeSet &&
((HashCodeSet) o).hashCodes.equals(hashCodes);
}
@Override
public int hashCode() {
return hashCodes.hashCode(); // hash-ception
}
@Override
public String toString() {
return hashCodes.toString();
}
}
There is no such built-in data structure, because such a data structure is rarely needed. It's easy to build one, though.
public class HashCodeSet<T> {
private final HashSet<Integer> hashCodes;
public MyHashSet() {
hashCodes = new HashSet<>();
}
public MyHashSet(int initialCapacity) {
hashCodes = new HashSet<>(initialCapacity);
}
public HashCodeSet(HashCodeSet toCopy) {
hashCodes = new HashSet<>(toCopy.hashCodes);
}
public void add(T element) {
hashCodes.add(element.hashCode());
}
public boolean containsHashCodeOf(T element) {
return hashCodes.contains(element.hashCode());
}
@Override
public boolean equals(o: Object) {
return o == this || o instanceof HashCodeSet &&
((HashCodeSet) o).hashCodes.equals(hashCodes);
}
@Override
public int hashCode() {
return hashCodes.hashCode(); // hash-ception
}
@Override
public String toString() {
return hashCodes.toString();
}
}
answered 2 hours ago
Leo AsoLeo Aso
5,27411029
5,27411029
1
I think the OP's question wasn't about API, but about memory usage. This doesn't help with that since the result still acts like a HashSet.
– Steve
2 hours ago
I realize that I was unfair completely on this. I've removed my objections. Feel free to delete your comments on my comments. I still think there's something more to the problem, but I was off base. Sorry
– Steve
2 hours ago
add a comment |
1
I think the OP's question wasn't about API, but about memory usage. This doesn't help with that since the result still acts like a HashSet.
– Steve
2 hours ago
I realize that I was unfair completely on this. I've removed my objections. Feel free to delete your comments on my comments. I still think there's something more to the problem, but I was off base. Sorry
– Steve
2 hours ago
1
1
I think the OP's question wasn't about API, but about memory usage. This doesn't help with that since the result still acts like a HashSet.
– Steve
2 hours ago
I think the OP's question wasn't about API, but about memory usage. This doesn't help with that since the result still acts like a HashSet.
– Steve
2 hours ago
I realize that I was unfair completely on this. I've removed my objections. Feel free to delete your comments on my comments. I still think there's something more to the problem, but I was off base. Sorry
– Steve
2 hours ago
I realize that I was unfair completely on this. I've removed my objections. Feel free to delete your comments on my comments. I still think there's something more to the problem, but I was off base. Sorry
– Steve
2 hours ago
add a comment |
You could use a primitive collection implementation like IntSet to store values of hash codes. Obviously as others have mentioned this assumes collisions aren't a problem.
add a comment |
You could use a primitive collection implementation like IntSet to store values of hash codes. Obviously as others have mentioned this assumes collisions aren't a problem.
add a comment |
You could use a primitive collection implementation like IntSet to store values of hash codes. Obviously as others have mentioned this assumes collisions aren't a problem.
You could use a primitive collection implementation like IntSet to store values of hash codes. Obviously as others have mentioned this assumes collisions aren't a problem.
answered 2 hours ago
MarkMark
24.2k44783
24.2k44783
add a comment |
add a comment |
If you want to track if a hash code is already present and to do it memory efficient a BitSet
may suite your requirements.
Look at the following example:
public static void main(String args) {
BitSet hashCodes = new BitSet();
hashCodes.set("1".hashCode());
System.out.println(hashCodes.get("1".hashCode())); // true
System.out.println(hashCodes.get("2".hashCode())); // false
}
The BitSet
"implements a vector of bits that grows as needed.". It's a JDK "built-in data structure" which doesn't contain "references to the actual objects". It stores only if "the same hash code is inside".
EDIT:
As @Steve mentioned in his comment the implementation of the BitSet
isn't the most memory efficient one. But there are more memory efficient implementations of a bit set - though not built-in.
I don't know how a BitSet stores individual bits. Since obviously this usage will spread bits across a very large input domain, are you sure those are stored efficiently? Just asking. The naive assumption is that the structure would be an array of bytes, where the array was just expanded to include any bit position and all positions before that, which would be monstrously inefficient. But I don't know how it actually represents bits spread way apart.
– Steve
2 hours ago
It appears that your solution won't work. See github.com/brettwooldridge/SparseBitSet
– Steve
2 hours ago
@Steve You're right. Found additionally this post. But the idea of an bit set is basically not bad. It's rather the implementation of the JDKBitSet
.
– LuCio
2 hours ago
add a comment |
If you want to track if a hash code is already present and to do it memory efficient a BitSet
may suite your requirements.
Look at the following example:
public static void main(String args) {
BitSet hashCodes = new BitSet();
hashCodes.set("1".hashCode());
System.out.println(hashCodes.get("1".hashCode())); // true
System.out.println(hashCodes.get("2".hashCode())); // false
}
The BitSet
"implements a vector of bits that grows as needed.". It's a JDK "built-in data structure" which doesn't contain "references to the actual objects". It stores only if "the same hash code is inside".
EDIT:
As @Steve mentioned in his comment the implementation of the BitSet
isn't the most memory efficient one. But there are more memory efficient implementations of a bit set - though not built-in.
I don't know how a BitSet stores individual bits. Since obviously this usage will spread bits across a very large input domain, are you sure those are stored efficiently? Just asking. The naive assumption is that the structure would be an array of bytes, where the array was just expanded to include any bit position and all positions before that, which would be monstrously inefficient. But I don't know how it actually represents bits spread way apart.
– Steve
2 hours ago
It appears that your solution won't work. See github.com/brettwooldridge/SparseBitSet
– Steve
2 hours ago
@Steve You're right. Found additionally this post. But the idea of an bit set is basically not bad. It's rather the implementation of the JDKBitSet
.
– LuCio
2 hours ago
add a comment |
If you want to track if a hash code is already present and to do it memory efficient a BitSet
may suite your requirements.
Look at the following example:
public static void main(String args) {
BitSet hashCodes = new BitSet();
hashCodes.set("1".hashCode());
System.out.println(hashCodes.get("1".hashCode())); // true
System.out.println(hashCodes.get("2".hashCode())); // false
}
The BitSet
"implements a vector of bits that grows as needed.". It's a JDK "built-in data structure" which doesn't contain "references to the actual objects". It stores only if "the same hash code is inside".
EDIT:
As @Steve mentioned in his comment the implementation of the BitSet
isn't the most memory efficient one. But there are more memory efficient implementations of a bit set - though not built-in.
If you want to track if a hash code is already present and to do it memory efficient a BitSet
may suite your requirements.
Look at the following example:
public static void main(String args) {
BitSet hashCodes = new BitSet();
hashCodes.set("1".hashCode());
System.out.println(hashCodes.get("1".hashCode())); // true
System.out.println(hashCodes.get("2".hashCode())); // false
}
The BitSet
"implements a vector of bits that grows as needed.". It's a JDK "built-in data structure" which doesn't contain "references to the actual objects". It stores only if "the same hash code is inside".
EDIT:
As @Steve mentioned in his comment the implementation of the BitSet
isn't the most memory efficient one. But there are more memory efficient implementations of a bit set - though not built-in.
edited 2 hours ago
answered 2 hours ago
LuCioLuCio
2,8721924
2,8721924
I don't know how a BitSet stores individual bits. Since obviously this usage will spread bits across a very large input domain, are you sure those are stored efficiently? Just asking. The naive assumption is that the structure would be an array of bytes, where the array was just expanded to include any bit position and all positions before that, which would be monstrously inefficient. But I don't know how it actually represents bits spread way apart.
– Steve
2 hours ago
It appears that your solution won't work. See github.com/brettwooldridge/SparseBitSet
– Steve
2 hours ago
@Steve You're right. Found additionally this post. But the idea of an bit set is basically not bad. It's rather the implementation of the JDKBitSet
.
– LuCio
2 hours ago
add a comment |
I don't know how a BitSet stores individual bits. Since obviously this usage will spread bits across a very large input domain, are you sure those are stored efficiently? Just asking. The naive assumption is that the structure would be an array of bytes, where the array was just expanded to include any bit position and all positions before that, which would be monstrously inefficient. But I don't know how it actually represents bits spread way apart.
– Steve
2 hours ago
It appears that your solution won't work. See github.com/brettwooldridge/SparseBitSet
– Steve
2 hours ago
@Steve You're right. Found additionally this post. But the idea of an bit set is basically not bad. It's rather the implementation of the JDKBitSet
.
– LuCio
2 hours ago
I don't know how a BitSet stores individual bits. Since obviously this usage will spread bits across a very large input domain, are you sure those are stored efficiently? Just asking. The naive assumption is that the structure would be an array of bytes, where the array was just expanded to include any bit position and all positions before that, which would be monstrously inefficient. But I don't know how it actually represents bits spread way apart.
– Steve
2 hours ago
I don't know how a BitSet stores individual bits. Since obviously this usage will spread bits across a very large input domain, are you sure those are stored efficiently? Just asking. The naive assumption is that the structure would be an array of bytes, where the array was just expanded to include any bit position and all positions before that, which would be monstrously inefficient. But I don't know how it actually represents bits spread way apart.
– Steve
2 hours ago
It appears that your solution won't work. See github.com/brettwooldridge/SparseBitSet
– Steve
2 hours ago
It appears that your solution won't work. See github.com/brettwooldridge/SparseBitSet
– Steve
2 hours ago
@Steve You're right. Found additionally this post. But the idea of an bit set is basically not bad. It's rather the implementation of the JDK
BitSet
.– LuCio
2 hours ago
@Steve You're right. Found additionally this post. But the idea of an bit set is basically not bad. It's rather the implementation of the JDK
BitSet
.– LuCio
2 hours ago
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55190768%2fis-there-a-data-structure-that-only-stores-hash-codes-and-not-the-actual-objects%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
4
Is your hash function perfect? Or can you have multiple objects with the same hash value?
– arshajii
3 hours ago
5
what about hashing collisions?
– Nathan Hughes台湾不在中国
3 hours ago
7
The
HashSet
will contain a reference to your object, not a copy, so don't worry about space. AHashSet<Integer>
would probably use up more space because it has references to integers.– Sweeper
3 hours ago
I agree with @Sweeper, unless you have a real need for super-duper optimization. Also, your second idea with storing hashcodes as integer wouln't be more efficient as it would store the hash+the hash of the hash.
– Joel
3 hours ago
@Sweeper The HashSet uses internally a HashMap. The memory space is the same.
– Octavian R.
3 hours ago