- Deprecating finalizers and using cleaner queues (https://openjdk.org/jeps/421)
- Weak references (https://learn.microsoft.com/en-us/dotnet/api/system.weakrefe..., https://docs.oracle.com/javase/8/docs/api/java/lang/ref/Weak...)
Related tickets,
"runtime: add AddCleanup and deprecate SetFinalizer" - https://github.com/golang/go/issues/67535
"weak: new package providing weak pointers" - https://github.com/golang/go/issues/67552
One day Go will eventually have all those features that they deemed unnecessary from "complex" languages.
People often want to have millions of S3 objects in memory and reducing the memory used would be very desirable.
I interned all the strings used - there are a lot of duplicates like Content Type and it reduced the memory usage by about 30% which is great.
I wonder how much difference this little fix mentioned in the article for go1.23.2 will make? https://github.com/golang/go/issues/69370
The article also mentions strings.Clone which has been around for a while. Using that is very easy and it stops big strings being pinned into memory. I had a problem with this in the S3 backend where some string was pinning the entire XML response from the server which strings.Clone fixed.
Look past the "loading the whole book in memory" the author gets to the point soon enough.
The ip address example is ok. It's true, and highlights some important points. But keep in mind pointers are 64 bit. If you're not ipv6, and you're shuffling a lot of them, you're probably better off just keeping the uint64 and converting to string and allocating the struct as needed. interning doesn't appear to be much of a win in that narrow case. but if you do care about ipv6, and you're connecting to millions of upstreams, it's not unreasonable.
It's neat it's available. it's good to be aware of interning, but it's generally not a huge win. For a few special cases, it can be really awesome.
** edit uint32 for ipv4. bit counting is hard.
https://cs.opensource.google/go/go/+/refs/tags/go1.23.1:src/...
This is called an IPv4-Mapped Address.
Writing your own interner is under hundred lines of code and takes about 15 minutes. Writing a thread-safe interner is a bit trickier problem, but there are good libraries for that.
> I have a very large plaintext file and I’m loading it fully in memory. This results in a string variable book that occupies 282 MiB of memory.
At what point does something become (very) large?On 64-bit machines though strings shorter than 8 bytes are better off not interned.
This will not work for the average struct, and if you do use it, you now no longer can include any, for example `[]byte` fields in the struct or else it will no longer compile.
I always find it funny that `string` is comparable, but `[]rune` and `[]byte` are not.
For slice types it's more complex because there is mutability implicit aliasing. And the backing array pointed at may change. I guess the latter should point us toward deep value comparison for slice types since any other comparison would be quite flimsy.
This is an instance of dictionary encoding where terms are words and dictionary code space is the entire machine addressing space (heap).
The disadvantage of this approach is that you can only "deduplicate" words that are exactly the same and you cannot remove duplicated substrings. Another disadvantage is that the dictionary code size is 8 bytes which means you achieve some saving only for words that are longer than 8 bytes (plus some other overhead for the interning index).
Compression algorithms that use dictionary encoding typically use code/symbol sizes smaller than 64 bits, often using variable length encoded numbers encoded with some kind of entropy coding that is able to use short bit sequences for codes representing dictionary entries appearing often and longer bit sequences for less frequent ones.
But the idea is the same. During decompression you read each "number" from the compressed stream and you use it effectively to index an array that contains the target (sub)strings.
The algorithm in the post just interprets the 64-bit dictionary code as an index in the giant array that is the machine addressing space.