Last year I had a chance to talk about the internals of our service: Pathtraq at Percona Performance Conference (slides), in which I described the methods we use to compress the URLs in our database to below 40% of the original size, however had not released the source code since then. I am sorry for the delay, but have finally uploaded the code to github.com/kazuho/url_compress.
It is generally considered difficult to achieve high ratio for compressing short texts. This is due to the fact that most compression algorithms are adaptive, i.e., short texts reach their end before the compressors learn how to encode them …
[さらに読む]