Last year I had a chance to talk about the internals of our
service: Pathtraq at Percona
Performance Conference (slides), in which I described the methods we
use to compress the URLs in our database to below 40% of
the original size, however had not released the source
code since then. I am sorry for the delay, but have finally
uploaded the code to github.com/kazuho/url_compress.
It is generally considered difficult to achieve high ratio for
compressing short texts. This is due to the fact that most
compression algorithms are adaptive, i.e., short texts
reach their end before the compressors learn how to encode them …
[さらに読む]