More than three years ago one prominent Wikipedia community
member, who didn’t have enough time to work on our technology,
but had enough of it to criticize it, had a problem. He was
working for a super-computing lab that boasted one of fastest
world’s clusters (IBM blue-something), and to relieve the stress
of some of his hard work, was trying to solve a computer science
mystery:
Given a dictionary, find words that have most anagrams. With
few-hundred-thousand word dictionary, on 8-way machine this
should complete in less than 12 hours.
The supercomputing lab inspired solution was something like this
(I may be wrong, years pass though):
- Put all the words into an array
- Start generating letter sequences: aaaa, aaab, aaac, …
- For every generated word, traverse the array, find how many
matches it hits
- Get sequences with most matches
See, the problem was not …
[Read more]