• 2 Posts
  • 89 Comments
Joined 2 years ago
cake
Cake day: July 1st, 2023

help-circle









  • “could help solve” was the quote.

    Physics is like that joke about halving the distance to a woman at a bar*. I don’t expect it will ever be entirely solved, but whatever stands as the “for all practical purposes” of the era might. I’m taking “help solve” as just another halving of the distance in this analogy.

    * A mathematician and an engineer are sitting at a table drinking when a very beautiful woman walks in and sits down at the bar.

    The mathematician sighs. “I’d like to talk to her, but first I have to cover half the distance between where we are and where she is, then half of the distance that remains, then half of that distance, and so on. The series is infinite. There’ll always be some finite distance between us.”

    The engineer gets up and starts walking. “Ah, well, I figure I can get close enough for all practical purposes.”


  • The real meat of the story is in the referenced blog post: https://blog.codingconfessions.com/p/how-unix-spell-ran-in-64kb-ram

    TL;DR

    If you’re short on time, here’s the key engineering story:

    • McIlroy’s first innovation was a clever linguistics-based stemming algorithm that reduced the dictionary to just 25,000 words while improving accuracy.

    • For fast lookups, he initially used a Bloom filter—perhaps one of its first production uses. Interestingly, Dennis Ritchie provided the implementation. They tuned it to have such a low false positive rate that they could skip actual dictionary lookups.

    • When the dictionary grew to 30,000 words, the Bloom filter approach became impractical, leading to innovative hash compression techniques.

    • They computed that 27-bit hash codes would keep collision probability acceptably low, but needed compression.

    • McIlroy’s solution was to store differences between sorted hash codes, after discovering these differences followed a geometric distribution.

    • Using Golomb’s code, a compression scheme designed for geometric distributions, he achieved 13.60 bits per word—remarkably close to the theoretical minimum of 13.57 bits.

    • Finally, he partitioned the compressed data to speed up lookups, trading a small memory increase (final size ~14 bits per word) for significantly faster performance.





  • After a lifetime of dealing with dandruff, I’ve recently discovered the best anti-dandruff shampoo in the world, hands down: hot water.

    No shampoo at all. Just scrubbing with hot water every two-three days. After about a month or two the scalp stops overcompensating for the degreasing effect of shampoos and calms down. The first month will be pretty itchy and dandruffy though. Tapering out shampoo use gradually and some spray-on/drip-on anti-seboreic dermatitis lotion can keep things under control until the scalp normalizes.

    I think this was the video that I first got the idea from:https://www.youtube.com/watch?v=zmDBYsRJN7A