Link Rodeo: Go Package Management and Boring Technology

Here are a number of interesting topics for you to think about this week.

I’ve been learning the Go programming language recently, and in the process I’ve been having conversations about it with friends and colleagues. Go has a unique package management system that has already caused me a number of headaches. The recommended method for taking care of package dependencies is lacking, at best. Over at Nerdbucket, my buddy Nerdmaster has written a thoughtful piece about it.

While I’m talking about new technology, I’d also like to contradict myself by agreeing with Dan McKinley’s great piece, “Choose Boring Technology.” He argues that a project should be careful about adopting lots of new tools and technologies. It reminds me of a time recently when I was looking for a Node.js programmer, and one of the replies I got back was, “For us, Node.js is glue. Its ecosystem is still too young to support anything long-term. Libraries and packages move too fast to build a product that will need actual maintenance.” I’ve had the same feeling about many technologies I’ve wanted to try, such as Ocsigen for OCaml, which has a build system and API that is always several steps ahead of its documentation.

This post’s featured photo is courtesy of Flickr user MunicipioPinas.

An English-language Stemmer for OCaml

A stemming algorithm attempts to reduce words to their stem. For instance, “swimming” would be reduced to “swim”, and “avocados” would become “avocado”. This is useful in a number of situations, most especially in searching text. This library is a direct port of the Porter English stemming algorithm.

It was one of my first OCaml projects. I wrote it back in 2003, when I was still new to the language. I had been spending a lot of time writing C libraries that were being called by Perl scripts for my day job. Perl has, or had, a cumbersome, messy interface to C that made such interfaces very difficult to write and maintain.

When I discovered how easy it was to link C libraries into OCaml, I was overjoyed! This was my first attempt. Before reading further, check out my ocaml-stemmer library on GitHub.

Updating the Code

Recently, while overhauling all of my publicly-available code, I decided to update my English-language stemmer for OCaml. It’s not a very large piece of code, but its age really shows. It wouldn’t compile cleanly with the latest version of OCaml. It looks like the code of somebody who hasn’t really grokked functional programming yet. Just look at this.

let rec replace_end word (rule_list : (int * string * string * int) list) =
  match rule_list with
      hd :: tl ->
        if (match_rule word hd) then
          let (rule, _, _, _) = hd in
            (rule, apply_rule word hd)
          replace_end word tl
    | [] ->
        (0, word)

Ouch, right?

I decided that the scary code would stand as a good message ((Or maybe I should say a good warning.)) to future functional programmers. For now, I just wanted to get this code to compile and not look messy. That ended up being easy.

Finding The Bug

Once I got it compiled cleanly, however, I found a bug. Back in 2003, I was big on test-driven development. I wrote tests for lots of code. The OCaml stemmer, it turns out, has been broken for quite a while. It doesn’t handle words with apostrophes correctly!

I thought that fixing the bug it would be a challenge. However, I quickly I discovered in the OCaml manual that the or operator was deprecated, and that || should be used instead. Embarrassingly, the or operator was deprecated back in 2002. That never should have been in the code! You can view the commit which fixed the bug here.

My Stemmer Library is Now on OPAM

This is my second library on OPAM, including my prime number library. You can view it on OPAM here.

Prime Number Library for OCaml

A couple of weeks ago, I cleaned up my prime number library for OCaml. This library has a number of primality-testing methods in it, but my favorite is the Miller-Rabin primality test. It’s fast and rather accurate.

If you’d like to take a look at the library, please check out the camlprime GitHub page. The library is pretty easy to use. If you download and compile the library, you’ll end up with a toplevel that you can play with.

The file has some examples of how to use the primality tests. However, my favorite thing about this library is that it includes a lazy list implementation of prime numbers. The following example shows how to set up a lazy list of prime numbers proved using the MR algorithm in the toplevel.

# open Num ;;
# let prime_list = Prime.make (Prime.miller_rabin 500) (num_of_int 500) ;;
val prime_list : Num.num LazyList.t = LazyList.Node (Int 503, <lazy>)
# Prime.nth prime_list 500 ;;
- : Num.num = Int 4363

The library is pretty fast, even for really large numbers. I’ve tested it on 300-digit prime numbers, and I’m sure it will scale to sizes much larger than that.

Any thoughts or improvements? Let me know in the comments.