Thursday, December 31, 2009

Book review: Pro Git by Scott Chacon

Like many others, I use git as a version control system for my Lisp code. Version control is one of those things that offer a lot of benefits with relatively little investment: I picked up a few basic git commands from tutorials on the internet (Github has a nice collection), but didn't bother to learn about git in depth.

However, as others started to contribute to my libraries, I was forced to learn more about certain features, most importantly branching and merging. After spending a few hours doing things the wrong way, I realized that I need to do some reading, and I was lucky to find Pro Git by Scott Chacon, one of the prominent git developers. Thanks to the author and the publisher (Apress), the book is available online. I really appreciate that Apress makes its books available online so I can start reading them immediately, and I have already ordered a dead tree copy.

The book is a pleasure to read, and is really useful. Instead of simply rehashing the manual, it focuses on explaining the concepts of git, and illustrates them with typical workflows. I found this invaluable: understanding the concepts behind git made me realize that I have been doing things the wrong way. I used subversion before, and it was still influencing the way I use git. For example, branching in git is really, really cheap, and I learned that I should do it more often.

My favorite chapters from this book are Git Branching, which explains the concept of branches, gives examples of basic branching and merging, branch management and typical workflows, and also discusses remote branches, and Distributed Git, which talks about distributed workflows in more detail.

I have learned a lot from this book, and hopefully it will help me manage my Lisp libraries better. So far I have been keeping changes on my hard disk between "releases" (updates to master), but now I think that I will make development branches for new features and push them more often.

Maxell pendrive suckage

I had to reinstall the OS on my parents' computer during the Christmas holidays. I forgot to bring my external hard disk with me, so I decided to buy a pendrive to backup their files.

Unfortunately, I picked a Maxell Venture to do the job. Maxell puts this pendrive in their Business Range line. I learned about this later because this is not indicated on the packaging. If it had been, I wouldn't have touched the product with a ten-foot pole. I consider pendrives a commodity: they should present a standard interface, and the only qualitative differentiation I can imagine is sleek and/or durable design or novelty packaging (not that I care for the latter).

Apparently, some pendrive manufacturers think otherwise. Maxell decided that it would indulge consumers with "extra features", such as "security and file compression software". Unfortunately, this makes the pendrive interface non-standard. The pendrive shows up as two physical devices, one contains a small partition called SECRET, the other has a large partition called PUBLIC. Apparently, this feature is governed by the firmware. There is no way to repartition the device (which shows up as /dev/sdb and /dev/sdc at the same time), and it is impossible to get rid of these "features". An added bonus is that I found it impossible to make this pendrive boot: the BIOS of my computer was understandably confused about this whole arrangement.

Nice job, Maxell. I returned the pendrive to the store the next day. When I googled for a possible solution, I learned about a similar "extension" called U3, which is equally useless but at least it can be removed (albeit only using Windows), and the U3 folks are decent enough to offer removal software on their website. By the way, Maxell support still hasn't answered my e-mail.

I think that companies should be free to offer "features" that break pendrives, but they should be obliged by law to display this warning message:

You are purchasing a pendrive with a non-standard interface.
Generally, this sucks.
Do yourself a favor and buy from one of our competitors.

Friday, December 25, 2009

Syntax highlighting with org-mode

Blogging is probably the best way to disseminate information about new (features in) libraries in the Lisp community—if you are reading this, you are probably subscribed to Planet Lisp, which nicely aggregates all Lisp-related blogs that are registered. On the other hand, blogging about Lisp is—or was—quite a hassle to me. I am using Google's blogger.com, which has nice features, but I am used to Emacs and prefer typing text there. Including Lisp code snippets was a pain in the butt: I had to keep playing with <pre> tags and it still didn't look right. But now I had a bit of time to explore the issue and found a workflow that is

pretty painless and
does syntax highlighting.

It uses org-mode, which I already use to manage my agenda. I would like to thank EMACS-FU, Blogistic Reflections, Drew Crampsie, Ross Lonstein, and Jānis Džeriņš for their help and/or blog posts on using org-mode.

Here is what you should do:

In Emacs, set org-export-htmlize-output-type to css. This makes org-mode emit formatting with <span> elements instead of inline style directives, which is the default.
Open a source file with org-mode (eg sample.org). Leave it empty, we just need it to generate some CSS. Use the Emacs function org-export (usually C-c C-e), and extract the CSS part. This is the first part of all the CSS styling you need.
Generate the second part with org-export-htmlize-generate-css.
Paste both parts into the CSS section served by your blog engine. For example, on blogger.com you should go to Layout, then Edit HTML, and paste it after the <Variable name=...> section where the CSS formatting starts. There will be some overlap between the two parts and possibly your existing template, but it should not matter. Edit colors, fonts, etc to the style you want. All I did was change the colors for the <pre> tags.
Turn off auto-conversion of line breaks. On blogger.com, you can find Convert line breaks in the Settings/Formatting section. Unless you do this, your blog engine could introduce unnecessary <br/> tags in the HTML.

You are now ready to post. A simple example looks like this in org-mode:

#+TITLE:     Sample blog post
#+AUTHOR:    Your Name
#+EMAIL:     your@email.address
#+LANGUAGE:  en
#+OPTIONS:   H:3 num:nil toc:nil \n:nil @:t ::t |:t ^:t -:t f:t *:tl creator:nil
#+OPTIONS:   TeX:t LaTeX:nil skip:nil d:nil tags:not-in-toc author:nil timestamp:nil
#+INFOJS_OPT: view:nil toc:nil ltoc:t mouse:underline buttons:0 path:http://orgmode.org/org-info.js

Here is some sample source code:
#+BEGIN_SRC lisp
(defun add2 (x)
  (+ x 2))
#+END_SRC
You can include example output:
#+BEGIN_EXAMPLE
CL-USER> (add2 3)
5
#+END_EXAMPLE
See the [[http://orgmode.org/org.html][Org Manual]] for further details.

You can include formatting options as shown in the example.

I use the following snippet in Emacs for settings and keybindings:

(defun org-export-body-as-html ()
  (interactive)
  (org-export-as-html 3 nil nil "blog" t))

(add-hook 'org-mode-hook
          (lambda ()
            (setq org-export-htmlize-output-type 'css)
            (local-set-key (quote [?\C-c ?\C-x]) 'org-export-body-as-html)))

Sunday, December 20, 2009

A tour of xarray: part 1

A tour of xarray

I frequently come across situations which require taking slices from arrays, and sometimes I need other high-level array manipulations, such as index permutations or projections to row- or column major. In an earlier attempt called affi, I handled these things using affine mappings, but now I realize that those are not general enough, and my need for these manipulations to be super-efficient was mostly imaginary. I started working on a library, which evolved to xarray, which is able to do these things (and much more—see the next post!) for any kind of array-like object, provided that they have the required minimal interface, defined as a few CLOS method. This blog post is the first of a series which is meant to give a tour of xarray.

The minimal interface

Xarray handles objects which can be addressed using a rectilinear coordinate system. No assumptions are made about the storage model, and if you want some class to be xarray-compatible, just define methods xdims, which gives a list of dimensions, and xref (also (setf xref), if your object is modifiable, but this is not required for basic functionality), which is a generic function pretty much like aref. Finally, there is the generic function xelttype to query element type. This is all you need to provide.

Based on xdims, reasonable defaults are provided for xrank, which returns an integer for the number of dimensions, xsize, which returns the total number of elements in the object, and (xdim object i), which is defined as (nth (xdims object) i) by default. Feel free to define these, though, if there is a "natural" way to do this for your objects (like it is for arrays, below).

To demonstrate how easy it is to do this for any kind of object, here are the definitions from array.lisp, which defines these methods for Common Lisp arrays:

(defmethod xelttype ((object array))
  (array-element-type object))

(defmethod xrank ((object array))
  (array-rank object))

(defmethod xdims ((object array))
  (array-dimensions object))

(defmethod xdim ((object array) axis-number)
  (array-dimension object axis-number))

(defmethod xsize ((object array))
  (array-total-size object))

(defmethod xref ((object array) &rest subscripts)
  (apply #'aref object subscripts))

(defmethod (setf xref) (value (object array) &rest subscripts)
  (setf (apply #'aref object subscripts) value))

In case a specialized method is not provided for an object, the default fallback methods (in atoms.lisp) just handle objects as atoms, ie arrays of rank 0. After you define the methods xdims, xref and xelttype—and nothing more!—for your class, you can use xarray's views. The (somewhat sloppy) terminology of xarray is to call classes which provide a minimal interface xrefable.

Views

Technically, views are functions mapping a set of rectilinear indexes to a (sub)set of an xrefable object called the ancestor. Practically, they are objects that walk and quack like arrays, but are usually a thin layer on other array-like objects.

Let's see some examples. I will use a Common Lisp array as a starting point, but of course these examples would work with any object as long as it is xrefable.

Slices

Slicing is the most obvious view that you can have for an array:

CL-USER> (defparameter *a* (make-array '(3 4) :initial-contents
                              '((1 2 3 4)
                                (5 6 7 8)
                                (9 10 11 12))))
CL-USER> (slice *a* 2 :all)
#<SLICE-VIEW 
#(9 10 11 12)  {B1025A1}>

Instead of an array, you get a slice-view object. Print-object for views usually just prints the elements as if they were an array, even if the ancestor isn't (this was simple to do, fancy printing may be introduced later at some point, but don't hold your breath). 2 selects the elements of same index along the 0th dimension, while :all selects all elements along the 1st.

You can get an array using take:

CL-USER> (take t (slice *a* 2 :all))
#(9 10 11 12)

The first argument tells take to return an object similar to the ancestor—we will tackle the concept of similarity and object creation in the next post about xarray.

For now, notice how slice dropped a dimension when you asked for a single index. You can avoid that by making it a list:

CL-USER>
(slice *a* '(2) :all)
#<SLICE-VIEW 
#2A((9 10 11 12))  {B9EBF59}>

You can drop unit dimensions with (drop object).

Lists of two elements select ranges (inclusive), which can also reverse elements, for example, ='(2 0)= would select the third, second and first elements along that dimensions. Negative integers count from the largest index (which is -1; think of -i as (- (xdim object d) i)) backwards, and :rev reverses all elements (so it is equivalent to \'(0 -1)). Finally, a vector just picks individual indexes (which can repeat, occur in any order, be negative following the conventions above, etc):

CL-USER> (slice *a* -1 :all)
#<SLICE-VIEW 
#(9 10 11 12)  {AC4FF31}>
CL-USER> (slice *a* :rev '(1 3))
#<SLICE-VIEW 
#2A((10 11 12) (6 7 8) (2 3 4))  {BAC83F9}>
CL-USER> (slice *a* '2 #(1 1 3))
#<SLICE-VIEW 
#(10 10 12)  {BB66859}>

Permutations

A permutation permutes the dimensions of an xrefable object:

CL-USER> (take t (permutation *a* 1 0))
#2A((1 5 9) (2 6 10) (3 7 11) (4 8 12))

Transposing is a special case of permutations, with indexes 1 and 0 (as above). Of course, you can combine views:

CL-USER> (slice (permutation *a* 1 0) :rev :rev)
#<SLICE-VIEW 
#2A((12 8 4) (11 7 3) (10 6 2) (9 5 1))  {AAAC849}>

Miscellaneous views

Two other special views are column-major-projection and flat views. The first one provides a projection of an object as if the elements of the two were arranged in column-major order (note that this may not be the actual storage model, as xarray does not make assumptions the latter but uses xref; however, for some special objects faster methods may be provided):

CL-USER> (column-major-projection *a* 6 2)
#<COLUMN-MAJOR-PROJECTION-VIEW 
#2A((1 3) (5 7) (9 11) (2 4) (6 8) (10 12))  {BE28921}>

Flat views just return an object with a single dimension, with the order of elements unspecified. This is advantageous if the object has a natural representation in some storage model, and we want to perform operations where the order of elements does not matter (eg sum the elements).

Semantics of views

(setf xref), when provided, sets elements in the ancestor of an object: if that is another view, the original ancestor (ie the object which is not a view) is modified. (Thus views always share structure. If you want to avoid this, use take to create a new object.

All the operations shown above are generic functions, you are free to specialize them. These methods

don't have to return subclasses of view, they are free to return any xrefable object as long as it shares structure,
are allowed to combine views.

For example, some CL array slices with contiguous row-major indexes could be implemented as displaced arrays, or a combination of a transpose and a slice can be folded into one view if the object supports that. Currently, these efficiency hacks are not implemented as I don't really need them, but the infrastructure is there.

Next time, I will address object creation and some generic operations on xrefable objects.

Sunday, December 6, 2009

first steps with ECL

I am living in Austria now and my German is very basic, so I use online translators to cope with the occasional e-mail in German that I can't decipher. I am using the mutt mail user agent, and in the past I have configured it to open HTML attachments in Firefox. However, with plain text messages manual copy & paste became tedious, so I decided to write a little script that opens them in Firefox.

This gave me a good excuse to play around with ECL, something that I wanted to do for a while. The choice of CL as a scripting language might seem odd to some people, but I find it is optimal for me. I am aware that there are DSLs for this specific purpose (called "scripting languages") out there, and some even look like Lisp. However, I find that it takes a bit of time to find my way around a new language, and since at this point I am convinced that I will keep using CL as my main programming language, learning a scripting language that gives no additional insight would be a wasted effort.

Here is the end result. The script is really simple, and it benefited quite a bit from the comments of people on comp.lang.lisp. I like using this script, but the greatest benefit I derived from writing it is discovering what a gem ECL is. For example, I only need to use the two simple commands

(compile-file "savebody.lisp" :system-p t)
(c:build-program "savebody" :lisp-files '("savebody.o"))

to compile the script to a small (37K) (I am serious — it is really that small) executable file via C. A nice summary of simple compilation cases can be found here. Reading the ECL manual, I found that there are even more powerful facilities for building programs.

I am very impressed with ECL, and will definitely consider it as a remarkable implementation in the future.

Tamas K Papp's blog