My Profile Photo

Chris Arcand

Principal Engineer building Terraform at HashiCorp 👨‍💻 Formerly: Red Hat, NBC SportsEngine, RubyMN organizer. Once played the clarinet for a living, now making a living to play hockey. I make pointless connections to Minnesota as a Minnesotan does.

St. Paul, MN

Manually setting languages on GitHub with Linguist's (Improved) Vim Modeline Detection

TL;DR. Show me the good stuff!

As you may know, GitHub has had filetype specific syntax highlighting since [forever] and a fun little breakdown of languages on a repository’s home page since 2012. The latest incarnation of the latter looks like this:

Languages are recognized for both of these features by GitHub’s Linguist, an open source Ruby gem released in 2011 from GitHub’s original highlighting lexers. It works fairly well using file extensions, and common project layouts (example: ignoring vendored file locations).

However, for something with an uncommon layout or extension it’s understandably inaccurate. My personal dotfiles are a good example, where I designate a file to be automatically symlinked to my home directory by the installer scripts by adding a .symlink extension. Here’s my zshrc from a while ago:

Manually setting languages for Linguist

The other day I discovered an interesting feature of Linguist added last year: you can manually set the reported language in a file using Vim or Emacs modelines. To summarize for everyone, Vim modelines are magic comments that set options for Vim for the particular file.

For example, I may decide that I want Vim to turn on autocorrect for a foreign language in one file only. I can add a comment (with whatever syntax for a comment is necessary given the file type it is) in the first or last few lines to set the spellcheck to German and wrap text at 120 characters with vim: spell spelllang=de tw=120.

Or, more realistically, I can set the syntax for a file with an awkward or very general file extension - like (gasp) the zshrc.symlink example from above. With # vim: syntax=zsh, Linguist should recognize that it’s a shell file, set the correct syntax, and report it as such!

No soup for you

…Except it didn’t when I tried it. I found out that Linguist’s parser is very restrictive as to the format of it’s acceptable modelines:

  • In Vim there are two different ways to use a modeline: vim: syntax=java or vim: set syntax=java:. The latter is compatible with versions of Vi, Vim’s predecessor. Linguist does not support the first, more common syntax.
  • Linguist recognizes the Vim filetype option and not the Vim syntax option. They do different things in the editor. It’s very common that people don’t care about the filetype and only want the syntax to be correct, and so use that one instead.

But you can haz all Vim modelines, now

I love regular expressions, so I hunted down where Linguist does this modeline parsing and submitted a patch that’s been accepted and merged. When the next version of Linguist is released (current version: v4.7.4), you’ll be able to use all the various common ways of identifying a file on GitHub with Vim modelines.

From the updated Linguist README:

<- Back