Using `html-pipeline` in Jekyll.

With the Jekyll team pretty much having rejected my idea on custom markdown based on the notion that they "have them all" (which is actually wrong but whatever.) I had to think of a way to get html-pipeline to replace the built in Markdown processors they had. This reduced the necessary hacks to get better Pygments support, and even gives me the flexibility to add anything I want to the pipeline of my content.

Instead of patching their classes or even caring about what they had going on I decided the best approach was to remove their Markdown processors by changing the extension to something I would probably never use. In this case I chose .jekyll. I also decided to disable their Pygments support too.

markdown_ext: jekyll
pygments: false

The Pygments processor already built into html-pipeline is a great feature already but I had decided that if I ever wanted to go back to Octopress I should probably keep my site backwards with their core syntax for highlighting, since I had also already cleaned up and ported it into my site. I wanted to keep Github's built-in processor timeout handling, I also want to also add in the "Octopress" style figure elements.

module HTML
  class Pipeline
    class PygmentsFilter < Filter
      Templates = {
        :line_number => %{<span class="line-number">%s</span>\n},
        :code_line => '<span class="line">%s</span>',
        :code_wrapper => <<-HTML
          <figure class="code">
            <div class="highlight">
              <table>
                <tbody>
                  <tr>
                    <td class="gutter">
                      <pre>%s</pre>
                    </td>
                    <td class="code">
                      <pre><code class="%s">%s</code></pre>
                    </td>
                  </tr>
                </tbody>
              </table>
            </div>
          </figure>
        HTML
      }

      # --
      # Searches for the Pygments lexer and then attempts to highlight
      # caching the timeout (if one happens) and moves along as if nothing
      # ever happened (by not highlighting your code of course.) Then it
      # wraps it in what I hope is close to Octopress style `figure`
      # --
      def call
        doc.search("pre").each do |node|
          next unless lexer = Pygments::Lexer[node["lang"]]
          if out = highlight_without_timeout(lexer, node.inner_text)
              .match(/<pre>(.+)<\/pre>/m)[1]

            code = wrap_lines_and_create_numbers(out)
            node.replace Templates[:code_wrapper] % [
              code[:line_numbers],
              node["lang"],
              code[:code]
            ]

          else
            next
          end
        end

        doc
      end

      # --
      # Does what it says, what the else do you need to know?
      # --
      def highlight_without_timeout(lexer, text)
        lexer.highlight(text)
      rescue Timeout::Error
        nil
      end

      # --
      # Loop through each line of a Pygments highlighted code block and
      # wrap it with a span that tells us it's a line number and add the
      # index so people know which line of code they are reading.
      # --
      def wrap_lines_and_create_numbers(lines)
        code, line_numbers = "", ""

        lines.each_line.with_index(1) do |l, i|
          code+= Templates[:code_line] % l
          line_numbers+= Templates[
            :line_number
          ] % i
        end

        {
          :code => code,
          :line_numbers => line_numbers
        }
      end
    end
  end
end

Note: That might not be how Octopress does it, considering my site has changed over the years, so it might be slightly different, it isn't that hard to make it the same though.

The next part of my task was to build the actual generator. I wanted to do this in a way that would allow me to extend it without disrupting an the existing configuration file. I opted to stick with a hash for all, so that if I added anything more than the :gfm option it would not interrupt anything users have already done.

I also did not want to give people the ability to select what types of filters they had... yet, at least until I wanted to take the time to see what all did what and build a pre-approved list that could act as basic security. So I ended up with the following:

module Jekyll
  module Converters
    class Pipeline < Converter
      safe true

      FILTERS = [
        HTML::Pipeline::MarkdownFilter,
        HTML::Pipeline::AutolinkFilter,
        HTML::Pipeline::PygmentsFilter,
      ]

      # --
      # Make sure we have some default options.
      # @return [Hash]
      # --
      def ensure_default_opts
        @config["pipeline"] ||= {}
        @config["pipeline"]["exts"] ||= "github_markdown"
        @config["pipeline"]["opts"] ||= {}
      end

      # --
      # Setup HTML Pipeline
      # --
      def setup
        unless @setup
          ensure_default_opts
          @parser = HTML::Pipeline.new(FILTERS)
          @setup = true
        end
      end

      # --
      # The extension we match.
      # --
      def output_ext(ext)
        ".html"
      end

      # --
      # The extension we match.
      # --
      def matches(ext)
        ext =~ /\.(#{@config["pipeline"]["exts"]})\Z/
      end

      # --
      # Convert the content to HTML.
      # @return [String]
      # --
      def convert(content)
        gfm = @config["pipeline"]["opts"]["gfm"]
        setup; @parser.call(content, :gfm => gfm)[:output].to_s
      end
    end
  end
end

After I threw all that into the _plugins folder and went on about my way and did a jekyll build it let html-pipeline take over and do it's job. I am happy, to say the least. And with all that said, I hope that you too will switch to html-pipeline because it will allow you to adjust your content on the fly a lot easier than Jekyll allows you to.