play/type blog

We are creating Germany's juiciest event platform, boomloop.com. Because we love the Internet more than our own mothers. See for yourself. check out boomloop.com


23
Oct

Ruby and Bash FTW

here’s a little thing that made me happy today. i spent some time automating pdf printing. i have a directory structure like so: receipts/2008/10. when needed, everything is packed into a single pdf and sent to the printer down the road. to do this, every file has to be converted to pdf and combined to a single file. input formats are rtf, txt, pdf or multipage tiff.

so today wrote a dirty ruby script for this. mashing little unix tools into your ruby code is good. dirty, trash-eating shell opening code, i salute you! it’s taken me a while to come around to this, being the ex-java i am. 2 years away from jcp.org seem to have done me good ;)

Low effort pdf conversion using the shell

sam, suggested i should look into ruby-cocoa, since preview can print many formats to pdf. but a bit of googling lead me to CUPS-PDF, with which you can set up a printer which dumps pdfs into a specified directory. Follow along at http://www.codepoetry.net/projects/cups-pdf-for-mosx.

So lets see if we have everything we need to get started. First off, lets check if the CUPS printer has been installed:

    raise Exception.new("install cups.") unless `lpstat -a`.grep Regexp.new(CUPS_PRINTER)

lpstat -a gives you a list of all printing devices configured on your machine. We’re going to search this for CUPS_PRINTER using grep. We’ll actually use this printer to convert input files, like rtfs, into pdf files on disk.

So now we want to Grab the files and pass them to the printer. This should all be automatic. A quick search reveals lp:

    `lp -d #{ CUPS_PRINTER } #{ file }`

Nice, this now dumps the converted file into the CUPS root directory, which by default is cups-pdf on your desktop.

Multipage Tiffs

Unfortunately lp only prints the first page of multipage TIFFS to pdf. So what we need to to do is to extract the pages from the tiff. Lets see if there’s anything available:

    man -k tiff

This shows us tiffutil, which can be utilized to ‘manipulate tiff images’. Lets see how this works:

    man tiffutil

Bingo, there’s an -extract option which pulls out the specified page. So now we just need to know how many pages are in the tiff. There’s an -info option that gives us the information we need. Each image is described with an entry that starts with “Directory at”. Lets count the number of times this appears:

     `tiffutil -info #{ file } | grep "Directory at" | wc -l`.strip.to_i

Here we’re piping the output of tiffutil to grep, which results in a line per found image. We pipe this through wc -l to count the number of images.

Combining the pdf

Thanks again to sam for pointing me to pdfcombine. just specify all files you want to combine, the outfile with -out, and you’re done. So let’s see if it’s installed…

    raise Exception.new("install pdfcombine command line tool. ") unless `which pdfcombine`

which searches $PATH for the specified executable. The full path is returned. To run it:

     `pdfcombine #{ exports.join(" ") } -o #{ packfile }`

And you’re done!

The script

    require 'fileutils'
    require 'logger'

    #
    #  Combines all txt, rtf, tiff multipage, pdf files in a named subdirectory. 
    #  Results in a single pdf called <subdir>-pack.pdf 
    #
    class ReceiptsPrinter

      # requires you to be in this subdirectory for safety reasons (FileUtils.rm_r is used.)
      RECEIPTS_ROOT = "/Users/me/Documents/2008"

      # cups prints to this directory
      CUPS_ROOT = "/Users/me/Desktop/cups-pdf"

      # cups printer name
      CUPS_PRINTER = "CUPS_PDF"

      # 
      #  pass in the subdirectory name which includes the files
      #
      def initialize(subdir)
        @subdir = subdir
        @files = Dir.glob("#{ subdir }/*.*")
        @export_dir = "#{ subdir }/export/"
        @logger = Logger.new(STDOUT)

        # please meet the conditions.   
        raise Exception.new("specify the subdir to process.") unless @subdir
        raise Exception.new("cd to #{ RECEIPTS_ROOT } before starting.") if `pwd`.chomp != RECEIPTS_ROOT
        raise Exception.new("install pdfcombine command line tool. ") unless `which pdfcombine`
        raise Exception.new("install cups to enable conversions: http://www.codepoetry.net/projects/cups-pdf-for-mosx ") unless `lpstat -a`.grep Regexp.new(CUPS_PRINTER)

        # sets up the export directory
        FileUtils.rm_r @export_dir rescue nil
        FileUtils.mkdir_p @export_dir
      end

      #
      #  iterates through all files in subdirectory, converting them to pdf if necessary, then combines them
      #  into a single pdf. 
      #
      def print
        @files.each { |file| convert(file) }
        packfile = combine

        @logger.info "Done! Packed it up into #{ packfile }."
      end

      private
        def convert(file)
          case type(file)
          when "pdf" : stage(file)
          when "tiff" 
            extract_tiff_pages(file).each do |file| 
              stage cups_converter(file)
            end
          else 
            stage cups_converter(file)
          end
        end

        def stage(file)
          @logger.info "staging file #{ file }"
          dest = @export_dir + file.split('/').last
          `cp #{ file } #{ dest }`
        end

        def combine
          exports = Dir.glob("#{ @export_dir }*.pdf")
          packfile = "#{ @export_dir }/#{ @subdir }-packed.pdf"
          command = "pdfcombine #{ exports.join(" ") } -o #{ packfile }"

          # pack it up!
          `#{ command }`

          packfile
        end

        def cups_converter(file)
          command = "lp -d #{ CUPS_PRINTER } #{ file }"
          regexp = Regexp.new("Auftrags-ID ist #{ CUPS_PRINTER }-(\d*) .*")
          string, id =  *`#{ command }`.match(regexp)
          raise Exception.new("cups printing failed for #{ file }: #{ string }") unless id

          # give the thing a moment to generate
          sleep 1 while !(pdf = Dir.glob("#{ CUPS_ROOT }/job_#{ id }-*.pdf").first)

          pdf
        end

        def extract_tiff_pages(file)
          (1..tiff_page_number(file)).map do |page|
            extract_tiff_page(file, page - 1)
          end
        end

        def extract_tiff_page(file, page)
          filename = "#{ @export_dir }#{ page }-#{ file.split('/').last }"

          @logger.info "tiff: #{ filename }"
          `tiffutil -extract #{ page } #{ file } -out #{ filename }`

          filename
        end

        def tiff_page_number(file)
          `tiffutil -info #{ file } | grep "Directory at" | wc -l`.strip.to_i
        end

        def type(file); file.split(".").last; end
    end

    printer = ReceiptsPrinter.new(month = ARGV.first)
    puts ">>>>> Please back the files in #{ month } before doing this. Starting; press CTL-C to abort. \n\n"
    printer.print
Tags: bash, ruby Page: previous, next

Comments

There are 0 Comments for this post.  Write comment →

Write a comment

Required in bold.