Oct

Ruby and Bash FTW
here’s a little thing that made me happy today. i spent some time automating pdf printing. i have a directory structure like so: receipts/2008/10. when needed, everything is packed into a single pdf and sent to the printer down the road. to do this, every file has to be converted to pdf and combined to a single file. input formats are rtf, txt, pdf or multipage tiff.
so today wrote a dirty ruby script for this. mashing little unix tools into your ruby code is good. dirty, trash-eating shell opening code, i salute you! it’s taken me a while to come around to this, being the ex-java i am. 2 years away from jcp.org seem to have done me good ;)
Low effort pdf conversion using the shell
sam, suggested i should look into ruby-cocoa, since preview can print many formats to pdf. but a bit of googling lead me to CUPS-PDF, with which you can set up a printer which dumps pdfs into a specified directory. Follow along at http://www.codepoetry.net/projects/cups-pdf-for-mosx.
So lets see if we have everything we need to get started. First off, lets check if the CUPS printer has been installed:
raise Exception.new("install cups.") unless `lpstat -a`.grep Regexp.new(CUPS_PRINTER)
lpstat -a gives you a list of all printing devices configured on your machine. We’re going to search this for CUPS_PRINTER using grep. We’ll actually use this printer to convert input files, like rtfs, into pdf files on disk.
So now we want to Grab the files and pass them to the printer. This should all be automatic. A quick search reveals lp:
`lp -d #{ CUPS_PRINTER } #{ file }`
Nice, this now dumps the converted file into the CUPS root directory, which by default is cups-pdf on your desktop.
Multipage Tiffs
Unfortunately lp only prints the first page of multipage TIFFS to pdf. So what we need to to do is to extract the pages from the tiff. Lets see if there’s anything available:
man -k tiff
This shows us tiffutil, which can be utilized to ‘manipulate tiff images’. Lets see how this works:
man tiffutil
Bingo, there’s an -extract option which pulls out the specified page. So now we just need to know how many pages are in the tiff. There’s an -info option that gives us the information we need. Each image is described with an entry that starts with “Directory at”. Lets count the number of times this appears:
`tiffutil -info #{ file } | grep "Directory at" | wc -l`.strip.to_i
Here we’re piping the output of tiffutil to grep, which results in a line per found image. We pipe this through wc -l to count the number of images.
Combining the pdf
Thanks again to sam for pointing me to pdfcombine. just specify all files you want to combine, the outfile with -out, and you’re done. So let’s see if it’s installed…
raise Exception.new("install pdfcombine command line tool. ") unless `which pdfcombine`
which searches $PATH for the specified executable. The full path is returned. To run it:
`pdfcombine #{ exports.join(" ") } -o #{ packfile }`
And you’re done!
The script
require 'fileutils'
require 'logger'
#
# Combines all txt, rtf, tiff multipage, pdf files in a named subdirectory.
# Results in a single pdf called <subdir>-pack.pdf
#
class ReceiptsPrinter
# requires you to be in this subdirectory for safety reasons (FileUtils.rm_r is used.)
RECEIPTS_ROOT = "/Users/me/Documents/2008"
# cups prints to this directory
CUPS_ROOT = "/Users/me/Desktop/cups-pdf"
# cups printer name
CUPS_PRINTER = "CUPS_PDF"
#
# pass in the subdirectory name which includes the files
#
def initialize(subdir)
@subdir = subdir
@files = Dir.glob("#{ subdir }/*.*")
@export_dir = "#{ subdir }/export/"
@logger = Logger.new(STDOUT)
# please meet the conditions.
raise Exception.new("specify the subdir to process.") unless @subdir
raise Exception.new("cd to #{ RECEIPTS_ROOT } before starting.") if `pwd`.chomp != RECEIPTS_ROOT
raise Exception.new("install pdfcombine command line tool. ") unless `which pdfcombine`
raise Exception.new("install cups to enable conversions: http://www.codepoetry.net/projects/cups-pdf-for-mosx ") unless `lpstat -a`.grep Regexp.new(CUPS_PRINTER)
# sets up the export directory
FileUtils.rm_r @export_dir rescue nil
FileUtils.mkdir_p @export_dir
end
#
# iterates through all files in subdirectory, converting them to pdf if necessary, then combines them
# into a single pdf.
#
def print
@files.each { |file| convert(file) }
packfile = combine
@logger.info "Done! Packed it up into #{ packfile }."
end
private
def convert(file)
case type(file)
when "pdf" : stage(file)
when "tiff"
extract_tiff_pages(file).each do |file|
stage cups_converter(file)
end
else
stage cups_converter(file)
end
end
def stage(file)
@logger.info "staging file #{ file }"
dest = @export_dir + file.split('/').last
`cp #{ file } #{ dest }`
end
def combine
exports = Dir.glob("#{ @export_dir }*.pdf")
packfile = "#{ @export_dir }/#{ @subdir }-packed.pdf"
command = "pdfcombine #{ exports.join(" ") } -o #{ packfile }"
# pack it up!
`#{ command }`
packfile
end
def cups_converter(file)
command = "lp -d #{ CUPS_PRINTER } #{ file }"
regexp = Regexp.new("Auftrags-ID ist #{ CUPS_PRINTER }-(\d*) .*")
string, id = *`#{ command }`.match(regexp)
raise Exception.new("cups printing failed for #{ file }: #{ string }") unless id
# give the thing a moment to generate
sleep 1 while !(pdf = Dir.glob("#{ CUPS_ROOT }/job_#{ id }-*.pdf").first)
pdf
end
def extract_tiff_pages(file)
(1..tiff_page_number(file)).map do |page|
extract_tiff_page(file, page - 1)
end
end
def extract_tiff_page(file, page)
filename = "#{ @export_dir }#{ page }-#{ file.split('/').last }"
@logger.info "tiff: #{ filename }"
`tiffutil -extract #{ page } #{ file } -out #{ filename }`
filename
end
def tiff_page_number(file)
`tiffutil -info #{ file } | grep "Directory at" | wc -l`.strip.to_i
end
def type(file); file.split(".").last; end
end
printer = ReceiptsPrinter.new(month = ARGV.first)
puts ">>>>> Please back the files in #{ month } before doing this. Starting; press CTL-C to abort. \n\n"
printer.print



Comments
There are 0 Comments for this post. Write comment →
Write a comment
Required in bold.