Optimizing PDF Compression
Free public domain sheet music from IMSLP / Petrucci Music Library
Some PDF files use suboptimal image compression; this page lists some of the ways to fix this by recompressing/recreating the file. The methods described should be lossless; it is better to submit a larger file if it cannot be compressed losslessly.
Contents |
Methods
Using PDF Split and Merge
See here for information on this program.
Using pdfimages
This method basically attempts to reconstruct the PDF file by ripping out the images, and recreating the PDF file from the images. The following is a Ruby script to do this.
Prerequisites
- The Ruby interpreter for running the script.
- PDFtk
- The convert command from Imagemagick.
- The tiff2pdf command from LibTIFF; Windows port can be found here.
- The pdfimages command from the xpdf tools.
What PDF files this method cannot be used on
- Grayscale and colour PDF files
- PDF files that do not contain images (i.e. retypesetted PDF files)
- Other PDF files with weird creation methods (generally rare)
Common problems and solutions
- The resulting PDF file has colours inverted
- Solution: run this script with -negate at the end, for example: ./pdfcompress.rb somefile.pdf -negate for *nix, and ruby pdfcompress.rb somefile.pdf -negate for Windows
#!/usr/bin/ruby
require( 'fileutils' )
BASICCONVERTOPTIONS = " -compress Group4"
DELETEIGNOREFILE = false #Automatically delete files which grow in size after recompression?
TMPDIRNAME = "tmpx139toslw"
if ARGV[0] === NIL
$stderr.puts "Syntax: pdfcompress.rb <PDF file> ( <additional convert options> )"
exit 1
end
if ARGV[1] === NIL
convertoptions = BASICCONVERTOPTIONS
else
convertoptions = ARGV[1] + BASICCONVERTOPTIONS
end
begin
Dir.mkdir( TMPDIRNAME )
$stderr.puts "Processing file " + ( file = ARGV[0] ) + "..."
#Convert to individual PDFs
system( "pdfimages \"" + file +"\" " + File.join( TMPDIRNAME, "images" ) )
Dir.glob( File.join( TMPDIRNAME, "*" ) ).each { |imagefile|
$stderr.printf( "\rCompressing " + File.basename( imagefile ) + "..." );
system( "convert #{convertoptions} \"" + imagefile + "\" \"" + imagefile.sub( /\.[^.]*$/, ".tiff" ) + "\"" )
system( "tiff2pdf \"" + imagefile.sub( /\.[^.]*$/, ".tiff" ) + "\" -o \"" + imagefile.sub( /\.[^.]*$/, ".pdf" ) +"\"" )
}
$stderr.printf( "\n" );
#Put them all together now
$stderr.printf( "Combining PDF files... " );
system( "pdftk \"" + Dir.glob( File.join( TMPDIRNAME, "*.pdf" ) ).join( "\" \"" ) + "\" cat output \"" + ( output_filename = File.basename( file ).sub( /#{File.extname( file )}$/, ".2.pdf" ) ) + "\"" )
$stderr.printf( "Done\n" );
#Compare the sizes
if( File.size( file ) > File.size( output_filename ) )
$stdout.puts "Compressed file " + File.basename( file ) + " - Compressed from " + File.size( file ).to_s + " to " + File.size( output_filename ).to_s
else
$stdout.puts "Ignored file " + File.basename( file ) + " - Changed from " + File.size( file ).to_s + " to " + File.size( output_filename ).to_s
File.delete( output_filename ) if DELETEIGNOREFILE
end
ensure
#Clean up temp dir
Dir.glob( File.join( TMPDIRNAME, "*" ) ).each { |delfile| File.delete( delfile ) }
Dir.delete( TMPDIRNAME );
end

