User:Geniusboy98/Obtaining scores from NYPO Archive

Obtaining and cleaning scores/parts from NYPO Archives

Be sure to check out Mazin's guide as well.

  1. Obtain Piupiannissimo's Java Downloader.
  2. Find the archive tag in the URL of the file you wish to download:
    Archive tag
  3. Use the program to download the score as described here.
    • On OS X, the program might not save the images with a file extension. ScanTailor will reject the files unless they have a file extension. If it does this, open a terminal, change directory to the images, and run for file in *; do mv "$file" "$file.jpg"; done
  4. Install ScanTailor and ImageMagick.
    • If using Windows, download the files from the respective download pages.
    • If using Linux, install the scantailor and imagemagick packages using your distribution's package mananger.
    • If using OS X, install Homebrew and run the command brew install scantailor imagemagick from the Terminal.
  5. Import the images into ScanTailor. Save the project in the same directory as the images.
    (TODO: add screenshots!)
  6. Complete the first five of ScanTailor's six steps as shown on the left panel:
    1. Click the arrow next to Fix Orientation. Let ScanTailor auto-process, and ensure the orientation for each image is correct
    2. If the scanned images are 2-up, click the arrow next to Split Pages. Let ScanTailor auto-process, and ensure that it properly identifies the page splits. Even if the pages are not 2-up, on each image make sure the crop bars do not cut off any text. You will probably have to make adjustments.
    3. Click the arrow next to Deskew. Let ScanTailor auto-process, and ensure that ScanTailor properly rotated each image. Unless the original scan is particularly poor, the angle for each image should be between about -1.5º and +1.5º.
    4. Click the arrow next to Select Content. Let ScanTailor automatically identify the crop region in each image, and then check each image to ensure ScanTailor got it right. If a crop bar is incorrect, drag it to the appropriate position. Make sure no page numbers, plate numbers, staff names, braces, etc. are cut off. If the crop bar goes to the edge of the image or farther than necessary, drag it to remove white space. Also crop out any pencil markings in the margins.
    5. Click the arrow next to Margins. The default settings are appropriate. No manual corrections are necessary in this step.
  7. Click the Output entry but don't click the arrow yet! Make sure under Output Resolution (DPI) that 600 is selected, under Mode that Black and White is selected, and under Despeckling that the smallest brush is selected.
    (TODO: dealing with page warping)
  8. This part takes the longest. Select the Fill Zones tab on the right, and for each image draw polygons over any and all marks on the page, making sure not to cover printed markings.
    Use your judgement. Most pencil markings can be erased, but if they appear to correct an error, leave them. If they mark a cut, erase them. Erase written-in accidentals unless they correct an error. Erase the New York Philharmonic Society stamp or any other seal if present. SAVE THE PROJECT FILE FREQUENTLY!
  9. After you have cleaned each page and double-checked your work, click the Output arrow at left. Let it auto-process; no further manual changes are necessary.
  10. In a terminal, change directory to the out folder that ScanTailor created in your folder with the original images. These are high-quality TIF images. To produce the final PDF, run this command:
    convert -compress group4 * SCORE_NAME.pdf
    The convert command is part of the ImageMagick suite. On Windows, you might have to include the full path to this command's location in the Program Files\ImageMagick directory.
  11. The PDF is ready to be uploaded! In the Misc. Info field, make sure to include the {{HiRes}} template, and in the Scanner field add {{NYPODA|<insert original NYPO artifact tag here>}}.