Digitization Procedures

Integrate left & right sides of the batch edit

  1. Open NameWiz.
  2. Name the files in their respective folders. Do the left folder first then the right one.
  3. Enter the following information:
    • - enable Alphanumeric, Increment & Numerical.
    • the Padding Mask should be the name of the file followed by an 'e' for edit and four zero's (ex. abce0000).
    • Start with this is zero if doing the left side and one if doing the right side.
    • Increment by is two.
  4. Click Preview. Scroll down to check the results, if everything is ok click apply.
  5. Scroll down again to make sure all the files were renamed properly (This will be indicated by an 'ok' beside each file. If 'error' appears beside any of the files click undo and try again.
  6. If everything renamed properly click reset and repeat for the right side (except start with 1).
  7. Once you have named the left and right images in their respective folder create another folder and combine the images.

QA/Edit

  1. Make sure you have the book with you to QA.
  2. Open ThumbsPlus. Select the folder containing the batched edits you wish to QA. Once the folder is selected, ThumbsPlus will automatically load the images in the view area.
  3. Once the images are loaded, double click on the first image to display the full image (If the image does not fit to the screen select View>Zoom>Fit All or hit F6 to display the full image. Hitting F6 again will set the zoom to 100%).
  4. The first thing you should notice is that the covers are in grayscale, and the back cover follows the front cover (e.g. front cover = lwge0000, back cover = lwge0001). Replace the front cover with the colour cover scan (done previously). Delete the grayscale back cover (lwge0001), and insert the colour back cover scan after the very last scanned image.
  5. You can now view each image in succession by hitting the space bar to move forward, and backspace to move backwards. Examine each image comparing it to the page in the actual book. Make sure to keep track of file numbers that need corrections and a description of the problem. Check for the following problems:
    • make sure there are no missing pages and that all the pages are in order
    • levels (image is not too dark or too light)
    • blemishes that obscure the text (such as dust or hair that was picked up during scanning. Do not correct anything that's in the original document uneven text (Note: Some books will have pages that contain text that has been printed unevenly on the page. Make sure the text is straight even if the page itself happens to be crooked). If you are ever unsure if text should be straightened use the grid on Photoshop to check If the text falls way out of a gridline then it probably needs straightening. If you're still unsure please ask. An easy method is to ask yourself, 'When I read the text is it distracting to read?'. Remember we are looking for straight text but it does not have to be perfect
    • watch for wigglies (Wigglies occur when the book gets bumped or moved while it is being scanned. The only way to correct this problem is to rescan the page)
    • it is recommended that there should be a black border around all four edges of the page. It's not vital to correct this problem unless there are only a few pages to correct or if all the pages need the same correction. If the latter problem is the case a batch can correct this. In either case the point is not to spend any unnecessary time correcting a purely esthetic issue)
    • watch for 'bleed through'. This occurs when the pages are really thin and the text from the other side of the page comes through and creates a 'dirty' looking image. This problem can have a significant effect on OCRing. If the problem occurs throughout the book it will most likely have to be rebatched if the bleed-through is severe enough. If there is only a small amount of pages to correct they can be done individually
    • all the images should be grayscale (except colour covers and full colour inserts). The only way to tell in Thumbs if an image isn't grayscale is that the image looks a little funny when you have the image zoomed in to full screen size (Thumbs reads indexed colour images as grayscale). To double check the image open Photoshop, go to Image>Mode> Grayscale. Grayscale should be checked off
  6. Once you've checked each individual image, select options>display>list view then scroll through the thumbnails watching the information beside each image for resolution (must be 300dpi) and any unusual file dimensions (ex. If all the images are averaging a size around 2000x3000x8, give or take a few hundred pixels, and you come across an image that's 450x1500x8, that image should be flagged).
  7. Once you have made a note of all of the problems in the book, open Photoshop and make the corrections. If you have only a few pages that need to be rescanned and both scanners are occupied ask if you can squeeze in a few scans (This gives the person scanning a book a well deserved break). If you have many images that need to be rescanned then it is better to wait for a more opportune time.
  8. If the covers of the book have not been scanned in colour yet please do so. Until this step is done the book is not finished the QA stage.
  9. If you are unable to finish your QA during your shift make sure your QA correction notes are clear if someone else will be taking over. If you will be finishing the corrections later be sure to keep your notes in a safe place such as the book itself or in the red binder.
  10. When all the corrections have been made the last step is to make sure everything is numbered properly and that there are no missing numbers, etc. NameWiz can re-number everything again (After any renumbering you can quickly check the new order in Thumbs because it will update its database immediately). Once the book has been renumbered and you are absolutely sure there are no further corrections to be made then you are finished this stage and the book is ready to be burned onto a CD.

Note: if the problem can not easily be corrected in Photoshop rescan the image. Example: Trying to erase a blemish we inadvertently caused might take 15min. to correct, rescanning the image would be faster.

Burn the edits to CD

(General Guidelines For Burning A CD).

  • Create a new data CD layout for each CD you burn (Don't save the previous layout)
  • Give the CD the appropriate name and number (Be careful) the CD numbers)
  • Avoid burning a CD over the network unless it is absolutely necessary. If you must burn over the network create an ISO image first
  • Make sure the file system is ISO9660
  • Burn the CD as a single data track and close the CD (also displayed as Disc-at-once)
  • The mode should be '1' CDROM not '2' CDROM XA
  • Select the proper write speed. Normally you would choose Max Speed supported by the burner but you might have to specify a write speed which should be the fastest based on the CD burner and/or the CD-R (Generally 4x-8x is used)
  • You don't have to use the file navigation system on the CD burner software. You can drag and drop the images you need to burn into the appropriate area
  • Make sure the file names are no more than eight characters (not including the file extension). When you select ISO9660 this limits the filename size to eight characters, anything beyond that gets truncated and could potentially scramble the filenames and their order
  • If you ever get a buffer under-run error or the software crashes, restart the computer to clear the buffer

Web Processing

  1. Create 5 folders in the book's directory. Call them a, b, c, z and uncompressed (the uncompressed files will be used for OCRing later).
  2. Open Photoshop. If the History/Actions window is not already open, click Window > Show Actions to open it. Create a new folder by clicking on the folder icon at the bottom of the Actions window. Name it with your book code and the process. For example abc_web or ddd_batch). Click on file icon to create an action.
  3. Click on File > Open to open an edit image. Double click the magnifying glass to zoom in at 100%.
  4. Click File > Automate > Fit Image to adjust the image size. First start off with 800 pixels (width) x 600 pixels (height).
  5. Select Filter > Sharpen > Unsharp Mask to sharpen the image. You want the text on the page to be clear and easy to ready. The following settings are a guideline:
    • Amount: 125% - 150%
    • Radius: 1.0 — 1.5 pixels
    • Threshold 10-15 levels

    • Change levels if necessary, sharpening can cause the text to darken.
  6. Select File > Save As, then choose the a folder. Save the image as a JPEG. A JPEG Options menu will appear. Make sure the settings are as follows:
    • Quality: 5, Medium
    • Format: Baseline ("Standard")
    • Size: 56.6Kbps
  7. Click on History at the top of the History/Actions window. In that window, under actions, click on Open to go back to the very beginning.
  8. Go back to the actions palette.
  9. Repeat steps 4-7 with the following image sizes:
    • 1024 x 768 (b)
    • 10000 x 1000 (c)
    • 256 x 256 (z)

    • Note: the 256x256 thumbnail image can be sharpened much more than the other sizes.
  10. Click File > Save As. Choose the uncompressed folder and select TIFF in the Save As box. When a TIFF Options menu pops up, uncheck LZW Compression. This will prepare the files for the OCR process.
  11. When done, close the image only (NOT Photoshop). Stop recording the action. Click File > Automate > Batch to set up the web process. Use the following settings:
    • Set: (your folder name, e.g. abc_web)
    • Action: (your action, e.g. action 2)
    • Source: Folder
    • Choose: (select your edits folder)
    • Check box for Override Action "Open" commands
    • Destination: Folder
    • Choose: (select the book directory, e.g. C:\_law_books\abc)
    • Do not check the box for Override Action "Save in" commands
    • Errors: Log errors to file
    • Save As: (save in your book directory with a name like web_errors)

    • Click OK when done.
  12. When the web processing batch is done and you've checked to make sure the error log does not show any errors, open NameWiz and rename the files in their folders with the appropriate names.
  13. Once the files have been renamed combine all the web images into one folder then delete the empty folders (a,b,c,z)
  14. Burn the web files.

Note: There are occasions when these sizes do not apply (e.g. When we have a particularly large book with small text). If this occurs please adjust the image sizes to display the text at similar sizes as other books).

OCR

  1. Open ABBYY.
  2. Choose File>New Batch. Name the batch the same as the book code you are working on.
  3. Select Open & Read. Highlight all the images in the uncompressed folder that's within the web folder of the book you're working on. Click Open.
  4. Check to make sure all the images have been read. If any images were missed read those images.
  5. When all the images have been read select File>Save Text As and enter the following settings:
    • Save as Type: text document (*.txt)
    • Save Pages: all pages
    • File Options: Create a separate file for each page
    • Format: keep line breaks and use blank lines
  6. After closing ABBYY open NameWiz and rename files to correspond with the book code (e.g. lwg0000.txt — there is no 'e', 'a', 'b', etc…)
  7. Check the OCRs to make sure they match the images.
  8. When you are positive that all the OCRs are completed and accurate you can delete the uncompressed files.

Table of Contents

When you open up the Local History Meta Data Template (Excel) you will find four sections that you need to fill in: Administration, Descriptive, Image Metadata, and Structural Metadata.

ADMINISTRATION SHEET — see attached Administrative worksheet

Fill in as many of the fields as possible.

DESCRIPTIVE SHEET — see attached Descriptive worksheet

To get the information for this sheet you have to go the U of C Library Website (or website of the library where the book is from) and search the catalogue for the book's information. Whenever [blank] occurs, delete [blank] and leave the space empty

  1. Go to your web browser and access the U of C website www.ucalgary.ca
  2. Click on the Library Link located at the top of the page
  3. Click on the Catalogue link located at the top of the page
  4. Insert the title of the book and click title
  5. You will see a list of books, click on the book title that you want

IMAGE METADATA SHEET — see attached example Image Metadata — 11t

To complete this page:

  1. Close the table of contents that you are working on.
  2. Open up the ImageValues program.
  3. In the Source Directory select your edited images from the computer.
  4. Click on the Select Excel sheet, and chose the table you are working on.
  5. At the bottom left you will see Ready. At this point click the Start button. ImageValues should now start counting through all the images.
  6. When the book has been processed it will say Ready again. You can now re-open the table of contents.

The Filename, Image Height, Width, and CRC-32 will now contain values for the book. If the filename pages are out of order select the first five columns (not including headers), then select data from the menu, and click sort>sort by>column A>Ascending. Click ok.

Fill in missing information:

CD - Insert only the number of the CD where the edits are located (e.g. 1870) Do not enter the full name of the CD.

Page - Insert the page numbers for the book. (e.g. If page number 1 begins at image 0010 and ends at image 0232 insert the first two page numbers, highlight them and drag it down until 0232 is reached). If there are breaks in the page numbers, highlight the spaces where there are no page numbers and right click (insert>shift cells down)

Order - Enter the order of the pages. If the book was not microfiche, enter the number 1 in the first field and the number 2 in the field directly below it. Highlight the first two cells you entered and drag down to where the file codes end. If the book was microfiche the actual front of the book doesn't start until usually image 3 or 4 for example. For microfiche don't insert the number 1 in the first field (image 0000). Instead, enter the number 1 at the actual start (or cover page) of the book. Highlight the first two columns you entered and drag down to where the actual book ends, not where the file codes end (The last image of any microfiche is usually a symbol>end). Leave the fields empty for any images past where the actual book ends and for any blank pages that don't have a number associated with the page.

STRUCTURAL METADATA — see attached example of Structural Metadata — 11t

Chapter Name: - Insert whatever title headings there are (e.g. Index, Table of Contents, Chapter Heading,etc.)

Number: - Insert the Chapter number when available

Order: - Insert the order of the titles, starting with 0 for the cover

Page File Reference: - Insert the file reference to the Chapter name (e.g. lwg0004 We use only the 3 letter volume code [lwg] + the four digit counter [0000]).


Previous Page >>