Transcript E-Text

Processing PDF: How to Go from PDF to E-text to Audio

Gaeir Dietrich Director High Tech Center Training Unit of the California Community Colleges Foothill Community College District

PDF from Publishers

 Portable document format (PDF)  Reads the same on any computer  Looks like the book  Smaller than TIFFs  Contains all the text – Always check to make sure the book is the right one!

 Easy for publishers

Requesting through ATN

 Access Text Network – Now free for requesting files from ATN member publishers – – Paid membership to exchange files www.accesstext.org

 Not all publishers – But ATN does have the largest ones

Other Resources at ATN

 Accessible Textbook Finder – http://www.accesstext.org/atf.php

 Link to Publisher Lookup – http://www.publisherlookup.org/ – Will have to contact non-ATN member publishers directly

Using Publisher PDFs

 Sometimes students can use files directly  Most often files will need further processing for student use  At the very least, large files need to be broken into chapters

PDF Strengths

   Good format for large print – – – Cropping Fit to page on large pages Print sections on large pages (tiling) Adobe Reader has some nice features – – – Change colors Reflow Limited voicing Easy for most publishers to create

PDF Weaknesses

 Not always fully accessible – Screen readers do not always like them — even when they are text-based – Reading order can be problematic  May be graphics (pictures of text)  May have too much security

As an Aside…

 When faculty create PDFs… – The PDF always started as something else…usually a Word file – Try to get the starting document – Security concerns?

 Word files can be password protected  Button > Prepare > Encrypt

Types of PDF Documents

 Text-based – Text can be selected  Graphical – Picture of text (i.e., a graphic) – Text cannot be selected  Use text-select tool to tell the difference  Files may be “locked”

Processing PDFs

 Adobe Acrobat Professional  Good OCR program – Abbyy FineReader – Nuance OmniPage  IF you are a Kurzweil campus, you will also need Kurzweil

Adobe Tools

 Adobe Reader – – Free Useful for students who need minimal accessibility features – http://www.adobe.com/products/reader/  Adobe Acrobat Professional – Essential for alt media specialists – Extract text, create accessible PDFs, enabled Adobe Reader features – www.uscollegebuy.com Discounted Price

Acrobat Reader

 Reads aloud – But does not highlight or track  Enlarges text – Nice reflow feature  Changes text/background colors  Text highlighting, sticky notes, and comments  Access text-based PDFs

Process with Acrobat Pro

 Cropping  Enlargement for printing  Tiling  Combining  Some text extraction  Works with text-based PDF

Processing Graphical PDFs

 Must run optical character recognition (OCR) – – Computers cannot read pictures OCR programs recognize the “characters” in the picture  How you process the file depends on the end format the student wants!

Various Options

 OmniPage or FineReader – FineReader generally easier to learn – Save to Word or HTML or Text based on student preference  Use virtual printer with Kurzweil – Create KESI files  R&W – Save as Word

Which One When?

 Want a Word file?

– Best choice is OmniPage or FineReader  Want a Kurzweil document?

– Use Kurzweil to process the PDF  For students to do themselves?

– Whichever program they prefer

Why?

 OCR programs are designed to make extraction and editing easy  Document readers (R&W, Kurzweil, etc.) are designed to make reading easy…NOT editing.

NEVER!!!

 Do NOT run OCR with FineReader or OmniPage…save to PDF…and then take into Kurzweil, R&W, etc.

 Kurzweil, R&W, WYNN will run their own OCR on the PDF!

– Wastes time, adds error to do OCR twice

OCR Programs

 Treat PDFs the same as a TIFF – If you OCR scanned documents, use the same process  Load image file  Select zones  Create templates as needed

PDF Bottom Line

 Source files vs. end-user files – Source files = for you to create alt media from – End-user files = alt media formats  PDF – Consider PDFs as source files (files to process) that sometimes double as end user files (for certain students with limited access issues)

Resource Info

 Gaeir Dietrich  [email protected]

 408-996-6047  www.htctu.net  Alt media listserv  Manuals online