Tool Development for Optimized Scalable Ingestion Workflows: The Case of the Japanese American WWII Incarceration Camp Incident Cards
Download ReportTranscript Tool Development for Optimized Scalable Ingestion Workflows: The Case of the Japanese American WWII Incarceration Camp Incident Cards
Tool Development for Optimized Scalable Ingestion Workflows: The Case of the Japanese American WWII Incarceration Camp Incident Cards Developer: Magdalena Balderas| Professor: Dr. Kathy Weaver | Client: Dr. Richard Marciano Cultural institutions currently have millions to billions of objects that they are responsible for. In some cases these objects are accessible to the public, but in most cases they are not due to restrictions in resources and rules that limit their access. POSSIBILITIES If ingestion workflows can be optimized for running at scale, everyone will benefit including: cultural institutions, scholars, researchers, students, and the general public. More digital data allows for web databases to be created in which people, places, dates and other entities can be linked as shown below. This allows for a comprehensive integration of events with available data. Japanese American WWII Incarceration Camps: Tule Lake The case of the Japanese American WWII Incarceration Camp Incident Cards is an interesting one. The cards were initially accessible but were removed from public access in the fall of 2014. Updates in the access policy in 2015 now allow them to be accessed again provided they do not reference minors and can be automatically appraised. The Digital Curation Innovation Center (DCIC) established a partnership with the National Archives and helped fund the recent digitization of the camp incident cards and their automated appraisal. Challenges Faced: • Unobtainable data in the designated timeline • Lack of funding • Time constraints • Uncertainty of findings and possibilities Interment Card, Box 8 Tule Lake 0307 Provided by the National Archives and Records Administration 1943, “RIOT” INCIDENT CARDS Image retrieved from www.vintag.es Provided by the National Archives and Records Administration Successes of the project: • Comprehensive understanding and description of the ingestion workflow when it comes to cultural institutions and objects • Comprehensive analysis of how commercial and open source software do not meet the requirements necessary to complete the ingestion process for cultural institutions. • Creation of a clear vision of future research and software development necessities within the field of digital curation in terms of automated ingestion especially related to data from cultural institutions. Box 12 Box15 Box 12 INGESTION WORKFLOW Box 12 Cultural Object Digitization • Provided by National Archives and Records Administration (NARA) Scan Lab • • • • Optical Character Recognition (OCR) KoFax Express** Tesseract Cuneiform Linux ABBYY FineReader • • • • Text Extraction PDF2Text ** Zilla PDF PDF to Text TextfromPDF Name Entity Recognition (NER) • Alchemy API • OpenNLP • Stanford NER • OpenCalais Ingestion into Database Box 12