There are 10 types of people. Those who understand binary

Download Report

Transcript There are 10 types of people. Those who understand binary

There are 10 types of people.
Those who understand binary and
those who don’t.
We got the Vocab.







Hardware
Software
Server
Storage types
Download & Upload (FTP, SSH)
The Internet (Browsers, ISP, LAN, POP, NAP, routers, backbones)
Languages/Code (binary, unicode, hex, ASCII, SGML, HTML,
XHTML, DHTML, CSS, PHP, Java, Flash, XML, XSL)
 File formats (rtf, GIF, TIFF, JPEG, MP3, MP4, WAV, dtd)
 Random TLAs (NEH, ODH, ALA, ISO, W3C, TEI, GIS)
 “Deep Freeze” (Security Software; Terminals on campus erase all
files/programs on reboot that are not part of the original image)
Hardware--external physical equipment
The physical computer (tower, monitor,
laptop)
Printer
External storage device
Early Computers
 The first computers
filled rooms, such as
ENIAC pictured
here. ENIAC was
developed by the
U.S. military in the
early 1950s.
Early PC created in the 1980s
Software--inside the computer
 Operating System
 The infrastructure that manages and coordinates
files/programs
 MS-DOS, Windows, OS X
 Programs/applications
 MS Office (including Word), Adobe Suite
(Photoshop, Dreamweaver)
 Open Source v. Proprietary
 “Off the Shelf” v. “In-House”
The mythic “Server”
 Often refers to hardware in a specified location
(frequently part of a server farm or cluster) that is used
for web site delivery, data storage, and/or delivery of
multimedia files (streaming video server).
 Has its own operating system (such as Unix) and
software (such as Apache) to “serve” customers.
 Files must be uploaded from a PC to the server using
FTP, SSH or “Fetch” to appear on the Internet.
 Generally, UND’s servers are maintained by ITSS.
Early Storage Mediums
Punch card
Magnetic Tape
Floppy discs
8 inch (above)
5 inch (above)
3.5 inch (below)
8 inch/3.5 (below)
How data is stored.
 Punch cards & magnetic tape stored information sequentially
(linear)
 Floppies (and later CDs, Jump Drives) have random access (can
jump to the file you want in any order)
 RAM (random access memory) bits (a binary digit; base 2
comprised of zeros and ones) of information are stored in a
memory cell with a specific address on the hard drive. 8 bits=1
byte
 ASCII (American Standard Code for Information Interchange),
Unicode, hexidecimal (hex) are ways of translating our alphabets
into binary computer language to be stored.
 Databases (groups of related data with fields, such as MS Access
or MySQL) NB Excel is NOT a database app; it is a spreadsheet.
Where data is stored.
 External storage (jump drive, CD, DVD,
external hard drive)--something that
connected to and able to be removed from the
PC. Now usually connected through USB port.
 On your local hard drive (inside your personal
computer and it’s only available on your
personal computer)
 “On the server” files either saved or uploaded
to a designated server space.
How the Internet works…basically.
 Enter URI (Universal Resource Identifiers) aka URL (Universal
Resource Locator) into your Browser
 Browser (IE, Netscape, Mozilla, Firefox, Safari, Sea Monkey) on
a local computer displays/reads internet files.
 Browser/local computer is connected via modem, airport,
ethernet card, or cable to a local area network (LAN) or Internet
service provider (ISP). Can also access through a VPN (Virtual
Private Network, which is set up for a security measure).
 LAN or ISP is connected to a larger network known as a Network
Access Point (NAP)
 Through routers and “backbones” (fiber optic cables) information
is delivered/routed to its destination
Languages & Codes
 Standard Generalized Markup Language (SGML) (1986), established by
International Organization for Standardization (ISO) (est. 1946)
 Hypertext Markup Language (HTML) (1991); basic structure language
(static)
 Cascading Style Sheets (CSS) presentation definition language (how a
page should look)
 JavaScript (Java) client side object scripting language that is often used
to run small applications (roll overs, pop ups, site counts, date, etc.)
 Flash--Adobe software that allows for multimedia/interactive displays
 DHTML (dynamic HTML); animated or interactive using CSS, Java, or
Flash (among others)
 Hypertext Preprocessor (PHP); server side scripting language used to
create dynamic web pages (translates and performs functions faster and
outputs to HTML)
Languages & Codes, part Deux
 Extensible HTML (XHTML); well formed and valid
HTML (lower case, open and close tags); current
standard through W3C (World Wide Web Consortium)
 Extensible Markup Language (XML) derived from
SGML
 Extensible Style sheet Language (XSL sometimes
XSLT) (used to transform XML files into something
else, often used to export XML to XHTML)
 Document Type Declaration (.dtd file) file that
contains the rules and syntax for XML.
File Formats
 .rtf (Rich Text Format) cross platform file that enables .doc files to be read by
programs not created by Microsoft
 .pdf (Adobe) Portable Document Format for document exchange (usually not
interactive)
 GIF (Graphics Interchange Format) bit map image file format
 JPEG (Joint Photographic Experts Group) Compressed image file format for Web
delivery
 TIFF (Tagged Image File Format) high resolution image file (archival standard)
 .dpi (Dots per inch) relates to the resolution/digital image quality (600 dpi is
currently the archival standard)
 MP3 digital audio file format (compressed)
 MP4 digital audio/visual file format (used for streaming video web delivery)
 WAV (WAVE) uncompressed waveform audio format (generally thought of as
archival standard for A/V files)
TLAs (Three letter acronyms)
 NEH (National Endowment for the Humanities)
 ODH (Office of Digital Humanities, sub-division of NEH)
 ALA (American Library Association)
 ISO (International Organization for Standardization)
 W3C (World Wide Web Consortium)
 TEI (Text Encoding Initiative)
 GIS (Geographic/Geospatial Information System)
TAKE FIVE
Dr. William Caraher to speak
next.
First Digital Literature Project
Unlike many other interdisciplinary experiments, humanities computing has a very well-known
beginning. In 1949, an Italian Jesuit priest, Father Roberto Busa, began what even to this day is a
monumental task: to make an index verborum of all the words in the works of St Thomas Aquinas
and related authors, totaling some 11 million words of medieval Latin. Father Busa imagined that
a machine might be able to help him, and, having heard of computers, went to visit Thomas J.
Watson at IBM in the United States in search of support (Busa 1980). Some assistance was
forthcoming and Busa began his work. The entire texts were gradually transferred to punched
cards and a concordance program written for the project. The intention was to produce printed
volumes, of which the first was published in 1974 (Busa 1974).A purely mechanical concordance
program, where words are alphabetized according to their graphic forms (sequences of letters),
could have produced a result in much less time, but Busa would not be satisfied with this. He
wanted to produce a "lemmatized" concordance where words are listed under their dictionary
headings, not under their simple forms. His team attempted to write some computer software to
deal with this and, eventually, the lemmatization of all 11 million words was completed in a
semiautomatic way with human beings dealing with word forms that the program could not
handle. Busa set very high standards for his work. His volumes are elegantly typeset and he would
not compromise on any levels of scholarship in order to get the work done faster.
http://www.digitalhumanities.org/companion/
Historical conference on Dig Lit.
In 1964, IBM organized a conference at Yorktown Heights. The
subsequent publication, Literary Data Processing Conference Proceedings,
edited by Jess Bessinger and Stephen Parrish (1965), almost reads
like something from twenty or so years later, except for the reliance
on punched cards for input. Papers discuss complex questions in
encoding manuscript material and also in automated sorting for
concordances where both variant spellings and the lack of
lemmatization are noted as serious impediments. Became the first
of a regular series of conferences on literary and linguistic computing
and the precursor of what became the Association for Literary and
Linguistic Computing/Association for Computers and the Humanities
(ALLC/ACH)
http://www.digitalhumanities.org/companion/
First Projects--literature & linguistics
 Frequently focused on concordances
 quantitative approaches to style and authorship studies
 Computers and the Humanities began publication in 1966
under the editorship of Joseph Raben.
 Centre for Literary and Linguistic Computing in
Cambridge in established in 1963
 Oxford Text Archive (OTA) est. 1976
 Medieval and Classical texts were among the first
collaborative digital collections.
 Unicode invented in the 1980s to content with character
representation issues
Exemplary Digital History and
Literature Collections
 Journals of Lewis and Clark (U of Nebraska,
Lincoln)
 Eyes on the Prize (WashU)
 The Revised Dred Scott Collection (WashU)
 Walt Whitman Archive (U of Nebraska,
Lincoln)
 University of Virginia E-text Center
Nuremberg…
 Harvard Collection (images, full text, searchable, uses
in-house database, IMT cases Medical, Milch and
Pohl)
 Yale Collection (Avalon Project) only the text (HTML)
from the Blue Series (IMT) and Red Series (vols. 1-4
Nazi Aggression). No images, not searchable.
 U of MO, KC excerpts from various trials, in HTML
without page images and not searchable.
How our project is different.
 A case/theme that hasn’t been done anywhere
else.
 Full text in P5 XML (which will be fully
searchable and built according to international
standards)
 Page images
 Critical apparatus/maps to enhance the
information, particularly to be used by
instructors.
Stage 1--Transcription
 Transcribe the text of the transcripts
 NOT a facsimile, only need to maintain paragraph
breaks/page breaks
 Maintain spelling (if it is “wrong” add [sic] after the
word)
 DO NOT maintain hyphenated word breaks
 Save file as .rtf (to avoid MS Word proprietary codes)
 Images of the transcripts to be transcribed are
available online OR you can go to Special Collections
to view the original/request photocopies