A NEW PAGE TABLE FOR 64

Download Report

Transcript A NEW PAGE TABLE FOR 64

COSC 1306 COMPUTER SCIENCE AND PROGRAMMING

Jehan François Pâris

[email protected]

CHAPTER VI FILES

Chapter Overview  We will learn how to read, create and modify files  Essential if we want to store our program inputs and results.

 Pay special attention to

pickled files

 They are very easy to use!

The file system  Provides

long term storage

of information.  Will store data in

stable storage

(disk)  Cannot be RAM because: 

Dynamic RAM

powered off 

Static RAM

loses its contents when is too expensive  System crashes can corrupt contents of the main memory

Overall organization  Data managed by the file system are grouped in

user-defined

data sets called

files

 The file system must provide a mechanism for

naming

these data  Each file system has its own set of conventions  All modern operating systems use a

hierarchical directory structure

Windows solution  Each device and each disk partition is identified by a letter  A: and B: were used by the floppy drives  C: is the first

disk partition o

f the hard drive  If hard drive has no other disk partition, D: denotes the DVD drive  Each device and each disk partition has its

own hierarchy of folders

Windows solution

C: Second disk D: Users Windows Flash drive F: Program Files

UNIX/LINUX organization  Each device and disk partition has its own directory tree  Disk partitions are glued together through the operation to form a single tree  Typical user does not know where her files are stored

UNIX/LINUX organization

Root partition

/

Other partition

bin usr

The magic mount

Second partition can be accessed as /usr

Mac OS organization  Similar to Windows  Disk partitions are not merged  Represented by separate icons on the desktop

Accessing a file (I)  Your Python programs are stored in a folder AKA directory  On my home PC it is

C:\Users\Jehan-Francois Paris\Documents\ Courses\1306\Python

 All files in that folder can be directly accessed through their names 

"myfile.txt"

Accessing a file (II)  Files in folders inside that folder —

subfolders

—can be accessed by specifying first the subfolder 

Windows style:

"test\\sample.txt"

Note the double backslash

Linux/Unix/Mac OS X style:

"test/sample.txt"

Generally works for Windows

Why the double backslash?

 The backslash is an

escape character

in Python  Combines with its successor to represent

non-printable characters

‘\n’

represents a newline 

‘\t’

represents a tab  Must use ‘

\\

’ to represent a plain backslash

Accessing a file (III)  For other files, must use full pathname 

Windows Style:

"C:\\Users\\Jehan-Francois Paris\\ Documents\\Courses\\1306\\Python\\ myfile.txt"

Accessing file contents  Two step process:  First we

open

the file  Then we access its contents 

read

write

 When we are done, we

close

the file

What happens at open() time?

 The system verifies  That you are an

authorized user

 That you have the

right permission

Read permission

Write permission

 Execute permission exists but doesn’t apply and returns a

file handle

/

file descriptor

The file handle  Gives the user  Fast direct access to the file  No folder lookups  Authority to execute the file operations whose permissions have been requested

Python open() 

open(name, mode = ‘r’, buffering = -1)

where 

name

is name of file 

mode

is

permission requested

 Default is

r

for read only 

buffering

specifies the

buffer size

Use system default value

(code -1)

The modes  Can request 

‘r’

for read-only 

‘w’

for write-only  Always overwrites the file  ‘

a

’ for append  Writes at the end 

‘r+’

or

‘a+’

for updating (read + write/append)

Examples 

f1 = open("myfile.txt")

same as

f1 = open("myfile.txt", "r")

f2 = open("test\\sample.txt", "r")

f3 = open("test/sample.txt", "r")

) f4 = open("C:\\Users\\Jehan-Francois Paris\\ Documents\\Courses\\1306\\Python\\myfile.txt"

Reading a file  Three ways:  Global reads  Line by line  Pickled files

Global reads 

fh.read()

 Returns

whole contents

of file specified by file handle

fh

 File contents are stored in a

single string

that might be very large

Example 

f2 = open("test\\sample.txt", "r") bigstring = f2.read() print(bigstring) f2.close() # not required

Output of example 

To be or not to be that is the question Now is the winter of our discontent

 Exact contents of file

‘test\sample.txt’ followed by an extra return

Line-by-line reads 

for line in fh : # do not forget the column #anything you want fh.close() # not required: Python does it

Example 

f3 = open("test/sample.txt", "r") for line in f3 : # do not forget the column print(line) f3.close() # not required

Output  To be or not to be that is the question Now is the winter of our discontent  With one or more

extra blank lines

Why?

  Each line ends with an end-of-line marker

print(…) adds

an extra end-of-line

Trying to remove blank lines 

print('----------------------------------------------------') f5 = open("test/sample.txt", "r") for line in f5 : # do not forget the column print(line[:-1]) # remove last char f5.close() # not required print('-----------------------------------------------------')

The output 

--------------------------------------------------- To be or not to be that is the question Now is the winter of our disconten -----------------------------------------------------

The last line did not end with an EOL!

A smarter solution (I) 

Only remove the last character if it is an EOL

if line[ 1] == ‘\n’ : print(line[:-1] else print line

A smarter solution (II) 

print('----------------------------------------------------') fh = open("test/sample.txt", "r") for line in fh : # do not forget the column if line[-1] == '\n' : print(line[:-1]) # remove last char else : print(line) print('-----------------------------------------------------') fh.close() # not required

It works!

 --------------------------------------------------- To be or not to be that is the question Now is the winter of our discontent -----------------------------------------------------

Making sense of file contents  Most files contain more than one data item per line  COSC 713-743-3350 UHPD 713-743-3333  Must split lines 

mystring.split(sepchar)

where

sepchar

is a separation character  returns a list of items

Splitting strings  >>> text = "Four score and seven years ago" >>> text.split() ['Four', 'score', 'and', 'seven', 'years', 'ago']  >>>record ="1,'Baker, Andy', 83, 89, 85" >>> record.split(',') [' 1', "'Baker", " Andy'", ' 83', ' 89', ' 85']

Not what we wanted!

Example

# how2split.py

print('----------------------------------------------------') f5 = open("test/sample.txt", "r") for line in f5 : words = line.split() for xxx in words : print(xxx) f5.close() # not required print('-----------------------------------------------------')

Output 

--------------------------------------------------- To be … of our discontent -----------------------------------------------------

Picking the right separator (I) 

Commas

 CSV Excel format  Values are separated by commas  Strings are stored without quotes  Unless they contain a comma  “Doe, Jane”, freshman, 90, 90  Quotes within strings are doubled

Picking the right separator (II) 

Tabs( ‘\t’)

Advantages:

 Your fields will appear nicely aligned  Spaces, commas, … are not an issue 

Disadvantage:

 You do not see them  They look like spaces

Why it is important  When you must pick your file format, you should decide how the data inside the file will be used:  People will read them  Other programs will use them  Will be used by people and machines

An exercise  Converting tab-separated data to CSV format  Replacing tabs by commas  Easy  Will use string replace function

First attempt 

fh_in = open('grades.txt', 'r') # the 'r' is optional buffer = fh_in.read() newbuffer = buffer.replace('\t', ',') fh_out = open('grades0.csv', 'w') fh_out.write(newbuffer) fh_in.close() fh_out.close() print('Done!')

The output 

Alice Bob Carol 90 85 75 90 85 75

becomes 

Alice,90,90,90,90,90 Bob,85,85,85,85,85 Carol,75,75,75,75,75 90 85 75 90 85 75 90 85 75

Dealing with commas (I)  Work line by line  For each line  split input into fields using TAB as separator  store fields into a list  Alice 90 90 90 90 becomes [‘Alice’, ’90’, ’90’, ’90’, ’90’, ’90’] 90

Dealing with commas (II)  Put within double quotes any entry containing one or more commas  Output list entries separated by commas 

['"Baker, Alice"', 90, 90, 90, 90, 90]

becomes

"Baker, Alice",90,90,90,90,90

Dealing with commas (III)  Our troubles are not over:  Must store somewhere all lines until we are done  Store them in a list

Dealing with double quotes  Before wrapping items with commas with double quotes replace  All double quotes by pairs of double quotes 

'Aguirre, "Lalo" Eduardo'

becomes

'Aguirre, ""Lalo"" Eduardo'

then

'"Aguirre, ""Lalo"" Eduardo"'

Order matters (I)  We must double the inside double quotes before wrapping the string into double quotes;  From

'Aguirre, "Lalo" Eduardo'

go to

'Aguirre, ""Lalo"" Eduardo'

then to

'"Aguirre, ""Lalo"" Eduardo"'

Order matters (II)  Otherwise;  We go from

'Aguirre, "Lalo" Eduardo'

to '

"Aguirre, "Lalo" Eduardo"'

then to

'""Aguirre, ""Lalo"" Eduardo""'

with

all

double quotes doubled

General organization (I)   linelist = [ ] # the samer file in CSV format for line in file  itemlist = line.split(…)  linestring = '' # always start with an empty line  for item in itemlist :  remove any trailing newline  double all double quotes  if item contains comma, wrap  add to linestring

General organization (II)  for line in file  …  for each item in itemlist  double all double quotes  if item contains comma, wrap  add to linestring  append linestring to stringlist

General organization (III)  for line in file  …  remove last comma of linestring  add newline at end of linestring  append linestring to stringlist  for linestring in in stringline  write linestring into output file

The program (I) 

# betterconvert2csv.py

""" Convert tab-separated file to csv """ fh = open('grades.txt','r') #input file linelist = [ ] # global data structure for line in fh : # we process an input line itemlist = line.split('\t') # print(str(itemlist)) # just for debugging linestring = '' # start afresh

The program (II) 

for item in itemlist : #we process an item item = item.replace(' " ',' "" ') # for quotes if item[-1] == '\n' : # remove it item = item[:-1] if ',' in item : # wrap item linestring += ' " ' + item +' " ' + ',' else : # just append linestring += item +',' # end of item loop

The program (III) 

# must replace last comma by newline linestring = linestring[:-1] + '\n' linelist.append(linestring) # end of line loop fh.close() fhh = open('great.csv', 'w') for line in linelist : fhh.write(line) fhh.close()

Notes  Most print statements used for debugging were removed  Space considerations  Observe that the inner loop adds a comma after each item  Wanted to remove the last one  Must also add a newline at end of each line

The input file 

Alice 90 Bob Carol 85 75 Doe, Jane 90 85 75 90 90 85 75 90 90 85 75 90 Fulano, Eduardo "Lalo" 90 90 85 75 80 90 70 90 90

The output file 

Alice,90,90,90,90,90 Bob,85,85,85,85,85 Carol ,75,75,75,75,75 "Doe, Jane",90,90,90 ,80 ,75 "Fulano, Eduardo ""Lalo""",90,90,90,90

Mistakes being made (I) 

Mixing lists and strings:

 Earlier draft of program declared 

linestring = [ ]

and did 

linestring.append(item)

Outcome was

['Alice,', '90,'. … ]

instead of 

'Alice,90, …'

Mistakes being made (II) 

Forgetting to add a newline

 Output was a single line 

Doing the append inside the inner loop:

 Output was 

Alice,90 Alice,90,90 Alice,90,90,90 …

Mistakes being made 

Forgetting that strings are immutable:

 Trying to do 

linestring[-1] = '\n'

instead of 

linestring = linestring[:-1] + '\n'

Bigger issue:

 Do we have to remove the last comma?

Could we have done better? (I)  Make the program

more readable by decomposing it into functions

 A function to process each line of input 

do_line(line)

 Input is a string ending with newline  Output is a string in CSV format  Should call a function processing individual items

Could we have done better? (II)  A function to process individual items 

do_item(item)

 Input is a string  Returns a string  With double quotes "doubled"  Without a newline  Within quotes if it contains a comma

The new program (I) 

def do_item(item) : item = item.replace(' " ',' "" ') if item[-1] == '\n' : item = item[:-1] if ',' in item : item =' " ' + item +' " ' return item

The new program (II) 

def do_line(line) : itemlist = line.split('\t') linestring = '' # start afresh for item in itemlist : linestring += do_item(item) +',' if linestring != '' and linestring[-1] == ',' : linestring = linestring [:-1] linestring += '\n' return linestring

The new program (III) 

fh = open('grades.txt','r') linelist = [ ] for line in fh : linelist.append( do_line(line )) fh.close()

The new program (IV) 

fhh = open('great.csv', 'w') for line in linelist : fhh.write(line) fhh.close()

Why it is better  Program is decomposed into small modules that are much easier to understand  Each fits on a PowerPoint slide

The break statement  Makes the program exit the loop it is in  In next example, we are looking for

first instance

of a string in a file  Can exit as soon it is found

Example (I) 

searchstring= input('Enter search string:') found = False fh = open('grades.txt') for line in fh : if searchstring in line : print(line) found = True break

Example (II) 

if found == True : print("String %s was found" % searchstring) else : print("String %s NOT found " % searchstring)

Flags  A variable like

found

 That can either be

True

or

False

 That is used in a condition for an

if

or a

while is often referred to as a flag

A dumb mistake   Unlike C and its family of languages, Python does not let you write 

if found = True

for 

if found == True

There are still cases where we can do mistakes!

Example 

>>> b = 5 >>> c = 8 >>> a = b = c >>> a 8

>>> a = b == c >>> a True

HANDLING EXCEPTIONS

When a wrong value is entered  When user is prompted for 

number = int(input("Enter a number: ")

and enters  a non-numerical string a

ValueError

exception is raised and the program terminates  Python a programs catch errors

The try… except pair (I) 

try: except Exception as ex:

 Observe  the colons  the indentation

The try… except pair (II) 

try: except Exception as ex:

 If an exception occurs while the program executes the statements between the

try

and the

except,

control is

immediately transferred

to the

statements after the except

A better example 

done = False while not done : filename= input("Enter a file name: ") try : fh = open(filename) done = True except Exception as ex: print ('File %s does not exist' % filename) print(fh.read())

An Example (I) 

done = False while not done : try : number = int(input('Enter a number:')) done = True except Exception as ex: print ('You did not enter a number') print ("You entered %.2f." % number) input("Hit enter when done with program.")

A simpler solution 

done = False while not done myinput = (input('Enter a number:')) if myinput.isdigit() : number = int(myinput) done = True else : print ('You did not enter a number') print ("You entered %.2f." % number) input("Hit enter when done with program.")

PICKLED FILES

Pickled files 

import pickle

 Provides a way to save complex data structures in a file  Sometimes said to provide a

serialized representation

of Python objects

Basic primitives (I) 

dump(object,fh)

 appends a sequential representation of

object

into file with file handle

fh

object

is virtually any Python object 

fh

is the handle of a file that must have been opened in

'wb'

mode b is a special option allowing to

write or read binary data

Basic primitives (II) 

target = load( filehandle)

 assigns to

target

next pickled object stored in file

filehandle

target

is virtually any Python object 

filehandle

id filehandle of a file that was opened in

rb

mode

Example (I) 

>>> mylist = [ 2, 'Apples', 5, 'Oranges']

>>> mylist [2, 'Apples', 5, 'Oranges']

>>> fh = open('testfile', 'wb') # b for BINARY

>>> import pickle

>>> pickle.dump(mylist, fh)

>>> fh.close()

Example (II) 

>>> fhh = open('testfile', 'rb') # b for BINARY

>>> theirlist = pickle.load(fhh)

>>> theirlist [2, 'Apples', 5, 'Oranges']

>>> theirlist == mylist True

What was stored in testfile?

 Some binary data containing the strings 'Apples' and 'Oranges'

Using ASCII format  Can require a pickled representation of objects that only contains printable characters  Must specify

protocol = 0

Advantage:

 Easier to debug 

Disadvantage:

 Takes more space

Example 

import pickle mydict = {'Alice': 22, 'Bob' : 27} fh = open('asciifile.txt', 'wb') # MUST be 'wb' pickle.dump(mydict, fh, protocol = 0) fh.close() fhh = open('asciifile.txt', 'rb') theirdict = pickle.load(fhh) print(mydict) print(theirdict)

The output 

{'Bob': 27, 'Alice': 22} {'Bob': 27, 'Alice': 22}

What is inside asciifile.txt?

(dp0VBobp1L27Ls

V

Alicep2L22Ls.

Dumping multiple objects (I) 

import pickle fh = open('asciifile.txt', 'wb') for k in range(3, 6) : mylist = [i for i in range(1,k)] print(mylist) pickle.dump(mylist, fh, protocol = 0) fh.close()

Dumping multiple objects (II) 

fhh = open('asciifile.txt', 'rb') lists = [ ] # initializing list of lists while 1 : # means forever try: lists.append(pickle.load(fhh)) except EOFError : break fhh.close() print(lists)

Dumping multiple objects (III)  Note the way we test for end-of-file (

EOF

) 

while 1 : # means forever try: lists.append(pickle.load(fhh)) except EOFError : break

The output 

[1, 2] [1, 2, 3] [1, 2, 3, 4] [[1, 2], [1, 2, 3], [1, 2, 3, 4]]

What is inside asciifile.txt?

(lp0L1LaL2La.(lp0L1LaL2LaL3La.(lp0L1LaL2 LaL3LaL4La.

Practical considerations  You rarely pick the format of your input files 

May have to do format conversion

 You often have to use specific formats for you output files 

Often dictated by program that will use them

 Otherwise

stick with pickled files

!