阅读(4373) 赞(10)

文件(2)

2016-02-24 15:48:26 更新

上一节，对文件有了初步认识。要牢记，文件无非也是一种类型的数据。

文件的状态

很多时候，我们需要获取一个文件的有关状态（也称为属性），比如创建日期，访问日期，修改日期，大小，等等。在os模块中，有这样一个方法，专门让我们查看文件的这些状态参数的。

>>> import os
>>> file_stat = os.stat("131.txt")      #查看这个文件的状态
>>> file_stat                           #文件状态是这样的。从下面的内容，有不少从英文单词中可以猜测出来。
posix.stat_result(st_mode=33204, st_ino=5772566L, st_dev=2049L, st_nlink=1, st_uid=1000, st_gid=1000, st_size=69L, st_atime=1407897031, st_mtime=1407734600, st_ctime=1407734600)

>>> file_stat.st_ctime                  #这个是文件创建时间
1407734600.0882277

这是什么时间？看不懂！别着急，换一种方式。在python中，有一个模块time，是专门针对时间设计的。

>>> import time                         
>>> time.localtime(file_stat.st_ctime)  #这回看清楚了。
time.struct_time(tm_year=2014, tm_mon=8, tm_mday=11, tm_hour=13, tm_min=23, tm_sec=20, tm_wday=0, tm_yday=223, tm_isdst=0)

read/readline/readlines

上节中，简单演示了如何读取文件内容，但是，在用dir(file)的时候，会看到三个函数：read/readline/readlines，它们各自有什么特点，为什么要三个？一个不行吗？

在读者向下看下面内容之前，请想一想，如果要回答这个问题，你要用什么方法？注意，我问的是用什么方法能够找到答案，不是问答案内容是什么。因为内容，肯定是在某个地方存放着呢，关键是用什么方法找到。

搜索？是一个不错的方法。

还有一种，就是在交互模式下使用的，你肯定也想到了。

>>> help(file.read)

用这样的方法，可以分别得到三个函数的说明：

read(...)
    read([size]) -> read at most size bytes, returned as a string.

    If the size argument is negative or omitted, read until EOF is reached.
    Notice that when in non-blocking mode, less data than what was requested
    may be returned, even if no size parameter was given.

readline(...)
    readline([size]) -> next line from the file, as a string.

    Retain newline.  A non-negative size argument limits the maximum
    number of bytes to return (an incomplete line may be returned then).
    Return an empty string at EOF.

readlines(...)
    readlines([size]) -> list of strings, each a line from the file.

    Call readline() repeatedly and return a list of the lines so read.
    The optional size argument, if given, is an approximate bound on the
    total number of bytes in the lines returned.

对照一下上面的说明，三个的异同就显现了。

EOF什么意思？End-of-file。在维基百科中居然有对它的解释：

In computing, End Of File (commonly abbreviated EOF[1]) is a condition in a computer operating system where no more data can be read from a data source. The data source is usually called a file or stream. In general, the EOF is either determined when the reader returns null as seen in Java's BufferedReader,[2] or sometimes people will manually insert an EOF character of their choosing to signal when the file has ended.

明白EOF之后，就对比一下：

read：如果指定了参数size，就按照该指定长度从文件中读取内容，否则，就读取全文。被读出来的内容，全部塞到一个字符串里面。这样有好处，就是东西都到内存里面了，随时取用，比较快捷；“成也萧何败萧何”，也是因为这点，如果文件内容太多了，内存会吃不消的。文档中已经提醒注意在“non-blocking”模式下的问题，关于这个问题，不是本节的重点，暂时不讨论。
readline：那个可选参数size的含义同上。它则是以行为单位返回字符串，也就是每次读一行，依次循环，如果不限定size，直到最后一个返回的是空字符串，意味着到文件末尾了(EOF)。
readlines：size同上。它返回的是以行为单位的列表，即相当于先执行readline()，得到每一行，然后把这一行的字符串作为列表中的元素塞到一个列表中，最后将此列表返回。

依次演示操作，即可明了。有这样一个文档，名曰：you.md，其内容和基本格式如下：

You Raise Me Up When I am down and, oh my soul, so weary; When troubles come and my heart burdened be; Then, I am still and wait here in the silence, Until you come and sit awhile with me. You raise me up, so I can stand on mountains; You raise me up, to walk on stormy seas; I am strong, when I am on your shoulders; You raise me up: To more than I can be.

分别用上述三种函数读取这个文件。

>>> f = open("you.md")
>>> content = f.read()
>>> content
'You Raise Me Up\nWhen I am down and, oh my soul, so weary;\nWhen troubles come and my heart burdened be;\nThen, I am still and wait here in the silence,\nUntil you come and sit awhile with me.\nYou raise me up, so I can stand on mountains;\nYou raise me up, to walk on stormy seas;\nI am strong, when I am on your shoulders;\nYou raise me up: To more than I can be.\n'
>>> f.close()

提示：养成一个好习惯，只要打开文件，不用该文件了，就一定要随手关闭它。如果不关闭它，它还驻留在内存中，后面又没有对它的操作，是不是浪费内存空间了呢？同时也增加了文件安全的风险。

注意：在python中，'\n'表示换行，这也是UNIX系统中的规范。但是，在奇葩的windows中，用'\r\n'表示换行。python在处理这个的时候，会自动将'\r\n'转换为'\n'。

请仔细观察，得到的就是一个大大的字符串，但是这个字符串里面包含着一些符号\n，因为原文中有换行符。如果用print输出这个字符串，就是这样的了，其中的\n起作用了。

>>> print content
You Raise Me Up
When I am down and, oh my soul, so weary;
When troubles come and my heart burdened be;
Then, I am still and wait here in the silence,
Until you come and sit awhile with me.
You raise me up, so I can stand on mountains;
You raise me up, to walk on stormy seas;
I am strong, when I am on your shoulders;
You raise me up: To more than I can be.

用readline()读取，则是这样的：

>>> f = open("you.md")
>>> f.readline()
'You Raise Me Up\n'
>>> f.readline()
'When I am down and, oh my soul, so weary;\n'
>>> f.readline()
'When troubles come and my heart burdened be;\n'
>>> f.close()

显示出一行一行读取了，每操作一次f.readline()，就读取一行，并且将指针向下移动一行，如此循环。显然，这种是一种循环，或者说可迭代的。因此，就可以用循环语句来完成对全文的读取。

#!/usr/bin/env python
# coding=utf-8

f = open("you.md")

while True:
    line = f.readline()
    if not line:         #到EOF，返回空字符串，则终止循环
        break
    print line ,         #注意后面的逗号，去掉print语句后面的'\n'，保留原文件中的换行

f.close()                #别忘记关闭文件

将其和文件"you.md"保存在同一个目录中，我这里命名的文件名是12701.py，然后在该目录中运行python 12701.py，就看到下面的效果了：

~/Documents$ python 12701.py 
You Raise Me Up
When I am down and, oh my soul, so weary;
When troubles come and my heart burdened be;
Then, I am still and wait here in the silence,
Until you come and sit awhile with me.
You raise me up, so I can stand on mountains;
You raise me up, to walk on stormy seas;
I am strong, when I am on your shoulders;
You raise me up: To more than I can be.

也用readlines()来读取此文件：

>>> f = open("you.md")
>>> content = f.readlines()
>>> content
['You Raise Me Up\n', 'When I am down and, oh my soul, so weary;\n', 'When troubles come and my heart burdened be;\n', 'Then, I am still and wait here in the silence,\n', 'Until you come and sit awhile with me.\n', 'You raise me up, so I can stand on mountains;\n', 'You raise me up, to walk on stormy seas;\n', 'I am strong, when I am on your shoulders;\n', 'You raise me up: To more than I can be.\n']

返回的是一个列表，列表中每个元素都是一个字符串，每个字符串中的内容就是文件的一行文字，含行末的符号。显而易见，它是可以用for来循环的。

>>> for line in content:
...     print line ,
... 
You Raise Me Up
When I am down and, oh my soul, so weary;
When troubles come and my heart burdened be;
Then, I am still and wait here in the silence,
Until you come and sit awhile with me.
You raise me up, so I can stand on mountains;
You raise me up, to walk on stormy seas;
I am strong, when I am on your shoulders;
You raise me up: To more than I can be.
>>> f.close()

读很大的文件

前面已经说明了，如果文件太大，就不能用read()或者readlines()一次性将全部内容读入内存，可以使用while循环和readlin()来完成这个任务。

此外，还有一个方法：fileinput模块

>>> import fileinput
>>> for line in fileinput.input("you.md"):
...     print line ,
... 
You Raise Me Up
When I am down and, oh my soul, so weary;
When troubles come and my heart burdened be;
Then, I am still and wait here in the silence,
Until you come and sit awhile with me.
You raise me up, so I can stand on mountains;
You raise me up, to walk on stormy seas;
I am strong, when I am on your shoulders;
You raise me up: To more than I can be.

我比较喜欢这个，用起来是那么得心应手，简洁明快，还用for。

对于这个模块的更多内容，读者可以自己在交互模式下利用dir()，help()去查看明白。

还有一种方法，更为常用：

>>> for line in f:
...     print line ,
... 
You Raise Me Up
When I am down and, oh my soul, so weary;
When troubles come and my heart burdened be;
Then, I am still and wait here in the silence,
Until you come and sit awhile with me.
You raise me up, so I can stand on mountains;
You raise me up, to walk on stormy seas;
I am strong, when I am on your shoulders;
You raise me up: To more than I can be.

之所以能够如此，是因为file是可迭代的数据类型，直接用for来迭代即可。

seek

这个函数的功能就是让指针移动。特别注意，它是以字节为单位进行移动的。比如：

>>> f = open("you.md")
>>> f.readline()
'You Raise Me Up\n'
>>> f.readline()
'When I am down and, oh my soul, so weary;\n'

现在已经移动到第四行末尾了，看seek()的能力：

>>> f.seek(0)

意图是要回到文件的最开头，那么如果用f.readline()应该读取第一行。

>>> f.readline()
'You Raise Me Up\n'

果然如此。此时指针所在的位置，还可以用tell()来显示，如

>>> f.tell()
17L
>>> f.seek(4)

f.seek(4)就将位置定位到从开头算起的第四个字符后面，也就是"You "之后，字母"R"之前的位置。

>>> f.tell()
4L

tell()也是这么说的。这时候如果使用readline()，得到就是从当前位置开始到行末。

>>> f.readline()
'Raise Me Up\n'
>>> f.close()

seek()还有别的参数，具体如下：

seek(...) seek(offset[, whence]) -> None. Move to new file position.

Argument offset is a byte count. Optional argument whence defaults to 0 (offset from start of file, offset should be >= 0); other values are 1 (move relative to current position, positive or negative), and 2 (move relative to end of file, usually negative, although many platforms allow seeking beyond the end of a file). If the file is opened in text mode, only offsets returned by tell() are legal. Use of other offsets causes undefined behavior. Note that not all file objects are seekable.

whence的值：

默认值是0，表示从文件开头开始计算指针偏移的量（简称偏移量）。这是offset必须是大于等于0的整数。
是1时，表示从当前位置开始计算偏移量。offset如果是负数，表示从当前位置向前移动，整数表示向后移动。
是2时，表示相对文件末尾移动。

← 文件(1)

迭代 →

文件(2)

文件的状态

read/readline/readlines

读很大的文件

seek

推荐文章

推荐教程

最新教程