Python-開啟檔案

Python

開啟檔案

基本常見檔案類型

二進位檔
文字檔
CSV
XML
JSON
html
excel
word
圖片
音源
影片

以下是處理該類型檔案對應函式或模組
這邊內建函數的意思是讀取之後能直接處理。

檔案類型	內建函數	標準模組	非標準模組
二進位檔	open()	None	-
文字檔	open()	None	-
CSV	None	csv	-
XML	None	xml	-
JSON	None	json	-
html	None	html	-
excel	None	None	非 windows excel api windows excel api
word	None	None	非 windows word api windows word api
圖片	None	None	pypng
音源	None	wave	-
影片	None	None	moviepy

二進位檔程式碼範例

二位元的定義在Python官網的資料型態沒有定義，但是還是可以使用的需要用函式轉換才能夠出現，分別用bytes、bytearray兩種，在使用前可以先盡到直譯器上，用help指令查訊該function的功能，以下是其內容。
在前面先講bytes跟bytearray用法，後續再講數字、字串轉成bytes的方法，最後才是進行二進位檔案讀寫。

bytes

再把資料料轉換時輸入內容可分成一下種類

整數
字串
可迭代資料：迭代內容一定要是數字
buffer：這邊不是示範，因為寫python沒用過


bytes(1)

bytes(2)

bytes(3)

bytes(4)

bytes('str'.encode('utf8'))


bytes('str'.encode('ascii'))

bytes([0,1,255])

bytes((2))

bytes((1,2))

bytes({1,2})

程式碼說明

bytes()中代入數字是告知bytes數量，如：bytes(1)就是一個bytes量
bytes()中代入字串時，附加編碼格式，才能夠實行，轉換出來該編碼的bytes
bytes()中代入list、tuple、set，是可以直接轉換成bytes，但一個數字用tupple代入會跟直接代入數字效果一樣，在數字上只能紀錄0-255，2^8 = 8 bits = 1 bytes。

bytearray

再把資料料轉換時輸入內容可分成一下種類

整數
字串
可迭代資料：迭代內容一定要是數字
buffer：這邊不是示範，因為寫python沒用過


bytearray

(1)
bytearray(2)
bytearray(3)
bytearray(4)

bytearray('str'.encode('utf8'))

bytearray('str'.encode('ascii'))
bytearray([0,1,255])
bytearray((2))
bytearray((1,2))
bytesarray({1,2})

程式碼說明

bytearray()中代入數字是告知bytes數量，如：bytes(1)就是一個bytes量
bytearray()中代入字串時，附加編碼格式，才能夠實行，轉換出來該編碼的bytes
bytearray()中代入list、tuple、set，是可以直接轉換成bytes，但一個數字用tupple代入會跟直接代入數字效果一樣，在數字上只能紀錄0-255，2^8 = 8 bits = 1 bytes。

bytes VS bytesarray

這兩種是不同的資料型態，轉換相當簡單，要轉換成bytes時，直接使用bytes轉換成bytes，而要轉換成bytesarray，則直接使用bytearray轉換成bytearray，此外在字串前面加上b，就是bytes資料型態。


b'string'

bytearray(b'striing')

bytes(bytearray(b'string'))

整數轉bytes

(-3).to_bytes(3,'big',signed=True)

(3).to_bytes(3,'big',signed=True)

(3).to_bytes(3,'big',signed=False)

(-3).to_bytes(4,'little',signed=True)

(3).to_bytes(4,'little',signed=True)

整數轉換成bytes直接在整數後方加上.to_bytes即可，其參數依序意義為資料長度、順序、正負號，資料長度輸入1為1個bytes，順序分成big以及little，big為數字大在在右，little為數自小在左，正負號為是否在bytes中紀錄正負號，如為true則紀錄，false則不紀錄，如一開始數字為負數則一定要開起。

字串轉bytes和bytes轉字串


str1 = 'string'.encode('utf8')
str1.decode('utf8')

str2 = 'string'.encode('ascii')

str2.decode('ascii')

將字串轉換成以其他編碼方式的二進制紀錄，雖然字串預設是以utf8進行編碼，但是經過encode編碼過後的utf8二進制形式跟一開始的utf8形式是不同的。
在編碼格式上，utf8以及ascii為例子，encode是編碼，decode是解碼。

二進位檔處理

在處理二進位檔時，使用open()的內建函數處理即可，以下是它開啟檔案的幾種形式。

讀取：r
寫入：w
附加：a
二進位檔：b
加號：+

程式碼範例：

寫入

with open('file.bin', 'wb') as f_write:

f_write.write(b'123\n')

f_write.close()

讀取

with open('file.bin', 'rb') as f_read:

f_read.read()

f_read.close()

with open('file.bin', 'rb') as f_read:

for line in f_read:

print(line)

f_read.close()

# for line in f_read 跟for line in f_read.readline()意義一樣

with open('file.bin', 'rb') as f_read.readlines():

for line in f_read:

print(line)

f_read.close()

程式碼說明

在處理檔案的過程中，一定有開啟檔案、讀取檔案內容、關閉檔案的散個過程，而這個過程的概念是，佔用特定資源、對資源進行處理、釋放資源，這個概念一定要有，因為同一個檔案進行重複開啟，會讓程式會有很大的問題。

在寫入時，使用w這個關鍵字意思是write的意思，b則是二進制的意思。
在讀取時，使用r這個關鍵字意思是read的意思，b則是二進制的意思。

這邊用with這種方式開啟，在前面的文章中，並沒有提到，原因是這種寫法並不是每個程式語言都有的，with在運作的時候，是為了避免後續腳本中，出現了錯誤時沒有釋放檔案的問題所產生出來的寫法，因此在寫的時候建議使用這種寫法，f = open('file.bin', 'wb')這種寫法也是可行的。

as是把open的資料存入as後面的變數名稱。

當然如果覺得想要嘗試使用f = open()的寫法也是可以的，只是在正常開發複雜的程式的時候，會常常因為一些沒想到的原因出錯。

在開啟檔案形式中有五個關鍵字，其中會有疑問的應該是w、r、a、+，這幾個形式，加號用來連結兩個以上的形式，如：wb+或w+b，但其實沒有加也是能夠執行的，w為wrter，意思寫入檔案的意思，但是會看檔案存在與否，如不存在創造，存在則覆蓋內容，r則是讀取，a是附加資料，就是在原本檔案後面加上新的資料，如檔案不存在，則建立一個新的。

題外話，二進位檔python在應用上有一個叫pickle的方法，有興趣可以研究。

文字檔處理

程式碼範例：

寫入

with open('file.txt', 'w') as f_write:

f_write.write('123\n')

f_write.close()

讀取

with open('file.txt', 'r') as f_read:

f_read.read()

f_read.close()

with open('file.txt', 'r) as f_read:

for line in f_read:

print(line)

f_read.close()

# for line in f_read 跟for line in f_read.readline()意義一樣

with open('file.txt', 'r') as f_read.readlines():

for line in f_read:

print(line)

f_read.close()

程式碼說明

基本上跟二進制的處理是相同的，只差在b而已，此外open函數還可以決定開啟時的編碼形式，open('file.bin', 'wb', encode='UTF-8')，所以遇到不同的編碼型式也可以讀取。

CSV處理

csv處理的時候，基本上可以用內建函數open來處理以及用csv標準函式庫來處理，而csv函式庫是對進行處理csv檔案的優化，但真要談哪部分的優化，我也是不太清楚的，畢竟沒有必要去詳細研究csv檔案處理的優化，除非今天該語言沒有一個好的處理方式，而需要自己寫一個新的函式庫，但我還是會就用open做基本處理的說明。

CSV檔案說明連結。
在範例說明的時候，會先使用csv函式庫的方式寫入檔案，再用字串處理的方式讀取一次，最後再用csv的方式讀取一次，用來比較兩者差異。

CSV程式碼範例

csv寫入


import csv

with open('test.csv', 'w', newline='') as csv_write:
writer = csv.writer(csv_write)
writer.writerow(['title_A', 'title_B', 'title_C'])
writer.writerow(['A1', 'B1', 'C1'])

   writer.writerow(['A2', 'B2', 'C2'])

程式碼說明

用with跟open開啟檔案後，用csv.writer定義writer，而writer的writerow()進行寫入，一次一行。

用程式開啟test.csv檔案來看

讀取內容並字串，字串處理

with open('test.csv', 'r', newline='') as csv_read:
str = csv_read.read()
str_csv = []
str_l = str.split('\r\n')
for s in str_l:
if s != '':
str_csv.append(s.split(','))

for s in str_csv:
print(s)

程式碼說明

用read()讀取全部的資料，讀取完之後，用split進行分割，首先對分行符號進行分割\r\n，再來對逗號進行分割，並附加到str_csv串列中，串列需先行定義，而在分割\r\n過程中，會產生空的字串，所以在進行逗號分割時，如果遇到空的字串則忽略。

csv讀取，使用csv

import csv
with open('test.csv', 'r', newline='') as csv_read:
rows = csv.reader(csv_read)

for s in rows:
print(s)

程式碼說明

使用csv.reader去讀取資料，並定義到rows中，再用for去print出來。

csv-DictReader

import csv
with open('test.csv', 'r', newline='') as csv_read:
rows = csv.Dictreader(csv_read)

for s in rows:
print(s)

在顯示的時候連同標題一起顯示。
讀取格式有csv.excel、csv.excel_tab、csv.unix_dialect

XML程式碼範例

xml檔案說明
python上處理xml有很多種方式，如下，範例程式介紹etree、DOM Parser、SAX Parser，

xml.etree.ElementTree：Python推薦方法
xml.dom：使用DOM Parser，底下有兩種方法
xml.dom.minidom：
xml.dom.pulldom：
xml.sax：使用SAX Parser
xml.parsers.expat：

etree程式碼：

建立xml


import xml.etree.ElementTree as ET

root = ET.Element('root')
doc = ET.SubElment(root, 'doc')
doc2 = ET.SubElment(root, 'doc2')
ET.SubElement(doc, 'filed1', a='a', b='b', c='c').text = 'value1'
ET.SubElement(doc, 'filed2', a='a', b='b', c='c').text = 'value2'

ET.SubElement(doc2, 'filed1', a='a', b='b', c='c').text = 'value3'
ET.SubElement(doc2, 'filed2', a='a', b='b', c='c').text = 'value4'
ET.SubElement(doc2, 'filed2', attrib={"id":1}).text = 'value5'

tree.write('filename.xml')

程式碼說明

匯入ElementTree，並命名為ET
用root變數存根資訊，根資訊用Element進行定義

讀取xml

import xml.etree.cElementTree as ET
tree = ET.parse('filename.xml')
tree.getroot()
for i in tree.iter():
for element in i:
print(element.tag)
print(element.text)

修正xml

for i in tree.iter('field1'):
i.text = 'modify'
tree.write('filename.xml')

`刪除特定節點`

doc = tree.find('doc')

doc.remove(doc.find('field2'))

tree.write('filename.xml')

搜尋結點

doc = tree.find('doc')

docs = tree.findall('doc')

在刪除特定節點時，必須確定變數的記憶體位置是否跟tree的定義的位置一樣，不然在刪除的時候，會出現刪錯記憶體的內容。

dom程式碼：

在測試時發現有bug，會因為xml排版方式造成，資料無法顯示。

測試環境macOS High Sierra，python2.7.15、python3.6.5。

建立xml

讀取xml

修正xml
刪除特定節點
搜尋結點

SAX程式碼：

需要先寫出handler，之後再定義parser時，把handler帶入。

讀取xml


import xml.sax

class Handler( xml.sax.ContentHandler ):
def __init__(self):
self.CurrentData = ""
self.type = ""

def startElement(self, tag, attributes):
self.CurrentData = tag
if self.CurrentData == 'doc':
print('****doc****')
def endElement(self, tag):
print(self.type)
def characters(self, content):
if self.CurrentData == 'field1':
self.type = content
elif self.CurrentData == 'field2':
self.type = content

if ( __name__ == "__main__"):
parser = xml.sax.make_parser()
parser.setFeature(xml.sax.handler.feature_namespaces, 0)
Handler = Handler()
parser.setContentHandler( Handler )

   parser.parse("filename.xml")

程式碼說明

使用handle方式處理，在使用的時候需要先定義怎樣的標籤，該怎樣處理，定義完成後，再將需要處理的內容給parser分析。

JSON程式碼範例

Json檔案說明

建立Json


import json

data = [{'dataType':'person','content':{'name':'a','sexual':'boy'}},{'dataType':'person','content':{'name':'b'}}]

data_json = json.dumps(data)
with open('text.json', 'w') as f_write:
f_write.write(data_json)

讀取Json

import json
with open('text.json', 'r') as f_read:

data_json = f_read.read(data_json)
data = json.load(data_json)
print(i)

程式碼說明

匯入模組後，用dupms轉乘json格式或用load模式轉乘python的形式，並用open進行讀寫。

html程式碼範例

html檔案說明

讀取html


from html.parser import HTMLParser


class MyHTMLParser(HTMLParser):
   def handle_starttag(self, tag, attrs):
      print("Encountered a start tag:", tag)
   def handle_endtag(self, tag):
      print("Encountered an end tag :", tag)
   def handle_data(self, data):
      print("Encountered some data  :", data)

parser = MyHTMLParser()

parser.feed(

'<html><head><title>Test</title></head>'+

            '<body><h1>Parse me!</h1></body></html>')

程式碼說明

使用handle方式處理，基本概念xml的sax概念相同，在使用的時候需要先定義怎樣的標籤，該怎樣處理，定義完成後，再將需要處理的內容給parser分析。

excel程式碼範例-連結

word程式碼範例-連結

圖片程式碼範例-PNG、JPEG

音源程式碼範例-連結

影片程式碼範例-連結

https://kknews.cc/zh-tw/tech/p8kop88.html
https://www.qa-knowhow.com/?p=3773
https://pythonhosted.org/pypng/

投影片-slideshare:Python＿函數

影片-youtube:Python＿函數

程式碼-Github:Python＿函數
下一單元：Python-類別

我的技術蒐集