python笔记（一）

最近使用python时的一些知识点记录，本文基于python3.5。

获取当前路径

获取执行脚本时的工作目录，以及脚本所在的目录：

import sys
import os

# 获取执行脚本时的起始目录
print (os.getcwd())
print (os.path.abspath('.'))
print (os.path.realpath('.'))

# 获取执行的脚本的完整路径
print (__file__)
print (os.path.abspath(__file__))
print (os.path.realpath(__file__))
print (sys.argv[0])

# 获取执行的脚本所在的目录
print (os.path.dirname(__file__))
print (os.path.split(os.path.realpath(__file__))[0])
print (sys.path[0])

文件引用

在脚本中包含自定义的类或函数，使用__init__.py来标记包，并通过控制__init__.py的内容来控制导出内容。

用一个例子来说明，假定有目录结构如下图所示。模块内是一个函数，模块内的类与此相似。

.
├────__init__.py
├────app.py
├────mod_1.py
├─┬──pac_2
│ ├────__init__.py
│ └────mod_2.py
└─┬──pac_3
  ├────__init__.py
  ├────mod_3.py
  ├─┬──pac_4
  │ ├────__init__.py
  │ └────mod_4.py
  └─┬──pac_5
    ├────__init__.py
    ├────mod_5_1.py
    └────mod_5_2.py

app.py表示脚本入口，pac_*是文件夹，表示包，mod_*.py是模块。

所有的mod_文件内容均相同，内容如下：

1 2	def func(): print('This is ' + __name__)

除了pac_5内的__init__.py之外，其余的__init__.py文件均为空文件，pac_5内的__init__.py文件内容为：

1	__all__ = ['mod_5_1', 'mod_5_2']

app.py内容如下：

import mod_1
from pac_2.mod_2 import func as func2
from pac_3.mod_3 import func as func3
from pac_3.pac_4.mod_4 import func as func4
from pac_3.pac_5 import *

if __name__ == '__main__':
    mod_1.func()
    func2()
    func3()
    func4()
    mod_5_1.func()
    mod_5_2.func()

执行结果为：

This is mod_1
This is pac_2.mod_2
This is pac_3.mod_3
This is pac_3.pac_4.mod_4
This is pac_3.pac_5.mod_5_1
This is pac_3.pac_5.mod_5_2

sort和sorted

sort和sorted的作用相似，sort是可迭代对象的方法，会修改调用者的值。sorted是全局函数，不会修改被调用对象的值，并且会返回一个新的排序后的对象。

一个简单例子：

l = [1,3,4,2,5]
print ('l is %s' % l)
sl = sorted(l)
print ('sorted(l) is %s' % sl)
print ('now l is %s' % l)
l.sort()
print ('after l.sorl() l is %s' % l)

输出结果为：

l is [1, 3, 4, 2, 5]
sorted(l) is [1, 2, 3, 4, 5]
now l is [1, 3, 4, 2, 5]
after l.sorl() l is [1, 2, 3, 4, 5]

sort和sorted函数可通过参数key来指定排序规则，通过参数reverse来控制是否逆向排序，如：

l = [1, 222, 30, 9, 10000, -6]
print(sorted(l)) # 默认规则
print(sorted(l, reverse=True)) # 默认规则 逆向排序
print(sorted(l, key = str)) # 转化为str后按照str的规则排序
print(sorted(l, key = abs)) # 按照绝对值排序
print(sorted(l, key = lambda p : len(str(p)))) # 按照lambda指定的规则 先转化为str再按str的长度排序

得到结果为：

[-6, 1, 9, 30, 222, 10000]
[10000, 222, 30, 9, 1, -6]
[-6, 1, 10000, 222, 30, 9]
[1, -6, 9, 30, 222, 10000]
[1, 9, 30, -6, 222, 10000]

sort和sorted都是稳定排序。

filter、map和reduce

对于可迭代对象，使用for遍历对每个元素做相同的操作时，某些情况下可以考虑使用filter、map和reduce，可以使编码更高效，使代码更易读。

filter

过滤器，将可迭代对象中，符合指定条件的对象提取出来，放入一个filter对象并返回。两个示例：

# 过滤出数组中的偶数
list_1 = [1,2,3,4,5]
filter_l = filter(lambda x : x%2 == 0,list_1)
list_2 = [i for i in filter_l]
print('list is %s' % list_1)
print('after filter list is %s' % list_2) # [2, 4]

# 删除字符串中的空格
str_1 = 'hello world !'
filter_s= filter(lambda c : c != ' ',str_1)
str_2 = ''.join(filter_s)
print('str is %s' % str_1)
print('after filter str is %s' % str_2) # helloworld!

map

映射关系，将可迭代对象中的每个元素做相同的操作（调用指定的函数），并返回由函数返回值构成的map对象。

# 将所有的元素求平方
list_1 = [1,2,3,4,5]
map_l = map(lambda x : x ** 2,list_1)  
list_2 = [i for i in map_l]
print('list is %s' % list_1)
print('after map list is %s' % list_2) # [1, 4, 9, 16, 25]

# 将所有的字符转为大写
str_1 = 'hello world !'
map_s= map(lambda c : c.upper(),str_1)
str_2 = ''.join(map_s)
print('str is %s' % str_1)
print('after map str is %s' % str_2) # HELLO WORLD !

reduce

函数func需要传入2个参数，得到一个单一的返回值。对于可迭代对象seq中的所有元素，从前两个元素开始，传入func，得到返回值与seq中的下一个元素再一起传入func，直到seq中没有更多的元素，从而得到一个单一的返回值。

更形象地说明，对于可迭代对象seq = [item1, item2, item3, item4, item5]，和指定的函数func，则有：

1	reduce(func, seq) = func(func(func(func(item1, item2), item3), item4), item5)

以下是两个示例：

from functools import reduce

# 将所有的元素求和
list_1 = [1,2,3,4,5]
reduce_1 = reduce(lambda x,y : x+y,list_1)
print('list_1 is %s' % list_1)
print('sum is %s' % reduce_1) # 15

# 字符串拼接为一个字符串并用下划线分隔
list_2 = ['apple', 'orange', 'banana']
reduce_2= reduce(lambda str1, str2 : str1 + '_' + str2, list_2)
print('list_2 is %s' % list_2)
print('combined list_2 is %s' % reduce_2) # apple_orange_banana

reduce还提供有一个可选参数init，当为init指定值时，func的首次调用会使用init的值和seq中的第一个元素。指定init参数也可以避免当seq为空时出现的异常：

from functools import reduce

list_1 = [1, 2, 3]
sum_1_no_init = reduce(lambda x, y : x+y , list_1)
print('sum_1_no_init is %s' % sum_1_no_init)
sum_1_init_1000 = reduce(lambda x, y : x+y , list_1, 1000)
print('sum_1_init_1000 is %s' % sum_1_init_1000)

list_2 = []
try:
	sum_2_no_init = reduce(lambda x, y : x+y , list_2)
	print('sum_2_no_init is %s' % sum_2_no_init)
except Exception as e:
	print (e)
sum_2_init_1000 = reduce(lambda x, y : x+y , list_2, 1000)
print('sum_2_init_1000 is %s' % sum_2_init_1000)

输出结果为：

sum_1_no_init is 6
sum_1_init_1000 is 1006
reduce() of empty sequence with no initial value
sum_2_init_1000 is 1000

序列化

在python中有三种实现序列化的方式，但具体的内容、适用对象均不同：

struct

可定义一个C语言形式的结构体，按照结构体指定的格式化字符串，将结构体序列化为字节流。这种方式可用于将基础类型的数据打包成为字节流，封装协议头及将数据进行网络传输等，满足跨语言、跨平台需求。

使用struck.pack将数据打包，使用struct.unpack从字节流中解包出数据。

from struct import *
a, b, c = 1, 2, 3
print (a, b, c)
p = pack('hhl', a, b, c) # b'\x01\x00\x02\x00\x00\x00\x00\x00\x03\x00\x00\x00\x00\x00\x00\x00'
print (p)
a, b, c = unpack('hhl', p)
print (a, b, c) # 1 2 3

pickle

将python对象序列化与反序列化，适用于配合文件读写实现对象持久化。

使用pickle.dump将对象序列化，使用 pickle.load反序列化出对象：

# 自定义类
class Addition:
	def __init__(self,x,y):
		self.x = x
		self.y = y

	def value(self):
		return self.x + self.y

# ================================================
# 使用dump序列化
import pickle
data_1 = {
			'num': 500,
			'list' : [0,'apple'],
			'dict' : {'one': 1 , 'two': 2, 'three': 3},
        	'tuple': ('string', u'Unicode string'),
        	'n': None
        }
data_2 = [10, 100, 1000]
data_3 = Addition(1,2)

f = open('data.pkl', 'wb')

pickle.dump(data_1, f, -1)
pickle.dump(data_2, f, -1)
pickle.dump(data_3, f,-1)

f.close()

# ================================================
# 使用load反序列化

# import pickle

f = open('data.pkl', 'rb')

data_1 = pickle.load(f)
data_2 = pickle.load(f)
data_3 = pickle.load(f)

f.close()

print(data_1)
print(data_2)
print(data_3)
print(data_3.x , data_3.y, data_3.value())

pickle.dump(obj, file, [,protocol])的第三个参数用于指定序列化时使用的协议版本，默认值为0，表示使用ASCII协议，当protocol的值取1或2时表示使用二进制格式，其中值为2的协议更新更高效。当使用二进制格式协议时，打开文件时需要以二进制格式打开。protocol取-1表示使用所支持的最新协议版本来序列化对象。

json

比较通用的序列化方式，对象与json字符串互转。

使用json.dumps将对象序列化，使用 json.loads反序列化出对象：

import json

data = {'a' : 1,
		'b' : 2,
		'c' : 3,
		'd' : [ 4.1, 4.2, 4.3 ],
		'e' : { 'e1': 5.1, 'e2': 5.2}
		}

jstr = json.dumps(data)
print (jstr)
print (len(jstr)) # 75

data = json.loads(jstr)
print (data)
print (len(data)) # 5

re.sub正则替换

re.sub(pattern, repl, string, count=0, flags=0)用于将字符串按照正则匹配规则替换，返回替换后的字符串。

一个最简单的例子：

import re

# 将输入字符串中的所有数字替换为_
inStr = "ab2ui8xyz99pq1"
outStr = re.sub("\d", "_", inStr)
print(outStr) # ab_ui_xyz__pq_

# 屏蔽hell 替换为****
inStr = "open the shell oh what the hell i mean hello"
outStr = re.sub(r"\bhell\b", "****", inStr) # 注意使用raw字符串否则\b会失效
print(outStr) # open the shell oh what the **** i mean hello

使用捕获组（capture group），

import re

# 使用捕获组 将A4替换为A-4 ... 
inStr = "A4 is ready but B8 and B9 are not"
outStr = re.sub("(\w)(\d+)", "\g<1>-\g<2>", inStr)
'''
outStr = re.sub(r"(?P<letter>\w)(?P<figure>\d)", "\g<letter>-\g<figure>", inStr)
'''

print(outStr) # A-4 is ready but B-8 and B-9 are not

re.sub(pattern, repl, string, count=0, flags=0)不仅可以正则匹配替换，还可以执行更复杂的操作，参数repl可以是一个函数，如以下示例：

import re
import random

def repl(m):
	inner_word = list(m.group(2))
	random.shuffle(inner_word)
	return m.group(1) + "".join(inner_word) + m.group(3)

text = "Professor Abdolmalek, please report your absences promptly."
print (text)
text = re.sub(r"(\w)(\w+)(\w)", repl, text)
print (text)
text = re.sub(r"(\w)(\w+)(\w)", repl, text)
print (text)

输出结果为：

1
2
3

Professor Abdolmalek, please report your absences promptly.
Pssrfooer Albalomdek, pasele roerpt your acebenss pptmroly.
Prososefr Aeadblmolk, psleae reorpt your abeencss pmloptry.

REFERENCE

https://docs.python.org/3/library/