StackOverFlowCn/102.md at master · qiwsir/StackOverFlowCn

#如何把两个Python字典用一个表达式合并？

原问题地址：http://stackoverflow.com/questions/38987/how-can-i-merge-two-python-dictionaries-in-a-single-expression

##问题：

我有两个Python字典，想写一个表达式，实现两个字典的合并。用update()方法，如果它能够返回合并结果而不是原地修改，就是我想要的。

>>> x = {'a':1, 'b': 2}
>>> y = {'b':10, 'c': 11}
>>> z = x.update(y)
>>> print z
None
>>> x
{'a': 1, 'b': 10, 'c': 11}

我怎样才能让最终合并后的字典是在z中，而不是在x中？（要格外清楚，最后一个成功解决dict.update()的冲突的方法，也是我所寻求的。）

##回答：

你有两个字典，你想把它们合并成一个新的字典，而又不改变原有字典中的内容：

x = {'a': 1, 'b': 2}
y = {'b': 3, 'c': 4}

y是后一个，它的值将取代x的值，因此'b'将在最终的结果中指向3。

经典的Python解法是分两步处理：

z = x.copy()
z.update(y)

PEP 448中提出的新语法在 Python 3.5中可用。新语法是这样的

z = {**x, **y}

它满足了你所提出的要求（单一的表达式）。它符合Python 3.5 PEP 478，并且它已经出现在文档What's New in Python 3.5中。

然而，由于许多组织仍然在使用Python 2，在未来几年的时间里，新语法都不太可能在生产环境中被使用。

###在Python 2中，合并字典的表达式

随着许多组织把Python版本升级到Python 3，Python 3.5中提出的新的解决方案将会成为这个问题的主要解决方案。

然而，如果你还没有使用Python 3.5，并且你还想在一个表达式中实现这个功能，最高效的方法就是把它放在一个函数中：

def merge_two_dicts(x, y):
    '''Given two dicts, merge them into a new dict as a shallow copy.'''
    z = x.copy()
    z.update(y)
    return z

然后，你就有一个单一的表达式：

z = merge_two_dicts(x, y)

你也可以创建一个函数来合并一些数量不确定的字典，从零到一个非常大的数量：

def merge_dicts(*dict_args):
    '''
    Given any number of dicts, shallow copy and merge into a new dict, precedence goes to key value pairs in latter dicts.
    '''
    result = {}
    for dictionary in dict_args:
        result.update(dictionary)
    return result

这个函数对于Python 2和Python 3中所有的字典都适用。例如，在给定的从a到g的字典中，

z = merge_dicts(a, b, c, d, e, f, g)

字典g中的键值对优先于字典a到f，依此类推（即“最后一个有效”——译者注）。

###对其他答案的评论

不要采用你所看到的投票最多的答案：

z = dict(x.items() + y.items())

在Python 2中，你在内存中为这两个字典创建两个列表，创建的第三个列表的长度等于前两个列表加起来的长度，然后创建字典并丢弃所有这三个列表。在Python 3中，这样做就行不通了，因为你是把两个dict_items加在一起，而不是两个列表。

>>> c = dict(a.items() + b.items())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for +: 'dict_items' and 'dict_items'
and you would have to explicitly create them as lists, e.g. z = dict(list(x.items()) + list(y.items())). This is a waste of    resources and computation power.

你必须明确地把它们作为列表来创建，例如z = dict(list(x.items()) + list(y.items()))。这是对资源和计算能力的一种浪费。

同样地，如果值不是可哈希对象（例如：列表），在Python 3 中合并items()也不会成功（Python 2.7中viewitems()）。即使你的值是可哈希的，因为字典是无序的，也没有定义行为的优先顺序。所以不要这样做：

>>> c = dict(a.items() | b.items())

这个例子说明了如果值不是哈希表时会发生什么情况：

>>> x = {'a': []}
>>> y = {'b': []}
>>> dict(x.items() | y.items())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'

这里有一个例子，它说明了y在什么情况下应该有优先权，但是由于字典的顺序是任意的，x的值被保留下来：

>>> x = {'a': 2}
>>> y = {'a': 1}
>>> dict(x.items() | y.items())
{'a': 2}

你不应该使用的另一个把戏：

z = dict(x, **y)

这里使用了字典构造器，并且速度非常快、内存效率高（甚至略高于我们的两步法）。但是这个把戏很难读懂，除非你确切地知道这里发生了什么情况（第二个字典被作为关键字参数传递给字典构造器），这不是预期的用法，所以不Python。另外，当关键字不是字符串时，这种方法在Python 3中也行不通。

>>> c = dict(a, **b)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: keyword arguments must be strings

在邮件列表中，吉多·范罗苏姆，Python语言的创造者，写到：

“I am fine with declaring dict({}, **{1:3}) illegal, since after all it is abuse of the ** mechanism.” “我可以宣布dict({}, **{1:3})是非法的，因为这毕竟是在滥用**机制。”

“Apparently dict(x, **y) is going around as "cool hack" for "call x.update(y) and return x". Personally I find it more despicable than cool.” “显然，dict(x, **y)是对于x.update(y) and return x的“酷黑”。我个人觉得它不是‘酷’而是卑劣。”

###正确但性能不高的点对点模式

这些方法的性能不高，但它们将提供正确的功能，性能会比copy和update或新的unpacking低得多，因为它们在更高的抽象层次上遍历每个键值对，但它们肯定尊重优先顺序（后面的字典优先）

你也可以在字典的解析式中创建字典：

{k: v for d in dicts for k, v in d.items()} # iteritems in Python 2.7

或者在Python 2.6中（也许早在引入了生成器表达式的Python2.4中）：

dict((k, v) for d in dicts for k, v in d.items())

itertools.chain将迭代器的键值对按正确的顺序链接起来：

import itertools
z = dict(itertools.chain(x.iteritems(), y.iteritems()))

###性能分析

我只对能够正确运行的那些用法进行性能分析。

import timeit

以下适用于Ubuntu 14.04和Python 2.7（Python系统）：

>>> min(timeit.repeat(lambda: merge_two_dicts(x, y)))
0.5726828575134277
>>> min(timeit.repeat(lambda: {k: v for d in (x, y) for k, v in d.items()} ))
1.163769006729126
>>> min(timeit.repeat(lambda: dict(itertools.chain(x.iteritems(), y.iteritems()))))
1.1614501476287842
>>> min(timeit.repeat(lambda: dict((k, v) for d in (x, y) for k, v in d.items())))
2.2345519065856934

Python 3.5：

>>> min(timeit.repeat(lambda: {**x, **y}))
0.4094954460160807
>>> min(timeit.repeat(lambda: merge_two_dicts(x, y)))
0.7881555100320838
>>> min(timeit.repeat(lambda: {k: v for d in (x, y) for k, v in d.items()} ))
1.4525277839857154
>>> min(timeit.repeat(lambda: dict(itertools.chain(x.items(), y.items()))))
2.3143140770262107
>>> min(timeit.repeat(lambda: dict((k, v) for d in (x, y) for k, v in d.items())))
3.2069112799945287

打赏帐号：[email protected]（支付宝）

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FilesExpand file tree

102.md

Latest commit

History

102.md

File metadata and controls