Removing duplicates in lists

技术背景

在Python编程中，列表去重是一个常见的需求。当我们处理数据时，经常会遇到列表中存在重复元素的情况，为了后续数据处理的准确性和效率，需要将这些重复元素去除。同时，根据不同的业务场景，可能还需要保留元素的原始顺序。

实现步骤

不保留顺序

使用set：set是Python中的一种无序集合，其中的元素是唯一的。可以将列表转换为set，再转换回列表来实现去重。

1
2
3

t = [1, 2, 3, 1, 2, 3, 5, 6, 7, 8]
result = list(set(t))
print(result)

保留顺序

使用OrderedDict：从Python 3.7开始，内置字典保证按插入顺序排列。可以使用dict.fromkeys()方法去重并保留顺序。

from collections import OrderedDict
t = [1, 2, 3, 1, 2, 3, 5, 6, 7, 8]
result = list(dict.fromkeys(t))
print(result)

自定义循环：通过遍历列表，检查元素是否已存在于新列表中，若不存在则添加。

t = [1, 2, 3, 1, 2, 3, 5, 6, 7, 8]
s = []
for i in t:
    if i not in s:
        s.append(i)
print(s)

核心代码

自定义类和函数实现去重

from collections import OrderedDict, Counter

class Container:
    def __init__(self, obj):
        self.obj = obj
    def __eq__(self, obj):
        return self.obj == obj
    def __hash__(self):
        try:
            return hash(self.obj)
        except:
            return id(self.obj)

class OrderedCounter(Counter, OrderedDict):
    'Counter that remembers the order elements are first encountered'

    def __repr__(self):
        return '%s(%r)' % (self.__class__.__name__, OrderedDict(self))

    def __reduce__(self):
        return self.__class__, (OrderedDict(self),)

def remd(sequence):
    cnt = Counter()
    for x in sequence:
        cnt[Container(x)] += 1
    return [item.obj for item in cnt]

def oremd(sequence):
    cnt = OrderedCounter()
    for x in sequence:
        cnt[Container(x)] += 1
    return [item.obj for item in cnt]

使用生成器去重

def uniqify(iterable):
    seen = set()
    for item in iterable:
        if item not in seen:
            seen.add(item)
            yield item

unique_list = list(uniqify([1, 2, 3, 4, 3, 2, 4, 5, 6, 7, 6, 8, 8]))
print(unique_list)