Iterable/Sequences in Python

Iterable/Sequences

Strings, ranges, lists and tuples.

“An iterable which supports efficient element access using integer indices via the __getitem__ special method and defines a __len__ method that returns the length of the sequence.”

  • use one index to fetch item in a list, the index can't out of the range; but if use slice method, then you can do it out of the range, because the slice will autolly find the greediest indexs.

  • assigning a list to another list, like a = b, makes a and b refers to the same list. And a list is mutable. So if we change a, then b will also be affected. It's called alias. Do avoid this.

  • Tuple also has order. Just it's immutable.

  • Fxxxk, Tuple and Set are different things! A set is like {1, 2, 3}, that is what I thought as not ordered and not duplicated. And a set has intersection() union() difference() symmetric_difference().

    To delete items in a set:

    1
    2
    3
    set1.discard(6) # If the item doesn't exist, no exception
    set1.remove(6) # Raise an exception if doesn't exist
    set1.pop() # randomly delete one item.
  • you can insert an interable in a list's slice, though they have different length!

    1
    2
    3
    4
    >>> a = [1, 2, 3, 4, 5]
    >>> a[3:3] = [6, 7]
    >>> a
    [1, 2, 3, 6, 7, 4, 5]
  • list() takes an iterable, and makes it a list;

    str() shows you the whole object's look.

  • range() is also an iterable. It can be slicing, and len()

  • a, b = b, a: the b, a on the right firstly is packed into tuple: (b, a), then it is unpacked and individually assigned to each var on the left

Sequence operations

All: (list tuple str range)

image-20210825165741137
image-20210825165741137
image-20210825165757277
image-20210825165757277

Except range:

image-20210825165814598
image-20210825165814598

List:

image-20210825165842016
image-20210825165842016

List

l.extend() treats the input as an iterable, and take each element of the iterable as a new element;

l.append() treats the input as a new element.

l.pop() Pop and return element of some index (if omitted, pop last element)

l.remove() Remove first occurrence of element

be careful which methods can cause runtime error

for-comprehension: elem for var in interable if cond

all(<iterable>): whether all are true

any(<interable>): any true

Generator expressions

only retrieve elements when need

Yield expression

Generator functions are written using the function* syntax. When called, generator functions do not initially execute their code. Instead, they return a special type of iterator, called a Generator. When a value is consumed by calling the generator's next method, the Generator function executes until it encounters the yield keyword.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
>>> def f():
... cur = 0
... while True:
... yield cur
... cur += 1
>>> a = f()
>>> type(a)
generator
>>> next(a)
0
>>> next(a)
1
>>> for i in f():
... print(i)

0
1
2

###迭代器和生成器

  • 迭代器 an iterable,跟sequence不同的是它不能slicing,也不一定有len(),因为是可以无穷的。其定义就是一个实现了__iter()____next()__方法的对象。从道理上来讲是一个可以遍历sequence,并且记住遍历位置的对象。直到访问完所有元素,抛出一个StopIteration异常。
  • 生成器 a generator:使用了yield的函数,必然返回一个迭代器。用起来和迭代器是一样的。
    • 至于yield的用法,在运行生成器的时候,其实也就是next(<generator>)的时候,函数会运行,直到遇到yield,就暂停并保存当前运行信息,返回yield的值;下一次运行时从当前位置继续

Dict and Set

  • Use short-circuit evaluation to fetch a key in a Dict, but if it doesn't exist, it will return a False, instead of raising an error:

    1
    2
    3
    'Toronto' in MLB_team and MLB_team['Toronto']
    # if 'Toronto' is in the list, => True and xxx, it will return the xxx;
    # if it's not in the list, => False and xxx, it will return False and skip running the xxx.

    But d.get() method is similar. It will give a None if the key doesn't exist.

  • set can only contains hashable items. set is unhashable. So a set can't be put into a set.

  • All mutable types are unhashable

  • set() function picking up an iterable and use it one by one to create a set. so set({1,2,3}) is ok.

  • frozenset is immutable, so also hashable. so can be put into a set.

hash

1
2
3
4
5
6
7
8
9
>>> set([1, 1.0])
{1}

>>> n1 = 1234e23
>>> n2 = 1234e23
>>> id(n1), id(n2)
(4318748880, 4318749104)
>>> hash(n1), hash(n2)
(539300935830116283, 539300935830116283)
  • Strange: Python's behavior is different in shell/script/ways:
1
2
3
>>> n1, n2 = 1234e23, 1234e23
>>> id(n1), id(n2)
(4317529008, 4317529008)

1. hash for built-in data types

  • For built-in immutable data types (i.e. float, str, int), hash uses their value, so, x == y => hash(x) == hash(y)

  • For tuple, hash depends on its elements. So, some tuple are hashable, some are not. depends on whether members are all hashable:

    1
    2
    3
    4
    5
    6
    7
    >>> t1, t2 = (1, 'abc'), (1, 'abc', [1])
    >>> hash(t1)
    7760687994730794974
    >>> hash(t2)
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    TypeError: unhashable type: 'list'
  • hash is None for built-in mutable types (i.e. list, set, dict)

2. hash for user defined classes
  • For user defined classes, by default:
    • __hash__ and __eq__ are based on id by default (but they can be overriden) (so, by default, hash, ==(equal) and is are the same)
    • duplicate is defined as: x == y <=> x is y and hash(x) == hash(y)
    • 如果你改了equal的逻辑,那么就要把hash的逻辑也改了,否则就会变成None
  • 说人话:
    • 默认来说,全都看id
    • 如果光改了eq,那hash就成none了。想保留hash的话,要满足这个条件:只要ab全等,必须ab哈希值一样(不然你是魔鬼)
  • 哈希冲突 (collision): 不同的元素(a == b: False)经过哈希函数后发现哈希值一样(hash(a) == hash(b))。是需要一些策略来进行处理,以确保把这些元素都储存下来的
  • 重复(duplicate):两个元素全等(而不是id一样!!!),hash得到的值又一样(a == b and hash(a) == hash(b)),自然认为是同一个元素,直接覆盖了