Sets versus multisets

suggest change

Sets are unordered collections of distinct elements. But sometimes we want to work with unordered collections of elements that are not necessarily distinct and keep track of the elements’ multiplicities.

Consider this example:

>>> setA = {'a','b','b','c'}
>>> setA
set(['a', 'c', 'b'])

By saving the strings 'a', 'b', 'b', 'c' into a set data structure we’ve lost the information on the fact that 'b' occurs twice. Of course saving the elements to a list would retain this information

>>> listA = ['a','b','b','c']
>>> listA
['a', 'b', 'b', 'c']

but a list data structure introduces an extra unneeded ordering that will slow down our computations.

For implementing multisets Python provides the Counter class from the collections module (starting from version 2.7):

>>> from collections import Counter
>>> counterA = Counter(['a','b','b','c'])
>>> counterA
Counter({'b': 2, 'a': 1, 'c': 1})

Counter is a dictionary where where elements are stored as dictionary keys and their counts are stored as dictionary values. And as all dictionaries, it is an unordered collection.

Feedback about page:

Feedback:
Optional: your email if you want me to get back to you:


Set:
* Set
* Sets versus multisets

Table Of Contents
2 Filter
3 List
7 Loops
22 Reduce
27 Classes
31 Set
42 Tuple
45 Enum
62 Sockets
89 urllib
92 Idioms
104 Stack
105 Profiling
109 Logging
111 os module
118 Mixins
120 ArcPy
126 Arrays
132 2to3 tool
135 Unicode
138 Neo4j
140 Curses
141 Templates
145 heapq
146 tkinter
154 Audio
155 pyglet
157 ijson
160 Flask
161 Groupby
163 pygame
165 hashlib
166 Gzip
167 ctypes
185 pyaudio
186 shelve