Example 1

suggest change

Say you have the string

s = 'AAAABBBCCDAABBB'

and you would like to split it so all the ‘A’s are in one list and so with all the ‘B’s and ‘C’, etc. You could do something like this

s = 'AAAABBBCCDAABBB'
s_dict = {}
for i in s:
    if i not in s_dict.keys():
        s_dict[i] = [i]
    else:
        s_dict[i].append(i)
s_dict

Results in

{'A': ['A', 'A', 'A', 'A', 'A', 'A'],
 'B': ['B', 'B', 'B', 'B', 'B', 'B'],
 'C': ['C', 'C'],
 'D': ['D']}

But for large data set you would be building up these items in memory. This is where groupby() comes in

We could get the same result in a more efficient manner by doing the following

# note that we get a {key : value} pair for iterating over the items just like in python dictionary
from itertools import groupby
s = 'AAAABBBCCDAABBB'
c = groupby(s)

dic = {} 
for k, v in c:
    dic[k] = list(v)
dic

Results in

{'A': ['A', 'A'], 'B': ['B', 'B', 'B'], 'C': ['C', 'C'], 'D': ['D']}

Notice that the number of ‘A’s in the result when we used group by is less than the actual number of ‘A’s in the original string. We can avoid that loss of information by sorting the items in s before passing it to c as shown below

c = groupby(sorted(s))

dic = {} 
for k, v in c:
    dic[k] = list(v)
dic

Results in

{'A': ['A', 'A', 'A', 'A', 'A', 'A'], 'B': ['B', 'B', 'B', 'B', 'B', 'B'], 'C': ['C', 'C'], 'D': ['D']}

Now we have all our ‘A’s.

Feedback about page:

Feedback:
Optional: your email if you want me to get back to you:


Groupby:
* Example 1

Table Of Contents
2 Filter
3 List
7 Loops
22 Reduce
27 Classes
31 Set
42 Tuple
45 Enum
62 Sockets
89 urllib
92 Idioms
104 Stack
105 Profiling
109 Logging
111 os module
118 Mixins
120 ArcPy
126 Arrays
132 2to3 tool
135 Unicode
138 Neo4j
140 Curses
141 Templates
145 heapq
146 tkinter
154 Audio
155 pyglet
157 ijson
160 Flask
161 Groupby
163 pygame
165 hashlib
166 Gzip
167 ctypes
185 pyaudio
186 shelve