Strings Bytes versus Unicode

suggest change

In Python 2 there are two variants of string: those made of bytes with type (str) and those made of text with type (unicode).

In Python 2, an object of type str is always a byte sequence, but is commonly used for both text and binary data.

A string literal is interpreted as a byte string.

s = 'Cafe'    # type(s) == str

There are two exceptions: You can define a Unicode (text) literal explicitly by prefixing the literal with u:

s = u'Café'   # type(s) == unicode
b = 'Lorem ipsum'  # type(b) == str

Alternatively, you can specify that a whole module’s string literals should create Unicode (text) literals:

from __future__ import unicode_literals

s = 'Café'   # type(s) == unicode
b = 'Lorem ipsum'  # type(b) == unicode

In order to check whether your variable is a string (either Unicode or a byte string), you can use:

isinstance(s, basestring)

In Python 3, the str type is a Unicode text type.

s = 'Cafe'           # type(s) == str
s = 'Café'           # type(s) == str (note the accented trailing e)

Additionally, Python 3 added a bytes object, suitable for binary “blobs” or writing to encoding-independent files. To create a bytes object, you can prefix b to a string literal or call the string’s encode method:

# Or, if you really need a byte string:
s = b'Cafe'          # type(s) == bytes
s = 'Café'.encode()  # type(s) == bytes

To test whether a value is a string, use:

isinstance(s, str)

It is also possible to prefix string literals with a u prefix to ease compatibility between Python 2 and Python 3 code bases. Since, in Python 3, all strings are Unicode by default, prepending a string literal with u has no effect:

u'Cafe' == 'Cafe'

Python 2’s raw Unicode string prefix ur is not supported, however:

>>> ur'Café'
  File "<stdin>", line 1
    ur'Café'
           ^
SyntaxError: invalid syntax

Note that you must encode a Python 3 text (str) object to convert it into a bytes representation of that text. The default encoding of this method is UTF-8.

You can use decode to ask a bytes object for what Unicode text it represents:

>>> b.decode()
'Café'

While the bytes type exists in both Python 2 and 3, the unicode type only exists in Python 2. To use Python 3’s implicit Unicode strings in Python 2, add the following to the top of your code file:

from __future__ import unicode_literals
print(repr("hi"))
# u'hi'

Another important difference is that indexing bytes in Python 3 results in an int output like so:

b"abc"[0] == 97

Whilst slicing in a size of one results in a length 1 bytes object:

b"abc"[0:1] == b"a"

In addition, Python 3 fixes some unusual behavior with unicode, i.e. reversing byte strings in Python 2. For example, the following issue is resolved:

# -*- coding: utf8 -*-
print("Hi, my name is Łukasz Langa.")
print(u"Hi, my name is Łukasz Langa."[::-1])
print("Hi, my name is Łukasz Langa."[::-1])

# Output in Python 2
# Hi, my name is Łukasz Langa.
# .agnaL zsakuŁ si eman ym ,iH
# .agnaL zsaku�� si eman ym ,iH

# Output in Python 3
# Hi, my name is Łukasz Langa.
# .agnaL zsakuŁ si eman ym ,iH
# .agnaL zsakuŁ si eman ym ,iH

Found a mistake? Have a question or improvement idea? Let me know.

Incompatibilities moving from Python 2 to Python 3:

* 000Incompatibilities moving from Python 2 to Python 3

* Integer Division

* Unpacking Iterables

* Strings Bytes versus Unicode

* Print statement vs. Print function

* Differences between range and xrange functions

* Raising and handling Exceptions

* Leaked variables in list comprehension

* True False and None

* User Input

* Comparison of different types

* .next method on iterators renamed

* filter map and zip return iterators instead of sequences

* Renamed modules

* long vs. int

* Removed operators and synonymous with and repr

* Reduce is no longer a built-in

* All classes are new-style classes in Python 3

* AbsoluteRelative Imports

* map

* The round function tie-breaking and return type

* cmp function removed in Python 3

* File IO

* Return value when writing to a file object

* Octal Constants

* encode / decode to hex no longer available

* exec statement is a function in Python 3

* Dictionary method changes

* Class Boolean Value

* hasattr function bug in Python 2

Table Of Contents

0 Getting Started

1 List comprehension

2 Filter

3 List

4 Functions

5 Decorators

6 Math module

7 Loops

8 Random module

9 Comparisons

10 Importing modules

11 Sorting Minimum and Maximium

12 Operator module

13 Variable Scope and Binding

14 Basic Input and Output

15 Files, Folders, I/O

16 JSON Module

17 String Methods

18 Metaclasses

19 Indexing and Slicing

20 Generators

21 Simple Mathematical Operators

22 Reduce

23 Map Function

24 Exponentation

25 Searching

26 Dictionary

27 Classes

28 Counting

29 Manipulating XML

30 Date and Time

31 Set

32 Collections module

33 Parallel computation

34 Multithreading

35 Writing C extensions

36 Unit Testing

37 Regular Expressions

38 Bitwise Operators

39 Incompatibilities moving from Python 2 to Python 3

40 Virtual environments

41 Copying data

42 Tuple

43 Context Managers with Statement

44 Hidden Features

45 Enum

46 String Formatting

47 Conditionals

48 Complex math

49 Unicode and bytes

50 The __name__ special variable

51 Check if path exists

52 Networking

53 Asyncio Module

54 Print Function

55 os.path module

56 Creating Python packages

57 Parsing Command Line Arguments

58 HTML Parsing

59 Subprocess Library

60 setup.py

61 List slicing

62 Sockets

63 Itertools Module

64 Recursion

65 Boolean Operators

66 dis module

67 Type Hints

68 pip PyPI Package Manager

69 locale module

70 Exceptions

71 Web scraping

72 deque module

73 Distributing self-contained applications

74 Property Objects

75 Overloading

76 Debugging

77 Reading and Writing CSV

78 Dynamic code execution with exec and eval

79 PyInstaller - Distributing Python Code

80 Iterables and Iterators

81 Data Visualization

82 The interpreter command line console

83 args and kwargs

84 functools module

85 Garbage Collection

86 Indentation

87 Security and Cryptograhy

88 Pickle data serialization

89 urllib

90 Binary Data

91 Python and Excel

92 Idioms

93 Method Overriding

94 Difference between a module and a package

95 Data Serialization

96 Python Concurrency

97 RabbitMQ using AMQPStorm

98 PostgreSQL

99 Descriptor

100 Common Pitfalls

101 Multiprocessing

102 Creating temporary files with tempfile

103 Working with ZIP files

104 Stack

105 Profiling

106 User-Defined Methods

107 Working around Global Interpreter Lock

108 Deployment using conda

109 Logging

110 Processes and Threads

111 os module

112 Comments and documentation

113 Database Access

114 Python HTTP Server

115 Alternatives to switch statement from other languages

116 List destructuring

117 Accessing Python source code and bytecode

118 Mixins

119 Attribute Access

120 ArcPy

121 Python Anti-Patterns

122 Plugin and Extension Classes

123 Websockets

124 Immutable data types

125 String representations of class

126 Arrays

127 Operator Precedence

128 Polymorphism

129 Alternative Python implementations

130 List Comprehensions

131 Web Server Gateway Intrerface WSGI

132 2to3 tool

133 Abstract Syntax Tree

134 Abstract Base Classes

135 Unicode

136 ssh in Python

137 Serial Communication with pyserial

138 Neo4j

139 Performance optimization

140 Curses

141 Templates

142 pass statement

143 Testing with py.test

144 Date Formatting

145 heapq

146 tkinter

147 CLI subcommands

148 Defining functions with list arguments

149 SQLite3 module

150 Persistence with pickle

151 Connecting to SQL Server

152 Design Patterns

153 Multidimensional arrays

154 Audio

155 pyglet

156 queue module

157 ijson

158 webbrowser module

159 base64 module

160 Flask

161 Groupby

162 Sockets and Message Encryption / Decryption

163 pygame

164 Input Subset and Output External Data Files using Pandas

165 hashlib

166 Gzip

167 ctypes

168 Creating a Windows Service

169 Mutable vs. Immutable

170 configparser

171 Common Exceptions

172 Optical Character Recognition OCR

173 Python Data Types

174 Partial functions

175 Generating graphs

176 Unzipping Files

177 Functional Programming

178 Python Virtual Environment - virtualenv

179 sys module

180 virtual environment with virtualenvwrapper

181 virtualenvwrapper on Windows

182 Python Requests Post

183 Plotting with Matplotlib

184 Python Lex-Yacc

185 pyaudio

186 shelve

187 pip and PyPI Package Manager

188 Writing to CSV from String or List

189 Raise Custom Errors Exceptions

190 Using loops within functions

191 Contributors