Match an expression only in specific locations

suggest change

Often you want to match an expression only in specific places (leaving them untouched in others, that is). Consider the following sentence:

An apple a day keeps the doctor away (I eat an apple everyday).

Here the “apple” occurs twice which can be solved with so called backtracking control verbs which are supported by the newer regex module. The idea is:

forget_this | or this | and this as well | (but keep this)

With our apple example, this would be:

import regex as re
string = "An apple a day keeps the doctor away (I eat an apple everyday)."
rx = re.compile(r'''
    \([^()]*\) (*SKIP)(*FAIL)  # match anything in parentheses and "throw it away"
    |                          # or
    apple                      # match an apple
    ''', re.VERBOSE)
apples = rx.findall(string)
print(apples)
# only one

This matches “apple” only when it can be found outside of the parentheses.

Here’s how it works:

Feedback about page:

Feedback:
Optional: your email if you want me to get back to you:


Regular Expressions:
* Flags
* Match an expression only in specific locations

Table Of Contents
2 Filter
3 List
7 Loops
22 Reduce
27 Classes
31 Set
37 Regular Expressions
42 Tuple
45 Enum
62 Sockets
89 urllib
92 Idioms
104 Stack
105 Profiling
109 Logging
111 os module
118 Mixins
120 ArcPy
126 Arrays
132 2to3 tool
135 Unicode
138 Neo4j
140 Curses
141 Templates
145 heapq
146 tkinter
154 Audio
155 pyglet
157 ijson
160 Flask
161 Groupby
163 pygame
165 hashlib
166 Gzip
167 ctypes
185 pyaudio
186 shelve