regex iterator Example

suggest change

When processing of captures has to be done iteratively a regex_iterator is a good choice. Dereferencing a regex_iterator returns a match_result. This is great for conditional captures or captures which have interdependence. Let’s say that we want to tokenize some C++ code. Given:

enum TOKENS {
    NUMBER,
    ADDITION,
    SUBTRACTION,
    MULTIPLICATION,
    DIVISION,
    EQUALITY,
    OPEN_PARENTHESIS,
    CLOSE_PARENTHESIS
};

We can tokenize this string: const auto input = "42/2 + -8\t=\n(2 + 2) * 2 * 2 -3"s with a regex_iterator like this:

vector<TOKENS> tokens;
const regex re{ "\\s*(\\(?)\\s*(-?\\s*\\d+)\\s*(\\)?)\\s*(?:(\\+)|(-)|(\\*)|(/)|(=))" };

for_each(sregex_iterator(cbegin(input), cend(input), re), sregex_iterator(), [&](const auto& i) {
    if(i[1].length() > 0) {
        tokens.push_back(OPEN_PARENTHESIS);
    }
    
    tokens.push_back(i[2].str().front() == '-' ? NEGATIVE_NUMBER : NON_NEGATIVE_NUMBER);
    
    if(i[3].length() > 0) {
        tokens.push_back(CLOSE_PARENTHESIS);
    }        
    
    auto it = next(cbegin(i), 4);
    
    for(int result = ADDITION; it != cend(i); ++result, ++it) {
        if (it->length() > 0U) {
            tokens.push_back(static_cast<TOKENS>(result));
            break;
        }
    }
});

match_results<string::const_reverse_iterator> sm;

if(regex_search(crbegin(input), crend(input), sm, regex{ tokens.back() == SUBTRACTION ? "^\\s*\\d+\\s*-\\s*(-?)" : "^\\s*\\d+\\s*(-?)" })) {
    tokens.push_back(sm[1].length() == 0 ? NON_NEGATIVE_NUMBER : NEGATIVE_NUMBER);
}

Live Example

A notable gotcha with regex iterators is that the regex argument must be an L-value, an R-value will not work: http://stackoverflow.com/q/29895747/2642059

Feedback about page:

Feedback:
Optional: your email if you want me to get back to you:


Regular expressions:
* regex iterator Example

Table Of Contents
8 Arrays
11 Loops
39 Streams
51 Unions
56 Lambdas
60 SFINAE
62 RAII
67 Sorting
68 Regular expressions
84 RTTI
87 Scopes
104 Profiling
107 Recursion
117 Iteration
125 Alignment
134 Semaphore
136 Debugging
139 Mutexes
142 decltype