mirror of
https://github.com/crystalidea/qt6windows7.git
synced 2025-01-23 12:24:31 +08:00
176 lines
7.7 KiB
Plaintext
176 lines
7.7 KiB
Plaintext
|
// Copyright (C) 2022 Giuseppe D'Angelo <dangelog@gmail.com>.
|
||
|
// Copyright (C) 2022 Klarälvdalens Datakonsult AB, a KDAB Group company, info@kdab.com, author Giuseppe D'Angelo <giuseppe.dangelo@kdab.com>
|
||
|
// Copyright (C) 2022 The Qt Company Ltd.
|
||
|
// SPDX-License-Identifier: LicenseRef-Qt-Commercial OR GFDL-1.3-no-invariants-only
|
||
|
|
||
|
//! [porting-to-qregularexpression]
|
||
|
|
||
|
The QRegularExpression class introduced in Qt 5 implements Perl-compatible
|
||
|
regular expressions and is a big improvement upon QRegExp in terms of APIs
|
||
|
offered, supported pattern syntax, and speed of execution. The biggest
|
||
|
difference is that QRegularExpression simply holds a regular expression,
|
||
|
and it's \e{not} modified when a match is requested. Instead, a
|
||
|
QRegularExpressionMatch object is returned, to check the result of a match
|
||
|
and extract the captured substring. The same applies to global matching and
|
||
|
QRegularExpressionMatchIterator.
|
||
|
|
||
|
Other differences are outlined below.
|
||
|
|
||
|
\note QRegularExpression does not support all the features available in
|
||
|
Perl-compatible regular expressions. The most notable one is the fact that
|
||
|
duplicated names for capturing groups are not supported, and using them can
|
||
|
lead to undefined behavior. This may change in a future version of Qt.
|
||
|
|
||
|
\section3 Different pattern syntax
|
||
|
|
||
|
Porting a regular expression from QRegExp to QRegularExpression may require
|
||
|
changes to the pattern itself.
|
||
|
|
||
|
In specific scenarios, QRegExp was too lenient and accepted patterns that
|
||
|
are simply invalid when using QRegularExpression. These are easy to detect,
|
||
|
because the QRegularExpression objects built with these patterns are not
|
||
|
valid (see QRegularExpression::isValid()).
|
||
|
|
||
|
In other cases, a pattern ported from QRegExp to QRegularExpression may
|
||
|
silently change semantics. Therefore, it is necessary to review the
|
||
|
patterns used. The most notable cases of silent incompatibility are:
|
||
|
|
||
|
\list
|
||
|
|
||
|
\li Curly braces are needed to use a hexadecimal escape like \c{\xHHHH}
|
||
|
with more than 2 digits. A pattern like \c{\x2022} needs to be ported
|
||
|
to \c{\x{2022}}, or it will match a space (\c{0x20}) followed by the
|
||
|
string \c{"22"}. In general, it is highly recommended to always use
|
||
|
curly braces with the \c{\x} escape, no matter the number of digits
|
||
|
specified.
|
||
|
|
||
|
\li A 0-to-n quantification like \c{{,n}} needs to be ported to \c{{0,n}}
|
||
|
to preserve semantics. Otherwise, a pattern such as \c{\d{,3}} would
|
||
|
match a digit followed by the exact string \c{"{,3}"}.
|
||
|
|
||
|
\li QRegExp by default does Unicode-aware matching, while
|
||
|
QRegularExpression requires a separate option; see below for more
|
||
|
details.
|
||
|
|
||
|
\li c{.} in QRegExp does by default match all characters, including the
|
||
|
newline character. QRegularExpression excludes the newline character
|
||
|
by default. To include the newline character, set the
|
||
|
QRegularExpression::DotMatchesEverythingOption pattern option.
|
||
|
|
||
|
\endlist
|
||
|
|
||
|
For an overview of the regular expression syntax supported by
|
||
|
QRegularExpression, please refer to the
|
||
|
\l{https://pcre.org/original/doc/html/pcrepattern.html}{pcrepattern(3)}
|
||
|
man page, describing the pattern syntax supported by PCRE (the reference
|
||
|
implementation of Perl-compatible regular expressions).
|
||
|
|
||
|
\section3 Porting from QRegExp::exactMatch()
|
||
|
|
||
|
QRegExp::exactMatch() served two purposes: it exactly matched a regular
|
||
|
expression against a subject string, and it implemented partial matching.
|
||
|
|
||
|
\section4 Porting from QRegExp's Exact Matching
|
||
|
|
||
|
Exact matching indicates whether the regular expression matches the entire
|
||
|
subject string. For example, the classes yield on the subject string \c{"abc123"}:
|
||
|
|
||
|
\table
|
||
|
\header \li \li QRegExp::exactMatch() \li QRegularExpressionMatch::hasMatch()
|
||
|
\row \li \c{"\\d+"} \li \b false \li \b true
|
||
|
\row \li \c{"[a-z]+\\d+"} \li \b true \li \b true
|
||
|
\endtable
|
||
|
|
||
|
Exact matching is not reflected in QRegularExpression. If you want
|
||
|
to be sure that the subject string matches the regular expression
|
||
|
exactly, you can wrap the pattern using the QRegularExpression::anchoredPattern()
|
||
|
function:
|
||
|
|
||
|
\snippet code/doc_src_port_from_qregexp.cpp 0
|
||
|
|
||
|
\section4 Porting from QRegExp's Partial Matching
|
||
|
|
||
|
When using QRegExp::exactMatch(), if an exact match was not found, one
|
||
|
could still find out how much of the subject string was matched by the
|
||
|
regular expression by calling QRegExp::matchedLength(). If the returned length
|
||
|
was equal to the subject string's length, then one could conclude that a partial
|
||
|
match was found.
|
||
|
|
||
|
QRegularExpression supports partial matching explicitly by means of the
|
||
|
appropriate QRegularExpression::MatchType.
|
||
|
|
||
|
\section3 Global matching
|
||
|
|
||
|
Due to limitations of the QRegExp API, it was impossible to implement global
|
||
|
matching correctly (that is, like Perl does). In particular, patterns that
|
||
|
can match 0 characters (like \c{"a*"}) are problematic.
|
||
|
|
||
|
QRegularExpression::globalMatch() implements Perl global match correctly, and
|
||
|
the returned iterator can be used to examine each result.
|
||
|
|
||
|
For example, if you have code like:
|
||
|
|
||
|
\snippet code/doc_src_port_from_qregexp.cpp 1
|
||
|
|
||
|
You can rewrite it as:
|
||
|
|
||
|
\snippet code/doc_src_port_from_qregexp.cpp 2
|
||
|
|
||
|
\section3 Unicode properties support
|
||
|
|
||
|
When using QRegExp, character classes such as \c{\w}, \c{\d}, etc. match
|
||
|
characters with the corresponding Unicode property: for instance, \c{\d}
|
||
|
matches any character with the Unicode \c{Nd} (decimal digit) property.
|
||
|
|
||
|
Those character classes only match ASCII characters by default when using
|
||
|
QRegularExpression: for instance, \c{\d} matches exactly a character in the
|
||
|
\c{0-9} ASCII range. It is possible to change this behavior by using the
|
||
|
QRegularExpression::UseUnicodePropertiesOption pattern option.
|
||
|
|
||
|
\section3 Wildcard matching
|
||
|
|
||
|
There is no direct way to do wildcard matching in QRegularExpression.
|
||
|
However, the QRegularExpression::wildcardToRegularExpression() method
|
||
|
is provided to translate glob patterns into a Perl-compatible regular
|
||
|
expression that can be used for that purpose.
|
||
|
|
||
|
For example, if you have code like:
|
||
|
|
||
|
\snippet code/doc_src_port_from_qregexp.cpp 3
|
||
|
|
||
|
You can rewrite it as:
|
||
|
|
||
|
\snippet code/doc_src_port_from_qregexp.cpp 4
|
||
|
|
||
|
Please note though that some shell-like wildcard patterns might not be
|
||
|
translated to what you expect. The following example code will silently
|
||
|
break if simply converted using the above-mentioned function:
|
||
|
|
||
|
\snippet code/doc_src_port_from_qregexp.cpp 5
|
||
|
|
||
|
This is because, by default, the regular expression returned by
|
||
|
QRegularExpression::wildcardToRegularExpression() is fully anchored.
|
||
|
To get a regular expression that is not anchored, pass
|
||
|
QRegularExpression::UnanchoredWildcardConversion as the conversion
|
||
|
options:
|
||
|
|
||
|
\snippet code/doc_src_port_from_qregexp.cpp 6
|
||
|
|
||
|
\section3 Minimal matching
|
||
|
|
||
|
QRegExp::setMinimal() implemented minimal matching by simply reversing the
|
||
|
greediness of the quantifiers (QRegExp did not support lazy quantifiers,
|
||
|
like \c{*?}, \c{+?}, etc.). QRegularExpression instead does support greedy,
|
||
|
lazy, and possessive quantifiers. The QRegularExpression::InvertedGreedinessOption
|
||
|
pattern option can be useful to emulate the effects of QRegExp::setMinimal():
|
||
|
if enabled, it inverts the greediness of quantifiers (greedy ones become
|
||
|
lazy and vice versa).
|
||
|
|
||
|
\section3 Caret modes
|
||
|
|
||
|
The QRegularExpression::AnchorAtOffsetMatchOption match option can be used to
|
||
|
emulate the QRegExp::CaretAtOffset behavior. There is no equivalent for the
|
||
|
other QRegExp::CaretMode modes.
|
||
|
|
||
|
//! [porting-to-qregularexpression]
|