Struggling with regex and regex AI isn't helping.
What I need to get back is:
Group 1 - All text to before the 3rd numeric
Group 2 - All text from the 3rd numeric to before the 5th numeric
Group 3 - All text from the 5th numeric to before the 7th numeric
Group 4 - All text from the 7th numeric to before the 11th numeric
Group 5 - All text from the 11th numeric to before the 13th numeric
Group 6 - All text from the 13th numeric to the end of the string
The tricky bit is there may be HTML code anywhere between the numbers.
The below PHP is kiiiinda working:
<?php
$gsPattern = '/([^\d]+)(\d{2})(.*?)(\d{2})(.*?)(\d{2})(.*?)(\d{4})(.*?)(\d{2})(.*?)(\d{2})(.*)/i';
$gsReplacement = '${1}${2}-${3}${4}-${5}${6}-${7}${8}-${9}${10}-${11}${12}${13}';
$gsRowCode = "<td>160301<b style=\"color: red;\">1234</b>2525</td>";
echo preg_replace($gsPattern, $gsReplacement, $gsRowCode);
echo "<br>";
$gsRowCode = "<td>16030<b style=\"color: red;\">1123</b>42525</td>";
echo preg_replace($gsPattern, $gsReplacement, $gsRowCode);
The following hopefully shows the output I'm after:
Output<br>
1.<br>
16-03-01-<b style="color: red;">1234-</b>25-25 <br>
<br>
2.<br>
16030<b style="color: red;">1123</b>42525 <br>
<br>
Would like it to output as: <br>
16-03-0<b style="color: red;">1-123</b>4-25-25 <br>
Struggling with regex and regex AI isn't helping.
What I need to get back is:
Group 1 - All text to before the 3rd numeric
Group 2 - All text from the 3rd numeric to before the 5th numeric
Group 3 - All text from the 5th numeric to before the 7th numeric
Group 4 - All text from the 7th numeric to before the 11th numeric
Group 5 - All text from the 11th numeric to before the 13th numeric
Group 6 - All text from the 13th numeric to the end of the string
The tricky bit is there may be HTML code anywhere between the numbers.
The below PHP is kiiiinda working:
<?php
$gsPattern = '/([^\d]+)(\d{2})(.*?)(\d{2})(.*?)(\d{2})(.*?)(\d{4})(.*?)(\d{2})(.*?)(\d{2})(.*)/i';
$gsReplacement = '${1}${2}-${3}${4}-${5}${6}-${7}${8}-${9}${10}-${11}${12}${13}';
$gsRowCode = "<td>160301<b style=\"color: red;\">1234</b>2525</td>";
echo preg_replace($gsPattern, $gsReplacement, $gsRowCode);
echo "<br>";
$gsRowCode = "<td>16030<b style=\"color: red;\">1123</b>42525</td>";
echo preg_replace($gsPattern, $gsReplacement, $gsRowCode);
The following hopefully shows the output I'm after:
Output<br>
1.<br>
16-03-01-<b style="color: red;">1234-</b>25-25 <br>
<br>
2.<br>
16030<b style="color: red;">1123</b>42525 <br>
<br>
Would like it to output as: <br>
16-03-0<b style="color: red;">1-123</b>4-25-25 <br>
Share
Improve this question
edited Mar 21 at 3:41
Tangentially Perpendicular
5,3994 gold badges14 silver badges33 bronze badges
asked Mar 21 at 3:27
AnrikAnrik
594 bronze badges
5
|
3 Answers
Reset to default 2APPROACH:
To help with clarity: I captured every digit separately into a named capture group (
(?P<n1>...)
,(?P<n3>...)
,(?P<n3>...)
, etc.).And, I captured every non-digit string before the first digit (
<?P<string1_beginning>...)
), between digits ((?P<string2>...)
,(?P<string3>...)
,(?P<string4>...)
, etc.) and after the last digit ((?P<string15_end>...)
) into named capture groups.The numbers in the names of the non-digit-string capture groups
P<string_n
) and digit capture groups (P<n_n>
) match.In the replacement string, I build the desired outcome string placing the digit capture groups and string capture groups in to produce the desired outcome. For example, when the named capture group
${string6}
returns<b style=\"color: red;\">
, the group${string_7}
returns[EMPTY]
, and vice versa, resulting in the desired outcome.I added a capture group between each digit. This creates flexibility and allows me to select the possible string-group numbers for the replacement string based on the digit location, as you can see in the suggested replacement string below.
There will be an issue if there are other digits between the digits that we are looking to capture, for example the color is in rgb-format including numbers. This, however, that was not part of the scope of the question.
REGEX PATTERN AND REPLACEMENT STRING(PRCE2 flavor):
$gsPattern = '^(?P<string1_beginning>[^\d]*)(?P<n1>\d)(?P<string2>[^\d]*)(?P<n2>\d)(?P<string3>[^\d]*)(?P<n3>\d)(?P<string4>[^\d]*)(?P<n4>\d)(?P<string5>[^\d]*)(?P<n5>\d)(?P<string6>[^\d]*)(?P<n6>\d)(?P<string7>[^\d]*)(?P<n7>\d)(?P<string8>[^\d]*)(?P<n8>\d)(?P<string9>[^\d]*)(?P<n9>\d)(?P<string10>[^\d]*)(?P<n10>\d)(?P<string11>[^\d]*)(?P<n11>\d)(?P<string12>[^\d]*)(?P<n12>\d)(?P<string13>[^\d]*)(?P<n13>\d)(?P<string14>[^\d]*)(?P<n14>\d)(?P<string15_end>[^\d]*$)'
$gsReplacement = '<br>${n1}${n2}-${n3}${n4}-${n5}${string6}${string7}${n6}-${n7}${n8}${n9}${string10}${string11}${n10}-${n11}${n12}-${n13}${n14} <br>'
Regex Demo: https://regex101/r/EmP12l/4
INPUT:
1: $gsRowCode = "<td>160301<b style=\"color: red;\">1234</b>2525</td>";
2: $gsRowCode = "<td>16030<b style=\"color: red;\">1123</b>42525</td>";
OUTPUT:
1: <br>16-03-0<b style=\"color: red;\">1-123</b>4-25-25 <br>
2: <br>16-03-0<b style=\"color: red;\">1-123</b>4-25-25 <br>
DESIRED OUTPUT:
<br>16-03-0<b style="color: red;">1-123</b>4-25-25 <br>
REGEX NOTES:
^
Match beginning of the string.(?P<string1_beginning>[^\d]*)
Begin Named capture group(?P<name>...)
. Negated character class[^...]
. Matches any character that is NOT a digit\d
0 or more times (*
). In the replacement string, the string captured in this group would be retrieved with using$<string1_beginning>
.(?P<n1>\d)
Named capture group(?P<name>...)
. Matches one digit\d
. In the replacement string, the string captured in this group is retrieved with using$<n1>
.- The same pattern repeats for to capture a total of 14 digits, and 15 non-digit strings.
$
Matches end of string.
Basically its just getting finer granularity digit capture around the color tag to shift the digits over by 1 place.
This is a ECMAScript example. It could be further shrunk if using Pcre style eng.
^.*?(\d{2}).*?(\d{2}).*?(?=\d*(<b[^>]*?color[^>]*?>)\d{4}(</b>))(?:(\d)\3(\d)(\d{3})\4(\d)|(\d)(\d)\3(\d{3})(\d)\4).*?(\d{2}).*?(\d{2}).*
Replace $1-$2-$5$9$3$6$10-$7$11$4$8$12-$13-$14
https://regex101/r/mJuvUJ/1
Below seems to work ok:
$gsPattern = '/^(.*?\d.*?\d)(.*?\d.*?\d)(.*?\d.*?\d)(.*?\d.*?\d.*?\d.*?\d)(.*?\d.*?\d)(.*)/i';
$gsReplacement = '${1}-${2}-${3}-${4}-${5}-${6}';
numeric
andnumbers
. – jhnc Commented Mar 21 at 5:20<table>
. 2. The data in the<table>
is fundamentally flawed or is being misinterpreted. 3. If #2 is incorrect then you have a HTML<table>
and you need the highlighted (eg.<b>
) part of each cell (eg.<td>
) to shift to the left by a single character. If #1 and #3 are correct there's a far better way of dealing with your problem using JavaScript and approaching the source as DOM. If #1 and #2 are correct...? – zer00ne Commented Mar 23 at 0:21