Using regex for two delimiters to extract strings

1194 views python
1

The question How to delete the words between two delimiters? was really helpful to me.

So I have a string like this:

string = $blabla$blav:1085$350_X[7:0]

I am trying to remove whatever is inside the '[]' to make the whole thing be $blabla$blav:1085$350_X.

I tried all these:

re.sub('[[^]]+]', '', string)
re.sub(r'[.+?]', '', string)
re.sub('[.*?]', '', string)

Any method to do it with regex involving one step please.

Also, I need to capture that string[7:0] for later use.

answered question

please check my updated answer.

3 Answers

9

You can use rsplit with maxsplit=1 to make sure it only split on last [,

string = "$blabla$blav:1085$350_X[7:0]"
s_string = string.rsplit('[', maxsplit=1)

left = s_string[0]
right = "[" + s_string[-1]
print(left)
print(right)


# output

$blabla$blav:1085$350_X
[7:0]

If you must use regex, then try positive lookahead to match last occurrence of [,

import re

string = "$blabla$blav:1085$350_X[7:0]"
regex = r'(^.*(?=\[))(.*)'
ss = re.match(regex, string)

left = ss.group(1)
right = ss.group(2)


print(left)
print(right)


# output

$blabla$blav:1085$350_X
[7:0]

posted this
7

string= '$blabla$blav:1085$350_X[7:0]'

cut_string = string.split('[')[0] # = '$blabla$blav:1085$350_X'

bracket_data = string.split('[')[1].replace(']', '') # = '7:0'

Dirty, but it just werks.

posted this
9

Try the regex \[([0-9+]\:[0-9])\]$. It matches [X:Y] where X and Y are numbers and the whole thing is at the end of a string. There is only one group in the regex that returns the two numbers X:Y without the [ and ]

Use this to replace the string:

import re
re.sub('\[([0-9+]\:[0-9])\]$', '', string)

You can use this \[([0-9+])\:([0-9])\]$ to match the two numbers in two groups.

numbersRegex = re.search('\[([0-9+])\:([0-9])\]$', string)
number1 = numbersRegex.group(1)
number2 = numbersRegex.group(2)
bothNumbers = numbersRegex.group(0)

It is important to use regex instead of just string indexes in case the numbers are two or more digits. Otherwise, it is fine to indices.

If the [X:Y] is not at the end of the string, just remove the $ from the regex.

You can use this website and paste the regex there. It provides explanation and a text field to test it.

posted this

Have an answer?

JD

Please login first before posting an answer.