Remove everything after regex pattern match but keep pattern

1886 views python
4

I was searching for a way to remove all characters past a certain pattern match. I know that there are many similar questions here on SO but i was unable to find one that works for me. Basically i have a fixed pattern (\w\w\d\d\d\d), and i want to remove everything after that, but keep the pattern.

ive tried using:

test = 'PP1909dfgdfgd'
done = re.sub ('(\w\w\d\d\d\d/w*)', '\w\w\d\d\d\d/', test)

but still get the same string ..

example:

dirty = 'AA1001dirtydata'
dirty2 = 'AA1001222%^&*'

Desired output:

clean = 'AA1001'

answered question

What do you want to happen to text that appears before that pattern?

There is no slash in your test string, but your regex looks for an expression with a slash in it.

Also want to delete=)

Oh right! i accidentally put a slash..

Notice also in the duplicate how the second argument to re.sub is not (and could not be) a regex.

1 Answer

3

You can use re.match() instead of re.sub():

re.match('\w\w\d\d\d\d', dirty).group(0)  # returns 'AA1001'

Note: match will look for the regular expression at the beginning of the string you provide and only "match" the characters corresponding to the pattern. If you want to find the pattern partway through the string you can use re.search().

posted this

Have an answer?

JD

Please login first before posting an answer.

Ads

Categories