MySQL Pattern matching to find specific text

1142 views mysql
0

My working database is from a webforum. In it, there's a table containing all the data of posts (I.e. the text a user submitted within a thread). These posts contain a column called message which is the actual content of the post. A post can contain any character, as well as smilies. Smilies are indicated by a colon, immediately followed by a short description of variable length and again a colon. I.e. :clap:. A single post can contain multiple smilies.

I am trying to come up with a way to pull out a list of all smilies within the posts table.

What I have working so far is a query that pulls a list of posts containing at least two colons:

SELECT 
    thread_id
    , post_id
    , SUBSTRING_INDEX(SUBSTRING_INDEX(message, ':', 2), ':', -1) 
FROM 
    xf_post
WHERE 
    ROUND((CHAR_LENGTH(message) - CHAR_LENGTH(REPLACE(message, ':', ""))) / CHAR_LENGTH(':')) > 1
LIMIT 50

This works, but will also return messages where a user for whatever reason included multiple colons, like for instance random : text followed : by more text, or a timestamp: 00:00:12345.

What I'm hoping to achieve is to return all occurrences of alphanumeric characters enclosed between colons, without any spaces. (Yes, this will remove all smilies that are purely numeric, but ¯\_(?)_/¯).

I fiddled with REGEXP, and came up with the following: [:][a-zA-Z]+(?=:)[:] which according to regex101 yields exactly what I want.

How can I use this to capture the output, and only see the values between the semi-colons, and preferably in such a way it would show all occurrences of a smilie within a single post?

Thank you.

answered question

maybe you can take a look to REGEXP_SUBSTR() dev.mysql.com/doc/refman/8.0/en/…

@SimonlucaLandi - Thanks. That solvs part of my issue. I can use REGEXP_SUBSTR(message, '[:][a-zA-Z]+(?=:)[:]') to show the smiliecode.

1 Answer

2

@SimonlucaLandi helped me at least figure out the way to display the results. My final query:

SELECT 
    thread_id
    , post_id
    , REGEXP_SUBSTR(message, '[:][a-zA-Z]+(?=:)[:]')
FROM 
    xf_post
WHERE 
    message REGEXP '[:][a-zA-Z]+(?=:)[:]'
LIMIT 50

posted this

Have an answer?

JD

Please login first before posting an answer.