My working database is from a webforum. In it, there's a table containing all the data of posts (I.e. the text a user submitted within a thread). These posts contain a column called
message which is the actual content of the post. A post can contain any character, as well as smilies. Smilies are indicated by a colon, immediately followed by a short description of variable length and again a colon. I.e.
:clap:. A single post can contain multiple smilies.
I am trying to come up with a way to pull out a list of all smilies within the posts table.
What I have working so far is a query that pulls a list of posts containing at least two colons:
SELECT thread_id , post_id , SUBSTRING_INDEX(SUBSTRING_INDEX(message, ':', 2), ':', -1) FROM xf_post WHERE ROUND((CHAR_LENGTH(message) - CHAR_LENGTH(REPLACE(message, ':', ""))) / CHAR_LENGTH(':')) > 1 LIMIT 50
This works, but will also return messages where a user for whatever reason included multiple colons, like for instance
random : text followed : by more text, or a timestamp:
What I'm hoping to achieve is to return all occurrences of alphanumeric characters enclosed between colons, without any spaces. (Yes, this will remove all smilies that are purely numeric, but
I fiddled with
REGEXP, and came up with the following:
[:][a-zA-Z]+(?=:)[:] which according to regex101 yields exactly what I want.
How can I use this to capture the output, and only see the values between the semi-colons, and preferably in such a way it would show all occurrences of a smilie within a single post?