Regex to extract src from img tag in nodejs

2389 views javascript regex node.js
0

I have following string in my Nodejs app

"<div><img src=\"https://sg-test-11.slatic.net/shop/8c30af81cb50dd2d7fc2dd89adc78e3d.jpeg\"/>
<img src=\"https://sg-test-11.slatic.net/shop/4f0a4c0f8927212a6e842aecac428024.jpeg\"/>
<img src=\"https://sg-test-11.slatic.net/shop/fbdd76097cfb33b7341fb5a44ca8f965.jpeg\"/>
<img src=\"https://sg-test-11.slatic.net/shop/0fe741d61cc8e2d5c60ab8b18aaeee52.jpeg\"/>
<img src=\"https://sg-test-11.slatic.net/shop/4d5981db97464b078c7327b8749af301.jpeg\"/>
<img src=\"https://sg-test-11.slatic.net/shop/bf57342de95551a19aa56e0a79fcf9d6.jpeg\"/>
<img src=\"https://sg-test-11.slatic.net/shop/9fceb9e27b88db562ad571b4b1497bf7.jpeg\"/>
<img src=\"https://sg-test-11.slatic.net/shop/bdd8afd32d979c3616e7b89f14aa8f0c.jpeg\"/>
<img src=\"https://sg-test-11.slatic.net/shop/eb4f0524b37cf725cf273184c64cdd0f.jpeg\"/>
<img src=\"https://sg-test-11.slatic.net/shop/08fa38e2897c569c9cecc6b72c0bc719.jpeg\"/>
<img src=\"https://sg-test-11.slatic.net/shop/f04b5b970ba678a8ed23f26790093287.jpeg\"/><img src=\"https://sg-test-11.slatic.net/shop/5c
3ae56a18f9b11a20fdcdfdeb842dbc.jpeg\"/><img src=\"https://sg-test-11.slatic.net/shop/f02d18c9490e5e
db2ca2f4f73973d7cc.jpeg\"/><img src=\"https://sg-test-11.slatic.net/shop/7cd4843c47f1dd49a767
6edfbb1731d3.jpeg\"/><img src=\"https://sg-test-11.slatic.net/shop/88da4712b4723b3364ad6c8ebff3
5bfe.jpeg\"/><img src=\"https://sg-test-11.slatic.net/shop/d2434f63946ba19a656c320b7402e12e.jpeg
\"/><img src=\"https://sg-test-11.slatic.net/shop/080e8825b4498483505fe58484360d8c.jpeg\"/>
<img src=\"https://sg-test-11.slatic.net/shop/daa2e16c192def5c77f386c5c073c533.jpeg\"/></div>"

Does any one know how can I extract all the img src attribute using regex. I can't use document.createElement in Nodejs so need to use regex

I tried something like this

var regex = /<div><img.*?src='(.*?)'/;

but not able to get it to work?

answered question

Is it always going to be in exactly that format or will they sometimes be nested?

regex looks for src='(.*?)' .... but in html you have src="......" ... see the problem now?

@bluejayke - you can't nest images ... <img> doesn't "take" children

@jaromanda i mean if the images are nested in other elements besides the one div

1 Answer

7

A very quick and dirty approach that night be useful in this situation

var urls=
str
.split("<img src=")
.map(x=>x.split("/>"))
.flat()
.filter(x=>x.substring(0,5)=='"http')

posted this

Have an answer?

JD

Please login first before posting an answer.

Ads

Categories