What is the problem of java string replacement?

The

requirement is to replace the sensitive words in the HTML document. But I just want to replace plain text, and some html tags contain sensitive words, which I don"t want to replace. For example, the src of the img tag contains sensitive words, and if you replace the sensitive words in src, the picture will not be displayed.

for example,

it"s a nice day today. Abc

I want to replace abc with *

.

then replace it with

it"s a nice day today *

if I simply use Java"s String.replaceAll ("abc", https://api-v5.segmentfault.com/question/"*")

then you"ll get

it"s a nice day today *



, so the picture can"t be loaded, which is not what I want.

I"ve been thinking about it for a long time, and I"ve also thought about regularity. do you guys have any good ideas?

Jun.14,2021

Why do I have to replace regular at the page level? Isn't it good to replace it at the time of storage? Or is it not good to replace it when you take it out and render it to the page template? Why bother to replace in html code?


the simpler thing is what the above boss said. If you don't want to do this, just intercept the string in front of


.

two methods:

one: regular

(?!<[^>]*)[a][b][c](?![^<]*>)

:

two: Jsoup parses HTML, write selector traverses judgment

I prefer to use the second method, because the regularity is very difficult to maintain. If you need to replace anything else in the future, you have to study it again

.
Menu