Find a rule to verify the legitimacy of url. Many loopholes have been found on the Internet.

find a rule to verify the validity of url. Many loopholes have been found on the Internet, such as the following:

function IsURL(str_url){
    var strRegex = "^((https|http|ftp|rtsp|mms)?://)"
            +"?(([0-9a-z_!~*().&=+$%-]+: )?[0-9a-z_!~*().&=+$%-]+@)?" //ftpuser@
            + "(([0-9]{1,3}.){3}[0-9]{1,3}" // IPURL- 199.194.52.184
            + "|" // IPDOMAIN
            + "([0-9a-z_!~*()-]+.)*" // - www.
            + "([0-9a-z][0-9a-z-]{0,61})?[0-9a-z]." // 
            + "[a-z]{2,6})" // first level domain- .com or .museum
            + "(:[0-9]{1,4})?" // - :80
            + "((/?)|" // a slash isn"t required if there is no file name
            + "(/[0-9a-z_!~*().;?:@&=+$,%-sharp-]+)+/?)$";
    var re=new RegExp(strRegex);
    if (re.test(str_url)){
        return (true);
    }else{
        return (false);
    }
}
IsURL("sfas") //true
IsURL("http://www.baidu.com") //true

look at this again:

var regex =/^http(s)?:\/\/([\w-]+\.)+[\w-]+(\/[\w- .\/?%&=]*)?$/i;
regex.test("http://www.sina/") //true

there are problems with all of the above, ask the master to give a working one?


I'm sorry to tell you that regular expression detection url legitimacy regular detection is much more complicated than you think. My advice is to limit the url to a range, and then use the get request to check the existence of the url based on the returned http status code to see if it meets your requirements.
Let me explain to you why I give you this advice:
I won't talk about the components of url. To detect url, you need to test:

1. Protocol (the protocol cannot determine whether the url is legal, unless the url is forced to have a protocol), but the number of agreements is very small, and it is easy to use rules to make judgments with an index of 10 hands.
2. Domain name: to check whether the domain name is legal, the detection part is divided into domain name prefix, domain name and suffix (extension). Because the current domain name can contain Chinese, Japanese, Arabic (and maybe I don't know), and the suffix is the same.
for example, such a domain name is not accessible at present, but it is legal. For example, [xxxx. China] is also legal. As far as we know, there are as many as 882 domain suffixes. It contains the languages contained in the domain name I said.
3. Port, user name, password. In fact, a url with a user name, password, and port number may also be legal. For example, a form of url such as www.baidu.com:80, or http://username:password@www.example.com/path/index.html
is also legal. The website URL you see now does not have a user name and password because the browser sends requests to the server as anonymous users, and there are no restrictions on the server. But url with a user name and password is legal.
3.path section. The resource path of the url, and a fragment that may carry query parameters or the-sharp tag. For example: www.baidu.com:80/path/233.html?query=xxxx&query2=xxxx-sharpsome
this kind of url is also legal.
4. Punctuation in a parameter query. Www.example.com?qyuer=lily's blog is legal, but url is encoded and spaces are escaped into hexadecimal numbers with%. http://www.example.com/?qyuer.
these two url are the same.

therefore, it is impossible for you to find a regular expression to check the legitimacy of url on the Internet.
if you are interested, you can follow the above ideas to implement one yourself.

judge the length of a domain name. The maximum length of a domain name is: 255 bytes (including punctuation marks). The maximum length of the domain name is 63 bytes, for example, the maximum length of the example.com example section is 63 bytes. The domain name suffix is up to 13 bytes long (currently). If you still want to judge the language, you can judge the length of the domain name suffix that does not understand the language, such as the domain name suffix: ([a 4e00 z] {2jue 13} | [x {4e00}-x {9fa5}] {2jue 3}). The user name and password in
url can be limited to a certain length. Then the protocol determines whether there is an agreement within the limited scope, if not, it can be defaulted to the http protocol.
generally, url is judged to be legal only to prevent malicious input. If you want to know whether the url is the best way to access it properly or to send a get request to make a judgment based on the http status code. That's what reptiles do. However, you may encounter situations where domestic servers cannot be accessed by foreign url due to firewalls, so it is recommended to use US servers or Hong Kong servers to detect url.

about the number of domain name extensions, you can here: https://www.key-systems.net/e.

but if you reach out to the party, forget it.


Fish : https://www.npmjs.com/package.
fishing : take a look at this. I think if there is a problem you want to solve, search npmjs first, which may give you the results you want.


reg = /^((https|http|ftp|rtsp|mms):\/\/)(([0-9a-z_!~*'().&=+$%-]+: )?[0-9a-z_!~*'().&=+$%-]+@)?(([0-9]{1,3}.){3}[0-9]{1,3}|([0-9a-z_!~*'()-]+.)*([0-9a-z][0-9a-z-]{0,61})?[0-9a-z].[a-z]{2,6})(:[0-9]{1,4})?((\/?)|(\/[0-9a-z_!~*'().;?:@&=+$,%-sharp-]+)+\/?)$/;

it is possible to change the regularity in the above function in this way. Url must have a protocol to pass

.

looked at the regularity and there was an obvious mistake:

+ '([0-9a-z][0-9a-z-]{0,61})?[0-9a-z].' // 
//:
+ '([0-9a-z][0-9a-z-]{0,61})?[0-9a-z]\.' // 

there should be a dot here, but the point in the rule is., not. .

Note:. Matches any single character except the newline character n. Match. Oh, please use it. .
I won't say much more about

. Legal url basically has no restrictions, except to know that at least:
is basically any character + dot + basically any character + whatever you write, it may all be the parameter

.

for example, what the landlord wrote did not take into account the situation of the Chinese domain name.

in addition, the npm package in the kumfo knot upstairs uses nodejs's url package, so it cannot be used in the browser.


can tell you responsibly that you can't find a rule that meets your requirements.
such as www.sina do you say it is legal or illegal? You want to say it is illegal because the domain name suffix is wrong. The number of domain name suffixes is so large that if you want to add all legitimate domain name suffixes, this rule will be super long! cn,com, io, org,us,ca,me,ink, xyz,in,pro,studio,news, app. all kinds of suffixes emerge one after another, all of which can't be done, and are updated every day. Unless you only care about the specified suffixes, you can tell whether www.sina is legal. How many suffixes are there? just take a look at this website

.
Menu