Fastest way to detect external URLs

check if url is external php
navlink external url
which package is used to specify the external urls
javascript detect relative url
regex for url path
class=external
url regex
redirect to external url react-router

What's the fastest method to detect if foo='http://john.doe' is an external url (in comparsion to window.location.href)?

I know the regex version has already been accepted but I would bet this is "faster" than doing that complex of a regex. String.replace is quite fast.

var isExternal = function(url) {
    var domain = function(url) {
        return url.replace('http://','').replace('https://','').split('/')[0];
    };

    return domain(location.href) !== domain(url);
}
Update

I decided to do a little more research on this and found a faster method that uses a Regex.

var isExternalRegexClosure = (function(){
    var domainRe = /https?:\/\/((?:[\w\d-]+\.)+[\w\d]{2,})/i;

    return function(url) {
        function domain(url) {
          return domainRe.exec(url)[1];  
        }

        return domain(location.href) !== domain(url);
    }
})();

In IE this is slightly faster than the String.replace method. However in Chrome and Firefox it is about twice as fast. Also, defining the Regex only once inside the closure instead of just inside the function normally is about 30% faster in Firefox.

Here is a jsperf examining four different ways of determining an external hostname.

It is important to note that every method I've tried takes less than 1ms to run even on an old phone. So performance probably shouldn't be your primary consideration unless you are doing some large batch processing.

ServiceStack 4 Cookbook, What's the fastest method to detect if foo='http://john.doe' is an external url (in comparsion to window.location.href )?  With the W3C's free link checker, you enter a URL into the form field and get options for summary only, hide redirects for all or directories only, and check linked documents recursively within an

If you consider a URL being external if either the scheme, host or port is different, you could do something like this:

function isExternal(url) {
    var match = url.match(/^([^:\/?#]+:)?(?:\/\/([^\/?#]*))?([^?#]+)?(\?[^#]*)?(#.*)?/);
    if (typeof match[1] === "string" && match[1].length > 0 && match[1].toLowerCase() !== location.protocol) return true;
    if (typeof match[2] === "string" && match[2].length > 0 && match[2].replace(new RegExp(":("+{"http:":80,"https:":443}[location.protocol]+")?$"), "") !== location.host) return true;
    return false;
}

JMeter Cookbook, This appendix shows a work around when dealing with external services that won't allow call-back URLs to specify localhost as a domain. Testing external  However in Chrome and Firefox it is about twice as fast. Also, defining the Regex only once inside the closure instead of just inside the function normally is about 30% faster in Firefox. Here is a jsperf examining four different ways of determining an external hostname.

I've been using psuedosavant's method, but ran into a few cases where it triggered false positives, such as domain-less links ( /about, image.jpg ) and anchor links ( #about ). The old method would also give inaccurate results for different protocols ( http vs https ).

Here's my slightly modified version:

var checkDomain = function(url) {
  if ( url.indexOf('//') === 0 ) { url = location.protocol + url; }
  return url.toLowerCase().replace(/([a-z])?:\/\//,'$1').split('/')[0];
};

var isExternal = function(url) {
  return ( ( url.indexOf(':') > -1 || url.indexOf('//') > -1 ) && checkDomain(location.href) !== checkDomain(url) );
};

Here are some tests with the updated function:

isExternal('http://google.com'); // true
isExternal('https://google.com'); // true
isExternal('//google.com'); // true (no protocol)
isExternal('mailto:mail@example.com'); // true
isExternal('http://samedomain.com:8080/port'); // true (same domain, different port)
isExternal('https://samedomain.com/secure'); // true (same domain, https)

isExternal('http://samedomain.com/about'); // false (same domain, different page)
isExternal('HTTP://SAMEDOMAIN.COM/about'); // false (same domain, but different casing)
isExternal('//samedomain.com/about'); // false (same domain, no protocol)
isExternal('/about'); // false
isExternal('image.jpg'); // false
isExternal('#anchor'); // false

It's more accurate overall, and it even ends up being marginally faster, according to some basic jsperf tests. If you leave off the .toLowerCase() for case-insensitive testing, you can speed it up even more.

Check If Link Is External Using jQuery, IO used, for testing external facing applications / Testing external facing applications using Flood.IO, Howto do it,How itworks, There's more URL / Testing  Compare results of other browsers. Revisions. You can edit these tests or add even more tests to this page by appending /edit to the URL.. Revision 1: published on 2015-4-17

pseudosavant's answer didn't exactly work for me, so I improved it.

var isExternal = function(url) {
    return !(location.href.replace("http://", "").replace("https://", "").split("/")[0] === url.replace("http://", "").replace("https://", "").split("/")[0]);   
}

Web Security Testing Cookbook: Systematic Techniques to Find , This improved method for checking if a URL is external works on relative and setting the link that we're testing as it's JavaScript href attribute. One way to prevent infections getting onto your computer is to make sure the websites you are visiting are clean and haven't been infected or hacked. The best way to do this is scan the website for malicious or suspicious activity before you go there. Here are online 5 services to help you do that.

I had to build on pseudosavant's and Jon's answers because, I needed to also catch cases of URLs beginning with "//" and URLs that do not include a sub-domain. Here's what worked for me:

var getDomainName = function(domain) {
    var parts = domain.split('.').reverse();
    var cnt = parts.length;
    if (cnt >= 3) {
        // see if the second level domain is a common SLD.
        if (parts[1].match(/^(com|edu|gov|net|mil|org|nom|co|name|info|biz)$/i)) {
            return parts[2] + '.' + parts[1] + '.' + parts[0];
        }
    }
    return parts[1]+'.'+parts[0];
};
var isExternalUrl = function(url) {
	var curLocationUrl = getDomainName(location.href.replace("http://", "").replace("https://", "").replace("//", "").split("/")[0].toLowerCase());
	var destinationUrl = getDomainName(url.replace("http://", "").replace("https://", "").replace("//", "").split("/")[0].toLowerCase());
	return !(curLocationUrl === destinationUrl)
};

$(document).delegate('a', 'click', function() {
	var aHrefTarget = $(this).attr('target');
	if(typeof aHrefTarget === 'undefined')
		return;
	if(aHrefTarget !== '_blank')
		return;  // not an external link
	var aHrefUrl = $(this).attr('href');
	if(aHrefUrl.substr(0,2) !== '//' && (aHrefUrl.substr(0,1) == '/' || aHrefUrl.substr(0,1) == '#'))
		return;  // this is a relative link or anchor link
	if(isExternalUrl(aHrefUrl))
		alert('clicked external link');
});
<h3>Internal URLs:</h3>
<ul>
  <li><a href="stackoverflow.com/questions/6238351/fastest-way-to-detect-external-urls" target="_blank">stackoverflow.com/questions/6238351/fastest-way-to-detect-external-urls</a></li>
  <li><a href="www.stackoverflow.com/questions/6238351/fastest-way-to-detect-external-urls" target="_blank">www.stackoverflow.com/questions/6238351/fastest-way-to-detect-external-urls</a></li>
  <li><a href="//stackoverflow.com/questions/6238351/fastest-way-to-detect-external-urls" target="_blank">//stackoverflow.com/questions/6238351/fastest-way-to-detect-external-urls</a></li>
  <li><a href="//www.stackoverflow.com/questions/6238351/fastest-way-to-detect-external-urls" target="_blank">//www.stackoverflow.com/questions/6238351/fastest-way-to-detect-external-urls</a></li>
</ul>
<h3>External URLs:</h3>
<ul>
  <li><a href="http://www.yahoo.com" target="_blank">http://www.yahoo.com</a></li>
  <li><a href="yahoo.com" target="_blank">yahoo.com</a></li>
  <li><a href="www.yahoo.com" target="_blank">www.yahoo.com</a></li>
  <li><a href="//www.yahoo.com" target="_blank">//www.yahoo.com</a></li>
</ul>

30 Top Website Link Verification Testing Tools, Systematic Techniques to Find Problems Fast Brian Hope, Paco Hope, Ben events, detecting, 48 executing within page context, 50 from external sources, 264–266 length of URLs, testing, 81–83 LibWhisker module, 27 LibWWWPerl (​see  External Links are hyperlinks that point at (target) any domain other than the domain the link exists on (source). In layman's terms, if another website links to you, this is considered an external link to your site. Similarly, if you link out to another website, this is also considered an external…

10 BEST Broken Link Checker Tools to Check Your Entire Website, #8) Online Broken Link Checker: An Online tool which scans website for broken links, supports validation for both internal and external URL's, it  Find the Image URL using preg_match() syntax or something similar to extract JPG or PNG or GIF URLs from a mixed text and put them in an array or at last store the first url.

How to write a link checker in the browser with Vanilla JavaScript , Dr. Link Checker inspects all links that contain internal and external links to another website. Limited testing of FTP, gopher and mail URLs. In this Windows 10 guide, we walk you through the steps to fix most issues when your computer won't detect an external hard drive. Best online learning tools for kids: ABCmouse, Reading IQ, & more

Target Only External Links, Jul 2, 2019 - 7 minute read - Software Testing Test Automation Evil Tester JavaScript External Link Checkers; Building a Link Checker; Find all the links; Checking Links; Simple link checker forEach(function(link){ var reportLine = {​url: link. The best technique. Steve Souders has explored several different ways to load JavaScript without blocking both on his blog and in his books. After thinking about it and experimenting, I’ve come to the conclusion that there’s just one best practice for loading JavaScript without blocking: Create two JavaScript files.

Comments
  • What would you consider external? Different scheme/host/port?
  • fast, simple, accurate: choose 2?
  • this could be solution : stackoverflow.com/questions/2910946/…
  • @msec: What exactly are you doing with these 20k anchors?
  • Cool. Glad it works better for you. Regex's definitely have their place, but often times it is used like a chainsaw when a carving knife might be more appropriate.
  • What about magnet: or mailto: ?
  • @venimus I thought other protocols should return that it is an 'external' link but when I tried it it didn't work. Turns out I had a bug/typo in my example code. I had location.href in my domain function instead of url. With that changed it now works properly for other protocols as well.
  • I would MUCH rather maintain this code than the regex in the accepted answer. Nicely done!
  • I'm sorry but this is a terrible answer. 1. your first method doesn't work on stuff that start with a slash. for example, isExternal ("/questions/123456") will return true. 2. your second (regex) method will throw an exception on anything that doesn't start with http or https.
  • +1 Look at this regex beauty. ;) it's like from mr.Regex himself! ;) huhu
  • @roXon: The regular expression is actually from the current RFC for URIs.
  • for external links (facebook.com/mypage/id/123456789) i recieve the following results in msie: test 1 (typeof match[1]): false, test 2 (typeof match[2]): true, for internal links (sub.mydomain.com = host): test 1 (typeof match[1]): true, test 2 (typeof match[2]): true (should all be false) - why so ?