How to find the domain is whether HTTP or HTTPS (with or without WWW) using PHP?

http to https htaccess
redirect http to https
redirect all to https://www
htaccess force https and non www
htaccess force https and www
how to change http to https
php parse_url
redirect www to non www htaccess

I have million (1,000,000) domains list.

+----+--------------+--------------------------+
| Id | Domain_Name  |       Correct_URL        |
+----+--------------+--------------------------+
|  1 | example1.com | http://www.example1.com  |
|  2 | example2.com | https://exmple2.com      |
|  3 | example3.com | https://www.example3.com |
|  3 | example4.com | http://example4.com      |
+----+--------------+--------------------------+
  • ID and Domain_Name column is filled.
  • Correct_URL column is empty.

Question : I need to fill the Correct_URL column.

The problem I face is how do I find the prefix part before the domain. It may either http:// or http://www. or https:// or https://www.

How do I find correctly what is from above 4 using PHP? Please note that I need to run code to all 1,000,000 domains.... So I am looking at a fastest way to check it...

You could use cURL method:

$url_list = ['facebook.com','google.com'];

foreach($url_list as $url){

    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($ch,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
    curl_exec($ch);

    $real_url =  curl_getinfo($ch, CURLINFO_EFFECTIVE_URL);
    echo $real_url;//add here your db commands

}

This one take some times because it take the last redirected url. if you only want to check whether its http or https you could try this:

$url_list = ['facebook.com','google.com'];

foreach($url_list as $url){

    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_exec($ch);

    $real_url =  curl_getinfo($ch, CURLINFO_REDIRECT_URL);
    echo $real_url;//add here your db commands

}

parse_url - Manual, $url = 'http://username:password@hostname:9090/path?arg=value#anchor'; [If you haven't yet] been able to find a simple conversion back to string from a parsed url, Here's a good way to using parse_url () gets the youtube link. login.yahoo.com?.src=ym&.intl=gb&.lang=zh-Hans-HK&.done=https://mail.yahoo.​com")); PHP program to get complete URL of currently running pages. Create a PHP variable which will store the URL in string format. Check whether the HTTPS is enabled by the server .If it is, append “https” to the URL string. If HTTPS is not enabled, append “http” to the URL string. Append the regular symbol, i.e. “://” to the URL.

There isn't really any way other than making an HTTP request to each of the possibilities and see if you get a response.

While you assert "It may either http:// or http://www. or https:// or https://www.", real world domains may provide zero, some or all or those (as well as various others) and they may respond to requests with OKs or redirects or authentication errors, etc.

HTTP and HTTPS are not attributes of a web application; they are communication protocols handled by the endpoint (the web server, or an application firewall, etc.).

As with any network communications, one must probe the host ("www" is the host in this case), and the port (not necessarily, but most commonly) port 80 and 443 respectively. This probing is a shout, then you wait and see if there is a service listening on the other side.

get_headers - Manual, If the optional format parameter is set to non-zero, get_headers() parses the response Echo the function containing the URL you want to check the response code for, and voilà. Also we got the HTTP codes in a number indexed values. it is broken for the HTTPS protocol on many but the more recent versions of PHP:  (at least in 5.2.0 + djbdns-dnscache) gethostbyname does not really seem to cache entries. If somebody notices a speed-up after the second lookup of the same domain - that's most likely your dns-cache itself, not some php-internal dns-cache.

Given a known url you could make a call to http and/or https versions with get_headers, from their you can determine if https is available, if http redirects to https and so on.

Details can be found here: http://php.net/manual/en/function.get-headers.php

.htaccess redirect to https and www, If your site is serving secure pages via the HTTPS protocol (i.e., via SSL/TLS), you may need a technique to redirect all HTTP requests to HTTPS. Canonical HTTPS/non-WWW <IfModule mod_rewrite.c> RewriteCond duplicate pages because of index.php appended to requested URLs, check out this  Without HTTPS, any data passed is insecure. to redirect HTTP traffic to HTTPS. If you are using a content delivery network Whether or not you should switch to HTTPS is a decision

The Complete Guide To Switching From HTTP To HTTPS , You'll learn how to use it, how to tell if it's working, and its impact on performance. For example, without HTTPS, someone running a Wi-Fi access point could see Some certificates even check the legal identity behind that website, A CA issues a certificate, stating that the domain name example.com (a  The data contained in the WHOIS database, while believed by the company to be reliable, is provided "as is", with no guarantee or warranties regarding its accuracy. This information is provided for the sole purpose of assisting you in obtaining information about domain name registration records. Any use of this data for any other purpose

Apache redirect www to non-www and HTTP to HTTPS, How to redirect the www host name to the root domain (or vice-versa) with to consolidate a canonical domain by redirecting non-HTTP sites to HTTPS, in they are used to determine if the request should be redirected. Common .htaccess Redirects. GitHub Gist: instantly share code, notes, and snippets.

Force your site to load securely with an .htaccess file – DreamHost , Forcing the domain to serve securely using HTTPS (for any site) For example, if you see the domain 'example.com', change this to your To force any HTTP request to redirect to HTTPS, you can add code to <IfModule mod_rewrite.c> RewriteEngine On RewriteBase / RewriteRule ^index\.php$ - [L]  People understand that Google.com, Facebook.com, Twitter.com and SitePoint.com are websites. Before we take this discussion further, your site must work with or without the www. For the sake of

Allow both http and https for a given domain? [#758714], We have a public domain for our public website that uses http in most This is the proper check, see http://php.net/manual/en/reserved.variables.server.php '​HTTPS' Set to a non-empty value if the script was queried through  To force a specific domain to use HTTPS, use the following lines of code in the .htaccess file in your website’s root folder: NOTE: If you have existing code in your .htaccess, add this above where there are already rules with a similar starting prefix. RewriteCond % {REQUEST_URI} !^/ [0-9]+\..+\.cpaneldcv$ RewriteCond % {REQUEST_URI

Comments
  • "It may either http:// or http://www. or https:// or https://www." … or possibly all of the above …? A site might be set up to respond to all four of those "versions" - or it might redirect to one "main" version. No other way to figure this out, than to make an actual HTTP request (resp. requests, in case a site doesn’t want to answer for some of those addresses at all) … and that this is not going to go quick for 1,000,000 domains should be obvious from the get-go.
  • This works fine. Thanks But I can't understand how this works. When we tell program to add https://www. or https:// and other in front of domains (facebook.com. google.com)
  • And there is a small bug in this too. When we pass fb.com, it display real URL as facebook.com. But it should be displayed as there is no real URL
  • well, this is how it works, just imagine you type fb.com in a browser and what the final url you get ? so that's what above code does. it return actual redirected url
  • ... or simply ban your source IP address.
  • @YvesLeBorg — Unlikely unless all the domain names are being hosted by the same entity.
  • ... mostly yes , except I pass a bunch of domains/subdomains (.net, .io ,.ca, .com) in a single load/balancer and security architecture for two of my products. The same snoopers hit on all (observed), but not many times.
  • @Quentin How did you make this answer as community wiki?
  • @IamtheMostStupidPerson — stackoverflow.com/help/privileges/community-wiki
  • I tested this. It always go to the else part (// the site is not responding to any URL's do you need to do something here?). Do you have any idea
  • Sorry I bugged the code, $bestUrl with $best_url