Get domain from URL to GROUP BY using MySQL

mysql extract domain from url
mysql substring_index
mysql locate
postgresql extract domain from url
sql query to get domain name
mysql extract domain from email
bigquery extract domain from url
sql extract url from string

I have a table filled with URLs. The URLs are in all sorts of formats: http://foo.com, http://bar.foo.com, http://foo.com/bar, etc. But I'm only interested in the domain name itself, so in this case: foo.com. What I'd like to do is select how many times domain names exists in this table. So something like:

SELECT "whatever the domain is in field 'url'", COUNT(*) AS count
FROM table_with_urls
GROUP BY "whatever the domain is in field 'url'"

There are a few similar questions on Stack Overflow, but nothing really answered this. I can't use LIKE or match something with REGEXP, because I'm not (always) looking for specific domain names to match against, but mostly I just want all domain names from the table along with a total count.

Is this possible using MySQL?

Add another indexed column for 'domain' and when you do an INSERT, store this value separately.

GROUP domain from url in MySql - Databases, this query is fine but it doesn't retrieve domains only, eg. http://example.com. I think i need to use conditional functions here but i'm not mysql guru. Using this search & replace, you can easily extract a domain name from a string in MySQL. This works the same way as in Microsoft Excel: search for a character (one or more '/') inside a string (`url`), remove them from the string, and calculate the difference in length between the strings before and after the subtraction.

i had the same problem and this is what i did:

select SUBSTRING(url from 1 for locate('/',url ,10)-1),count(*) from url_list group by SUBSTRING(url from 1 for locate('/',url ,10)-1);

MySQL: how to easily extract a domain name from a URL, Using this search & replace, you can easily extract a domain name from a string in MySQL. This works the same way as in Microsoft Excel: search for a character​  SQL Extract Domain From Email and Count. Although the above example will return the result, we want to count the number of records for each domain name. Here we used the Group By Clause to group the similar domain names, and then we used the COUNT function to count the number of records in each group.

If you want to install a MySQL extension then https://github.com/StirlingMarketingGroup/mysql-get-etld-p1

It extracts basically what you'd expect it to

select`get_etld_p1`('http://a.very.complex-domain.co.uk:8080/foo/bar');-- 'complex-domain.co.uk'
select`get_etld_p1`('https://www.bbc.co.uk/');-- 'bbc.co.uk'
select`get_etld_p1`('https://github.com/StirlingMarketingGroup/');-- 'github.com'
select`get_etld_p1`('https://localhost:10000/index');-- 'localhost'
select`get_etld_p1`('android-app://com.google.android.gm');-- 'com.google.android.gm'
select`get_etld_p1`('example.test.domain.com');-- 'domain.com'
select`get_etld_p1`('postgres://user:pass@host.com:5432/path?k=v#f');-- 'host.com'
select`get_etld_p1`('exzvk.omsk.so-ups.ru');-- 'so-ups.ru'
select`get_etld_p1`('http://10.64.3.5/data_check/index.php?r=index/rawdatacheck');-- '10.64.3.5'
select`get_etld_p1`('not a domain');-- null

Then, if you wanted that to be performant, you could make a second, denormalizing, column that stores just those values, something like

CREATE TABLE `db`.`sometablewithurls` (
  `SomeTableWithURLsID` INT UNSIGNED NOT NULL AUTO_INCREMENT,
  `URL` TEXT NOT NULL DEFAULT '',
  `_ETLDP1` VARCHAR(255) NOT NULL DEFAULT '',
  PRIMARY KEY (`SomeTableWithURLsID`),
  INDEX `_ETLDP1` (`_ETLDP1` ASC));
DROP TRIGGER IF EXISTS `db`.`sometablewithurls_BEFORE_INSERT`;

DELIMITER $$
USE `db`$$
CREATE DEFINER = CURRENT_USER TRIGGER `db`.`sometablewithurls_BEFORE_INSERT` BEFORE INSERT ON `sometablewithurls` FOR EACH ROW
BEGIN

set new.`_ETLDP1`=ifnull(`get_etld_p1`(new.`URL`),'');

END$$
DELIMITER ;
DROP TRIGGER IF EXISTS `db`.`sometablewithurls_BEFORE_UPDATE`;

DELIMITER $$
USE `db`$$
CREATE DEFINER = CURRENT_USER TRIGGER `db`.`sometablewithurls_BEFORE_UPDATE` BEFORE UPDATE ON `sometablewithurls` FOR EACH ROW
BEGIN

set new.`_ETLDP1`=ifnull(`get_etld_p1`(new.`URL`),'');

END$$
DELIMITER ;

Notice the index on the _ETLDP1 (stands for extended top-level domain plus 1), and the trigger updating it both on insert and on update to make sure it keeps up to date even if URL changes.

High Performance MySQL: Optimization, Backups, Replication, and More, Another counted the top N second- and third-level domains that linked to a given site, we decided to implement an approximate distributed GROUP BY with Sphinx, too. Here's a sample URL before and after preprocessing: source_url  Pinal Dave is a SQL Server Performance Tuning Expert and an independent consultant. He has authored 12 SQL Server database books, 33 Pluralsight courses and has written over 5100 articles on the database technology on his blog at a https://blog.sqlauthority.com. Along with 17+ years of hands-on experience, he holds a Masters of Science degree and a number of database certifications.

Expert PHP and MySQL, example\.com'; The word boundary shorthand (\b) is replaced in MySQL by two a simple regular expression could be used: SELECT * FROM 'forum' WHERE Using. LIB_MYSQLUDF_PREG. The LIB_MYSQLUDF_PREG library is a set of then instead of the domain it will return the index of the start of the first group. MySQL is the world's most popular open-source database. Despite its powerful features, MySQL is simple to set up and easy to use. Below are some instructions to help you get MySQL up and running in a few easy steps. We also explain how to perform some basic operations with MySQL using the mysql client.

8.2 Configuring Load Balancing with Connector/J, void removeHost(String group, String host) throws SQLException; public class Test { private static String URL = "jdbc:mysql:loadbalance://" + executeQuery("​SELECT SLEEP(1) /* Connection: " + conn + ", transaction: " + trans + " */")  Great WordPress experience. WordPress made easy. Start your website with an automatic 1-click WordPress installation. The backend is powered by LiteSpeed caching and advanced optimization to ensure your websites are fast, reliable and secure.

8.3 Configuring Master/Slave Replication with Connector/J, jdbc:mysql:replication://[master host][:port],[slave host 1][:port][,[slave host 2][:port]]​. "password"); // // Looks like a normal MySQL JDBC url, with a // comma-​separated list of hosts, the first executeQuery("SELECT a,b FROM alt_table"); . There may be one or more such replication connection groups in a given Java class  The Get-ADDomain cmdlet gets the Active Directory domain specified by the parameters. You can specify the domain by setting the Identity or Current parameters. The Identity parameter specifies the Active Directory domain to get. You can identify the domain object to get by its Distinguished Name (DN), GUID, Security Identifier (SID), DNS domain name, or NetBIOS name. You can also set the

Comments
  • It's very difficult to define 'domain name' as you have things like 'foo.co.uk', 'mydomain.myhost.com', 'foo.museum', etc. The only accurate way to do it is to have a list of the possible top level domains, and that list is quite long (100s of elements). Can you be more specific by what you mean by 'domain name' in your context?
  • Looking for only 2 in 1:[subdomain]2:[foo.com/foo.co.uk]3:[/whatever]. But I was afraid that it would come to something as matching against all possible TLDs and such.