BigQuery parse URL web address

bigquery domain function
bigquery substring
bigquery ip address
bigquery modulo
bigquery coalesce
bigquery split string
bigquery round
bigquery url decode

I need help to parse out the web URL using BigQuery. Need to remove the string/text after last forward slash '/' and return the URL back. The input URL length can vary record by record. If the input URL does not have and string/text after domain address it should return the URL as it is.

Here are some examples.

Input Web URL

Expected Output

I have tried using SPLIT function which converts the URL string into ARRAY and calculate array size using ARRAY_LENGTH. However it doesn't cover the all the various scenario I have mentioned above.

Please advise how to tackle this? using Standard SQL in BigQuery?

You can use simple REGEXP_REPLACE for the last "/" and strings after that.

SELECT REGEXP_REPLACE(url, r"([^/])/[^/]*$", "\\1")
  SELECT '' as url UNION ALL
  SELECT '' as url

Note: \\1 (first capture group) represent the character just before "/", we need to consider the character to avoid matching with "//".

Test Result:

Net Functions in Standard SQL | BigQuery, For more examples, see the IP Version 6 Addressing Architecture. This function Description. Takes a URL as a STRING and returns the host as a STRING. If the function cannot parse the input, it returns NULL. Note: The� Takes a URL as a STRING and returns the host as a STRING. For best results, URL values should comply with the format as defined by RFC 3986. If the URL value does not comply with RFC 3986

I think a case expression helps fill in the blank:

select (case when url like '%//%/%' then regexp_replace(url, '/[^/]+$', '')
             else url
from (select '' as url union all
      select '' as url union all
      select '' as url
      ) x;

DOMAIN Function, Here is some example web visitor information, including the name of the individual and the referring URL. You would like to filter out the internal� soumendra-mishra / BigQuery. Created Jul 28 Clone with Git or checkout with SVN using the repository’s web address. about clone URLs

Below is for BigQuery Standard SQL

SELECT url, 
  REPLACE(REGEXP_REPLACE(REPLACE(url, '//', '\\'), r'/[^/]+$', ''), '\\', '//')
FROM `project.dataset.table`  

you can test, play with above using sample data from your question as in example below

WITH `project.dataset.table` AS (
  SELECT '' 
SELECT url, 
  REPLACE(REGEXP_REPLACE(REPLACE(url, '//', '\\'), r'/[^/]+$', ''), '\\', '//') value
FROM `project.dataset.table`  

with result

Row url                                                 value    

BigQuery 🔎: Extract URL parameters as ARRAY, We're gonna use the REGEXP_EXTRACT_ALL function provided in the Standard SQL dialect of BigQuery to extract parameters from the query� This simple tool lets you parse a URL into its individual components, i.e scheme, protocol, username, password, hostname, port, domain, subdomain, tld, path, query string, hash, etc. It also splits the query string into a human readable format and takes of decoding the parameters. This tool uses the URI.js library developed by Rodney Rhem

Provide a JavaScript UDF solution. Not because it is better for this scenario but it is always your last hope when things' getting really complicated.

(Also, I want to point out that, double slashes could exist in url like:, to handle which you may need extra logic coded in JavaScript)

  remove_last_part_from_url(url STRING)
  LANGUAGE js AS """
  var last_slash = url.lastIndexOf('/');
  var first_double_slash = url.indexOf('//');
  if (first_double_slash != -1 
      && last_slash != -1 
      && last_slash != first_double_slash + 1) {
    return url.substr(0, last_slash);
  return url;
  """ ;
SELECT remove_last_part_from_url(url)
  SELECT '' as url UNION ALL
  SELECT '' as url UNION ALL -- double slash after https://
  SELECT 'https:/invalid_url' as url UNION ALL
  SELECT '' as url

How to Parse Query String Parameters from URLs in Big Data , Parsing URL query string parameters is easy with Xplenty. You can take a huge pile of web server logs and analyze them via Xplenty's visual� Clone via HTTPS Clone with Git or checkout with SVN using the repository’s web address. Learn more about clone URLs BigQuery. newTableSchema ()

4. Loading Data into BigQuery, From Cloud Shell, you can page through the gzipped file using zless : Could not parse 'NULL' as int for field HBCU (position 26) starting at location 11945910 It is worth noting that you can do one-time loads from the BigQuery web user equals sign, and the Google Cloud Storage URL corresponding to the data file(s) . BigQuery supports the use of the SAFE. prefix with most scalar functions that can raise errors, including STRING functions, math functions, DATE functions, DATETIME functions, and TIMESTAMP functions. BigQuery does not support the use of the SAFE. prefix with aggregate, analytic, or user-defined functions.

BigQuery Export schema - Analytics Help, For each Analytics view that is enabled for BigQuery integration, a dataset is Could be "organic", "cpc", "referral", or the value of the utm_medium URL The sub-continent from which sessions originated, based on IP address of the visitor. takes to parse the document and execute deferred and parser-inserted scripts � BigQuery supports a FORMAT() function for formatting strings. This function is similar to the C printf function. It produces a STRING from a format string that contains zero or more format specifiers, along with a variable length list of additional arguments that matches the format specifiers.

Call functions via HTTP requests, Examples in this page are based on a sample function that triggers when you send time, formats the time as specified in a URL query parameter, and sends the result in the HTTP response. which supports routers and apps managed by the Express web framework. This parsing is done by the following body parsers :. This query is to find geolocation of an IP address including latitude, longitude, city and country. Legacy SQL doesn't support range conditions such as BETWEEN when using JOIN, so we need to filter data by WHERE.

  • Bravo! I new there were a better way, but missed that trick. Bravo!
  • Forgot to mention - instead of "\\1" - you can use r"\1"
  • @MikhailBerlyant - Thank you for your help!
  • @kshaikh - Sure, consider also voting up helpful answers :o)