How to find number of years between two timestamp dates in Hive?

hive timestamp difference
hive datediff
hive current date minus 1 year
how to check date format in hive
hive date format dd-mmm-yyyy
current date in hive
hive day of year

I'm trying to find the number of years between two timestamp dates in Hive.

This I tried in SQL.

In SQL:

Datediff(year, date1, date2)

But In Hive I tried:

Datediff(year(date1), year(date2))

But this throws an error stating that

"cannot recognize input near 'datediff' '(' 'year' in expression specification"

Can someone help me in learning this new thing.

There are multiple ways to achieve such results:

1) Extract & Subtraction

You can simply extract year from two dates and then perform subtraction on those two values.

select abs(extract(year from "2019-01-29") - extract(year from "2020-01-20"));

Problem with this approach is it will return you 0 even if you are subtracting first and last day of the same year and will return 1 if your two dates are 31st Dec and 1st Jan, but if that is not harmful for usecase and we just need to see if year changed between two dates then this approach can be useful.

2) datediff function

Function will return number of days between startdate and enddate. When you divide that by 365.25 then it will return number of years in decimal places.

select datediff('2019-02-01', '2019-01-27')/365.25;

You might want to truncate result of above query to two decimal places. If you are looking for an integer number only then just cast it to integer.

select cast(datediff('2019-02-01', '2019-01-27')/365.25 as int);

3) months_between function

This function will return number of months difference between two dates.

select abs(cast(months_between('2019-01-10', '2020-01-10')as int));

Above query will return 12 as a result. If you want to have result in number of years then you can divide result of above query by 12.

4) Custom UDF

This approach is complex then all the above as you need to write your custom UDF function and then validate against all the scenarios.

Write a custom UDF function which takes two dates/strings/timestamps as input and then return difference in years/months/dates/seconds/minutes.

You can write a query also doing the same things using multiple available UDF's in hive.

Here's the link for your reference: Hive Language Manual

How to Subtract TIMESTAMP-DATE-TIME in HIVE – SQL & Hadoop, Each date value contains the century, year, month, day, hour, minute, and second​. We shall see how to use the Hadoop Hive date functions with an Returns number of days between the two date or timestamp values. date_diff(str date 1, str date 2): This function is used to find the difference between two specified dates and returns the difference in the number of days. year(str date): This function is used to return the year portion of the given date in string format.

You can try the following one :

SELECT YEAR(date1)-YEAR(date2)

Hadoop Hive Date Functions and Examples, Hive Overview · Apache Impala Overview · Cloudera Search Overview See TIMESTAMP Data Type for details about how Impala handles time zone If you only need the individual units such as hour, day, month, or year, use the EXTRACT() the number of months between the date portions of two TIMESTAMP values,  Similarly we may want to subtract two DATEs and find the difference. There are functions available in HIVE to find difference between two dates however we can follow the same method to find the difference too. Let’s see how we can use DATEDIFF function to get the output: hive> select datediff(to_date('2017-09-22'), to_date('2017-09-12')); OK 10

If input is string and timestamp, then you can try below

hive> select current_timestamp();
OK
2019-01-29 04:57:04.128
hive> select year(from_unixtime(unix_timestamp('2019-01-29 04:57:04.128', 'yyyy-MM-dd HH:mm:ss.SSS'), 'yyyy-MM-dd'));
OK
2019

hive> select year(from_unixtime(unix_timestamp('2021-01-29 04:57:04.128', 'yyyy-MM-dd HH:mm:ss.SSS'), 'yyyy-MM-dd')) - year(from_unixtime(unix_timestamp('2019-01-29 04:57:04.128', 'yyyy-MM-dd HH:mm:ss.SSS'), 'yyyy-MM-dd'));
OK
2
Time taken: 0.054 seconds, Fetched: 1 row(s)

Impala Date and Time Functions | 5.16.x, SQL Differences Between Impala and Hive · Porting SQL See TIMESTAMP Data Type for details about how Impala handles time zone considerations select now() as right_now, datediff(now() + interval 5 years, now()) as in_5_years; Returns the number of months between the date portions of two TIMESTAMP values,  Date data types do not exist in Hive. In fact the dates are treated as strings in Hive. The date functions are listed below.

Assuming you want an integer, you can do the calculation directly:

select (case when date_format(date1, 'MMDD') < date_format(date2, 'MMDD')
             then year(date1) - year(date2) - 1
             else year(date1) - year(date2)
        end)

Or you can use an approximation:

select datediff(date1, date2) / 365.25

Impala Date and Time Functions | 5.6.x, Some of the date/time functions are affected by the setting of the See TIMESTAMP Data Type for details about how Impala handles time zone considerations for the you can calculate a delta value using other units such as weeks, years, hours, Purpose: Subtracts a specified number of days from a TIMESTAMP value. Many applications manipulate the date and time values. Latest Hadoop Hive query language support most of relational database date functions. In this article, we will check commonly used Hadoop Hive date functions and some of examples on usage of those functions. Hadoop Hive Date Functions Date types are highly formatted and very complicated. Each date […]

Impala Date and Time Functions, In this article, we will discuss various Date Functions provided by Hive in detail with multiple examples. to convert 'yyyy-MM-dd HH:mm: ss' date format into normal Unix timestamp. str date 2): This function is used to find the difference between two specified dates and returns the difference in the number of days. year(str  I see one example: ts between '2017-03-01 00:00:03.00' and '2017-03-01 00:00:10.0' - is this assuming the passed in dates to be in local time? We want to be able to feed UTC date values. We're doing this via the Cloudera JDBC driver for Hive. Seeing something about specifying the number of milliseconds since Epoch as timestamp, like this:

Date Functions in Hive, For example, if you are calculating the difference in years between two dates, A date or timestamp columns or expressions that implicitly convert to a date or  HIVE Date Functions from_unixtime: This function converts the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a STRING that represents the TIMESTAMP of that moment in the current system time zone in the format of “1970-01-01 00:00:00”.

DATEDIFF function - Amazon Redshift, DateDiff is used to calculate the length of time between two dates. You can use metrics, constants, attribute forms, or functions that result in a date or timestamp. “mn”. Hours. “h”. Days. “d”. Weeks. “w”. Months. “m”. Quarters. “q”. Years. “y” the metric displays the number of days between that date and the current date  No need to extract the month and year.Just need to use the unix_timestamp(date String,format String) function. For Example: select yourdate_column from your_table where unix_timestamp(yourdate_column, 'yyyy-MM-dd') >= unix_timestamp('2014-06-02', 'yyyy-MM-dd') and unix_timestamp(yourdate_column, 'yyyy-MM-dd') <= unix_timestamp('2014-07-02','yyyy-MM-dd') order by yourdate_column limit 10;

Comments
  • extract(year from date2) - extract(year from date1) will give you the number of year boundaries... don't know if that is required.
  • @Sara . . . Sample data and desired results would really help. It is not obvious how a difference in years should be calculated.