Temporal relational operators

Next to the traditional relational operators, Rax features several temporal relational operators like the temporal AND (denoted in Rax as @&@) which are basically joins with additional selections and processing of overlapping time intervals. Temporal operators are defined on sets of tuples which contain at least one field of type interval (|). If there are more than one field of this type, the first one is used. An attempt to use a temporal operator on a set without intervals, will result in an empty set.

@&@ (Temporal and)

Temporal and, denoted by @&@, returns all combinations of tuples from the left and right side, for which the intervals in the tuples are overlapping. The two intervals are replaced by their overlap. The syntax for the temporal and is the following:

<set_expression> @&@ <set_expression>

or

<set_expression> @&@ :[ <conditionals> ] <set_expression>

A very simple example usage for the @&@ is trimming behavioral data to a specific period. For example:

    |: this_year := (|)"2014-01-01/P1Y";
    `print viewing_data @&@ {[this_year]};
           

In this example, we created a very simple set of tuples, containing only one interval describing the current year. Then we performed temporal and on the TV viewing data and this simple set. The result of this operations is a subset of the TV viewing data containing only viewings from year 2014. The viewings that cross the year boundary (e.g., somebody started watching on the 31st of December 2013 and finished on the 1st of January 2014) will be trimmed to only the part that is in year 2014.

In a more complex scenario, we might want to compute overlaps between two sets of intervals. In the next example, we will compute overlaps between people's lunch breaks and meetings:

    {[|:lunch, $:who]}: lunch_breaks := {
      [(|)"2000-08-20T12:00/PT1H",  "jan"],
      [(|)"2000-08-20T12:10/PT40M", "piet"],
      [(|)"2000-08-20T12:20/PT20M", "kees"]
    };

    {[$:who, |:meeting]}: meetings := {
      ["jan",  (|)"2000-08-20T12:05/PT15M"],
      ["jan",  (|)"2000-08-20T12:40/PT30M"],
      ["jan",  (|)"2000-08-20T15:40/PT15M"],
      ["kees", (|)"2000-08-20T12:15/PT30M"]
    };

    `print lunch_breaks @&@ :[.who#1 == .who#2] meetings;

    // Output:
    //           lunch          |  who |  who
    // -------------------------|------|------
    // 2000-08-20T12:05:00/PT15M| "jan"| "jan"
    // 2000-08-20T12:20:00/PT20M|"kees"|"kees"
    // 2000-08-20T12:40:00/PT20M| "jan"| "jan"
           

We use the extra conditionals to specify that we only want to compute overlaps between lunches and meetings of the same person. Finally, the last example shows a part of a real analysis, in which we compute the reach of TV channels. The reach of a TV channel is the percentage of the total audience (i.e., all respondents) that have watched the channel:

    {[#:RespondentId, |:Timeslot, #:ChannelId]}: TvExposures := ...
    {[#:RespondentId, #:MediaUniverseId, |:Period, &:Weight]}: RespondentWeights :=

    {[#:RespondentId, |:Timeslot, #:ChannelId, &:Weight]}: TvExposuresWeighted :=
      project [.RespondentId#1, .Timeslot, .ChannelId, .Weight]
      (TvExposures @&@ :[.RespondentId#1 == .RespondentId#2] RespondentWeights);
           

In this dataset, each respondent has a weight to compensate for bias in the sample. For example, if our sample has too many programmers and too few econometrists, the econometrists will be assigned a higher weight than the programmers. Moreover, since respondents can join or leave the sample during the collection of the data, their weights change overtime. In this dataset, weights are re-computed for every week. To compute the correct TV reach, we need to assign the proper weight to each TV viewing record. The respondentID of the weight and the viewing record have to match, and the viewing timeslot has to fall within the validity period of the weight. If the viewing timeslot falls on a boundary of two (or more) weight periods, it has to be cut into parts and each part has to have a different weight. If you had to express this calculation in SQL, you would probably be pretty scared by now. Luckily, the @&@ operator in Rax provides just the functionality that is needed here, and the whole calculation can be expressed in a single line of code:

    TvExposures @&@ :[.RespondentId#1 == .RespondentId#2] RespondentWeights
           

@<:@ (Temporal element-of)

Temporal element-of, denoted by @<:@, is similar to temporal and, but it takes a set of timestamps on the left side, rather than a set of intervals. It returns all combinations of tuples from the left and right side, for which the timestamps in the tuples from the left side are contained in the intervals in the tuples on the right side. The result contains the timestamps from the left side, but not the intervals from the right side. The syntax for the temporal element-of is the following:

set_expression @<:@ [ optional_where_clause ] set_expression

The syntax of the optional where-clause is the same as in regular join operators:

: [ boolean_expression ]

An example usage for the @<:@ operator is trimming sets of events to certain windows, for example, trimming customer journeys to windows of 4 weeks before the conversion.

    {[#:user_id, @:conversion_timestamp]} : conversions;
    {[#:user_id, |:lookback_period]} : conversion_windows :=
        project [.user_id, (|)[(^)"P4W", .conversion_timestamp]
        conversions;
    `print events @<:@ conversion_windows;
           

@?@ (Temporal touch)

Temporal touch, denoted in Rax by @?@, returns a set of pairs of tuples from the left and right side, for which the intervals in the tuples are overlapping. The two intervals are preserved. The syntax for the temporal touch is the following:

<set_expression> @?@ <set_expression>

or

<set_expression> @?@ :[ <conditionals> ] <set_expression>

For example, if somebody wants to watch all episodes of "House" on Channel 1, and all episodes of "Mega Mindy" on Channel 2, they can check if any two are overlapping, by executing the following code:

    {[#:ch_nr, $:title, |:air_time]}: tv_programming :=
      {[1, "House", (|)"2014-09-12T17:00:00/PT60M"],
       [1, "House", (|)"2014-09-17T17:00:00/PT60M"],
       [1, "House", (|)"2014-09-24T17:00:00/PT60M"],
       [2, "Mega Mindy", (|)"2014-09-15T17:20:00/PT30M"],
       [2, "Mega Mindy", (|)"2014-09-16T17:20:00/PT30M"],
       [2, "Mega Mindy", (|)"2014-09-17T17:20:00/PT30M"],
       [2, "Mega Mindy", (|)"2014-09-18T17:20:00/PT30M"]};

    `print tv_programming @?@
             :[.title#1 == "House" && .ch_nr#1 == 1 &&
               .title#2 == "Mega Mindy" && .ch_nr#2 == 2]
           tv_programming;
         

The code above will perform a self-join (i.e., a join of a table with itself) on table tv_programming, and find and print out all pairs of overlapping episodes of our favorite series. In this case:

    ch_nr| title |        air_time        |ch_nr|    title   |         air_time
    -----|-------|------------------------|-----|------------|-------------------------
      1  |"House"|2014-09-17T17:00:00/PT1H|  2  |"Mega Mindy"|2014-09-17T17:20:00/PT30M
         

Note that all fields from both sides are preserved and intervals are not modified.

@? (Temporal semi-touch)

Temporal semi-touch, denoted in Rax by @?, returns the tuples from the left side, for which the interval overlaps with at least one interval from the right side. Note that, unlike with @?@, only tuple fields from the left side are returned, and each tuple from the left side can be included in the result at most once, even if it overlaps with multiple intervals from the right side. The syntax for the temporal touch is the following:

<set_expression> @? <set_expression>

or

<set_expression> @? :[ <conditionals> ] <set_expression>

For example, let's assume that Mandy wants to figure out which of her after-hours meetings conflict with her fitness schedule:

    {[$:class_name, |:time]}: fitness_schedule :=
      {["spinning",  (|)"2014-17-09T18:30:00/PT1H"],
       ["body pump", (|)"2014-19-09T19:30:00/PT1H"],
       ["spinning",  (|)"2014-24-09T18:30:00/PT1H"],
       ["body pump", (|)"2014-16-09T19:30:00/PT1H"]};

    {[$:with_whom, |:time]}: late_meetings :=
      {["kees", (|)"2014-18-09T19:00/PT2H"],
       ["piet", (|)"2014-19-09T19:00/PT1H"],
       ["rob",  (|)"2014-19-09T20:00/PT1H"],
       ["mies", (|)"2014-20-09T18:00/PT2H"]};

    `print late_meetings @? fitness_schedule;

    // Output:
    // with_whom|          time
    // ---------|------------------------
    //   "piet" |2015-07-09T19:00:00/PT1H
    //   "rob"  |2015-07-09T20:00:00/PT1H
           

It seems that the meetings with Piet and Rob are conflicting with Mandy's fitness schedule. Piet wouldn't mind rescheduling the meeting, but Rob gets easily offended, so Mandy is considering skipping some of the fitness classes. To find out which ones, Mandy executes:

    `print fitness_schedule @? late_meetings;

    // Output:
    //  class_name|          time
    // -----------|------------------------
    // "body pump"|2015-07-09T19:30:00/PT1H
           

Luckily it's only one class (even though it conflicts with two meetings), so Mandy decides to skip it.

@!? (Temporal anti-touch)

Temporal anti-touch, denoted in Rax by @!?, is the exact opposite of the temporal semi-touch. It returns the tuples from the left side, for which the interval does not overlap with any interval from the right side. The syntax for the temporal anti-touch is the following:

<set_expression> @!? <set_expression>

or

<set_expression> @!? :[ <conditionals> ] <set_expression>

Continuing the example from the previous section, let's assume that Mandy wants to figure out, to which fitness classes she can go without having to reschedule any of her meetings:

    `print fitness_schedule @!? late_meetings;

    // Output:
    //  class_name|          time
    // -----------|------------------------
    //  "spinning"|2015-05-09T18:30:00/PT1H
    //  "spinning"|2015-12-09T18:30:00/PT1H
    // "body pump"|2015-04-09T19:30:00/PT1H
           

@\/ (Temporal union)

Temporal union, denoted in Rax by @\/, is a unary temporal operator, i.e. it takes only one set as an argument. Its syntax is the following:

@\/ <set_expression>

or

@\/ :[ <grouping> ] <set_expression>

Temporal union unifies all overlapping intervals in a set of tuples into a single interval. The results is a set with fewer, but longer intervals. If the group-by clause is used, the set of tuples is first divided into groups for which values of the fields in the group-by clause are equal. Then the intervals are unified within each group separately.

In the example below, we want to figure out, how many people visit a supermarket more than once a day. We have a dataset with timestamped customer purchases. If more than 15 minutes pass between two purchases for the same customer on the same day, we assume the customer left the supermarket in the meantime.

    // Define and read data.
    //
    {[@: saleTime, #: customerId, #: productId ]}:
      Timestamps :=
          <\Timestamps:"foodmart_tables/saleTimestamps.csv">;


    // 1) Extend every timestamp to a visit of 15 minutes; add a day number
    //
    {[ |: visit, #: customerId, #: day_number ]}:
      Intervals :=
          project[(|)[.saleTime, (^)"PT15M"], .customerId, .saleTime.yyyyddd] Timestamps;

    //  2) Unite the 15 minute visits into bigger visits; group by customer ID
    //     and day number
    //
    Intervals :=
      @\/:[.customerId, .day_number] Intervals;
           

In the first step, we convert timestamps into 15-minute intervals, by performing a cast: (|)[.saleTime, (^)"PT15M"]. We also add a unique day number, to be able to treat visits on different days separately. We compute the day number by using the magic tag yyyyddd which returns the concatenation of the year number and the day number within the year. In the second step, we use the temporal union operator to "glue" all overlapping intervals into a single interval, per customer, per day. In this way, all purchases that were less than 15 minutes apart, are "glued" into a single visit. Now we have a dataset with one row per visit, per customer, per day. Now we can count the number of visits per customer, per day, using the fold operator, and then compute the average number of visits:

    `print
       fold [/average(.#3)]
       fold [.customerId, .day_number, /count()] Intervals;
           

In the same way, we could use the temporal union to, for example, unify web clicks into user sessions.