Relational algebra, first described by E.F. Codd, has a wellfounded semantics used for modeling data, and defining operations on it. Relational algebra is the theoretical basis of the relational databases and SQL (Structured Query Language) that is used to query such databases.
Data modeled as sets of tuples (often referred to
as tables) can be manipulated using the
socalled relational
operators. Some of these operators are
directly based on the ones defined by Codd, like
select
, project
or various
joins. Others are based more loosely on Codd's operators, like
the fold
operation that can be used to perform
aggregate operations or partition
that can be used
to split a table into multiple tables. In this section, we will
discuss the relational operators in detail. Since SQL is a very
popular language, and many readers will be already familiar with it,
we will also provide comparisons between Rax relational operators and
SQL constructs.
Projection is an operation defined by Codd in relational algebra and is denoted there by the symbol π. Projection is formally defined as a unary operation, π_{a1,...,an}(R) where R is a relation (i.e., a set of tuples) and a_{1},...,a_{n} is a set of attribute names. The result of such projection is defined as the set of tuples obtained when all tuples in R are restricted to the set {a_{1},...,a_{n}}. In the table terminology, we say that projections cuts out a subset of columns from a table.
Like in SQL, projection in Rax is more powerful than in relational algebra: it not only allows to cut out existing columns, but also to create new columns based on the existing columns. For example:
{[#:id, $:name, @:date_of_birth]}: people := { [1, "Gosia Wrzesinska", (@)"19780208"], [2, "Ala Makota", (@)"19890604"] }; `print project [.#1, .name, ((@)"now" ~ .date_of_birth).r'months/12] people; // Output: //: #1 #2 #3 //:  //: 1"Gosia Wrzesinska"36 //: 2 "Ala Makota" 25
In the above example, we use Rax's project
operator, to compute ages of people listed in
table people
. The project
statement in
Rax has the following syntax:
project[
<list_of_expressions_with_tuple_fields> ]
<set_expression>
The list of expressions in project
can contain
arbitrary expressions that can (but do not have to) refer to the
tuple fields in the set expression. The tuple field reference can
be in a form of a number (.#1
), field label
(.name
) or field label and number. The set
expression must evaluate to a set, but not necessarily to a set
of tuples. For example, this is also a
valid project
expression:
`print project [.#1 * 2] {1..10}; // Output: {2,4,6,8,10,12,14,16,18,20}
The list of expressions in project
can also
contain ranges of tuple fields, which is
very convenient when dealing with wide tables:
{[#:id1, #:id2, #:id3, $:name1, $:name2, $:name3, &:score1, &:score2, &:score3]}: wide_table := {[1, 2, 3, "ala", "ma", "kota", 1.5, 2.5, 3.5]}; `print project [.id1 .. .id3] wide_table; // Output: //: #1#2#3 //:  //: 1 2 3 `print project [.name1 ..] wide_table; // Output: //: #1  #2  #3  #4 #5 #6 //:  //: "ala""ma""kota"1.52.53.5 `print project [.. .name3] wide_table; // Output: //: #1#2#3 #4  #5  #6 //:  //: 1 2 3"ala""ma""kota" `print project [..] wide_table; // Output: //: #1#2#3 #4  #5  #6  #7 #8 #9 //:  //: 1 2 3"ala""ma""kota"1.52.53.5
A couple more useful examples of project:
Extend points in time to 15minute intervals:
project [()[.point_in_time, (^)"PT15M"] web_clicks
Compute the number of unique respondents in a dataset:
# ! project [.respondent_id] dataset
Compute the first 10 powers of 2:
project [2**.#1] {1..10}
Generate a vector of 1s of length 5:
project [1] {1..5}
Comparison with SQL.
There is no explicit project
operator in SQL. The columns are selected and transformed in
the SELECT
clause. For example, the "people"
query in SQL would look like that^{[10]}:
SELECT id, name, DATEDIFF(year, date_of_birth, GETDATE()) FROM people
It is not possible to refer to columns by numbers in SQL. It is also not possible to address whole ranges of columns.
Selection comes also directly from relational algebra and is denoted by σ. Selection is a unary operation σ_{Υ}(R), where R is a relation and Υ is a boolean expression, whose result is a set of tuples from R for which Υ holds. The select statement in Rax has the following form:
select[
<boolean_expressions_with_tuple_fields> ]
<set_expression> The boolean expression, which we call selection condition can be an arbitrary expression that can (but does not have to) refer to tuple fields in the set expression. A tuple field reference can be in a form of a number, field label or field label and number. For example:
// Select people born before 1980 // `print select [.#1 == 1 && .date_of_birth.year < 1980] people; // Output: //: id name  date_of_birth //:  //: 1"Gosia Wrzesinska"19780208T00:00:00
The set expression must evaluate to a set, but not necessarily to a set of tuples. For example:
{@}: dates := {(@)"20121112", (@)"19780208", (@)"20150612"}; // Select dates after year 1990 // `print select [.#1.year > 1990] dates; // Output: {20121112T00:00:00,20150612T00:00:00} // Select dates falling in a period of 25 years // starting from the 1st of January 1990 // `print select [.#1 <: ()"19900101/P25Y"] dates; // Output: {20121112T00:00:00}
Comparison with SQL.
In SQL, the selection is performed using
the SELECT
clause. To perform a pure selection
(i.e. without any projection), SELECT *
should
be used. For example the "people" query in SQL would look like
this:
SELECT * FROM people WHERE id = 1 AND DATEPART(year, date_of_birth) < 1980
Natural join is a binary operator between relations: R ⨝ S, where R and S are relations. The result of the natural join is the set of all combinations of tuples in R and S that are equal on their common attribute names. We've already seen an example of a natural join in Chapter 2, Getting started:
{[$:name, #:dept_id]}: employees := { ["Gosia Wrzesinska", 1], ["Daniela Gavidia", 1], ["Matt Dobson", 2], ["JanMark Wams", 3] }; {[#:dept_id, $:dept_name]}: departments := { [1, "Software development"], [2, "Hardware maintenance"], [3, "Entertainment"] }; `print employees >< departments; // Output: //: name dept_id dept_name //:  //: "Daniela Gavidia" 1 "Software development" //: "Gosia Wrzesinska" 1 "Software development" //: "JanMark Wams"  3  "Entertainment" //: "Matt Dobson"  2 "Hardware maintenance"
In the relationalalgebra terminology, we joined
relations employees
and departments
by matching the tuples from both sides for which the value of
the dept_id
attribute is equal.
Natural join in Rax has the following syntax:
><
<set_expression> or
>< :[
<conditionals> ]
<set_expression> Rax searches for matching tuple fields in both set set expressions (both field name and type must match). Unnamed fields do not match any other fields. If there are no matching fields, the natural join turns into a Cartesian product. For example:
`print {1,2} >< {1,2}; // Output: //: #1#2 //:  //: 1 1 //: 1 2 //: 2 1 //: 2 2
You can tell Rax to warn you in such cases, by switching the warning level to 4 (more about warning levels in Chapter 19, Meta instructions and macros):
%warn 4; `print {1,2} >< {1,2}; // Error: Rax warning: No matching labels turns >< into >< // // Output: //: #1#2 //:  //: 1 1 //: 1 2 //: 2 1 //: 2 2
The extra conditions have the following syntax:
:[
<conditions> ]
The extra conditions can be used to select only a part of the tuples from the join result. The boolean expression in the conditions can refer to tuple fields from both set expressions. For example:
`print employees ><:[.dept_name == "Software development"] departments; // Output: //: name dept_id dept_name //:  //: "Daniela Gavidia" 1 "Software development" //: "Gosia Wrzesinska" 1 "Software development"
Rax will automatically detect from which set expression the tuple fields in the boolean expression are coming from.
Comparison with SQL. Not all SQL dialect provide a naturaljoin operator. For example, in SQL Server, the naturaljoin operator has to be simulated in this way:
SELECT * FROM employees JOIN departments ON (employees.dept_id = departments.dept_id)
Note that, unlike with a natural join, the result of this join will contain
the dept_id
columns from both sides.
In some SQL dialects, this can be shortened to:
SELECT * FROM employees INNER JOIN departments USING (dept_id)
MySQL and PostgreSQL do provide a naturaljoin operator:
SELECT * FROM employees NATURAL JOIN departments
The result of this statement will contain
the dept_id
column only once.
Cross join is another name for Cartesian product, which in set theory is defined in the following way:
In other words, it is a set of all combinations of elements from
sets A and B. In Rax, the cross join is denoted
by ><
. For example:
`print {1,2} >< {3,4}; // Output: //: #1#2 //:  //: 1 3 //: 1 4 //: 2 3 //: 2 4 `print employees >< departments; // Output: //: name dept_iddept_id dept_name //:  //: "Daniela Gavidia" 1  1 "Software development" //: "Daniela Gavidia" 1  2 "Hardware maintenance" //: "Daniela Gavidia" 1  3  "Entertainment" //: "Gosia Wrzesinska" 1  1 "Software development" //: "Gosia Wrzesinska" 1  2 "Hardware maintenance" //: "Gosia Wrzesinska" 1  3  "Entertainment" //: "JanMark Wams"  3  1 "Software development" //: "JanMark Wams"  3  2 "Hardware maintenance" //: "JanMark Wams"  3  3  "Entertainment" //: "Matt Dobson"  2  1 "Software development" //: "Matt Dobson"  2  2 "Hardware maintenance" //: "Matt Dobson"  2  3  "Entertainment"
Like with the natural join, the crossjoin operator in Rax can contain extra conditionals:
><
<set_expression> or
>< :[
<conditionals> ]
<set_expression>> Using the extra conditionals, one can express other types of joins, such as ϴjoin, and its special case  equijoin. For example:
`print employees ><:[.dept_id#1 == .dept_id#2] departments; // Output: // name dept_iddept_id dept_name //  // "Daniela Gavidia" 1  1 "Software development" // "Gosia Wrzesinska" 1  1 "Software development" // "JanMark Wams"  3  3  "Entertainment" // "Matt Dobson"  2  2 "Hardware maintenance"
A ϴjoin produces all possible
combinations of tuples from both relations, for which the
ϴcondition  a comparison between tuple
attributes  holds. If the ϴcondition is an equality
condition, we call the join equijoin.
In the example above, we joined sets employees
and departments
on the equality
condition .dept_id#1 == .dept_id#2
.
The extra conditionals are somewhat stronger that the
ϴcondition in the ϴjoin  it can be an arbitrary
boolean expression that can (but does not have to) refer to the
tuple fields in both sets.
Rax will automatically detect from which set expression the tuple
fields in the boolean expression are coming from. For duplicate
tuplefield labels, use the name+number naming scheme, like in the
example above, where .dept_id#1
refers to the tuple from the
left side, while .dept_id#2
refers to the tuple
from the right side.
Note that the result of the join in the example above is almost
the same as the result of the natural join in the previous
section. The dept_id
field is included in the
resulting set twice, though.
Comparison with SQL. Most SQL dialects contain a crossjoin operator:
SELECT * FROM employees CROSS JOIN departments
Often, you can also express a cross join in the following way:
SELECT * FROM employees, departments
The SQL standard contains an inner join operator which can be compared to the ϴjoin and equijoin, for example:
SELECT * FROM employees INNER JOIN departments ON (employees.dept_id == departments.dept_id)
In most SQL dialects, the INNER
keyword can be
omitted:
SELECT * FROM employees JOIN departments ON (employees.dept_id == departments.dept_id)
Finally, in SQL, you can add arbitrary boolean conditions to a join expression by using the WHEREclause:
SELECT * FROM employees JOIN departments ON (employees.dept_id == departments.dept_id) WHERE employees.dept_id < 2
Like the natural join described before, natural outer joins match tuples based on tuple fields that are common for both sets (same label and type). Unlike the natural join though, natural outer joins do not require all tuples from both relations to match. The result of an outer join retains each tuple from the input sets, even if no matching record exists. In this case, the result will contain a tuple formed by extending the unmatched tuple with "filler" values. In Rax, every type has a default value which is used as a filler in outer joins. The default values for Rax types are listed in Table 12.1, “Initial Missing Values for Rax Types”, in Chapter 10, More about types.
There are three types of (natural) outer joins:
Left outer join
in which all tuples from the left set
are retained, but only matching tuples from the right set. The
filler values will only be used on the right side. Left outer
join in Rax is denoted by =><
.
Right outer join
in which all tuples from the right set are retained but only
matching tuples from the left set. The filler values will
only be used on the left side. Right outer join in Rax is
denoted by ><=
.
Full outer join
in which all tuples from both sets will be retained. The
filler values will be used on both sides. Full outer join in
Rax is denoted by =><=
Below a couple of examples illustrating all types of outer joins:
{[$:city, $:province]}: Province := { ["Delft","ZH"], ["Leek","GR"], ["Rotterdam","ZH"] }; {[$:city, $:area]}: Area := { ["Delft","Zuid"], ["Rotterdam","Zuid"], ["Sneek","Noord"] }; `print Province =>< Area; // Output: // city province area //  // "Delft"  "ZH" "Zuid" // "Leek"  "GR"  "" // "Rotterdam" "ZH" "Zuid"
In the above example, a left outer join is performed between
tables Province
and Area
. The tuples are matched on their
common field city
. There is no matching tuple
for ["Leek","GR"]
, therefore this tuple is
filled up with a default value in the
field area
(in this case, the default value is
the empty string, ""
). Note that not all
tuples from the right side of the join are in the result set 
the tuple ["Sneek","Noord"]
, for which there
is no match, was removed.
`print Province ><= Area; // Output: // city province area //  // "Delft"  "ZH"  "Zuid" // "Rotterdam" "ZH"  "Zuid" // "Sneek"  "" "Noord" //
In the above example, a right outer join is performed between
tables Province
and Area
. This time, the
tuple ["Sneek","Noord"]
from the right side of
the join, for which there is no match on the left side, is
retained and filled up with default value in
the province
field. The ["Leek","GR"]
, for which there is
no match on the right side was removed.
`print Province =><= Area; // Output: // city province area //  // "Delft"  "ZH"  "Zuid" // "Leek"  "GR"  "" // "Rotterdam" "ZH"  "Zuid" // "Sneek"  "" "Noord"
Finally, in the full outer join, both tuples for which there is
no match on the other side, ["Leek","GR"]
and ["Sneek","Noord"]
, are retained and filled
with default values.
The formal syntax of the outer joins in Rax is, for left outer join:
=><
<set_expression> or
=>< :[
<conditionals> ]
<set_expression> For right outer join:
><=
<set_expression> or
><= :[
<conditionals> ]
<set_expression> For full outer join:
=><=
<set_expression> or
=><= :[
<conditionals> ]
<set_expression> As with other types of join, the extra conditionals can contain an arbitrary boolean expression referencing the tuple fields from both sets. The extra conditionals can be used to further restrict the result set of a join. For example:
`print Province =>< :[.province == "GR"] Area; // Output: // city provincearea //  // "Leek"  "GR"  ""
Comparison with SQL. Most SQL dialects provide some form of outerjoin operators. These are not natural join operators, though, which means the join predicate has to be explicitly specified. Here are some examples:
SELECT * FROM Province RIGHT OUTER JOIN Area ON (Province.city = Area.city) SELECT * FROM Province LEFT OUTER JOIN Area ON (Province.city = Area.city) SELECT * FROM Province FULL OUTER JOIN Area ON (Province.city = Area.city)
The most important difference, however, between outer joins in
Rax and in SQL are the "filler" values. In SQL, the nonmatching
tuples are filled with the special NULL
value. NULL
is a special value in SQL, because
it is by definition not in the domain of any of the SQL
datatypes, and therefore it's not comparable with "regular"
values and cannot be used in many functions and expressions that
only accept "regular" values. The special handling
that NULL
requires often leads to extra SQL code. For example:
SELECT * FROM Province LEFT OUTER JOIN Area ON (Province.city = Area.city) WHERE province <> "LB" OR ISNULL(province)
The last line in the above example is needed,
because NULL
is not comparable
with "LB"
. In Rax, such extra code is not
necessary:
`print Province =>< :[.province != "LB"] Area;
Aggregation in relational algebra allows to compute various functions on columns,
like summing up their elements, finding the minimum, etc. Relational algebra
contains, at least, five such aggregate
functions: G
_{Sum()},
G
_{Count()},
G
_{Average()},
G
_{Max()} and
G
_{Min()}. The functions can be performed on the
whole columns, or on parts of columns obtained by grouping the rows in the table on
the values of the grouping attributes.
In Rax, aggregations are performed using
the fold
function that applies reduction operators. The syntax
of fold
resembles the syntax of project
:
fold[
<list_of_expressions> ]
<set_expression>
Like with project
, the list of expressions can
contain arbitrary expressions referring to the tuple
fields. Additionally, list of expressions can contain aggregate
functions on tuple fields. For example:
{[#:employee_id, #:dept_id, &:salary, &:bonus]}: Salaries := { [34, 842, 23423.55, 100.00], [82, 783, 13546.47, 10.00], [89, 783, 48727.45, 0.00], [11, 842, 36144.88, 1000.00], [83, 783, 64001.67, 1000.00] }; `print fold[.dept_id, /sum(.salary)] Salaries; // Output: //: dept_id salary //:  //: 783 126275.59 //: 842  59568.43
In this example, /sum
is an aggregate function
that sums all values. The tuple fields that are used outside of
aggregate functions are used as grouping attributes. In this
example, dept_id
is the grouping attribute. So
the fold
expression will compute the sum of
salaries per department.
As with the project
operator, expressions inside a
fold
operator can be labeled. Aggregates can be labeled on the
outside and on the inside. Inside labeling can be used to give every individual
aggregate expression its own label. Outside labeling is a short hand for labeling
all the aggregate expressions the same. When an outside label as well as inside
labels are supplied the outer label is honored and all the inner labels are
ignored as demonstrated in the code below:
`print fold[.dept_id, /sum(.#3):dep_sal] Salaries; // Output: //: dept_id dep_sal //:  //: 783 126275.59 //: 842  59568.43 `print fold[.dept_id, /sum(.#3:ignore):total] Salaries; // Output: //: dept_id total //:  //: 783 126275.59 //: 842  59568.43
If more than one expression is supplied to a aggregate like /sum
,
the reason for having both inside and outside labeling becomes clearer, as
demonstrated by the code below:
`print fold[.dept_id, /sum(.#3:sal,.#4:bon)] Salaries; // Output: //: dept_id sal  bon //:  //: 783 126275.591010 //: 842  59568.431100 `print fold[.dept_id, /sum(.#3,.#4):sum] Salaries; // Output: //: dept_id sum  sum //:  //: 783 126275.591010 //: 842  59568.431100
The last fold
operator in the example above generates a table
with two columns labeled sum
. These can be addressed like any
tuple or table with repeating labels, by adding an index. For example the second
sum
can be addressed like so: .sum#2
.
The following table lists the aggregate functions that can be used
in fold
.
Table 8.3. Aggregate functions in fold
Syntax  Explanation 

/count() 
Count tuples in a table/group. For
example:
fold [/count()] employees
will count the rows in the employees
table, and
fold [.dept_id, /count()] employees
will count employees per department.

/count(
boolean expression
)

Count rows for which the boolean expression is true.
For example:
fold [/count(.dept_id == 1)] employees
will count the number of employees in the department
with ID 1 .

/count!(
field_{1},...,field_{n}
)

Count rows that are unique on the set of fields
field_{1},...,field_{n}.
For example:
fold [.dept_id, /count!(.first_name, .last_name)] employees
will count the number of unique first name / last name
combination in each department.

/sum(
field_{1},...,field_{n}
)

Return the sum of the values in each field
field_{1},...,field_{n}. Can
be applied only to fields of type #
or & .

/cat(
field_{1},...,field_{n}
)

Return the string concatenation of the values in each field
field_{1},...,field_{n}. Can be
applied only to fields of type $ . Fields as well as
the entire /cat() operator take a separator in the
form of  separator: /cat(.field"separator")

/product(
field_{1},...,field_{n}
) or /*(
field_{1},...,field_{n}
) 
Return the product of the values in each field
field_{1},...,field_{n}. Can
be applied only to fields of type #
or & .

/min(
field_{1},...,field_{n}
)

Return the minimum for each field
field_{1},...,field_{n}. Can
be applied to fields of
type # , &
and $ .

/max(
field_{1},...,field_{n}
)

Return the maximum for each field
field_{1},...,field_{n}. Can
be applied to fields of
type # , &
and $ .

/median(
field_{1},...,field_{n}
)

Return the median for each field
field_{1},...,field_{n}. Can
be applied to fields of
type # , &
and $ .

/average(
field_{1},...,field_{n}
)

Return the average (mean) for each field
field_{1},...,field_{n}. Can
be applied to fields of
type # and & .

/&&(
field_{1},...,field_{n}
)

Return the logical AND of all values for each field
field_{1},...,field_{n}. Can
be applied to fields of type ? .

/(
field_{1},...,field_{n}
)

Return the logical OR of all values for each field
field_{1},...,field_{n}. Can
be applied to fields of type ? .

/^^(
field_{1},...,field_{n}
)

Return the logical XOR of all values for each field
field_{1},...,field_{n}. Can
be applied to fields of type ? .

/&(
field_{1},...,field_{n}
)

Return the bitwise AND of all values for each field
field_{1},...,field_{n}. Can
be applied to fields of type # .

/(
field_{1},...,field_{n}
)

Return the bitwise OR of all values for each field
field_{1},...,field_{n}. Can
be applied to fields of type # .

/^(
field_{1},...,field_{n}
)

Return the bitwise XOR of all values for each field
field_{1},...,field_{n}. Can
be applied to fields of type # .

Comparison with SQL. In SQL, aggregate functions can be used in the SELECT clause mixed with other expressions on columns. Unlike in Rax, the grouping columns have to be specified explicitly in the GROUP BY clause. Nongrouping columns cannot be used outside aggregate functions. For example:
SELECT dept_id, SUM(salary) FROM employee_salaries GROUP BY dept_id
All SQL dialects support the five basic aggregate functions:
COUNT/COUNT DISTINCT, SUM, AVG, MIN and MAX. Some SQL dialects
offer additional functions.
Note, that SQL's aggregate functions
need to be well defined on unordered sets (since SQL tables are
unordered), whereas, since
all Rax sets are ordered, the Rax fold
operator
can perform operations like string concatenation where the order is
important.
Informally, partitioning is splitting a set into subsets based on
certain properties of the elements. More formally,
a partition of a set S
is a set of sets P
where:
P
does not contain the empty
set.
The union of all sets in P
is
equal to S
.
The sets in P
are pairwise
disjoint.
In Rax, sets can be partitioned using
the partition
operator, with the following
syntax:
partition[
<list_of_tuple_fields> ]
<set_expression> The set defined by the 'set_expression' will be partitioned in such a way, that values of the tuple fields on the 'list_of_tuple_fields' are equal in each subset. For example:
`print partition [.dept_id] employees; // Output: // { // {["Daniela Gavidia", 1], ["Gosia Wrzesinska", 1]}, // {["Matt Dobson", 2]}, // {["JanMark Wams", 3]} // }
In this example, we partitioned the set of employees on their department. The result is a set of 3 sets: one per each department. Each set contains employees working in the given department.
Comparison with SQL. Since in SQL the result of every query has to be a table, and a partition is a set of tables, there is no explicit partitioning operator. SQL's window functions provide a way of specifying table partitioning. The window function is applied to each subset separately and the subsets are unified to provide the end result, for example:
SELECT employee_id, dept_id, avg(salary) OVER (PARTITION BY dept_id) FROM employee_salaries;
^{[10] }We're using the SQL Server dialect of SQL  date and time functions are different in different SQL dialects