Chapter 11. Getting data in and out of Rax

Table of Contents

Processing data in files
Capturing data from external sources
Processing data in database tables

A data-analysis language without a way to import datasets from various sources would be rather useless. Rax provides two ways of importing data. First, Rax can import data from files. Second, since Rax is running on top of an SQL database, it can access data stored in SQL tables in the same database servers. It is also possible to export data from Rax, either to a file, or to a database table.

Processing data in files

As in most other languages, Rax provides a way to perform file I/O. In Rax, I/O is treated like variable assignment, as the below example shows:

    {[#:id, $:name]}: tableNames :=               // Define a table and
      <\tableNames:"nametab.csv">;                // Fill from a .csv file.
    'print tableNames;                            // Look at the data
    tableNames := tableNames \/ {[4, "ala"]};     // Add some more data
    <\tableNames:"nametab.csv"> := tableNames;    // Write back to the file
       

In the above example, a table tableNames is defined, and its contents are read from nametab.csv. The expression <\tableNames:"nametab.csv"> is called I/O variable and specifies the type of the data in a file (\tableNames in this case) and the name of the file containing the data (nametab.csv in this case). I/O variables have always the following form:

< <type> : <file_name> >

Using an I/O variable on the right side of an assignment causes reading data from a file and storing it in the variable on the left side. Using an I/O variable on the left side of an assignment causes writing data to a file. The file name suffix determines what the content of the file has to be (for input) or will be (for output). Note that the string with the file name can be prepended by a an exclamation mark to force overwriting an existing file, in case of output. Use with care and at your own risk.

Currently, Rax supports two file formats: CSV and XML. A CSV file is a text file in which field values are separated by commas and tuples are separated by new lines. Fields of the following types can be read from a CSV file:

  • boolean (?) coded as T and F

  • number (#)

  • real (&) in a regular or scientific notation

  • string ($) delimited by double quotes (")

  • time (@) in ISO 8601 format, with the time part

The first line of the CSV file created by Rax will contain field names. When reading from a CSV file, the first line will be ignored. So, if the following table is written to a CSV file:

    {[?:bool, #:num, &:real, $:str, @:time]}: aTable;
    aTable := {
      [true, 1, 2.34, "alamakota", (@)"2014-09-26T11:24:24"],
      [false, 100, 3.4e-3, "olamapsa", (@)"2010-08-02T12:12:00"]
    };
    <\aTable:"atable.csv"> := aTable;
         

then the CSV file will look like that:

    bool, num, real, str, time
    1,1,2.34,'alamakota',2014-09-26T11:24:24
    0,100,0.0034,'olamapsa',2010-08-02T12:12:00
         

The first line contains tuple-field names. The second and third lines contain the values. If the file is now read into a different set variable:

    {[?:bool1, #:num1, &:real1, $:str1, @:time1]}: anotherTable;
    anotherTable := <\anotherTable:"atable.csv">;
    `print anotherTable;
         

the new set will have the following contents:

    bool1|num1| real1|    str1   |       time1
    -----|----|------|-----------|-------------------
     true|  1 | 2.34 |"alamakota"|2014-09-26T11:24:24
    false| 100|0.0034| "olamapsa"|2010-08-02T12:12:00
         

Note that the field names in the first line of the CSV file were ignored and the field names are as in the definition of the anotherTable variable.

Another file format supported by Rax is XML. If the aTable is written to an XML file:

    <\aTable:"atable.xml"> := aTable;
         

The file will look like this:

    <O>
      <T>
        <b_bool>1</b_bool>
        <n_num>1</n_num>
        <r_real>2.340000e+00</r_real>
        <s_str>alamakota</s_str>
        <t_time>2014-09-26T11:24:24</t_time>
      </T>
      <T>
        <b_bool>0</b_bool>
        <n_num>100</n_num>
        <r_real>3.400000e-03</r_real>
        <s_str>olamapsa</s_str>
        <t_time>2010-08-02T12:12:00</t_time>
      </T>
    </O>
         

The XML tags reflect the types of the values that were written - the <O> tags contain an ordered set, the <T> tags contain a tuple, the <b> tags contain a boolean, etc. For named tuple fields, the XML tag will contain the field type and the field name separated by an underscore, for example: <b_bool>.

If a file to which you are trying to write already exists, Rax will issue a warning and will not overwrite the data:

    <\anotherTable:"atable.csv"> := anotherTable;

    // Output
    // 2: warning: File "atable.csv" already exists, use !"atable.csv" to overwrite.
         

You can add a ! to the file name, to force Rax to overwrite the file:

    <\anotherTable:!"atable.csv"> := anotherTable;