Types and Operations References
Primitive Types:
Int
, Long
, Float
, Double
:
defines standard arithmetic functions (+, -, *, /)
defines standard ordering/comparison operations (<, <=, >, >=, ==, !=)
Boolean
:
defines equality and standard binary operations (&&, ||)
Char
:
defines standard ordering/comparison operations (<, <=, >, >=, ==, !=)
String
:
defines startsWith, endsWith, contains, split
Date
:
defines standard ordering/comparison operations (<, <=, >, >=, ==, !=)
intialized from a String representation
"yyyy-mm-dd"
, e.g., Date("2005-01-31")
Record Types:
Records compose primitive types into a schema. Anonymous records can be easily declared inline as in:table Select( row =>
new Record {
val foo = row.a
val bar = row.b
}
)
Records can also be given a name using type aliases, for example:
type Nation = Record {
val nationkey: Int
val name: String
val regionkey: Int
}
Tuples
Tuples are completely equivalent to Records in OptiQL except for the fact that fields are automatically named using standard Scala conventions (_1
, _2
, etc.). They can provide a cheap syntactic alternative to declaring small anonymous records in-line.
Known Issues with Records
There is currently a bug in the Scala REPL that prevents Record types from resolving correctly. Note that this only affects the REPL; the normal compiled and interpreted modes of OptiQL function correctly.When using the REPL this issue can be worked around in one of two ways:
1) Replace all field accesses, e.g.,
record.foo
, with record.selectDynamic[A]("foo")
where A
is the type of the field foo (e.g., Int
)2) Declare schemas using Tuples or
case class
rather than a Record, e.g., case class Nation(nationkey: Int, name: String, regionkey: Int)
Table[A]
TheTable
is the primary collection type availabe in OptiQL. Every Table
has a schema defined by
the type paramter A
, where A
can be any of the primitive or Record
types listed above. Each element of type A
in the Table
logically forms a row of the table. If A
is a Record
type, each field of A
logically forms a column of the Table
. If A
is a primitive type, the Table
only has one column (with type A
).
Static Methods
Table(args: A*): Rep[Table[A]] Creates a table from the given sequence, e.g.,Table(1,2,3,4,5)
will create a Table[Int]
with rows valued 1 through 5
Table.range(start: Rep[Int], end: Rep[Int]): Rep[Table[Int]]
Creates a table containing every
Int
value in [start, end)
Table.fromFile(path: Rep[String])(selector: Rep[String] => Rep[A]): Rep[Table[A]]
Creates a
Table[A]
by reading in the file that exists at path. Each line of the file will be provided as a String
input to the selector, and the selector function should parse that line into a record of the desired type A
. Records will be stored in the table in the order in which they existed in the file.
Table.fromFile[A](path: Rep[String], separator: Rep[String]): Rep[Table[A]]
Creates a
Table[A]
by reading in the file that exists at path. The type parameter A
must be passed explicitly to this function; it cannot be inferred. Each line of the file will be be automatically parsed according to the schema of type A
and the column separator provided. The separator can be any valid regular expression. This function assumes each column can simply be parsed directly based on its type, e.g., if the type is a Double
it uses a standard string-to-double conversion function. For more complicated file formats, use the alternative fromFile
function above. Records will be stored in the table in the order in which they existed in the file. Example: Table.fromFile[Nation]("/path/to/nation_table", ",")
Instance Methods
Select(selector: Rep[A] => Rep[B]): Rep[Table[B]]Transforms every record in the table to a new schema as specified by the selector function. Example:
val result = lineItems Select( row =>
new Record {
val extendedPrice = row.extendedPrice
val quantity = row.quantity
}
)
result
with only the extendedPrice
and quantity
fields contained in the schema for lineItems
.
SelectMany(selector: Rep[A] => Rep[Table[B]]): Rep[Table[B]]
Transforms every record in the table to a new set of records with a new schema as specified by the selector function. The result is a flat
Table[B]
produced by concatenating the intermediate tables produced by the selector function.
Where(predicate: Rep[A] => Rep[Boolean]): Rep[Table[A]]
Produces a table containing the subset of records that pass the provided predicate, maintaining the order of those that pass.
Distinct(): Rep[Table[A]]
Produces a new table with all duplicate records eliminated. There is no guaranteed ordering of the resulting records.
Distinct(keySelector: Rep[A] => Rep[K]): Rep[Table[A]]
Produces a new table with all duplicate records eliminated, where duplicates are determined by the user provided keySelector.
GroupBy(keySelector: Rep[A] => Rep[K]): Rep[Table[(K,Table[A])]]
Produces a nested table of groups formed by grouping each of the records in the input table according to the keySelector function. This operation should typically be followed by a
Select
which forms a flat table by aggregating each group.
Each created group in the result table defines two methods, key
and values
, which are used to access the group's key and the records corresponding to that key, respectively. There is no guaranteed ordering of the resulting groups.
Sum(selector: Rep[A] => Rep[R]): Rep[R]
Returns the sum of all the values generated by applying the selector function to each record in the table. The selector result type
R
must be one of the primitive numeric types (Int, Long, Float, Double).
Average(selector: Rep[A] => Rep[R]): Rep[R]
Returns the average of all the values generated by applying the selector function to each record in the table. The selector result type
R
must be one of the primitive numeric types (Int, Long, Float, Double).
Max(selector: Rep[A] => Rep[R]): Rep[R]
Returns the maximum of all the values generated by applying the selector function to each record in the table. The selector result type
R
must be a primitive orderable type.
Min(selector: A => R): Rep[R]
Returns the minimum of all the values generated by applying the selector function to each record in the table. The selector result type
R
must be a primitive orderable type.
Count(predicate: Rep[A] => Rep[Boolean]): Rep[Int]
Returns the total number of records in the table that satisfy the given predicate.
OrderBy(selector: (Rep[A] => Rep[K])*): Rep[Table[A]]
Orders the records in the table using the selector function as the sorting criterion. The result type
K
of the selector must be a primitive orderable type. Each selector argument must be wrapped in asc
to sort in ascending order or desc
to sort in descending order. Each successive selector is only applied if all of the preceding selectors determine that two records are equal. Example:table OrderBy(asc(_.quantity), desc(_.price))
sorts records first by the quantity field in ascending order and if the quantities are equal then by the price field in descending order.
Join(tableB: Rep[Table[B]])(selectorA: Rep[A] => Rep[K], selectorB: Rep[B] => Rep[K])(resultSelector: (Rep[A],Rep[B]) => Rep[R]): Rep[Table[R]]
Produces a new table by performing an inner join between this table and the table passed as the first argument. Records in each table are checked for equality by applying the provided selector function to each (selectorA for the first table and selectorB for the second table) and checking the result of the two functions for equality. Pairs of records that are considered equal are then passed to the resultSelector which combines them into a single record for inclusion in the output table. Example:
parts.Join(lineItems)(_.p_partkey, _.l_partkey)(
(p,l) => new Record {
val l_discount = l.l_discount
val p_type = p.p_type
})
parts
records and lineItems
records that have the same partkey
field and defines a new result schema by combining pieces of the two original schemas.
Count(): Rep[Int]
Returns the total number of records in the the table.
apply(i: Rep[Int]): Rep[A]
Returns the
printAsTable(maxRows: Rep[Int])
Prints the table to stdout in a user-readable row-by-column format. Each column is named according to the table's schema. The maximum number of rows to print is an optional argument.
writeAsJSON(path: Rep[String])
Writes the table in JSON format to the file path specified.