Functions¶
CQL supports 2 main categories of functions:
- the scalar functions, which simply take a number of values and produce an output with it.
- the aggregate functions, which are used to aggregate multiple rows results from a
SELECT
statement.
In both cases, CQL provides a number of native “hard-coded” functions as well as the ability to create new user-defined functions.
Note
By default, the use of user-defined functions is disabled by default for security concerns (even when
enabled, the execution of user-defined functions is sandboxed and a “rogue” function should not be allowed to do
evil, but no sandbox is perfect so using user-defined functions is opt-in). See the enable_user_defined_functions
in cassandra.yaml
to enable them.
A function is identifier by its name:
function_name ::= [keyspace_name
'.' ]name
Scalar functions¶
Native functions¶
Cast¶
The cast
function can be used to converts one native datatype to another.
The following table describes the conversions supported by the cast
function. Cassandra will silently ignore any
cast converting a datatype into its own datatype.
From | To |
---|---|
ascii |
text , varchar |
bigint |
tinyint , smallint , int , float , double , decimal , varint , text ,
varchar |
boolean |
text , varchar |
counter |
tinyint , smallint , int , bigint , float , double , decimal , varint ,
text , varchar |
date |
timestamp |
decimal |
tinyint , smallint , int , bigint , float , double , varint , text ,
varchar |
double |
tinyint , smallint , int , bigint , float , decimal , varint , text ,
varchar |
float |
tinyint , smallint , int , bigint , double , decimal , varint , text ,
varchar |
inet |
text , varchar |
int |
tinyint , smallint , bigint , float , double , decimal , varint , text ,
varchar |
smallint |
tinyint , int , bigint , float , double , decimal , varint , text ,
varchar |
time |
text , varchar |
timestamp |
date , text , varchar |
timeuuid |
timestamp , date , text , varchar |
tinyint |
tinyint , smallint , int , bigint , float , double , decimal , varint ,
text , varchar |
uuid |
text , varchar |
varint |
tinyint , smallint , int , bigint , float , double , decimal , text ,
varchar |
The conversions rely strictly on Java’s semantics. For example, the double value 1 will be converted to the text value ‘1.0’. For instance:
SELECT avg(cast(count as double)) FROM myTable
Token¶
The token
function allows to compute the token for a given partition key. The exact signature of the token function
depends on the table concerned and of the partitioner used by the cluster.
The type of the arguments of the token
depend on the type of the partition key columns. The return type depend on
the partitioner in use:
- For Murmur3Partitioner, the return type is
bigint
. - For RandomPartitioner, the return type is
varint
. - For ByteOrderedPartitioner, the return type is
blob
.
For instance, in a cluster using the default Murmur3Partitioner, if a table is defined by:
CREATE TABLE users (
userid text PRIMARY KEY,
username text,
)
then the token
function will take a single argument of type text
(in that case, the partition key is userid
(there is no clustering columns so the partition key is the same than the primary key)), and the return type will be
bigint
.
Uuid¶
The uuid
function takes no parameters and generates a random type 4 uuid suitable for use in INSERT
or
UPDATE
statements.
Timeuuid functions¶
now
¶
The now
function takes no arguments and generates, on the coordinator node, a new unique timeuuid (at the time where
the statement using it is executed). Note that this method is useful for insertion but is largely non-sensical in
WHERE
clauses. For instance, a query of the form:
SELECT * FROM myTable WHERE t = now()
will never return any result by design, since the value returned by now()
is guaranteed to be unique.
minTimeuuid
and maxTimeuuid
¶
The minTimeuuid
(resp. maxTimeuuid
) function takes a timestamp
value t
(which can be either a timestamp
or a date string <timestamps>) and return a fake timeuuid
corresponding to the smallest (resp. biggest)
possible timeuuid
having for timestamp t
. So for instance:
SELECT * FROM myTable
WHERE t > maxTimeuuid('2013-01-01 00:05+0000')
AND t < minTimeuuid('2013-02-02 10:00+0000')
will select all rows where the timeuuid
column t
is strictly older than '2013-01-01 00:05+0000'
but strictly
younger than '2013-02-02 10:00+0000'
. Please note that t >= maxTimeuuid('2013-01-01 00:05+0000')
would still
not select a timeuuid
generated exactly at ‘2013-01-01 00:05+0000’ and is essentially equivalent to t >
maxTimeuuid('2013-01-01 00:05+0000')
.
Note
We called the values generated by minTimeuuid
and maxTimeuuid
fake UUID because they do no respect
the Time-Based UUID generation process specified by the RFC 4122. In
particular, the value returned by these 2 methods will not be unique. This means you should only use those methods
for querying (as in the example above). Inserting the result of those methods is almost certainly a bad idea.
Time conversion functions¶
A number of functions are provided to “convert” a timeuuid
, a timestamp
or a date
into another native
type.
Function name | Input type | Description |
---|---|---|
toDate |
timeuuid |
Converts the timeuuid argument into a date type |
toDate |
timestamp |
Converts the timestamp argument into a date type |
toTimestamp |
timeuuid |
Converts the timeuuid argument into a timestamp type |
toTimestamp |
date |
Converts the date argument into a timestamp type |
toUnixTimestamp |
timeuuid |
Converts the timeuuid argument into a bigInt raw value |
toUnixTimestamp |
timestamp |
Converts the timestamp argument into a bigInt raw value |
toUnixTimestamp |
date |
Converts the date argument into a bigInt raw value |
dateOf |
timeuuid |
Similar to toTimestamp(timeuuid) (DEPRECATED) |
unixTimestampOf |
timeuuid |
Similar to toUnixTimestamp(timeuuid) (DEPRECATED) |
Blob conversion functions¶
A number of functions are provided to “convert” the native types into binary data (blob
). For every
<native-type>
type
supported by CQL (a notable exceptions is blob
, for obvious reasons), the function
typeAsBlob
takes a argument of type type
and return it as a blob
. Conversely, the function blobAsType
takes a 64-bit blob
argument and convert it to a bigint
value. And so for instance, bigintAsBlob(3)
is
0x0000000000000003
and blobAsBigint(0x0000000000000003)
is 3
.
User-defined functions¶
User-defined functions allow execution of user-provided code in Cassandra. By default, Cassandra supports defining functions in Java and JavaScript. Support for other JSR 223 compliant scripting languages (such as Python, Ruby, and Scala) can be added by adding a JAR to the classpath.
UDFs are part of the Cassandra schema. As such, they are automatically propagated to all nodes in the cluster.
UDFs can be overloaded - i.e. multiple UDFs with different argument types but the same function name. Example:
CREATE FUNCTION sample ( arg int ) ...;
CREATE FUNCTION sample ( arg text ) ...;
User-defined functions are susceptible to all of the normal problems with the chosen programming language. Accordingly, implementations should be safe against null pointer exceptions, illegal arguments, or any other potential source of exceptions. An exception during function execution will result in the entire statement failing.
It is valid to use complex types like collections, tuple types and user-defined types as argument and return types. Tuple types and user-defined types are handled by the conversion functions of the DataStax Java Driver. Please see the documentation of the Java Driver for details on handling tuple types and user-defined types.
Arguments for functions can be literals or terms. Prepared statement placeholders can be used, too.
Note that you can use the double-quoted string syntax to enclose the UDF source code. For example:
CREATE FUNCTION some_function ( arg int )
RETURNS NULL ON NULL INPUT
RETURNS int
LANGUAGE java
AS $$ return arg; $$;
SELECT some_function(column) FROM atable ...;
UPDATE atable SET col = some_function(?) ...;
CREATE TYPE custom_type (txt text, i int);
CREATE FUNCTION fct_using_udt ( udtarg frozen )
RETURNS NULL ON NULL INPUT
RETURNS text
LANGUAGE java
AS $$ return udtarg.getString("txt"); $$;
User-defined functions can be used in SELECT
, INSERT
and UPDATE
statements.
The implicitly available udfContext
field (or binding for script UDFs) provides the necessary functionality to
create new UDT and tuple values:
CREATE TYPE custom_type (txt text, i int);
CREATE FUNCTION fct\_using\_udt ( somearg int )
RETURNS NULL ON NULL INPUT
RETURNS custom_type
LANGUAGE java
AS $$
UDTValue udt = udfContext.newReturnUDTValue();
udt.setString("txt", "some string");
udt.setInt("i", 42);
return udt;
$$;
The definition of the UDFContext
interface can be found in the Apache Cassandra source code for
org.apache.cassandra.cql3.functions.UDFContext
.
public interface UDFContext
{
UDTValue newArgUDTValue(String argName);
UDTValue newArgUDTValue(int argNum);
UDTValue newReturnUDTValue();
UDTValue newUDTValue(String udtName);
TupleValue newArgTupleValue(String argName);
TupleValue newArgTupleValue(int argNum);
TupleValue newReturnTupleValue();
TupleValue newTupleValue(String cqlDefinition);
}
Java UDFs already have some imports for common interfaces and classes defined. These imports are:
import java.nio.ByteBuffer;
import java.util.List;
import java.util.Map;
import java.util.Set;
import org.apache.cassandra.cql3.functions.UDFContext;
import com.datastax.driver.core.TypeCodec;
import com.datastax.driver.core.TupleValue;
import com.datastax.driver.core.UDTValue;
Please note, that these convenience imports are not available for script UDFs.
CREATE FUNCTION¶
Creating a new user-defined function uses the CREATE FUNCTION
statement:
create_function_statement ::= CREATE [ OR REPLACE ] FUNCTION [ IF NOT EXISTS]function_name
'('arguments_declaration
')' [ CALLED | RETURNS NULL ] ON NULL INPUT RETURNScql_type
LANGUAGEidentifier
ASstring
arguments_declaration ::=identifier
cql_type
( ','identifier
cql_type
)*
For instance:
CREATE OR REPLACE FUNCTION somefunction(somearg int, anotherarg text, complexarg frozen<someUDT>, listarg list)
RETURNS NULL ON NULL INPUT
RETURNS text
LANGUAGE java
AS $$
// some Java code
$$;
CREATE FUNCTION IF NOT EXISTS akeyspace.fname(someArg int)
CALLED ON NULL INPUT
RETURNS text
LANGUAGE java
AS $$
// some Java code
$$;
CREATE FUNCTION
with the optional OR REPLACE
keywords either creates a function or replaces an existing one with
the same signature. A CREATE FUNCTION
without OR REPLACE
fails if a function with the same signature already
exists.
If the optional IF NOT EXISTS
keywords are used, the function will
only be created if another function with the same signature does not
exist.
OR REPLACE
and IF NOT EXISTS
cannot be used together.
Behavior on invocation with null
values must be defined for each
function. There are two options:
RETURNS NULL ON NULL INPUT
declares that the function will always returnnull
if any of the input arguments isnull
.CALLED ON NULL INPUT
declares that the function will always be executed.
Function Signature¶
Signatures are used to distinguish individual functions. The signature consists of:
- The fully qualified function name - i.e keyspace plus function-name
- The concatenated list of all argument types
Note that keyspace names, function names and argument types are subject to the default naming conventions and case-sensitivity rules.
Functions belong to a keyspace. If no keyspace is specified in <function-name>
, the current keyspace is used (i.e.
the keyspace specified using the USE
statement). It is not possible to create a user-defined function in one of the
system keyspaces.
DROP FUNCTION¶
Dropping a function uses the DROP FUNCTION
statement:
drop_function_statement ::= DROP FUNCTION [ IF EXISTS ]function_name
[ '('arguments_signature
')' ] arguments_signature ::=cql_type
( ','cql_type
)*
For instance:
DROP FUNCTION myfunction;
DROP FUNCTION mykeyspace.afunction;
DROP FUNCTION afunction ( int );
DROP FUNCTION afunction ( text );
You must specify the argument types (arguments_signature
) of the function to drop if there are multiple
functions with the same name but a different signature (overloaded functions).
DROP FUNCTION
with the optional IF EXISTS
keywords drops a function if it exists, but does not throw an error if
it doesn’t
Aggregate functions¶
Aggregate functions work on a set of rows. They receive values for each row and returns one value for the whole set.
If normal
columns, scalar functions
, UDT
fields, writetime
or ttl
are selected together with
aggregate functions, the values returned for them will be the ones of the first row matching the query.
Native aggregates¶
Count¶
The count
function can be used to count the rows returned by a query. Example:
SELECT COUNT (*) FROM plays;
SELECT COUNT (1) FROM plays;
It also can be used to count the non null value of a given column:
SELECT COUNT (scores) FROM plays;
Max and Min¶
The max
and min
functions can be used to compute the maximum and the minimum value returned by a query for a
given column. For instance:
SELECT MIN (players), MAX (players) FROM plays WHERE game = 'quake';
Sum¶
The sum
function can be used to sum up all the values returned by a query for a given column. For instance:
SELECT SUM (players) FROM plays;
Avg¶
The avg
function can be used to compute the average of all the values returned by a query for a given column. For
instance:
SELECT AVG (players) FROM plays;
User-Defined Aggregates¶
User-defined aggregates allow the creation of custom aggregate functions. Common examples of aggregate functions are count, min, and max.
Each aggregate requires an initial state (INITCOND
, which defaults to null
) of type STYPE
. The first
argument of the state function must have type STYPE
. The remaining arguments of the state function must match the
types of the user-defined aggregate arguments. The state function is called once for each row, and the value returned by
the state function becomes the new state. After all rows are processed, the optional FINALFUNC
is executed with last
state value as its argument.
STYPE
is mandatory in order to be able to distinguish possibly overloaded versions of the state and/or final
function (since the overload can appear after creation of the aggregate).
User-defined aggregates can be used in SELECT
statement.
A complete working example for user-defined aggregates (assuming that a keyspace has been selected using the USE
statement):
CREATE OR REPLACE FUNCTION averageState(state tuple<int,bigint>, val int)
CALLED ON NULL INPUT
RETURNS tuple
LANGUAGE java
AS $$
if (val != null) {
state.setInt(0, state.getInt(0)+1);
state.setLong(1, state.getLong(1)+val.intValue());
}
return state;
$$;
CREATE OR REPLACE FUNCTION averageFinal (state tuple<int,bigint>)
CALLED ON NULL INPUT
RETURNS double
LANGUAGE java
AS $$
double r = 0;
if (state.getInt(0) == 0) return null;
r = state.getLong(1);
r /= state.getInt(0);
return Double.valueOf(r);
$$;
CREATE OR REPLACE AGGREGATE average(int)
SFUNC averageState
STYPE tuple
FINALFUNC averageFinal
INITCOND (0, 0);
CREATE TABLE atable (
pk int PRIMARY KEY,
val int
);
INSERT INTO atable (pk, val) VALUES (1,1);
INSERT INTO atable (pk, val) VALUES (2,2);
INSERT INTO atable (pk, val) VALUES (3,3);
INSERT INTO atable (pk, val) VALUES (4,4);
SELECT average(val) FROM atable;
CREATE AGGREGATE¶
Creating (or replacing) a user-defined aggregate function uses the CREATE AGGREGATE
statement:
create_aggregate_statement ::= CREATE [ OR REPLACE ] AGGREGATE [ IF NOT EXISTS ]function_name
'('arguments_signature
')' SFUNCfunction_name
STYPEcql_type
[ FINALFUNCfunction_name
] [ INITCONDterm
]
See above for a complete example.
CREATE AGGREGATE
with the optional OR REPLACE
keywords either creates an aggregate or replaces an existing one
with the same signature. A CREATE AGGREGATE
without OR REPLACE
fails if an aggregate with the same signature
already exists.
CREATE AGGREGATE
with the optional IF NOT EXISTS
keywords either creates an aggregate if it does not already
exist.
OR REPLACE
and IF NOT EXISTS
cannot be used together.
STYPE
defines the type of the state value and must be specified.
The optional INITCOND
defines the initial state value for the aggregate. It defaults to null
. A non-null
INITCOND
must be specified for state functions that are declared with RETURNS NULL ON NULL INPUT
.
SFUNC
references an existing function to be used as the state modifying function. The type of first argument of the
state function must match STYPE
. The remaining argument types of the state function must match the argument types of
the aggregate function. State is not updated for state functions declared with RETURNS NULL ON NULL INPUT
and called
with null
.
The optional FINALFUNC
is called just before the aggregate result is returned. It must take only one argument with
type STYPE
. The return type of the FINALFUNC
may be a different type. A final function declared with RETURNS
NULL ON NULL INPUT
means that the aggregate’s return value will be null
, if the last state is null
.
If no FINALFUNC
is defined, the overall return type of the aggregate function is STYPE
. If a FINALFUNC
is
defined, it is the return type of that function.
DROP AGGREGATE¶
Dropping an user-defined aggregate function uses the DROP AGGREGATE
statement:
drop_aggregate_statement ::= DROP AGGREGATE [ IF EXISTS ]function_name
[ '('arguments_signature
')' ]
For instance:
DROP AGGREGATE myAggregate;
DROP AGGREGATE myKeyspace.anAggregate;
DROP AGGREGATE someAggregate ( int );
DROP AGGREGATE someAggregate ( text );
The DROP AGGREGATE
statement removes an aggregate created using CREATE AGGREGATE
. You must specify the argument
types of the aggregate to drop if there are multiple aggregates with the same name but a different signature (overloaded
aggregates).
DROP AGGREGATE
with the optional IF EXISTS
keywords drops an aggregate if it exists, and does nothing if a
function with the signature does not exist.