Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Applies to:
Databricks SQL
Databricks Runtime
The TABLESAMPLE statement is used to sample the relation.
Syntax
TABLESAMPLE ( { percentage PERCENT ) |
num_rows ROWS |
BUCKET fraction OUT OF total } )
[ REPEATABLE ( seed ) ]
Parameters
percentage PERCENTAn INTEGER or DECIMAL constant
percentagebetween 0 and 100 specifying which percentage of the table's rows to sample.num_rows ROWSA constant positive INTEGER expression
num_rowsspecifying an absolute number of rows out of all rows to sample.BUCKET fraction OUT OF totalAn INTEGER constant
fractionspecifying the portion out of the INTEGER constanttotalto sample.REPEATABLE ( seed )Applies to:
Databricks SQL
Databricks Runtime 11.3 LTS and aboveAn optional positive INTEGER constant
seed, used to always produce the same set of rows. Use this clause when you want to reissue the query multiple times, and you expect the same set of sampled rows.
Note
TABLESAMPLEreturns the approximate number of rows or fraction requested.- Always use
TABLESAMPLE (percent PERCENT)if randomness is important.TABLESAMPLE (num_rows ROWS)is not a simple random sample but instead is implemented usingLIMIT. TABLESAMPLEdoes not support filter pushdowns as the sampling precedes the application of any other filters, such asWHEREclauses.
Examples
> CREATE TEMPORARY VIEW test(id, name) AS
VALUES ( 1, 'Lisa'),
( 2, 'Mary'),
( 3, 'Evan'),
( 4, 'Fred'),
( 5, 'Alex'),
( 6, 'Mark'),
( 7, 'Lily'),
( 8, 'Lucy'),
( 9, 'Eric'),
(10, 'Adam');
> SELECT * FROM test;
5 Alex
8 Lucy
2 Mary
4 Fred
1 Lisa
9 Eric
10 Adam
6 Mark
7 Lily
3 Evan
> SELECT * FROM test TABLESAMPLE (30 PERCENT) REPEATABLE (123);
1 Lisa
2 Mary
3 Evan
5 Alex
8 Lucy
> SELECT * FROM test TABLESAMPLE (5 ROWS);
5 Alex
8 Lucy
2 Mary
4 Fred
1 Lisa
> SELECT * FROM test TABLESAMPLE (BUCKET 4 OUT OF 10);
8 Lucy
2 Mary
9 Eric
6 Mark