cluster(5)
CLUSTER() SQL Commands CLUSTER()
NAME
CLUSTER - cluster a table according to an index
SYNOPSIS
CLUSTER indexname ON tablename
CLUSTER tablename
CLUSTER
DESCRIPTION
CLUSTER instructs PostgreSQL to cluster the table specified
by tablename based on the index specified by indexname. The
index must already have been defined on tablename.
When a table is clustered, it is physically reordered based
on the index information. Clustering is a one-time opera-
tion: when the table is subsequently updated, the changes
are not clustered. That is, no attempt is made to store new
or updated rows according to their index order. If one
wishes, one can periodically recluster by issuing the com-
mand again.
When a table is clustered, PostgreSQL remembers on which
index it was clustered. The form CLUSTER tablename reclus-
ters the table on the same index that it was clustered
before.
CLUSTER without any parameter reclusters all the tables in
the current database that the calling user owns, or all
tables if called by a superuser. (Never-clustered tables are
not included.) This form of CLUSTER cannot be executed
inside a transaction block.
When a table is being clustered, an ACCESS EXCLUSIVE lock is
acquired on it. This prevents any other database operations
(both reads and writes) from operating on the table until
the CLUSTER is finished.
PARAMETERS
indexname
The name of an index.
tablename
The name (possibly schema-qualified) of a table.
NOTES
CLUSTER loses all visibility information of tuples, which
makes the table look empty to any snapshot that was taken
before the CLUSTER command finished. That makes CLUSTER
unsuitable for applications where transactions that access
the table being clustered are run concurrently with CLUSTER.
SQL - Language StatementLast change: 2008-01-03 1
CLUSTER() SQL Commands CLUSTER()
This is most visible with serializable transactions, because
they take only one snapshot at the beginning of the transac-
tion, but read-committed transactions are also affected.
In cases where you are accessing single rows randomly within
a table, the actual order of the data in the table is unim-
portant. However, if you tend to access some data more than
others, and there is an index that groups them together, you
will benefit from using CLUSTER. If you are requesting a
range of indexed values from a table, or a single indexed
value that has multiple rows that match, CLUSTER will help
because once the index identifies the table page for the
first row that matches, all other rows that match are prob-
ably already on the same table page, and so you save disk
accesses and speed up the query.
During the cluster operation, a temporary copy of the table
is created that contains the table data in the index order.
Temporary copies of each index on the table are created as
well. Therefore, you need free space on disk at least equal
to the sum of the table size and the index sizes.
Because CLUSTER remembers the clustering information, one
can cluster the tables one wants clustered manually the
first time, and setup a timed event similar to VACUUM so
that the tables are periodically reclustered.
Because the planner records statistics about the ordering of
tables, it is advisable to run ANALYZE [analyze(5)] on the
newly clustered table. Otherwise, the planner may make poor
choices of query plans.
There is another way to cluster data. The CLUSTER command
reorders the original table by scanning it using the index
you specify. This can be slow on large tables because the
rows are fetched from the table in index order, and if the
table is disordered, the entries are on random pages, so
there is one disk page retrieved for every row moved. (Post-
greSQL has a cache, but the majority of a big table will not
fit in the cache.) The other way to cluster a table is to
use
CREATE TABLE newtable AS
SELECT * FROM table ORDER BY columnlist;
which uses the PostgreSQL sorting code to produce the
desired order; this is usually much faster than an index
scan for disordered data. Then you drop the old table, use
ALTER TABLE ... RENAME to rename newtable to the old name,
and recreate the table's indexes. The big disadvantage of
this approach is that it does not preserve OIDs, con-
straints, foreign key relationships, granted privileges, and
SQL - Language StatementLast change: 2008-01-03 2
CLUSTER() SQL Commands CLUSTER()
other ancillary properties of the table - all such items
must be manually recreated. Another disadvantage is that
this way requires a sort temporary file about the same size
as the table itself, so peak disk usage is about three times
the table size instead of twice the table size.
EXAMPLES
Cluster the table employees on the basis of its index
emp_ind:
CLUSTER emp_ind ON emp;
Cluster the employees table using the same index that was
used before:
CLUSTER emp;
Cluster all tables in the database that have previously been
clustered:
CLUSTER;
COMPATIBILITY
There is no CLUSTER statement in the SQL standard.
SEE ALSO
clusterdb [clusterdb(1)]
SQL - Language StatementLast change: 2008-01-03 3
Man(1) output converted with
man2html