| |
|
Back to Index
|
Although indexes in PostgreSQL do not need maintenance
and tuning, it is still important to check which indexes are actually used by the real-life
query workload. Examining index usage is done with the EXPLAIN
command; its application for this purpose is illustrated in Section
10.1.
It is difficult to formulate a general procedure for determining which indexes to set up.
There are a number of typical cases that have been shown in the examples throughout the
previous sections. A good deal of experimentation will be necessary in most cases. The rest
of this section gives some tips for that.
-
Always run ANALYZE first. This command collects statistics
about the distribution of the values in the table. This information is required to guess
the number of rows returned by a query, which is needed by the planner to assign
realistic costs to each possible query plan. In absence of any real statistics, some
default values are assumed, which are almost certain to be inaccurate. Examining an
application's index usage without having run ANALYZE is
therefore a lost cause.
-
Use real data for experimentation. Using test data for setting up indexes will tell
you what indexes you need for the test data, but that is all.
It is especially fatal to use proportionally reduced data sets. While selecting 1000
out of 100000 rows could be a candidate for an index, selecting 1 out of 100 rows will
hardly be, because the 100 rows will probably fit within a single disk page, and there
is no plan that can beat sequentially fetching 1 disk page.
Also be careful when making up test data, which is often unavoidable when the
application is not in production use yet. Values that are very similar, completely
random, or inserted in sorted order will skew the statistics away from the distribution
that real data would have.
-
When indexes are not used, it can be useful for testing to force their use. There are
run-time parameters that can turn off various plan types (described in the PostgreSQL 7.3
Administrator's Guide). For instance, turning off sequential scans (enable_seqscan) and nested-loop joins (enable_nestloop),
which are the most basic plans, will force the system to use a different plan. If the
system still chooses a sequential scan or nested-loop join then there is probably a more
fundamental problem for why the index is not used, for example, the query condition does
not match the index. (What kind of query can use what kind of index is explained in the
previous sections.)
-
If forcing index usage does use the index, then there are two possibilities: Either
the system is right and using the index is indeed not appropriate, or the cost estimates
of the query plans are not reflecting reality. So you should time your query with and
without indexes. The EXPLAIN ANALYZE command can be useful
here.
-
If it turns out that the cost estimates are wrong, there are, again, two
possibilities. The total cost is computed from the per-row costs of each plan node times
the selectivity estimate of the plan node. The costs of the plan nodes can be tuned with
run-time parameters (described in the PostgreSQL 7.3
Administrator's Guide). An inaccurate selectivity estimate is due to
insufficient statistics. It may be possible to help this by tuning the
statistics-gathering parameters (see ALTER TABLE reference).
If you do not succeed in adjusting the costs to be more appropriate, then you may
have to resort to forcing index usage explicitly. You may also want to contact the PostgreSQL developers to examine the issue.
|
|
|
|
|
|
© 2002-2003 Active-Venture.com
Small business hosting
|
| |
|
Disclaimer: This
documentation is provided only for the benefits of our hosting customers.
For authoritative source of the documentation, please refer to http://www.postgresql.org/docs/
|
|
|