In many cases a user will not need to understand the details of the type conversion
mechanism. However, the implicit conversions done by PostgreSQL
can affect the results of a query. When necessary, these results can be tailored by a user
or programmer using explicit type
coercion.
SQL is a strongly typed language. That is, every data item
has an associated data type which determines its behavior and allowed usage. PostgreSQL has an extensible type system that is much more
general and flexible than other SQL implementations. Hence,
most type conversion behavior in PostgreSQL should be
governed by general rules rather than by ad hoc heuristics,
to allow mixed-type expressions to be meaningful even with user-defined types.
The PostgreSQL scanner/parser decodes lexical elements
into only five fundamental categories: integers, floating-point numbers, strings, names,
and key words. Most extended types are first tokenized into strings. The SQL language definition allows specifying type names with strings,
and this mechanism can be used in PostgreSQL to start the
parser down the correct path. For example, the query
tgl=> SELECT text 'Origin' AS "Label", point '(0,0)' AS "Value";
Label | Value
--------+-------
Origin | (0,0)
(1 row)
has two literal constants, of type text and point.
If a type is not specified for a string literal, then the placeholder type unknown is assigned initially, to be resolved in later stages as
described below.
There are four fundamental SQL constructs requiring
distinct type conversion rules in the PostgreSQL parser:
- Operators
-
PostgreSQL allows expressions with prefix and
postfix unary (one-argument) operators, as well as binary (two-argument) operators.
- Function calls
-
Much of the PostgreSQL type system is built
around a rich set of functions. Function calls have one or more arguments which, for
any specific query, must be matched to the functions available in the system
catalog. Since PostgreSQL permits function
overloading, the function name alone does not uniquely identify the function to be
called; the parser must select the right function based on the data types of the
supplied arguments.
- Query targets
-
SQL INSERT and UPDATE statements place the results of expressions into a
table. The expressions in the query must be matched up with, and perhaps converted
to, the types of the target columns.
- UNION and CASE constructs
-
Since all select results from a unionized SELECT
statement must appear in a single set of columns, the types of the results of each SELECT clause must be matched up and converted to a uniform
set. Similarly, the result expressions of a CASE construct
must be coerced to a common type so that the CASE
expression as a whole has a known output type.
Many of the general type conversion rules use simple conventions built on the PostgreSQL function and operator system tables. There are some
heuristics included in the conversion rules to better support conventions for the SQL standard native types such as smallint, integer, and real.
The system catalogs store information about which conversions, called casts, between data types are valid, and how to perform those
conversions. Additional casts can be added by the user with the CREATE
CAST command. (This is usually done in conjunction with defining new data types. The
set of casts between the built-in types has been carefully crafted and should not be
altered.)
An additional heuristic is provided in the parser to allow better guesses at proper
behavior for SQL standard types. There are several basic type categories defined: boolean, numeric, string, bitstring, datetime, timespan, geometric,
network, and user-defined. Each category, with the exception of
user-defined, has a preferred type which is preferentially
selected when there is ambiguity. In the user-defined category, each type is its own
preferred type. Ambiguous expressions (those with multiple candidate parsing solutions)
can often be resolved when there are multiple possible built-in types, but they will raise
an error when there are multiple choices for user-defined types.
All type conversion rules are designed with several principles in mind:
-
Implicit conversions should never have surprising or unpredictable outcomes.
-
User-defined types, of which the parser has no a priori
knowledge, should be "higher" in the type
hierarchy. In mixed-type expressions, native types shall always be converted to a
user-defined type (of course, only if conversion is necessary).
-
User-defined types are not related. Currently, PostgreSQL
does not have information available to it on relationships between types, other than
hardcoded heuristics for built-in types and implicit relationships based on available
functions in the catalog.
-
There should be no extra overhead from the parser or executor if a query does not
need implicit type conversion. That is, if a query is well formulated and the types
already match up, then the query should proceed without spending extra time in the
parser and without introducing unnecessary implicit conversion functions into the
query.
Additionally, if a query usually requires an implicit conversion for a function,
and if then the user defines an explicit function with the correct argument types, the
parser should use this new function and will no longer do the implicit conversion
using the old function.