In: Computer Science
DBMS:
What does it mean for a catalog to be self-describing? What does the following query mean?
((TABLES JOIN COLUMNS) WHERE COLCOUNT < 3) [TABNAME, COLNAME]
Relational database management software systems contain a
"catalog." The catalog is actually a set of tables in the database
that contain information about the database. In fact, pretty much
everything the database knows about itself is contained in its
catalog. For example, there is a "tables" table in the catalog that
contains one row for every table in the database. There is a
"columns" table in the catalog that contains one row for every
column in every table in the database.
The catalog can be queried. For example, if you want to look up all
the tables that contain "customer" as part of the table name, you
can run a simple query against the "tables" table in the catalog
and look for tables with that string in the table name. However,
the catalog cannot be updated by running update queries against the
catalog. This could seriously damage your database. Rather, changes
are made through SQL DDL (data definition language) queries (such
as create, alter) against the database objects themselves. The
catalog will then reflect these changes.
The "self describing" nature of the database refers to the
database's knowledge of the objects within the database as
reflected in the contents of its catalog.
((TABLES JOIN COLUMNS) WHERE COLCOUNT < 3) [TABNAME, COLNAME]
To combine data from two tables we use the SQL JOIN
command, which comes after the FROM
command.
The JOIN
command on its own will result in a cross
product, where each row in first table is paired with each row in
the second table. Usually this is not what is desired when
combining two tables with data that is related in some way.
For that, we need to tell the computer which columns provide the
link between the two tables using the word ON
. What we
want is to join the data with the same species codes.
SELECT * FROM surveys JOIN species ON surveys.species_id = species.species_id;
ON
is like WHERE
, it filters things
out according to a test condition. We use the
table.colname
format to tell the manager what column
in which table we are referring to.
The output of the JOIN
command will have columns
from first table plus the columns from the second table. For the
above command, the output will be a table that has the following
column names:
record_id | month | day | year | plot_id | species_id | sex | hindfoot_length | weight | species_id | genus | species | taxa | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
… | |||||||||||||
96 | 8 | 20 | 1997 | 12 | DM | M | 36 | 41 | DM | Dipodomys | merriami | Rodent | |
… |
Alternatively, we can use the word USING
, as a
short-hand. In this case we are telling the manager that we want to
combine surveys
with species
and that the
common column is species_id
.
SELECT * FROM surveys JOIN species USING (species_id);
The output will only have one species_id column
record_id | month | day | year | plot_id | species_id | sex | hindfoot_length | weight | genus | species | taxa | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
… | ||||||||||||
96 | 8 | 20 | 1997 | 12 | DM | M | 36 | 41 | Dipodomys | merriami | Rodent | |
… |
We often won’t want all of the fields from both tables, so
anywhere we would have used a field name in a non-join query, we
can use table.colname
.
For example, what if we wanted information on when individuals of each species were captured, but instead of their species ID we wanted their actual species names.
SELECT surveys.year, surveys.month, surveys.day, species.genus, species.species FROM surveys JOIN species ON surveys.species_id = species.species_id;
year | month | day | genus | species | |
---|---|---|---|---|---|
… | |||||
1977 | 7 | 16 | Neotoma | albigula | |
1977 | 7 | 16 | Dipodomys | merriami | |
… |