Queries of the global surface database will be made by agricultural
planners, economic planners, city planners, climate modelers,
sociological studies, political studies, etc. We anticipate upwards
of queries per day or 12 queries per second, each query
requiring correlations over varying regions of the database.
The processed database is approximately bytes. Some queries
will use substantial portions of the database and the queries will
need to be answered on the order minutes. (For reference, the United
States is
of the land mass of the planet.) A query that needs
access to the entire database in five minutes would require a bandwidth
of
terabytes per second. Accessing a single model set for each
grid point in the United States in five minutes requires bandwidth of
gigabytes per second.
The table in Figure 4.3 estimates the computational cost
required for different classes of queries. It is assumed that
queries per day are to be processed and each is identical. Three
different size queries are estimated: (1) one that uses the entire
data contained in the database, (2) one that uses the data
corresponding to one sample for each grid point in the United States,
and (3) one that uses the data corresponding to one sample for each
grid point on a
acre farm. In addition, it is assumed that the
computation required to handle the query is either linear,
,
or quadratic.
Clearly, peta-operation machines will have difficulties with these query rates if a query requires more than linear processing. However, even large-scale queries that deal with all of the archived information can (barely) be handled by peta-machines if efficient (linear) algorithms can be developed.