mirror of
https://git.savannah.gnu.org/git/gperf.git
synced 2025-12-02 21:19:24 +00:00
99 lines
4.3 KiB
HTML
99 lines
4.3 KiB
HTML
<HTML>
|
|
<HEAD>
|
|
<!-- This HTML file has been created by texi2html 1.51
|
|
from gperf.texi on 20 August 2000 -->
|
|
|
|
<TITLE>Perfect Hash Function Generator - 2 Static search structures and GNU gperf</TITLE>
|
|
</HEAD>
|
|
<BODY>
|
|
Go to the <A HREF="gperf_1.html">first</A>, <A HREF="gperf_3.html">previous</A>, <A HREF="gperf_5.html">next</A>, <A HREF="gperf_11.html">last</A> section, <A HREF="gperf_toc.html">table of contents</A>.
|
|
<P><HR><P>
|
|
|
|
|
|
<H1><A NAME="SEC6" HREF="gperf_toc.html#TOC6">2 Static search structures and GNU <CODE>gperf</CODE></A></H1>
|
|
<P>
|
|
<A NAME="IDX2"></A>
|
|
|
|
</P>
|
|
<P>
|
|
A <STRONG>static search structure</STRONG> is an Abstract Data Type with certain
|
|
fundamental operations, e.g., <EM>initialize</EM>, <EM>insert</EM>,
|
|
and <EM>retrieve</EM>. Conceptually, all insertions occur before any
|
|
retrievals. In practice, <CODE>gperf</CODE> generates a <CODE>static</CODE> array
|
|
containing search set keywords and any associated attributes specified
|
|
by the user. Thus, there is essentially no execution-time cost for the
|
|
insertions. It is a useful data structure for representing <EM>static
|
|
search sets</EM>. Static search sets occur frequently in software system
|
|
applications. Typical static search sets include compiler reserved
|
|
words, assembler instruction opcodes, and built-in shell interpreter
|
|
commands. Search set members, called <STRONG>keywords</STRONG>, are inserted into
|
|
the structure only once, usually during program initialization, and are
|
|
not generally modified at run-time.
|
|
|
|
</P>
|
|
<P>
|
|
Numerous static search structure implementations exist, e.g.,
|
|
arrays, linked lists, binary search trees, digital search tries, and
|
|
hash tables. Different approaches offer trade-offs between space
|
|
utilization and search time efficiency. For example, an <VAR>n</VAR> element
|
|
sorted array is space efficient, though the average-case time
|
|
complexity for retrieval operations using binary search is
|
|
proportional to log <VAR>n</VAR>. Conversely, hash table implementations
|
|
often locate a table entry in constant time, but typically impose
|
|
additional memory overhead and exhibit poor worst case performance.
|
|
|
|
</P>
|
|
<P>
|
|
<A NAME="IDX3"></A>
|
|
<EM>Minimal perfect hash functions</EM> provide an optimal solution for a
|
|
particular class of static search sets. A minimal perfect hash
|
|
function is defined by two properties:
|
|
|
|
</P>
|
|
|
|
<UL>
|
|
<LI>
|
|
|
|
It allows keyword recognition in a static search set using at most
|
|
<EM>one</EM> probe into the hash table. This represents the "perfect"
|
|
property.
|
|
<LI>
|
|
|
|
The actual memory allocated to store the keywords is precisely large
|
|
enough for the keyword set, and <EM>no larger</EM>. This is the
|
|
"minimal" property.
|
|
</UL>
|
|
|
|
<P>
|
|
For most applications it is far easier to generate <EM>perfect</EM> hash
|
|
functions than <EM>minimal perfect</EM> hash functions. Moreover,
|
|
non-minimal perfect hash functions frequently execute faster than
|
|
minimal ones in practice. This phenomena occurs since searching a
|
|
sparse keyword table increases the probability of locating a "null"
|
|
entry, thereby reducing string comparisons. <CODE>gperf</CODE>'s default
|
|
behavior generates <EM>near-minimal</EM> perfect hash functions for
|
|
keyword sets. However, <CODE>gperf</CODE> provides many options that permit
|
|
user control over the degree of minimality and perfection.
|
|
|
|
</P>
|
|
<P>
|
|
Static search sets often exhibit relative stability over time. For
|
|
example, Ada's 63 reserved words have remained constant for nearly a
|
|
decade. It is therefore frequently worthwhile to expend concerted
|
|
effort building an optimal search structure <EM>once</EM>, if it
|
|
subsequently receives heavy use multiple times. <CODE>gperf</CODE> removes
|
|
the drudgery associated with constructing time- and space-efficient
|
|
search structures by hand. It has proven a useful and practical tool
|
|
for serious programming projects. Output from <CODE>gperf</CODE> is currently
|
|
used in several production and research compilers, including GNU C, GNU
|
|
C++, GNU Pascal, and GNU Modula 3. The latter two compilers are not yet
|
|
part of the official GNU distribution. Each compiler utilizes
|
|
<CODE>gperf</CODE> to automatically generate static search structures that
|
|
efficiently identify their respective reserved keywords.
|
|
|
|
</P>
|
|
<P><HR><P>
|
|
Go to the <A HREF="gperf_1.html">first</A>, <A HREF="gperf_3.html">previous</A>, <A HREF="gperf_5.html">next</A>, <A HREF="gperf_11.html">last</A> section, <A HREF="gperf_toc.html">table of contents</A>.
|
|
</BODY>
|
|
</HTML>
|