mirror of
https://git.savannah.gnu.org/git/gperf.git
synced 2025-12-02 13:09:22 +00:00
Regenerated.
This commit is contained in:
102
doc/gperf_4.html
102
doc/gperf_4.html
@@ -1,41 +1,95 @@
|
||||
<HTML>
|
||||
<HEAD>
|
||||
<!-- This HTML file has been created by texi2html 1.51
|
||||
from gperf.texi on 15 April 1998 -->
|
||||
from gperf.texi on 20 August 2000 -->
|
||||
|
||||
<TITLE>User's Guide to gperf - 1 Introduction</TITLE>
|
||||
<TITLE>Perfect Hash Function Generator - 2 Static search structures and GNU gperf</TITLE>
|
||||
</HEAD>
|
||||
<BODY>
|
||||
Go to the <A HREF="gperf_1.html">first</A>, <A HREF="gperf_3.html">previous</A>, <A HREF="gperf_5.html">next</A>, <A HREF="gperf_11.html">last</A> section, <A HREF="gperf_toc.html">table of contents</A>.
|
||||
<P><HR><P>
|
||||
|
||||
|
||||
<H1><A NAME="SEC6" HREF="gperf_toc.html#TOC6">1 Introduction</A></H1>
|
||||
|
||||
<H1><A NAME="SEC6" HREF="gperf_toc.html#TOC6">2 Static search structures and GNU <CODE>gperf</CODE></A></H1>
|
||||
<P>
|
||||
<CODE>gperf</CODE> is a perfect hash function generator written in C++. It
|
||||
transforms an <VAR>n</VAR> element user-specified keyword set <VAR>W</VAR> into
|
||||
a perfect hash function <VAR>F</VAR>. <VAR>F</VAR> uniquely maps keywords in
|
||||
<VAR>W</VAR> onto the range 0..<VAR>k</VAR>, where <VAR>k >= n</VAR>. If
|
||||
<VAR>k = n</VAR> then <VAR>F</VAR> is a <EM>minimal</EM> perfect hash function.
|
||||
<CODE>gperf</CODE> generates a 0..<VAR>k</VAR> element static lookup table and a
|
||||
pair of C functions. These functions determine whether a given
|
||||
character string <VAR>s</VAR> occurs in <VAR>W</VAR>, using at most one probe
|
||||
into the lookup table.
|
||||
<A NAME="IDX2"></A>
|
||||
|
||||
</P>
|
||||
<P>
|
||||
<CODE>gperf</CODE> currently generates the reserved keyword recognizer for
|
||||
lexical analyzers in several production and research compilers and
|
||||
language processing tools, including GNU C, GNU C++, GNU Pascal, GNU
|
||||
Modula 3, and GNU indent. Complete C++ source code for <CODE>gperf</CODE> is
|
||||
available via anonymous ftp from <CODE>ics.uci.edu</CODE> and
|
||||
<CODE>ftp.santafe.edu</CODE>. <CODE>gperf</CODE> was also distributed along with
|
||||
the GNU libg++ library for several years. A highly portable,
|
||||
functionally equivalent K&R C version of <CODE>gperf</CODE> is archived in
|
||||
comp.sources.unix, volume 20. Finally, a paper describing
|
||||
<CODE>gperf</CODE>'s design and implementation in greater detail is available
|
||||
in the Second USENIX C++ Conference proceedings.
|
||||
A <STRONG>static search structure</STRONG> is an Abstract Data Type with certain
|
||||
fundamental operations, e.g., <EM>initialize</EM>, <EM>insert</EM>,
|
||||
and <EM>retrieve</EM>. Conceptually, all insertions occur before any
|
||||
retrievals. In practice, <CODE>gperf</CODE> generates a <CODE>static</CODE> array
|
||||
containing search set keywords and any associated attributes specified
|
||||
by the user. Thus, there is essentially no execution-time cost for the
|
||||
insertions. It is a useful data structure for representing <EM>static
|
||||
search sets</EM>. Static search sets occur frequently in software system
|
||||
applications. Typical static search sets include compiler reserved
|
||||
words, assembler instruction opcodes, and built-in shell interpreter
|
||||
commands. Search set members, called <STRONG>keywords</STRONG>, are inserted into
|
||||
the structure only once, usually during program initialization, and are
|
||||
not generally modified at run-time.
|
||||
|
||||
</P>
|
||||
<P>
|
||||
Numerous static search structure implementations exist, e.g.,
|
||||
arrays, linked lists, binary search trees, digital search tries, and
|
||||
hash tables. Different approaches offer trade-offs between space
|
||||
utilization and search time efficiency. For example, an <VAR>n</VAR> element
|
||||
sorted array is space efficient, though the average-case time
|
||||
complexity for retrieval operations using binary search is
|
||||
proportional to log <VAR>n</VAR>. Conversely, hash table implementations
|
||||
often locate a table entry in constant time, but typically impose
|
||||
additional memory overhead and exhibit poor worst case performance.
|
||||
|
||||
</P>
|
||||
<P>
|
||||
<A NAME="IDX3"></A>
|
||||
<EM>Minimal perfect hash functions</EM> provide an optimal solution for a
|
||||
particular class of static search sets. A minimal perfect hash
|
||||
function is defined by two properties:
|
||||
|
||||
</P>
|
||||
|
||||
<UL>
|
||||
<LI>
|
||||
|
||||
It allows keyword recognition in a static search set using at most
|
||||
<EM>one</EM> probe into the hash table. This represents the "perfect"
|
||||
property.
|
||||
<LI>
|
||||
|
||||
The actual memory allocated to store the keywords is precisely large
|
||||
enough for the keyword set, and <EM>no larger</EM>. This is the
|
||||
"minimal" property.
|
||||
</UL>
|
||||
|
||||
<P>
|
||||
For most applications it is far easier to generate <EM>perfect</EM> hash
|
||||
functions than <EM>minimal perfect</EM> hash functions. Moreover,
|
||||
non-minimal perfect hash functions frequently execute faster than
|
||||
minimal ones in practice. This phenomena occurs since searching a
|
||||
sparse keyword table increases the probability of locating a "null"
|
||||
entry, thereby reducing string comparisons. <CODE>gperf</CODE>'s default
|
||||
behavior generates <EM>near-minimal</EM> perfect hash functions for
|
||||
keyword sets. However, <CODE>gperf</CODE> provides many options that permit
|
||||
user control over the degree of minimality and perfection.
|
||||
|
||||
</P>
|
||||
<P>
|
||||
Static search sets often exhibit relative stability over time. For
|
||||
example, Ada's 63 reserved words have remained constant for nearly a
|
||||
decade. It is therefore frequently worthwhile to expend concerted
|
||||
effort building an optimal search structure <EM>once</EM>, if it
|
||||
subsequently receives heavy use multiple times. <CODE>gperf</CODE> removes
|
||||
the drudgery associated with constructing time- and space-efficient
|
||||
search structures by hand. It has proven a useful and practical tool
|
||||
for serious programming projects. Output from <CODE>gperf</CODE> is currently
|
||||
used in several production and research compilers, including GNU C, GNU
|
||||
C++, GNU Pascal, and GNU Modula 3. The latter two compilers are not yet
|
||||
part of the official GNU distribution. Each compiler utilizes
|
||||
<CODE>gperf</CODE> to automatically generate static search structures that
|
||||
efficiently identify their respective reserved keywords.
|
||||
|
||||
</P>
|
||||
<P><HR><P>
|
||||
|
||||
Reference in New Issue
Block a user