mirror of
https://git.savannah.gnu.org/git/gperf.git
synced 2025-12-02 21:19:24 +00:00
Misc doc updates.
This commit is contained in:
17
ChangeLog
17
ChangeLog
@@ -1,3 +1,20 @@
|
|||||||
|
2002-11-09 Bruno Haible <bruno@clisp.org>
|
||||||
|
|
||||||
|
* doc/gperf.texi: Talk about "bytes" instead of "characters". Talk
|
||||||
|
about "keywords", not "keys". Talk about "input file", not "keyfile".
|
||||||
|
(@menu): Fix a menu entry.
|
||||||
|
(Contributors): Don't mention cperf.
|
||||||
|
(Motivation): Fix an off-by-one error in the definition of "minimal".
|
||||||
|
Mention GNU Java. Recommend http URL instead of anonymous ftp.
|
||||||
|
(Search Structures): Mention GNU Java.
|
||||||
|
(Output Format): Drop reference to node 'Implementation'.
|
||||||
|
(Output Details): Talk about "slot-name" instead of "key name".
|
||||||
|
(Algorithmic Details): Talk about "selected byte positons", not
|
||||||
|
"key positions". Upper limit is now 255. Explain a third reason
|
||||||
|
why duplicates can occur. Describe negative effects of
|
||||||
|
--occurrence-sort.
|
||||||
|
(Implementation): Remove chapter.
|
||||||
|
|
||||||
2002-11-07 Bruno Haible <bruno@clisp.org>
|
2002-11-07 Bruno Haible <bruno@clisp.org>
|
||||||
|
|
||||||
* src/bool-array.cc (Bool_Array::~Bool_Array): Free _storage_array.
|
* src/bool-array.cc (Bool_Array::~Bool_Array): Free _storage_array.
|
||||||
|
|||||||
184
doc/gperf.texi
184
doc/gperf.texi
@@ -7,7 +7,7 @@
|
|||||||
|
|
||||||
@c some day we should @include version.texi instead of defining
|
@c some day we should @include version.texi instead of defining
|
||||||
@c these values at hand.
|
@c these values at hand.
|
||||||
@set UPDATED 26 September 2000
|
@set UPDATED 9 November 2002
|
||||||
@set EDITION 2.7.2
|
@set EDITION 2.7.2
|
||||||
@set VERSION 2.7.2
|
@set VERSION 2.7.2
|
||||||
@c ---------------------
|
@c ---------------------
|
||||||
@@ -28,7 +28,7 @@
|
|||||||
This file documents the features of the GNU Perfect Hash Function
|
This file documents the features of the GNU Perfect Hash Function
|
||||||
Generator @value{VERSION}.
|
Generator @value{VERSION}.
|
||||||
|
|
||||||
Copyright @copyright{} 1989-2000 Free Software Foundation, Inc.
|
Copyright @copyright{} 1989-2002 Free Software Foundation, Inc.
|
||||||
|
|
||||||
Permission is granted to make and distribute verbatim copies of this
|
Permission is granted to make and distribute verbatim copies of this
|
||||||
manual provided the copyright notice and this permission notice are
|
manual provided the copyright notice and this permission notice are
|
||||||
@@ -65,7 +65,7 @@ Software Foundation instead of in the original English.
|
|||||||
|
|
||||||
@page
|
@page
|
||||||
@vskip 0pt plus 1filll
|
@vskip 0pt plus 1filll
|
||||||
Copyright @copyright{} 1989-2000 Free Software Foundation, Inc.
|
Copyright @copyright{} 1989-2002 Free Software Foundation, Inc.
|
||||||
|
|
||||||
|
|
||||||
Permission is granted to make and distribute verbatim copies of
|
Permission is granted to make and distribute verbatim copies of
|
||||||
@@ -98,13 +98,12 @@ bugs.
|
|||||||
* Copying:: GNU @code{gperf} General Public License says
|
* Copying:: GNU @code{gperf} General Public License says
|
||||||
how you can copy and share @code{gperf}.
|
how you can copy and share @code{gperf}.
|
||||||
* Contributors:: People who have contributed to @code{gperf}.
|
* Contributors:: People who have contributed to @code{gperf}.
|
||||||
* Motivation:: Static search structures and GNU GPERF.
|
* Motivation:: The purpose of @code{gperf}.
|
||||||
* Search Structures:: Static search structures and GNU @code{gperf}
|
* Search Structures:: Static search structures and GNU @code{gperf}
|
||||||
* Description:: High-level discussion of how GPERF functions.
|
* Description:: High-level discussion of how GPERF functions.
|
||||||
* Options:: A description of options to the program.
|
* Options:: A description of options to the program.
|
||||||
* Bugs:: Known bugs and limitations with GPERF.
|
* Bugs:: Known bugs and limitations with GPERF.
|
||||||
* Projects:: Things still left to do.
|
* Projects:: Things still left to do.
|
||||||
* Implementation:: Implementation Details for GNU GPERF.
|
|
||||||
* Bibliography:: Material Referenced in this Report.
|
* Bibliography:: Material Referenced in this Report.
|
||||||
|
|
||||||
* Concept Index::
|
* Concept Index::
|
||||||
@@ -115,7 +114,7 @@ High-Level Description of GNU @code{gperf}
|
|||||||
|
|
||||||
* Input Format:: Input Format to @code{gperf}
|
* Input Format:: Input Format to @code{gperf}
|
||||||
* Output Format:: Output Format for Generated C Code with @code{gperf}
|
* Output Format:: Output Format for Generated C Code with @code{gperf}
|
||||||
* Binary Strings:: Use of NUL characters
|
* Binary Strings:: Use of NUL bytes
|
||||||
|
|
||||||
Input Format to @code{gperf}
|
Input Format to @code{gperf}
|
||||||
|
|
||||||
@@ -147,8 +146,7 @@ Invoking @code{gperf}
|
|||||||
@item
|
@item
|
||||||
@cindex Bugs
|
@cindex Bugs
|
||||||
The GNU @code{gperf} perfect hash function generator utility was
|
The GNU @code{gperf} perfect hash function generator utility was
|
||||||
originally written in GNU C++ by Douglas C. Schmidt. It is now also
|
written in GNU C++ by Douglas C. Schmidt. The general
|
||||||
available in a highly-portable ``old-style'' C version. The general
|
|
||||||
idea for the perfect hash function generator was inspired by Keith
|
idea for the perfect hash function generator was inspired by Keith
|
||||||
Bostic's algorithm written in C, and distributed to net.sources around
|
Bostic's algorithm written in C, and distributed to net.sources around
|
||||||
1984. The current program is a heavily modified, enhanced, and extended
|
1984. The current program is a heavily modified, enhanced, and extended
|
||||||
@@ -175,8 +173,8 @@ routines for better reliability.
|
|||||||
@code{gperf} is a perfect hash function generator written in C++. It
|
@code{gperf} is a perfect hash function generator written in C++. It
|
||||||
transforms an @var{n} element user-specified keyword set @var{W} into a
|
transforms an @var{n} element user-specified keyword set @var{W} into a
|
||||||
perfect hash function @var{F}. @var{F} uniquely maps keywords in
|
perfect hash function @var{F}. @var{F} uniquely maps keywords in
|
||||||
@var{W} onto the range 0..@var{k}, where @var{k} >= @var{n}. If @var{k}
|
@var{W} onto the range 0..@var{k}, where @var{k} >= @var{n-1}. If @var{k}
|
||||||
= @var{n} then @var{F} is a @emph{minimal} perfect hash function.
|
= @var{n-1} then @var{F} is a @emph{minimal} perfect hash function.
|
||||||
@code{gperf} generates a 0..@var{k} element static lookup table and a
|
@code{gperf} generates a 0..@var{k} element static lookup table and a
|
||||||
pair of C functions. These functions determine whether a given
|
pair of C functions. These functions determine whether a given
|
||||||
character string @var{s} occurs in @var{W}, using at most one probe into
|
character string @var{s} occurs in @var{W}, using at most one probe into
|
||||||
@@ -184,9 +182,9 @@ the lookup table.
|
|||||||
|
|
||||||
@code{gperf} currently generates the reserved keyword recognizer for
|
@code{gperf} currently generates the reserved keyword recognizer for
|
||||||
lexical analyzers in several production and research compilers and
|
lexical analyzers in several production and research compilers and
|
||||||
language processing tools, including GNU C, GNU C++, GNU Pascal, GNU
|
language processing tools, including GNU C, GNU C++, GNU Java, GNU Pascal,
|
||||||
Modula 3, and GNU indent. Complete C++ source code for @code{gperf} is
|
GNU Modula 3, and GNU indent. Complete C++ source code for @code{gperf} is
|
||||||
available via anonymous ftp from @code{ftp://ftp.gnu.org/pub/gnu/gperf/}.
|
available from @code{http://ftp.gnu.org/pub/gnu/gperf/}.
|
||||||
A paper describing @code{gperf}'s design and implementation in greater
|
A paper describing @code{gperf}'s design and implementation in greater
|
||||||
detail is available in the Second USENIX C++ Conference proceedings
|
detail is available in the Second USENIX C++ Conference proceedings
|
||||||
or from @code{http://www.cs.wustl.edu/~schmidt/resume.html}.
|
or from @code{http://www.cs.wustl.edu/~schmidt/resume.html}.
|
||||||
@@ -198,7 +196,7 @@ or from @code{http://www.cs.wustl.edu/~schmidt/resume.html}.
|
|||||||
A @dfn{static search structure} is an Abstract Data Type with certain
|
A @dfn{static search structure} is an Abstract Data Type with certain
|
||||||
fundamental operations, e.g., @emph{initialize}, @emph{insert},
|
fundamental operations, e.g., @emph{initialize}, @emph{insert},
|
||||||
and @emph{retrieve}. Conceptually, all insertions occur before any
|
and @emph{retrieve}. Conceptually, all insertions occur before any
|
||||||
retrievals. In practice, @code{gperf} generates a @code{static} array
|
retrievals. In practice, @code{gperf} generates a @emph{static} array
|
||||||
containing search set keywords and any associated attributes specified
|
containing search set keywords and any associated attributes specified
|
||||||
by the user. Thus, there is essentially no execution-time cost for the
|
by the user. Thus, there is essentially no execution-time cost for the
|
||||||
insertions. It is a useful data structure for representing @emph{static
|
insertions. It is a useful data structure for representing @emph{static
|
||||||
@@ -254,8 +252,8 @@ the drudgery associated with constructing time- and space-efficient
|
|||||||
search structures by hand. It has proven a useful and practical tool
|
search structures by hand. It has proven a useful and practical tool
|
||||||
for serious programming projects. Output from @code{gperf} is currently
|
for serious programming projects. Output from @code{gperf} is currently
|
||||||
used in several production and research compilers, including GNU C, GNU
|
used in several production and research compilers, including GNU C, GNU
|
||||||
C++, GNU Pascal, and GNU Modula 3. The latter two compilers are not yet
|
C++, GNU Java, GNU Pascal, and GNU Modula 3. The latter two compilers are
|
||||||
part of the official GNU distribution. Each compiler utilizes
|
not yet part of the official GNU distribution. Each compiler utilizes
|
||||||
@code{gperf} to automatically generate static search structures that
|
@code{gperf} to automatically generate static search structures that
|
||||||
efficiently identify their respective reserved keywords.
|
efficiently identify their respective reserved keywords.
|
||||||
|
|
||||||
@@ -265,11 +263,11 @@ efficiently identify their respective reserved keywords.
|
|||||||
@menu
|
@menu
|
||||||
* Input Format:: Input Format to @code{gperf}
|
* Input Format:: Input Format to @code{gperf}
|
||||||
* Output Format:: Output Format for Generated C Code with @code{gperf}
|
* Output Format:: Output Format for Generated C Code with @code{gperf}
|
||||||
* Binary Strings:: Use of NUL characters
|
* Binary Strings:: Use of NUL bytes
|
||||||
@end menu
|
@end menu
|
||||||
|
|
||||||
The perfect hash function generator @code{gperf} reads a set of
|
The perfect hash function generator @code{gperf} reads a set of
|
||||||
``keywords'' from a @dfn{keyfile} (or from the standard input by
|
``keywords'' from an input file (or from the standard input by
|
||||||
default). It attempts to derive a perfect hashing function that
|
default). It attempts to derive a perfect hashing function that
|
||||||
recognizes a member of the @dfn{static keyword set} with at most a
|
recognizes a member of the @dfn{static keyword set} with at most a
|
||||||
single probe into the lookup table. If @code{gperf} succeeds in
|
single probe into the lookup table. If @code{gperf} succeeds in
|
||||||
@@ -288,7 +286,7 @@ statement scheme that minimizes data space storage size. Furthermore,
|
|||||||
using a C @code{switch} may actually speed up the keyword retrieval time
|
using a C @code{switch} may actually speed up the keyword retrieval time
|
||||||
somewhat. Actual results depend on your C compiler, of course.
|
somewhat. Actual results depend on your C compiler, of course.
|
||||||
|
|
||||||
In general, @code{gperf} assigns values to the characters it is using
|
In general, @code{gperf} assigns values to the bytes it is using
|
||||||
for hashing until some set of values gives each keyword a unique value.
|
for hashing until some set of values gives each keyword a unique value.
|
||||||
A helpful heuristic is that the larger the hash value range, the easier
|
A helpful heuristic is that the larger the hash value range, the easier
|
||||||
it is for @code{gperf} to find and generate a perfect hash function.
|
it is for @code{gperf} to find and generate a perfect hash function.
|
||||||
@@ -300,7 +298,7 @@ Experimentation is the key to getting the most from @code{gperf}.
|
|||||||
@cindex Declaration section
|
@cindex Declaration section
|
||||||
@cindex Keywords section
|
@cindex Keywords section
|
||||||
@cindex Functions section
|
@cindex Functions section
|
||||||
You can control the input keyfile format by varying certain command-line
|
You can control the input file format by varying certain command-line
|
||||||
arguments, in particular the @samp{-t} option. The input's appearance
|
arguments, in particular the @samp{-t} option. The input's appearance
|
||||||
is similar to GNU utilities @code{flex} and @code{bison} (or UNIX
|
is similar to GNU utilities @code{flex} and @code{bison} (or UNIX
|
||||||
utilities @code{lex} and @code{yacc}). Here's an outline of the general
|
utilities @code{lex} and @code{yacc}). Here's an outline of the general
|
||||||
@@ -333,7 +331,7 @@ The keyword input file optionally contains a section for including
|
|||||||
arbitrary C declarations and definitions, as well as provisions for
|
arbitrary C declarations and definitions, as well as provisions for
|
||||||
providing a user-supplied @code{struct}. If the @samp{-t} option
|
providing a user-supplied @code{struct}. If the @samp{-t} option
|
||||||
@emph{is} enabled, you @emph{must} provide a C @code{struct} as the last
|
@emph{is} enabled, you @emph{must} provide a C @code{struct} as the last
|
||||||
component in the declaration section from the keyfile file. The first
|
component in the declaration section from the input file. The first
|
||||||
field in this struct must be a @code{char *} or @code{const char *}
|
field in this struct must be a @code{char *} or @code{const char *}
|
||||||
identifier called @samp{name}, although it is possible to modify this
|
identifier called @samp{name}, although it is possible to modify this
|
||||||
field's name with the @samp{-K} option described below.
|
field's name with the @samp{-K} option described below.
|
||||||
@@ -392,7 +390,7 @@ march, 3, 31, 31
|
|||||||
@end example
|
@end example
|
||||||
|
|
||||||
It is possible to omit the declaration section entirely. In this case
|
It is possible to omit the declaration section entirely. In this case
|
||||||
the keyfile begins directly with the first keyword line, e.g.:
|
the input file begins directly with the first keyword line, e.g.:
|
||||||
|
|
||||||
@example
|
@example
|
||||||
@group
|
@group
|
||||||
@@ -407,12 +405,12 @@ april, 4, 30, 30
|
|||||||
@node Keywords, Functions, Declarations, Input Format
|
@node Keywords, Functions, Declarations, Input Format
|
||||||
@subsection Format for Keyword Entries
|
@subsection Format for Keyword Entries
|
||||||
|
|
||||||
The second keyfile format section contains lines of keywords and any
|
The second input file format section contains lines of keywords and any
|
||||||
associated attributes you might supply. A line beginning with @samp{#}
|
associated attributes you might supply. A line beginning with @samp{#}
|
||||||
in the first column is considered a comment. Everything following the
|
in the first column is considered a comment. Everything following the
|
||||||
@samp{#} is ignored, up to and including the following newline.
|
@samp{#} is ignored, up to and including the following newline.
|
||||||
|
|
||||||
The first field of each non-comment line is always the key itself. It
|
The first field of each non-comment line is always the keyword itself. It
|
||||||
can be given in two ways: as a simple name, i.e., without surrounding
|
can be given in two ways: as a simple name, i.e., without surrounding
|
||||||
string quotation marks, or as a string enclosed in double-quotes, in
|
string quotation marks, or as a string enclosed in double-quotes, in
|
||||||
C syntax, possibly with backslash escapes like @code{\"} or @code{\234}
|
C syntax, possibly with backslash escapes like @code{\"} or @code{\234}
|
||||||
@@ -472,14 +470,13 @@ function prototypes are as follows:
|
|||||||
|
|
||||||
@deftypefun {unsigned int} hash (const char * @var{str}, unsigned int @var{len})
|
@deftypefun {unsigned int} hash (const char * @var{str}, unsigned int @var{len})
|
||||||
By default, the generated @code{hash} function returns an integer value
|
By default, the generated @code{hash} function returns an integer value
|
||||||
created by adding @var{len} to several user-specified @var{str} key
|
created by adding @var{len} to several user-specified @var{str} byte
|
||||||
positions indexed into an @dfn{associated values} table stored in a
|
positions indexed into an @dfn{associated values} table stored in a
|
||||||
local static array. The associated values table is constructed
|
local static array. The associated values table is constructed
|
||||||
internally by @code{gperf} and later output as a static local C array
|
internally by @code{gperf} and later output as a static local C array
|
||||||
called @samp{hash_table}; its meaning and properties are described below
|
called @samp{hash_table}. The relevant selected positions (i.e. indices
|
||||||
(@pxref{Implementation}). The relevant key positions are specified via
|
into @var{str}) are specified via the @samp{-k} option when running
|
||||||
the @samp{-k} option when running @code{gperf}, as detailed in the
|
@code{gperf}, as detailed in the @emph{Options} section below(@pxref{Options}).
|
||||||
@emph{Options} section below(@pxref{Options}).
|
|
||||||
@end deftypefun
|
@end deftypefun
|
||||||
|
|
||||||
@deftypefun {} in_word_set (const char * @var{str}, unsigned int @var{len})
|
@deftypefun {} in_word_set (const char * @var{str}, unsigned int @var{len})
|
||||||
@@ -491,7 +488,7 @@ a pointer to the matching keyword's structure. Otherwise it returns
|
|||||||
|
|
||||||
If the option @samp{-c} is not used, @var{str} must be a NUL terminated
|
If the option @samp{-c} is not used, @var{str} must be a NUL terminated
|
||||||
string of exactly length @var{len}. If @samp{-c} is used, @var{str} must
|
string of exactly length @var{len}. If @samp{-c} is used, @var{str} must
|
||||||
simply be an array of @var{len} characters and does not need to be NUL
|
simply be an array of @var{len} bytes and does not need to be NUL
|
||||||
terminated.
|
terminated.
|
||||||
|
|
||||||
The code generated for these two functions is affected by the following
|
The code generated for these two functions is affected by the following
|
||||||
@@ -513,19 +510,19 @@ code.
|
|||||||
@end table
|
@end table
|
||||||
|
|
||||||
If the @samp{-t} and @samp{-S} options are omitted, the default action
|
If the @samp{-t} and @samp{-S} options are omitted, the default action
|
||||||
is to generate a @code{char *} array containing the keys, together with
|
is to generate a @code{char *} array containing the keywords, together with
|
||||||
additional null strings used for padding the array. By experimenting
|
additional empty strings used for padding the array. By experimenting
|
||||||
with the various input and output options, and timing the resulting C
|
with the various input and output options, and timing the resulting C
|
||||||
code, you can determine the best option choices for different keyword
|
code, you can determine the best option choices for different keyword
|
||||||
set characteristics.
|
set characteristics.
|
||||||
|
|
||||||
@node Binary Strings, , Output Format, Description
|
@node Binary Strings, , Output Format, Description
|
||||||
@section Use of NUL characters
|
@section Use of NUL bytes
|
||||||
@cindex NUL
|
@cindex NUL
|
||||||
|
|
||||||
By default, the code generated by @code{gperf} operates on zero
|
By default, the code generated by @code{gperf} operates on zero
|
||||||
terminated strings, the usual representation of strings in C. This means
|
terminated strings, the usual representation of strings in C. This means
|
||||||
that the keywords in the input file must not contain NUL characters,
|
that the keywords in the input file must not contain NUL bytes,
|
||||||
and the @var{str} argument passed to @code{hash} or @code{in_word_set}
|
and the @var{str} argument passed to @code{hash} or @code{in_word_set}
|
||||||
must be NUL terminated and have exactly length @var{len}.
|
must be NUL terminated and have exactly length @var{len}.
|
||||||
|
|
||||||
@@ -533,12 +530,12 @@ If option @samp{-c} is used, then the @var{str} argument does not need
|
|||||||
to be NUL terminated. The code generated by @code{gperf} will only
|
to be NUL terminated. The code generated by @code{gperf} will only
|
||||||
access the first @var{len}, not @var{len+1}, bytes starting at @var{str}.
|
access the first @var{len}, not @var{len+1}, bytes starting at @var{str}.
|
||||||
However, the keywords in the input file still must not contain NUL
|
However, the keywords in the input file still must not contain NUL
|
||||||
characters.
|
bytes.
|
||||||
|
|
||||||
If option @samp{-l} is used, then the hash table performs binary
|
If option @samp{-l} is used, then the hash table performs binary
|
||||||
comparison. The keywords in the input file may contain NUL characters,
|
comparison. The keywords in the input file may contain NUL bytes,
|
||||||
written in string syntax as @code{\000} or @code{\x00}, and the code
|
written in string syntax as @code{\000} or @code{\x00}, and the code
|
||||||
generated by @code{gperf} will treat NUL like any other character.
|
generated by @code{gperf} will treat NUL like any other byte.
|
||||||
Also, in this case the @samp{-c} option is ignored.
|
Also, in this case the @samp{-c} option is ignored.
|
||||||
|
|
||||||
@node Options, Bugs, Description, Top
|
@node Options, Bugs, Description, Top
|
||||||
@@ -622,12 +619,12 @@ This option is supported for compatibility with previous releases of
|
|||||||
@section Options for fine tuning Details in the Output Code
|
@section Options for fine tuning Details in the Output Code
|
||||||
|
|
||||||
@table @samp
|
@table @samp
|
||||||
@item -K @var{key-name}
|
@item -K @var{slot-name}
|
||||||
@itemx --slot-name=@var{key-name}
|
@itemx --slot-name=@var{slot-name}
|
||||||
@cindex Slot name
|
@cindex Slot name
|
||||||
This option is only useful when option @samp{-t} has been given.
|
This option is only useful when option @samp{-t} has been given.
|
||||||
By default, the program assumes the structure component identifier for
|
By default, the program assumes the structure component identifier for
|
||||||
the keyword is @samp{name}. This option allows an arbitrary choice of
|
the keyword is @samp{slot-name}. This option allows an arbitrary choice of
|
||||||
identifier for this component, although it still must occur as the first
|
identifier for this component, although it still must occur as the first
|
||||||
field in your supplied @code{struct}.
|
field in your supplied @code{struct}.
|
||||||
|
|
||||||
@@ -636,9 +633,9 @@ field in your supplied @code{struct}.
|
|||||||
@cindex Initializers
|
@cindex Initializers
|
||||||
This option is only useful when option @samp{-t} has been given.
|
This option is only useful when option @samp{-t} has been given.
|
||||||
It permits to specify initializers for the structure members following
|
It permits to specify initializers for the structure members following
|
||||||
@var{key name} in empty hash table entries. The list of initializers
|
@var{slot-name} in empty hash table entries. The list of initializers
|
||||||
should start with a comma. By default, the emitted code will
|
should start with a comma. By default, the emitted code will
|
||||||
zero-initialize structure members following @var{key name}.
|
zero-initialize structure members following @var{slot-name}.
|
||||||
|
|
||||||
@item -H @var{hash-function-name}
|
@item -H @var{hash-function-name}
|
||||||
@itemx --hash-fn-name=@var{hash-function-name}
|
@itemx --hash-fn-name=@var{hash-function-name}
|
||||||
@@ -664,12 +661,12 @@ allows you to specify the name of generated C++ class. Default name is
|
|||||||
@itemx --seven-bit
|
@itemx --seven-bit
|
||||||
This option specifies that all strings that will be passed as arguments
|
This option specifies that all strings that will be passed as arguments
|
||||||
to the generated hash function and the generated lookup function will
|
to the generated hash function and the generated lookup function will
|
||||||
solely consist of 7-bit ASCII characters (characters in the range 0..127).
|
solely consist of 7-bit ASCII characters (bytes in the range 0..127).
|
||||||
(Note that the ANSI C functions @code{isalnum} and @code{isgraph} do
|
(Note that the ANSI C functions @code{isalnum} and @code{isgraph} do
|
||||||
@emph{not} guarantee that a character is in this range. Only an explicit
|
@emph{not} guarantee that a byte is in this range. Only an explicit
|
||||||
test like @samp{c >= 'A' && c <= 'Z'} guarantees this.) This was the
|
test like @samp{c >= 'A' && c <= 'Z'} guarantees this.) This was the
|
||||||
default in versions of @code{gperf} earlier than 2.7; now the default is
|
default in versions of @code{gperf} earlier than 2.7; now the default is
|
||||||
to assume 8-bit characters.
|
to support 8-bit and multibyte characters.
|
||||||
|
|
||||||
@item -c
|
@item -c
|
||||||
@itemx --compare-strncmp
|
@itemx --compare-strncmp
|
||||||
@@ -713,7 +710,7 @@ is given.
|
|||||||
@cindex @code{switch}
|
@cindex @code{switch}
|
||||||
Causes the generated C code to use a @code{switch} statement scheme,
|
Causes the generated C code to use a @code{switch} statement scheme,
|
||||||
rather than an array lookup table. This can lead to a reduction in both
|
rather than an array lookup table. This can lead to a reduction in both
|
||||||
time and space requirements for some keyfiles. The argument to this
|
time and space requirements for some input files. The argument to this
|
||||||
option determines how many @code{switch} statements are generated. A
|
option determines how many @code{switch} statements are generated. A
|
||||||
value of 1 generates 1 @code{switch} containing all the elements, a
|
value of 1 generates 1 @code{switch} containing all the elements, a
|
||||||
value of 2 generates 2 tables with 1/2 the elements in each
|
value of 2 generates 2 tables with 1/2 the elements in each
|
||||||
@@ -735,30 +732,31 @@ This option is supported for compatibility with previous releases of
|
|||||||
@section Options for changing the Algorithms employed by @code{gperf}
|
@section Options for changing the Algorithms employed by @code{gperf}
|
||||||
|
|
||||||
@table @samp
|
@table @samp
|
||||||
@item -k @var{keys}
|
@item -k @var{selected-byte-positions}
|
||||||
@itemx --key-positions=@var{keys}
|
@itemx --key-positions=@var{selected-byte-positions}
|
||||||
Allows selection of the character key positions used in the keywords'
|
Allows selection of the byte positions used in the keywords'
|
||||||
hash function. The allowable choices range between 1-126, inclusive.
|
hash function. The allowable choices range between 1-255, inclusive.
|
||||||
The positions are separated by commas, e.g., @samp{-k 9,4,13,14};
|
The positions are separated by commas, e.g., @samp{-k 9,4,13,14};
|
||||||
ranges may be used, e.g., @samp{-k 2-7}; and positions may occur
|
ranges may be used, e.g., @samp{-k 2-7}; and positions may occur
|
||||||
in any order. Furthermore, the meta-character '*' causes the generated
|
in any order. Furthermore, the wildcard '*' causes the generated
|
||||||
hash function to consider @strong{all} character positions in each key,
|
hash function to consider @strong{all} byte positions in each keyword,
|
||||||
whereas '$' instructs the hash function to use the ``final character''
|
whereas '$' instructs the hash function to use the ``final byte''
|
||||||
of a key (this is the only way to use a character position greater than
|
of a keyword (this is the only way to use a byte position greater than
|
||||||
126, incidentally).
|
255, incidentally).
|
||||||
|
|
||||||
For instance, the option @samp{-k 1,2,4,6-10,'$'} generates a hash
|
For instance, the option @samp{-k 1,2,4,6-10,'$'} generates a hash
|
||||||
function that considers positions 1,2,4,6,7,8,9,10, plus the last
|
function that considers positions 1,2,4,6,7,8,9,10, plus the last
|
||||||
character in each key (which may differ for each key, obviously). Keys
|
byte in each keyword (which may be at a different position for each
|
||||||
with length less than the indicated key positions work properly, since
|
keyword, obviously). Keywords
|
||||||
selected key positions exceeding the key length are simply not
|
with length less than the indicated byte positions work properly, since
|
||||||
|
selected byte positions exceeding the keyword length are simply not
|
||||||
referenced in the hash function.
|
referenced in the hash function.
|
||||||
|
|
||||||
@item -l
|
@item -l
|
||||||
@itemx --compare-strlen
|
@itemx --compare-strlen
|
||||||
Compare key lengths before trying a string comparison. This might cut
|
Compare keyword lengths before trying a string comparison. This might cut
|
||||||
down on the number of string comparisons made during the lookup, since
|
down on the number of string comparisons made during the lookup, since
|
||||||
keys with different lengths are never compared via @code{strcmp}.
|
keywords with different lengths are never compared via @code{strcmp}.
|
||||||
However, using @samp{-l} might greatly increase the size of the
|
However, using @samp{-l} might greatly increase the size of the
|
||||||
generated C code if the lookup table range is large (which implies that
|
generated C code if the lookup table range is large (which implies that
|
||||||
the switch option @samp{-S} is not enabled), since the length table
|
the switch option @samp{-S} is not enabled), since the length table
|
||||||
@@ -768,21 +766,31 @@ This option is mandatory for binary comparisons (@pxref{Binary Strings}).
|
|||||||
@item -D
|
@item -D
|
||||||
@itemx --duplicates
|
@itemx --duplicates
|
||||||
@cindex Duplicates
|
@cindex Duplicates
|
||||||
Handle keywords whose key position sets hash to duplicate values.
|
Handle keywords whose selected byte sets hash to duplicate values.
|
||||||
Duplicate hash values occur for two reasons:
|
Duplicate hash values can occur for three reasons:
|
||||||
|
|
||||||
@itemize @bullet
|
@itemize @bullet
|
||||||
@item
|
@item
|
||||||
Since @code{gperf} does not backtrack it is possible for it to process
|
Since @code{gperf} does not backtrack it is possible for it to process
|
||||||
all your input keywords without finding a unique mapping for each word.
|
all your input keywords without finding a unique mapping for each word.
|
||||||
However, frequently only a very small number of duplicates occur, and
|
However, frequently only a very small number of duplicates occur, and
|
||||||
the majority of keys still require one probe into the table.
|
the majority of keywords still require one probe into the table. To
|
||||||
|
overcome this problem, the option @samp{-m 50} should be used.
|
||||||
|
|
||||||
@item
|
@item
|
||||||
Sometimes a set of keys may have the same names, but possess different
|
Since the @code{gperf} generated hash function treats the bytes at
|
||||||
attributes. With the -D option @code{gperf} treats all these keys as
|
different byte positions with equal weight, keywords that are permutations
|
||||||
|
of each other can lead to the same hash function value if they are not
|
||||||
|
disambiguated by the set of selected byte positions. Sometimes even this
|
||||||
|
is not possible; for example, the keyword set @{"xy", "yx", "xz", "zx"@}
|
||||||
|
will always lead to duplicates, regardless how the selected byte positions
|
||||||
|
are chosen. You can use the option @samp{-D} to handle this rare case.
|
||||||
|
|
||||||
|
@item
|
||||||
|
Sometimes a set of keywords may have the same names, but possess different
|
||||||
|
attributes. With the -D option @code{gperf} treats all these keywords as
|
||||||
part of an equivalence class and generates a perfect hash function with
|
part of an equivalence class and generates a perfect hash function with
|
||||||
multiple comparisons for duplicate keys. It is up to you to completely
|
multiple comparisons for duplicate keywords. It is up to you to completely
|
||||||
disambiguate the keywords by modifying the generated C code. However,
|
disambiguate the keywords by modifying the generated C code. However,
|
||||||
@code{gperf} helps you out by organizing the output.
|
@code{gperf} helps you out by organizing the output.
|
||||||
@end itemize
|
@end itemize
|
||||||
@@ -820,7 +828,7 @@ option is not particularly useful when @samp{-S} is used. Also,
|
|||||||
@itemx --jump=@var{jump-value}
|
@itemx --jump=@var{jump-value}
|
||||||
@cindex Jump value
|
@cindex Jump value
|
||||||
Affects the ``jump value'', i.e., how far to advance the associated
|
Affects the ``jump value'', i.e., how far to advance the associated
|
||||||
character value upon collisions. @var{Jump-value} is rounded up to an
|
byte value upon collisions. @var{Jump-value} is rounded up to an
|
||||||
odd number, the default is 5. If the @var{jump-value} is 0 @code{gperf}
|
odd number, the default is 5. If the @var{jump-value} is 0 @code{gperf}
|
||||||
jumps by random amounts.
|
jumps by random amounts.
|
||||||
|
|
||||||
@@ -832,14 +840,16 @@ the generated lookup table.
|
|||||||
|
|
||||||
@item -o
|
@item -o
|
||||||
@itemx --occurrence-sort
|
@itemx --occurrence-sort
|
||||||
Reorders the keywords by sorting the keywords so that frequently
|
Reorders the keywords by sorting the keywords so that keywords containing
|
||||||
occuring key position set components appear first. A second reordering
|
frequently occuring byte values appear first. A second reordering
|
||||||
pass follows so that keys with ``already determined values'' are placed
|
pass follows so that keywords with ``already determined values'' are placed
|
||||||
towards the front of the keylist. This may decrease the time required
|
towards the front of the keyword list. This may decrease the time required
|
||||||
to generate a perfect hash function for many keyword sets, and also
|
to generate a perfect hash function for many keyword sets, and also
|
||||||
produce more minimal perfect hash functions. The reason for this is
|
produce more minimal perfect hash functions. The reason for this is
|
||||||
that the reordering helps prune the search time by handling inevitable
|
that the reordering helps prune the search time by handling inevitable
|
||||||
collisions early in the search process. On the other hand, if the
|
collisions early in the search process. On the other hand, in practice,
|
||||||
|
a decreased search time also means a less minimal hash function, and a
|
||||||
|
higher probability of duplicate hash values. Furthermore, if the
|
||||||
number of keywords is @emph{very} large using @samp{-o} may
|
number of keywords is @emph{very} large using @samp{-o} may
|
||||||
@emph{increase} @code{gperf}'s execution time, since collisions will
|
@emph{increase} @code{gperf}'s execution time, since collisions will
|
||||||
begin earlier and continue throughout the remainder of keyword
|
begin earlier and continue throughout the remainder of keyword
|
||||||
@@ -859,14 +869,14 @@ table. If @code{gperf} has difficultly with a certain keyword set try using
|
|||||||
@itemx --size-multiple=@var{size-multiple}
|
@itemx --size-multiple=@var{size-multiple}
|
||||||
Affects the size of the generated hash table. The numeric argument for
|
Affects the size of the generated hash table. The numeric argument for
|
||||||
this option indicates ``how many times larger or smaller'' the maximum
|
this option indicates ``how many times larger or smaller'' the maximum
|
||||||
associated value range should be, in relationship to the number of keys.
|
associated value range should be, in relationship to the number of keywords.
|
||||||
If the @var{size-multiple} is negative the maximum associated value is
|
If the @var{size-multiple} is negative the maximum associated value is
|
||||||
calculated by @emph{dividing} it into the total number of keys. For
|
calculated by @emph{dividing} the number of keywords by it. For
|
||||||
example, a value of 3 means ``allow the maximum associated value to be
|
example, a value of 3 means ``allow the maximum associated value to be
|
||||||
about 3 times larger than the number of input keys''.
|
about 3 times larger than the number of input keywords''.
|
||||||
|
|
||||||
Conversely, a value of -3 means ``allow the maximum associated value to
|
Conversely, a value of -3 means ``allow the maximum associated value to
|
||||||
be about 3 times smaller than the number of input keys''. Negative
|
be about 3 times smaller than the number of input keywords''. Negative
|
||||||
values are useful for limiting the overall size of the generated hash
|
values are useful for limiting the overall size of the generated hash
|
||||||
table, though this usually increases the number of duplicate hash
|
table, though this usually increases the number of duplicate hash
|
||||||
values.
|
values.
|
||||||
@@ -877,7 +887,7 @@ table should decrease the time required for an unsuccessful search, at
|
|||||||
the expense of extra table space.
|
the expense of extra table space.
|
||||||
|
|
||||||
The default value is 1, thus the default maximum associated value about
|
The default value is 1, thus the default maximum associated value about
|
||||||
the same size as the number of keys (for efficiency, the maximum
|
the same size as the number of keywords (for efficiency, the maximum
|
||||||
associated value is always rounded up to a power of 2). The actual
|
associated value is always rounded up to a power of 2). The actual
|
||||||
table size may vary somewhat, since this technique is essentially a
|
table size may vary somewhat, since this technique is essentially a
|
||||||
heuristic. In particular, setting this value too high slows down
|
heuristic. In particular, setting this value too high slows down
|
||||||
@@ -948,13 +958,13 @@ with an appropriate numerical argument that controls the number of
|
|||||||
switch statements generated.
|
switch statements generated.
|
||||||
|
|
||||||
@item
|
@item
|
||||||
The maximum number of key positions selected for a given key has an
|
The maximum number of selected byte positions has an
|
||||||
arbitrary limit of 126. This restriction should be removed, and if
|
arbitrary limit of 255. This restriction should be removed, and if
|
||||||
anyone considers this a problem write me and let me know so I can remove
|
anyone considers this a problem write me and let me know so I can remove
|
||||||
the constraint.
|
the constraint.
|
||||||
@end itemize
|
@end itemize
|
||||||
|
|
||||||
@node Projects, Implementation, Bugs, Top
|
@node Projects, Bibliography, Bugs, Top
|
||||||
@chapter Things Still Left to Do
|
@chapter Things Still Left to Do
|
||||||
|
|
||||||
It should be ``relatively'' easy to replace the current perfect hash
|
It should be ``relatively'' easy to replace the current perfect hash
|
||||||
@@ -987,21 +997,9 @@ generate an Ada package as the code output, in addition to the current
|
|||||||
C and C++ routines.
|
C and C++ routines.
|
||||||
@end itemize
|
@end itemize
|
||||||
|
|
||||||
@node Implementation, Bibliography, Projects, Top
|
|
||||||
@chapter Implementation Details of GNU @code{gperf}
|
|
||||||
|
|
||||||
A paper describing the high-level description of the data structures and
|
|
||||||
algorithms used to implement @code{gperf} will soon be available. This
|
|
||||||
paper is useful not only from a maintenance and enhancement perspective,
|
|
||||||
but also because they demonstrate several clever and useful programming
|
|
||||||
techniques, e.g., `Iteration Number' boolean arrays, double
|
|
||||||
hashing, a ``safe'' and efficient method for reading arbitrarily long
|
|
||||||
input from a file, and a provably optimal algorithm for simultaneously
|
|
||||||
determining both the minimum and maximum elements in a list.
|
|
||||||
|
|
||||||
@page
|
@page
|
||||||
|
|
||||||
@node Bibliography, Concept Index, Implementation, Top
|
@node Bibliography, Concept Index, Projects, Top
|
||||||
@chapter Bibliography
|
@chapter Bibliography
|
||||||
|
|
||||||
[1] Chang, C.C.: @i{A Scheme for Constructing Ordered Minimal Perfect
|
[1] Chang, C.C.: @i{A Scheme for Constructing Ordered Minimal Perfect
|
||||||
|
|||||||
Reference in New Issue
Block a user