mirror of
https://git.savannah.gnu.org/git/gperf.git
synced 2025-12-02 13:09:22 +00:00
Implement backtracking.
This commit is contained in:
@@ -7,7 +7,7 @@
|
||||
|
||||
@c some day we should @include version.texi instead of defining
|
||||
@c these values at hand.
|
||||
@set UPDATED 16 November 2002
|
||||
@set UPDATED 20 November 2002
|
||||
@set EDITION 2.7.2
|
||||
@set VERSION 2.7.2
|
||||
@c ---------------------
|
||||
@@ -993,27 +993,14 @@ through a search that minimizes the number of byte positions.
|
||||
@itemx --duplicates
|
||||
@cindex Duplicates
|
||||
Handle keywords whose selected byte sets hash to duplicate values.
|
||||
Duplicate hash values can occur for two reasons:
|
||||
|
||||
@itemize @bullet
|
||||
@item
|
||||
Since @code{gperf} does not backtrack it is possible for it to process
|
||||
all your input keywords without finding a unique mapping for each word.
|
||||
However, frequently only a very small number of duplicates occur, and
|
||||
the majority of keywords still require one probe into the table. To
|
||||
overcome this problem, the option @samp{-m 50} should be used.
|
||||
|
||||
@item
|
||||
Sometimes a set of keywords may have the same names, but possess different
|
||||
attributes. With the -D option @code{gperf} treats all these keywords as
|
||||
Duplicate hash values can occur if a set of keywords has the same names, but
|
||||
possesses different attributes, or if the selected byte positions are not well
|
||||
chosen. With the -D option @code{gperf} treats all these keywords as
|
||||
part of an equivalence class and generates a perfect hash function with
|
||||
multiple comparisons for duplicate keywords. It is up to you to completely
|
||||
disambiguate the keywords by modifying the generated C code. However,
|
||||
@code{gperf} helps you out by organizing the output.
|
||||
@end itemize
|
||||
|
||||
Option @samp{-D} is extremely useful for certain large or highly
|
||||
redundant keyword sets, e.g., assembler instruction opcodes.
|
||||
Using this option usually means that the generated hash function is no
|
||||
longer perfect. On the other hand, it permits @code{gperf} to work on
|
||||
keyword sets that it otherwise could not handle.
|
||||
@@ -1025,7 +1012,7 @@ Generate the perfect hash function ``fast''. This decreases
|
||||
table-size. The iteration amount represents the number of times to
|
||||
iterate when resolving a collision. `0' means iterate by the number of
|
||||
keywords. This option is probably most useful when used in conjunction
|
||||
with options @samp{-D} and/or @samp{-S} for @emph{large} keyword sets.
|
||||
with option @samp{-o} for @emph{large} keyword sets.
|
||||
|
||||
@item -m @var{iterations}
|
||||
@itemx --multiple-iterations=@var{iterations}
|
||||
@@ -1067,7 +1054,7 @@ produce more minimal perfect hash functions. The reason for this is
|
||||
that the reordering helps prune the search time by handling inevitable
|
||||
collisions early in the search process. On the other hand, in practice,
|
||||
a decreased search time also means a less minimal hash function, and a
|
||||
higher probability of duplicate hash values. Furthermore, if the
|
||||
higher frequency of backtracking. Furthermore, if the
|
||||
number of keywords is @emph{very} large using @samp{-o} may
|
||||
@emph{increase} @code{gperf}'s execution time, since collisions will
|
||||
begin earlier and continue throughout the remainder of keyword
|
||||
@@ -1080,8 +1067,7 @@ Utilizes randomness to initialize the associated values table. This
|
||||
frequently generates solutions faster than using deterministic
|
||||
initialization (which starts all associated values at 0). Furthermore,
|
||||
using the randomization option generally increases the size of the
|
||||
table. If @code{gperf} has difficultly with a certain keyword set try using
|
||||
@samp{-r} or @samp{-D}.
|
||||
table.
|
||||
|
||||
@item -s @var{size-multiple}
|
||||
@itemx --size-multiple=@var{size-multiple}
|
||||
@@ -1154,16 +1140,6 @@ work efficiently on much larger keyword sets (over 15,000 keywords).
|
||||
When processing large keyword sets it helps greatly to have over 8 megs
|
||||
of RAM.
|
||||
|
||||
However, since @code{gperf} does not backtrack no guaranteed solution
|
||||
occurs on every run. On the other hand, it is usually easy to obtain a
|
||||
solution by varying the option parameters. In particular, try the
|
||||
@samp{-r} option, and also try changing the default arguments to the
|
||||
@samp{-s} and @samp{-j} options. To @emph{guarantee} a solution, use
|
||||
the @samp{-D} and @samp{-S} options, although the final results are not
|
||||
likely to be a @emph{perfect} hash function anymore! Finally, use the
|
||||
@samp{-f} option if you want @code{gperf} to generate the perfect hash
|
||||
function @emph{fast}, with less emphasis on making it minimal.
|
||||
|
||||
@item
|
||||
The size of the generate static keyword array can get @emph{extremely}
|
||||
large if the input keyword file is large or if the keywords are quite
|
||||
@@ -1171,7 +1147,7 @@ similar. This tends to slow down the compilation of the generated C
|
||||
code, and @emph{greatly} inflates the object code size. If this
|
||||
situation occurs, consider using the @samp{-S} option to reduce data
|
||||
size, potentially increasing keyword recognition time a negligible
|
||||
amount. Since many C compilers cannot correctly generated code for
|
||||
amount. Since many C compilers cannot correctly generate code for
|
||||
large switch statements it is important to qualify the @var{-S} option
|
||||
with an appropriate numerical argument that controls the number of
|
||||
switch statements generated.
|
||||
@@ -1192,19 +1168,11 @@ module is essential independent from other program modules. Additional
|
||||
worthwhile improvements include:
|
||||
|
||||
@itemize @bullet
|
||||
@item
|
||||
Make the algorithm more robust. At present, the program halts with an
|
||||
error diagnostic if it can't find a direct solution and the @samp{-D}
|
||||
option is not enabled. A more comprehensive, albeit computationally
|
||||
expensive, approach would employ backtracking or enable alternative
|
||||
options and retry. It's not clear how helpful this would be, in
|
||||
general, since most search sets are rather small in practice.
|
||||
|
||||
@item
|
||||
Another useful extension involves modifying the program to generate
|
||||
``minimal'' perfect hash functions (under certain circumstances, the
|
||||
current version can be rather extravagant in the generated table size).
|
||||
Again, this is mostly of theoretical interest, since a sparse table
|
||||
This is mostly of theoretical interest, since a sparse table
|
||||
often produces faster lookups, and use of the @samp{-S} @code{switch}
|
||||
option can minimize the data size, at the expense of slightly longer
|
||||
lookups (note that the gcc compiler generally produces good code for
|
||||
|
||||
Reference in New Issue
Block a user