From 1ad4108b348049bf3a4da86178fa66f5d36c6b3a Mon Sep 17 00:00:00 2001 From: Bruno Haible Date: Sun, 20 Aug 2000 17:20:23 +0000 Subject: [PATCH] Document the use of NULs. --- ChangeLog | 1 + doc/gperf.texi | 48 ++++++++++++++++++++++++++++++++++++++---------- 2 files changed, 39 insertions(+), 10 deletions(-) diff --git a/ChangeLog b/ChangeLog index 7656098..7d83ed1 100644 --- a/ChangeLog +++ b/ChangeLog @@ -43,6 +43,7 @@ result. (Gen_Perf::hash): Use explicit length of char_set. (Gen_Perf::change): Specify explicit length of key. + * doc/gperf.texi: Document it. * doc/help2man: New file, help2man version 1.022. * Makefile.devel (all): Add doc/gperf.1. diff --git a/doc/gperf.texi b/doc/gperf.texi index 2b4caf6..93f1f23 100644 --- a/doc/gperf.texi +++ b/doc/gperf.texi @@ -115,6 +115,7 @@ High-Level Description of GNU @code{gperf} * Input Format:: Input Format to @code{gperf} * Output Format:: Output Format for Generated C Code with @code{gperf} +* Binary Strings:: Use of NUL characters Input Format to @code{gperf} @@ -259,6 +260,7 @@ efficiently identify their respective reserved keywords. @menu * Input Format:: Input Format to @code{gperf} * Output Format:: Output Format for Generated C Code with @code{gperf} +* Binary Strings:: Use of NUL characters @end menu The perfect hash function generator @code{gperf} reads a set of @@ -327,9 +329,9 @@ arbitrary C declarations and definitions, as well as provisions for providing a user-supplied @code{struct}. If the @samp{-t} option @emph{is} enabled, you @emph{must} provide a C @code{struct} as the last component in the declaration section from the keyfile file. The first -field in this struct must be a @code{char *} identifier called @samp{name}, -although it is possible to modify this field's name with the @samp{-K} -option described below. +field in this struct must be a @code{char *} or @code{const char *} +identifier called @samp{name}, although it is possible to modify this +field's name with the @samp{-K} option described below. Here is a simple example, using months of the year and their attributes as input: @@ -406,15 +408,18 @@ in the first column is considered a comment. Everything following the @samp{#} is ignored, up to and including the following newline. The first field of each non-comment line is always the key itself. It -should be given as a simple name, i.e., without surrounding -string quotation marks, and be left-justified flush against the first -column. In this context, a ``field'' is considered to extend up to, but +can be given in two ways: as a simple name, i.e., without surrounding +string quotation marks, or as a string enclosed in double-quotes, in +C syntax, possibly with backslash escapes like @code{\"} or @code{\234} +or @code{\xa8}. In either case, it must start right at the beginning +of the line, without leading whitespace. +In this context, a ``field'' is considered to extend up to, but not include, the first blank, comma, or newline. Here is a simple example taken from a partial list of C reserved words: @example @group -# These are a few C reserved words, see the c.@code{gperf} file +# These are a few C reserved words, see the c.gperf file # for a complete list of ANSI C reserved words. unsigned sizeof @@ -449,7 +454,7 @@ file, is included verbatim into the generated output file. Naturally, it is your responsibility to ensure that the code contained in this section is valid C. -@node Output Format, , Input Format, Description +@node Output Format, Binary Strings, Input Format, Description @section Output Format for Generated C Code with @code{gperf} @cindex hash table @@ -509,6 +514,28 @@ with the various input and output options, and timing the resulting C code, you can determine the best option choices for different keyword set characteristics. +@node Binary Strings, , Output Format, Description +@section Use of NUL characters +@cindex NUL + +By default, the code generated by @code{gperf} operates on zero +terminated strings, the usual representation of strings in C. This means +that the keywords in the input file must not contain NUL characters, +and the @var{str} argument passed to @code{hash} or @code{in_word_set} +must be NUL terminated and have exactly length @var{len}. + +If option @samp{-c} is used, then the @var{str} argument does not need +to be NUL terminated. The code generated by @code{gperf} will only +access the first @var{len}, not @var{len+1}, bytes starting at @var{str}. +However, the keywords in the input file still must not contain NUL +characters. + +If option @samp{-l} is used, then the hash table performs binary +comparison. The keywords in the input file may contain NUL characters, +written in string syntax as @code{\000} or @code{\x00}, and the code +generated by @code{gperf} will treat NUL like any other character. +Also, in this case the @samp{-c} option is ignored. + @node Options, Bugs, Description, Top @chapter Invoking @code{gperf} @@ -636,8 +663,8 @@ solely consist of 7-bit ASCII characters (characters in the range 0..127). (Note that the ANSI C functions @code{isalnum} and @code{isgraph} do @emph{not} guarantee that a character is in this range. Only an explicit test like @samp{c >= 'A' && c <= 'Z'} guarantees this.) This was the -default in earlier versions of @code{gperf}; now the default is to assume -8-bit characters. +default in versions of @code{gperf} earlier than 2.7; now the default is +to assume 8-bit characters. @item -c @itemx --compare-strncmp @@ -731,6 +758,7 @@ However, using @samp{-l} might greatly increase the size of the generated C code if the lookup table range is large (which implies that the switch option @samp{-S} is not enabled), since the length table contains as many elements as there are entries in the lookup table. +This option is mandatory for binary comparisons (@pxref{Binary Strings}). @item -D @itemx --duplicates