Manual Reference Pages  - LEXER (1)


lexer - the shoki packet lexer




lexer [-b dbname] [-B] [-c max_count] [-d demux] [-D chrdir] [-E stop] [-f] [-F filterfile] [-h] [-l linktype] [-L logfname] [-o] [-p sample] [-r dumpfile] [-R] [-s snaplen] [-S start] [-t type] [-T] [-U luser] [-v verbosity] [-w outdump] [-W wconf] [-x] [-Z speed] [bpf_filter]


lexer is a tool for reading through pcap dumpfiles, applying various filter and search expressions, and logging messages about the contents of matching packets. This logging is done either to stdout, a file, via syslog(3), or to a Postgres database.

The content and format of the information logged for each packet may be user-specified via the format string in the lexer(1) config file, /usr/local/shoki/etc/lexer.conf. For information on how this works, consult the shoki_configs(5) manpage. If no format string is given, a fairly generic `IP quad' style message will be logged. The format of such messages is:

     name timestamp ip_src:sport ip_dst:dport ip_p

name is the filter name,
  is the timestamp (in seconds after the start of the epoch),
ip_src is the source IP address,
sport is the TCP or UDP source port (or 0 if the packet is neither TCP nor UDP),
ip_dst is the destnation address,
dport is the destination port, and
ip_p is the IP protocol (i.e., 6 for TCP).

Note that using the default output format results in substantially better performance than using user-specified formats. If you're planning on using a different fixed format for some performance-sensitive application, you might consider writing a different packet printing routine.

The logging mechanism is handled through a fairly generic callback mechanism so other actions could be taken given a minimal coding effort. For more information on underlying functions in shoki, consult the libshoki(3) manpage.

lexer reads dumpfiles using zlib(3), so it can read gzip'd dumpfiles.


-b dbname Logs packets to Postgres database dbname. For more information, consult the README.database file in the doc directory of the shoki distribution.
-B Turns off the use of callback functions. Useful primarily when you just want to do a rules-based rewriting of a dumpfile.
-c max_count
  Read no more than max_count packets.
-C conf_file
  Read an alternate config file. By default, /usr/local/shoki/etc/lexer.conf will be used.
-d demux Tells the lexer to demux the input dumpfile into multiple dumpfiles, one for each matching signature. Currently, a matching packet will only be written to the file corresponding to the first matching signature.

The path demux must be specified, and is used as a base for the output dumpfilenames. If the value for demux is /tmp/foo, then the output files will be called /tmp/foo.1.dump, /tmp/foo.2.dump, and so on, where the numbers will be the number of the matching filter. In addition, a file (in this case) /tmp/foo.txt will be created containing a list of the signature numbers and the corresponding names and unique IDs. So if you wanted to figure out what signature the packets in /tmp/foo.1.dump correspond to, you'd look for the line for signature 1 in /tmp/foo.txt.

-D chrdir If specified, does a chroot(2) to chrdir
-E stop Only look at packets with timestamps on or before start. A value of seconds after the start of the epoch is assumed.

See also -S.

-f Attempt fragment reassembly. A log message will be generated for any problems encountered in the reassembly process. I.e., overlapping fragments, fragments which cannot be assembled into a complete packet, u.s.w. The timestamp on this log message will be the timestamp of the first fragment received.

If the -v flag is also given, a message for each `bad' fragment will be logged.

Consult the README and/or the source for more information about how frag reassembly works.

-F filterfile
  Read filter expressions from filterfile. Consult the shoki.filters(5) manpage for details of the filter format.
-h Display a usage message and exit.
-l linktype
  The (numeric) linktype to use when compiling BPF filters. Defaults to 1 (DLT_EN10MB).
-L logfile
  For filter methods that support logging to a file, output will be sent to logfile. Use `-' (without the quotes) for stdout.
-o Turn off filter rule optimisation. Unless you have very few filter rules you almost certainly want to use optimisation.

NOTE: This is the opposite of the behaviour of the flag prior to shoki-0.3.0 .

-p sample Percentage of packets to use for random sampling.
-r dumpfile
  Read packets from dumpfile. The specified file must be a libpcap-style dumpfile. It may be gzip'd.
-R Don't use /dev/urandom to seed srand(3) for random sampling. If you use this option, every set of `random' samples will be the same for any given dump. This is useful primarily for testing and debugging.
-s snaplen
  Sets the default snaplen. If not specified, 65535 is assumed.

Individual filter rules can specify a different snaplenfor packets matching that filter.

-S start Only packets with timestamps on or after start will be used. A value of seconds after the start of the epoch is assumed.

See also -E .

-t type Set the sensor type. This is just a convienience used for grouping sensor output. The scripts included with shoki (i.e., the collector, importer, and reporter scripts) by default want to group sensors into categories like `internal', `external' and `dmz'.

If you're running the lexer by hand to do a visual grep on the output, you probably don't have to worry about this unless you have a lot of signatures that rely on this field (i.e., have something besides `ALL' in the type field).

-T If this flag is given, the lexer will keep some basic running total statistics and include them at the end of its output.
-U luser If specified, setuid/setgid to specified luser.
-v verbosity
  Set the verbosity level to verbose. Exactly what this entails tends to vary from release to release. In general, you won't want to specify a verbosity level unless you're doing debugging.

See also -f .

-w outdump
  Writes a libpcap-style dumpfile outdump containing packets matching the filter rules. If you just want to do policy-based rewriting of dumpfiles, you probably want to use the -B flag as well.
-W wconf Reads a set of whitening (or sanitising) rules from wconf and applies them to the data. The format of the config file is documented in the whiten.conf(5) man page.

By default, whitening takes place after filtering. See also the -x flag below.

-x Does whitening before applying filters. By default, whitening takes place after filtering.

This option has no effect if the -W option is not also used.

-Z speed If specified, this option will affect the rate at which output is generated by the lexer. Normally (when this option is not given), lexer(1) will output as fast as it can. The -Z flag specifies that a delay loop should be used so that output is generated as close to speed times as fast as the data was received as is possible. A dumpfile containing five kiloseconds' worth of captured data will therefore take 5000 seconds to output if speed is 1; 2500 seconds if speed is 2; 100 seconds if speed is 50; and so on.

Of course setting speed to an aribitrarily large value won't help if packet arrival times are already very close together.


/usr/local/shoki/etc/lexer.conf lexer config file.


.An Stephen P. Berry <>

More information can be found at the shoki homepage:


Check the README at the top of the source tree.

November 6, 2003 LEXER (1) shoki
Generated by manServer 1.07 from lexer.1 using doc macros.