1. 12

Logswan is a fast Web log analyzer using probabilistic data structures. It is targeted at very large log files, typically APIs logs. It has constant memory usage regardless of the log file size, and takes approximatively 4MB of RAM.

Unique visitors counting is performed using two HyperLogLog counters (one for IPv4, and another one for IPv6), providing a relative accuracy of 0.10%.

Project design goals include : speed, memory-usage efficiency, and keeping the code as simple as possible.


  2. 9
                        Logswan 1.00 (c) by Frederic Cambus 2015                   
    Processing file : access.log
    Segmentation fault 
    1. 6

      That’s not a very good bug report. Where’s the gdb backtrace?

      1. 4

        Could you provide a backtrace? It’s hard to try to guess what’s wrong without any context.

        I’ve had a bug report today, and it’s been found that such log lines crash the program (the cause is known and the issue will be fixed soon) : - - [18/Nov/2013:19:54:25 +0100] “-” 400 0 “-” “-”

        1. 2

          I thought the snippet in my comment had a little beauty of its own, therefore no backtrace.

          #0  __strcmp_sse2_unaligned ()
              at ../sysdeps/x86_64/multiarch/strcmp-sse2-unaligned.S:30
          #1  0x0000000000401f66 in main (argc=1, argv=0x7ffdb64c2c80)
              at /home/allan/mess/current/logswan/src/logswan.c:186
          1. 1

            I believe this is the same bug occurrence of the bug which has been reported and is now fixed.

            Logswan 1.01 has been tagged. Could you test and report if it solves your issue? Thanks.

        2. 2

          Wonder if Valgrind would’ve caught this? Also wondering how fast I could get an equivalent in Haskell to go. We have a HyperLogLog library that I’ve been waiting for an excuse to use outside of work.

          1. 4

            There’s no way to tell if valgrind would have helped unless @allan provides a stack trace or example input that crashes logswan.

            My best guess? Stuff like this would be more probably caught by afl fuzzing since I know fcambus uses logswan regularly on real life data. Though again, without the input log file or a stack trace there is no way to tell.