It’s always cool to see new Rust tools where C has traditionally been used in the past. The only thing that gives me pause is how many times the unsafe keyword is used. I glossed through some of the code and saw many functions were marked as unsafe. After cloning the repository and searching for unsafe, I found 55 occurrences.
The command used was rg -I -c unsafe src | awk '{ s += $1 } END { print s }' for those interested.
Not quite. This is because the output from the ripgrep command displays a count for each file per line via the -c flag. And because we also used -I, we are omitting the filename so we only get the count. Then we can sum these up to see the total for the entire src directory. I hope that helps.
I imagine it could be used for compression on slower machines (edge or sensors for instance) before transmitting over the network on larger machines. I’m curious about the memory requirements though.
I skimmed the code and the decoder could be improved a lot, it’s not necessarily an issue with the format.
MTF is not ideal for general purpose compression though. The file tested in the readme is html and English text, i.e. a small alphabet with skewed frequencies, which is the best case for MTF. It will probably do significantly worse on binary files.
Would be cool to include this in lzbench so it can easily be profiled on a range of systems.
It’s always cool to see new Rust tools where C has traditionally been used in the past. The only thing that gives me pause is how many times the unsafe keyword is used. I glossed through some of the code and saw many functions were marked as unsafe. After cloning the repository and searching for unsafe, I found 55 occurrences.
The command used was
rg -I -c unsafe src | awk '{ s += $1 } END { print s }'
for those interested.This is equivalent to
wc -l
isn’t it?Not quite. This is because the output from the ripgrep command displays a count for each file per line via the
-c
flag. And because we also used-I
, we are omitting the filename so we only get the count. Then we can sum these up to see the total for the entiresrc
directory. I hope that helps.Decompression performance is unfortunately not that great. I’m sure it has its uses, but probably not for general-public use.
I imagine it could be used for compression on slower machines (edge or sensors for instance) before transmitting over the network on larger machines. I’m curious about the memory requirements though.
I skimmed the code and the decoder could be improved a lot, it’s not necessarily an issue with the format.
MTF is not ideal for general purpose compression though. The file tested in the readme is html and English text, i.e. a small alphabet with skewed frequencies, which is the best case for MTF. It will probably do significantly worse on binary files.