Pcompress is an open source command-line parallel compression, decompression and deduplication utility, by splitting input data into chunks.
Pcompress features Delta compression, fixed block option, metadata compression, support for multiple algorithms, including LZMA, LZMA-Multithreaded, PPMD, Bzip2 or LZ4, strong data integrity, filters, Matrix transform, encryption, message authentication, metadata, overlapped processing, custom allocator, solid mode, padding, and much more.
What is new in this release:
- This is a major release featuring archiving support based on Libarchive. Advanced techniques are used to detect file types and use appropriate algorithms to compress better. Filters like PackJPG for Jpegs and Dispack for executables are used to improve compressability. A few heuristics help detect already compressed data and avoid costly compression steps. Archive entries are also sorted to cluster related content and achieve better compression. Data split boundaries for multiple parallel threads are determined from file type and rolling hash changes. Mmap and Zero-Copy techniques are used internally whenever possible to improve performance. Some optimizations have also been done to reduce memory usage. Usage has been simplified with most features automatically configured based on compression level.
- With these changes Pcompress can act as a full fledged archiver, like tar or cpio and provide better compression ratios and better performance than the single-file compression mode. Pcompress compares very favourably with other leading archiver utilities.
What is new in version 2.4:
- This version fixes several issues, including some corner case crashes and a couple of buffer overflows.
- Data Deduplication can now be done using blocks as small as 2KB, providing a much higher dedupe ratio than virtually any other deduplication software.
- Similarity based deduplication performance has been improved.
- Free memory detection accuracy has also been improved.
What is new in version 2.3:
- This version fixes a few bugs and provides several improvements in efficiency and performance.
- The Similarity detection effectiveness for similarity based near-exact deduplication has been improved.
- At the same time memory requirements for the index has been reduced.
- Accuracy of data partitioning between threads has been improved.
- Chunking and indexing performance have been improved and the KMV Sketch computation is now more accurate.
- This release moves all the core functionality into a shared library in preparation for an API interface that will be introduced in future releases.
What is new in version 2.2:
- This is primarily a Bugfix release.
- It fixes some crashes with invalid input and build problems on Debian6 and older non-SSE4 processors.
- The Min-heap based Similarity matching for Delta Encoding has been improved and made faster and more accurate.
- Accuracy of scalable Segmented Global Deduplication has been further improved to be greater than 95%.
- More testcases have been added.
What is new in version 2.1:
- This version adds many bugfixes and performance improvements.
- Accuracy in finding duplicates in Global Dedupe has been improved.
- SHA256 is now the default block hash algorithm for dedupe, with the ability to change it separately from the chunk verification hash.
- Overall, many performance improvements have been made, with better parallelism, more SSE vectorization, and faster sorting and improving the segment hash list file handling, resulting in smaller I/O and fewer random accesses.
- Bugs in calculating in-memory index size has been fixed to avoid overflowing free RAM and swapping to disk.