Checksums
Checksums are a sequence of numbers and letters produced by running a file through an algorithm which uses a cryptographic hash to produce a unique identifier. Even the slightest change in the original file will produce a different checksum, thus they can be employed to ensure file integrity but also to prevent resource duplication.
To enable checksums add the below to your config.php file:
$file_checksums = true;
It is possible to block duplicate files based on checksums although this has a performance impact so is disabled by default, to enable this use:
$file_upload_block_duplicates=true;
Please note that this will not work reliably with $file_checksums_offline=true unless the the checksum script is run frequently.
Note that metadata will affect the checksum of a resource. If you have the same file with different embedded metadata, the resulting checksum will differ and so will not be classed as a duplicate.
Advanced settings
Once this is enabled newly uploaded resources will have their checksum generated automatically, checksums can be retrospectively generated for previously uploaded resources by running the script:
pages/tools/update_checksums.php --recreate
Important
- This script should be run from the command line
- The scheduling priority should be modified and the process I/O scheduling class and priority set so it won't negatively impact the server if attempting to run on a large number of resources
- You may need to disable the maximum execution time for CLI for PHP for the same reason
Other settings
$file_checksums_50k
You may want to consider whether the whole file should be checked. By default the checksum is only calculated on the first 50k of a file. To use the whole file set this config option to false.
$file_checksums_offline
You should also consider whether to generate checksums in real time - if set to true then a background cron job (scheduled task) must be used to run the pages/tools/update_checksums.php script
Note that by default this option was set to true until version 10.4 after that release it will default to false.
Viewing checksums
The stored checksums can be viewed by choosing the 'CSV export - metadata' option for any set of search results and selecting the 'Include data from all accessible fields' option
Resource file integrity validation
Refer to this page on file integrity checking