Implement large file malware scanner for Uploaded files.
Created by: deepakduggirala
Requirements:
- The scanner should be able to handle large files (1–10 GB) without causing significant processing delays.
- Scanning in memory should be avoided due to size limitations.
- A good scanner should support chunk-based or stream-based scanning to minimize memory pressure.
- Low memory footprint to crucial for concurrent file scanning
- Signature-based: The scanner should detect known malware by matching file contents to known virus signatures.
- Heuristic-based: It should analyze behavior and file structure to identify suspicious patterns in unknown threats (e.g., code obfuscation).
- Low False Positive Rate
- The scanner should support automated signature updates to stay protected against new threats.
ClamAV
- open source
- streaming support
- supports frequent signature updates
- interface: CLI as well as API (clamd)
import pyclamd
cd = pyclamd.ClamdUnixSocket()
result = cd.scan_file('/path/to/file')
if result is not None:
raise ValueError("Malicious file detected")
Scan files via a pipe to avoid memory issues:
# Example config in /etc/clamd.d/scan.conf
StreamMaxLength 104857600 # 100MB
MaxThreads 4
MaxScanSize 2000M
cat largefile | clamdscan -
Setup hourly signature updates with freshclam
sudo yum install clamav clamav-update
sudo freshclam # To update virus definitions
Acceptance Criteria
- The scanner can handle files up to 10 GB without memory exhaustion or significant performance degradation.
- Signatures are updated automatically at regular intervals.
- The scanner can be run concurrently on multiple files with minimal resource conflicts.