building an offline archive: a guide
under our current fascist regime, the removal of any and all necessary data from the internet is commonplace. today, it's the CDC, tomorrow it could be the Library of Congress. because of this, it is essential that anyone who is able to builds their own offline archive of critical information. this can include: federal agency websites, historical archives, legal resources, banned or at risk books, healthcare databases, and anything that you believe the trump administration might scrub.
there already exist amazing online, open source archives that house absurd amounts of important resources. these include:
- The Internet Archive
- Anna's Archive
- Library of Congress Digital Collections
I suggest browsing the above archives as a jumping off point for your own archive. Combined, they provide thousands of pieces of music, film, books, academic sources, textbooks, art, history, and more. The Internet Archive also houses The Wayback Machine, which includes thousands of website archives.
my essentials:
these are generally helpful tools to have in one form or another. i share some of my personal favorites, but there are other options.
- Management System (DMS): FileZilla, Pydio, SeaFile, and Alfresco are solid options.
- SingleFile: browser extension that allows you to save a complete webpage into a single HTML file
- Wget (Command-line tool): download entire websites or webpages
- ArchiveBox: open source tool that lets you archive public & private web content
- Tagging Software: TagSpaces, Elastic Search, TMSU
- Secure Deletion Software: tools like Eraser and BleachBit can securely wipe sensitive files
making your archive:
define your archive’s purpose: what are you trying to accomplish?
organize files: create clear hierarchies and use descriptive names, metadata, and tags
choose the right file formats: opt for universally accepted formats like PDF/A for documents, JPEG or PNG for images, and MP4 for videos.
choose a storage tool: for smaller archives, use external drives with encryption (e.g., VeraCrypt). for larger, ongoing collections, consider cloud storage solutions like Tresorit, set up a self-hosted option like ArchiveBox.
regular updates and maintenance: schedule regular backups, check file integrity and ensure your encryption keys are securely stored
how to protect your archive:
encryption:
- AES-256 encryption: tools like 7zip or VeraCrypt that require a decryption key to access your files
- GPG/PGP encryption: tools like GPG Command Line encrypt files before uploading them to your archive, provides end-to-end encryption
- Full-Device Encryption (FDE): tools like BitLocker, FileVault, and LUKS for physical devices (e.g.hard drives)
back-ups:
3-2-1 Backup: keep two copies of your archive on physical media (like external hard drives or SSDs) and one off-site (remote server)
Automated Backups: software like rsync or Duplicati for regular backups, and services like Tresorit Sync.com for offsite, encrypted backups
additional resources:
- Digital Preservation Coalition: resources on preserving digital content for the long-term.
- National Digital Information Infrastructure and Preservation Program: a program by the Library of Congress dedicated to preserving the nation’s digital content
- The Internet Archive: A Guide to Digital Preservation: resources for contributing and preserving content through the Internet Archive.
- Open Archiving and Digital Libraries: academic resources on digital archiving