Downloading the full catalog¶
The catalog is distributed as JSON, in two large zip files on AWS S3. It’s straightforward to download and unpack, but it does require a lot of disc space.
#!/usr/bin/bash
# Download and unpack the US TNA catalog
# https://registry.opendata.aws/nara-national-archives-catalog/
# It's on aws, so need awscli in the environment and configured.
# It's also large - a 10Gb download and 230Gb of JSON when unzipped.
cd $SCRATCH/WW2_US_logs/US_TNA_Catalog/
aws s3 cp s3://nara-national-archives-catalog/zip/nac_export_authorities_2020-11-20.zip .
unzip -DD nac_export_authorities_2020-11-20.zip
aws s3 cp s3://nara-national-archives-catalog/zip/nac_export_descriptions_2020-11-20.zip .
unzip -DD nac_export_descriptions_2020-11-20.zip