Downloading the full catalog¶
The catalog is distributed as JSON, in two large zip files on AWS S3. It’s straightforward to download and unpack, but it does require a lot of disc space.
#!/usr/bin/bash # Download and unpack the US TNA catalog # https://registry.opendata.aws/nara-national-archives-catalog/ # It's on aws, so need awscli in the environment and configured. # It's also large - a 10Gb download and 230Gb of JSON when unzipped. cd $SCRATCH/WW2_US_logs/US_TNA_Catalog/ aws s3 cp s3://nara-national-archives-catalog/zip/nac_export_authorities_2020-11-20.zip . unzip -DD nac_export_authorities_2020-11-20.zip aws s3 cp s3://nara-national-archives-catalog/zip/nac_export_descriptions_2020-11-20.zip . unzip -DD nac_export_descriptions_2020-11-20.zip