Collections
Uploading and importing collections
Zegami builds a collection from two things: the images, and a metadata row behind each one. This guide covers every way to get those in — direct upload and the catalogue importers — plus what the processing pipeline does once they land.
Last updated 2026-05-29
The two ways in
Every collection is created from the Create collection button on the dashboard, then from the Data tab of an existing collection’s settings. There are two broad paths:
- Direct upload — you supply the images (and usually a metadata sheet). You own the bytes; every per-image action (replace, rotate, remove) is available later.
- Catalogue import — Zegami mirrors an external source. The upstream catalogue stays the system of record; you re-import to pull fresh data rather than editing images in place.
Direct upload
The upload wizard walks three steps:
- Images — drop a
.zipor a batch of image files. JPEG, PNG, TIFF, WebP, and BMP are supported. Drag-and-drop or click to pick. - Metadata — upload a
.csvor.tsv(tab-separated is auto-detected). One row per image. Optionally mark the sheet as authoritative so its row count drives the collection (rows without a matching image become blank-image rows). - Review — confirm the image count, the metadata join, and the collection name, then create.
The join key
The single most important field is the filename column — the column whose value matches each image’s filename. Zegami joins images to rows on this column. If it’s wrong (or it’s not the first column and you didn’t say so), rows silently fail to match and images come in with blank metadata. When in doubt, put the filename column first.
Catalogue imports
Instead of uploading, point Zegami at an external source and it mirrors the images + metadata for you. Supported sources include:
- IIIF — paste a manifest URL (museums, libraries, archives).
- DICOMweb / DICOM-WSI — radiology and pathology study servers.
- The Met, NASA, Europeana — open cultural and science catalogues.
- OAI-PMH — repository harvesting.
- TCIA, IDC, IDR — open medical / bioimaging archives.
- Kaggle, ALA — datasets and biodiversity records.
After an import the pipeline runs automatically. Catalogue collections show their source name and a Re-import button in the Data tab rather than the add/edit controls — re-importing is how you refresh them. See Data tab for the upload-based vs catalogue-sourced distinction in detail.
What the pipeline does
Once images and metadata are in, the processing pipeline builds the artifacts the viewer needs:
- Scale + Pyramid — thumbnails and zoomable deep-zoom tiles.
- Atlas (+ KTX2) — packed texture pages so thousands of tiles render on the GPU at once.
- Parquet — a columnar copy of the metadata for fast in-browser filtering of millions of rows.
- Optional analysis — similarity embeddings (CLIP), 2D layout (UMAP), clustering (TDA), and visual features. Toggle these in the Processing tab — see Processing & analysis.
A progress banner shows each step and an item count. The collection opens when the core steps finish; optional analysis can continue in the background.
Troubleshooting
“Every image came in with blank metadata.” The filename/join column is wrong. Re-upload the metadata with the matching column first, or re-create specifying the join column.
“Fewer images than rows (or vice-versa).” Rows only appear if they match an image by the join key — unless you marked the sheet authoritative, in which case unmatched rows appear as blank-image rows. Check the join key and whether the sheet was set authoritative.
“Processing is stuck at 0 / N.” The job may be queued behind another collection. Watch the dashboard activity feed; if it stays stuck for more than ~15 minutes the job may need a server-side retry — contact support.
Next steps
- Add or fix images after creation: Data tab.
- Turn on similarity search and other analysis: Collection settings.
- Publish or share the result: Sharing & publishing.