pelinker.clustering_quality_checkpoint¶
Structured checkpoint I/O for run/analysis/clustering_quality.py runs.
ClusteringQualityCheckpoint
dataclass
¶
On-disk checkpoint for resumable clustering quality runs.
Source code in pelinker/clustering_quality_checkpoint.py
FailureRecord
dataclass
¶
combination_key_from_members(members)
¶
Canonical key for a combination of (model, layer) embeddings.
Members are sorted lexicographically by (model, layer). Arity is len(members).
Source code in pelinker/clustering_quality_checkpoint.py
compute_run_fingerprint(config)
¶
Stable SHA256 over a JSON-serializable config (sorted keys).
Source code in pelinker/clustering_quality_checkpoint.py
fingerprint_config_from_cli(*, input_dir, umap_dim, pca_components, min_class_size, seed, frac, n_embedding_batches, batch_size, prefix, n_sample, selected_labels_kb_path, max_scale, min_scale=None, clustering_grid_step=5, negative_label=NEGATIVE_LABEL, screener_kind='lda')
¶
Parameters that must match between checkpoint and resume.
--fusion-pairs / --fusion-triples are excluded: they only affect which
fusion jobs run; changing them invalidates fusion rows via
:func:reconcile_fusion_checkpoint_params instead of the run fingerprint.
--mode is excluded so you can resume with a different mode (e.g. singletons
under all, then --mode fusion2) on the same data fingerprint.
Source code in pelinker/clustering_quality_checkpoint.py
model_layer_from_singleton_key(key)
¶
Parse 1:model/layer into (model, layer).
Source code in pelinker/clustering_quality_checkpoint.py
reconcile_fusion_checkpoint_params(ckpt, *, fusion_pairs, fusion_triples)
¶
If fusion CLI counts changed, drop arity >= 2 checkpoint state so fusion is recomputed while singletons stay cacheable.
Returns how many distinct fusion combination keys were removed from completed / summaries (0 if fusion params already matched this run).