pelinker.clustering_fusion_ranking¶
Rank single-embedding runs for fusion experiments (DBCV proxy before fused clustering).
singleton_items_by_dbcv_score(valid_files, score_by_model_layer)
¶
One tuple per (model, layer) that has a score, with path and mean DBCV, best-first.
Source code in pelinker/clustering_fusion_ranking.py
top_k_fusion_candidates_by_dbcv_proxy(items, order, k)
¶
Up to k distinct order-tuples of distinct embeddings with highest sum of
per-embedding DBCV scores (cheap proxy before running fused clustering).
Each element is ( paths in combination order, models, layers, sum_singleton_scores, ). Component identity is sorted lexicographically by (model, layer).