pelinker.reporting¶
ClusteringFitMetrics
dataclass
¶
Fit-time clustering diagnostics at a fixed min_cluster_size.
Source code in pelinker/reporting.py
dbcv
instance-attribute
¶
HDBSCAN relative_validity_ when available.
ClusteringHyperparameters
dataclass
¶
HDBSCAN (and related) choices selected by the grid search / smoother.
Add fields here as more knobs participate in optimization; call sites then stay typed.
Source code in pelinker/reporting.py
ClusteringReport
dataclass
¶
Report containing clustering analysis results for one sample.
Source code in pelinker/reporting.py
best_score
instance-attribute
¶
DBCV (relative_validity_) at the chosen min_cluster_size (mean when from aggregate).
manifold_oov_cv = None
class-attribute
instance-attribute
¶
CV F1 summary for 3D manifold OOV model selection; None when disabled or infeasible.
n_clusters_emergent
instance-attribute
¶
Number of HDBSCAN clusters at the chosen min_cluster_size (noise label -1 excluded).
negative_screener_cv = None
class-attribute
instance-attribute
¶
Stratified CV metrics for LDA and linear SVM (negative vs KB); None when screening is off or infeasible.
number_properties
instance-attribute
¶
Count of distinct KB entity labels in the frame used for PCA→UMAP (excludes pelinker.onto.NEGATIVE_LABEL when screening).
pca_mahalanobis_label_01
instance-attribute
¶
Same mask as pca_residual_label_01 (repeated for per-metric plots).
pca_residual_label_01
instance-attribute
¶
1 iff entity == negative_label on that row (same length as pca_residuals).
pca_spectral_entropy_label_01
instance-attribute
¶
Same mask as pca_residual_label_01 (repeated for per-metric plots).
ClusteringSearchSummaryRow
dataclass
¶
One row of the model×layer clustering search table (singleton or fusion label).
Use :meth:to_flat_dict for CSV / pandas / heatmaps (legacy column names).
Source code in pelinker/reporting.py
to_flat_dict()
¶
Keys aligned with historical results.csv and plot_heatmap expectations.
Source code in pelinker/reporting.py
HyperparameterSearchStats
dataclass
¶
Distribution of chosen hyperparameters across repeated clustering samples.
Source code in pelinker/reporting.py
MeanWithUncertainty
dataclass
¶
Sample mean and standard deviation (ddof=1) over repeated runs; std=0 for a single run.
Source code in pelinker/reporting.py
MetricMeanStd
dataclass
¶
NegativeScreenerCvSummary
dataclass
¶
Cross-validated LDA vs linear SVM on the same binary negative-detection task.
Source code in pelinker/reporting.py
NegativeScreenerInSampleMetrics
dataclass
¶
Train-set precision / recall / F1 for detecting negative_label (binary label 1).
Source code in pelinker/reporting.py
ScreenerModelCvBlock
dataclass
¶
Precision / recall / F1 for detecting the negative class (label 1) on held-out folds.
Source code in pelinker/reporting.py
clustering_report_to_jsonable_dict(report)
¶
Flatten a :class:ClusteringReport into JSON-serializable built-ins (no DataFrames/ndarrays).
Intended for json.dumps or for pickling a stable, language-adjacent blob. Schema version
is stored under "schema" for forward compatibility.
Source code in pelinker/reporting.py
clustering_search_summary_row_from_flat_dict(row)
¶
Reconstruct :class:ClusteringSearchSummaryRow from :meth:to_flat_dict output.
Source code in pelinker/reporting.py
entity_negative_label_mask_01(entities, negative_label)
¶
Per-row binary labels aligned with entities: 1 if the row's entity equals
negative_label (same convention as the negative screener positive class), else 0.
Source code in pelinker/reporting.py
linker_fit_clustering_report_path(report_dir)
¶
Filesystem path for the fit-time :class:ClusteringReport JSON under report_dir.
Source code in pelinker/reporting.py
negative_screener_cv_summary_from_eval_dict(raw)
¶
Build a typed summary from :func:pelinker.negative_screener.evaluate_negative_screener_models output.
Source code in pelinker/reporting.py
summarize_clustering_reports_for_search(reports, *, model, layer, pooled_min_cluster_size=None)
¶
Aggregate repeated :class:ClusteringReport runs into one search summary row.
When pooled_min_cluster_size is set (after aggregating grid curves across samples),
best_size / best_size_std report that single consensus hyperparameter (std is 0)
and dbcv is the mean (and std) of each sample's DBCV at that grid point.
Otherwise (independent runs or legacy callers) best_size is the mean of per-report
chosen sizes and dbcv is the mean of each report's best_score.
Raises:
| Type | Description |
|---|---|
ValueError
|
if |
Source code in pelinker/reporting.py
462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 | |
write_clustering_report_json(path, report, *, indent=2)
¶
Serialize report with :func:clustering_report_to_jsonable_dict to UTF-8 JSON.
Parent directories are created when missing.