Hour Manifold Discovery Experiment¶
Research Objective¶
We investigate whether Language Models (specifically GPT-2) represent temporal concepts, specifically the 24-hour cycle, using interpretable topological structures. We test if the internal activation space maps onto specific topological manifolds better than standard baselines.
Topological Hypotheses¶
We compare three distinct topological classes against standard baselines:
1. Figure-8 Topology (Lemniscates)¶
Models the 24-hour cycle as two distinct loops (AM/PM) meeting at a central crossing point.
- rationale: Captures the linguistic and functional distinction between morning and afternoon/evening while maintaining continuity at midnight.
- Variants: Gerono, Bernoulli, and Twisted Lemniscates.
2. Toroidal Topology¶
Models time on the surface of a torus.
- rationale: Encodes two nested periodicities (daily cycle + sub-cycles) without self-intersection points.
- Variants: Standard Torus paths with varying radii ratios.
3. Trefoil Knot ($3_1$ Knot)¶
A non-trivial knot that winds around a torus surface.
- rationale: Represents a complex, self-embedded cycle where the path winds 3 times around the minor axis and 2 times around the major axis.
- Significance: Tests for higher-complexity cyclic structures beyond simple circles.
Methodology¶
We utilize Supervised Multidimensional Scaling (SMDS) to map GPT-2 hidden states to target 3D manifolds.
- Data Generation: Synthesized datasets with 2400 samples (100 per hour) to ensure dense manifold coverage.
- Varied contexts using 50+ names and 40+ distinct actions.
- Model & Extraction: GPT-2 Small.
- Extraction of hidden states at the temporal token position (Layer 6).
- Manifold Definitions: Baselines: Standard shapes (Circle, Spiral, Helix, Linear).
- Hypotheses: Analytically defined 3D coordinates for Figure-8, Torus, and Trefoil shapes.
- Evaluation: Train/Test Split (80/20): Models are fitted on training data; scores are reported on unseen test data to verify structural generalization.
- Metric: Stress score (lower is better), measuring the distortion required to map neural activations to the target geometry.
import warnings
from pathlib import Path
from typing import List, Tuple
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
import torch
from matplotlib.lines import Line2D
from matplotlib.patches import Patch
from PIL import Image
from scipy import stats
from transformers import GPT2Model, GPT2Tokenizer
from smds import UserProvidedSMDSParametrization
from smds.pipeline import open_dashboard
from smds.pipeline.discovery_pipeline import discover_manifolds
from smds.shapes.continuous_shapes import CircularShape, EuclideanShape, LogLinearShape, SemicircularShape, SpiralShape
from smds.shapes.discrete_shapes import (
ChainShape,
ClusterShape,
DiscreteCircularShape,
)
warnings.filterwarnings("ignore")
sns.set_style("whitegrid")
plt.rcParams["figure.dpi"] = 100
%matplotlib inline
/Users/arwinsg/code/supervised-multidimensional-scaling/.venv/lib/python3.13/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html from .autonotebook import tqdm as notebook_tqdm
1. Configuration¶
Experimental Parameters¶
- Random seeds: Multiple independent datasets to assess consistency
- Samples per hour: Balance between statistical power and computational cost
- GPT-2 layer: Middle layer (6/12) where semantic representations are typically strongest
- Cross-validation: 5-fold to ensure robust generalization
Data Diversity¶
We maximize stimulus variability to probe robust representations:
- 4 time formats: 12-hour AM/PM, 24-hour, o'clock notation, natural language
- 51 unique names: Avoid tokenization artifacts
- 40 actions: Diverse semantic contexts
RANDOM_SEEDS = [42, 123, 456, 789, 1024]
N_SAMPLES_PER_HOUR = 10
GPT2_LAYER = 6
N_FOLDS = 5
EXPERIMENT_NAME = "Hour_Manifold_Comprehensive"
NAMES = [
"Alice",
"Bob",
"Charlie",
"George",
"Kevin",
"Laura",
"Michael",
"Rachel",
"William",
"Aaron",
"Ian",
"Kyle",
"Martin",
"Rose",
"Marco",
"Andrew",
"Frank",
"Henry",
"Jack",
"Leon",
"Peter",
"Scott",
"Grant",
"Neil",
"Dean",
"Hope",
"April",
"Connor",
"Brandon",
"Joy",
"Emily",
"Hunter",
"Tyler",
"Blake",
"Dallas",
"Walker",
"John",
"Fred",
"Steve",
"Matt",
"Luke",
"Richard",
"Maria",
"Jerry",
"Robert",
"Mark",
"Max",
"Jason",
"Alex",
"Josh",
"Ryan",
]
ACTIONS = [
"walked the dog",
"made coffee",
"read a book",
"went to sleep",
"ate lunch",
"called a friend",
"watched a movie",
"wrote a letter",
"cleaned the house",
"went for a run",
"cooked dinner",
"played the piano",
"studied for the exam",
"watered the plants",
"checked emails",
"did yoga",
"baked a cake",
"painted a picture",
"fixed the car",
"shopped for groceries",
"meditated",
"took a shower",
"brushed teeth",
"turned off the lights",
"opened the window",
"locked the door",
"started the meeting",
"finished work",
"planned the trip",
"listened to music",
"charged the phone",
"fed the cat",
"drank some tea",
"organized the desk",
"took a nap",
"solved a puzzle",
"played chess",
"wrote code",
"debugged the program",
"deployed the app",
]
TIME_FORMATS = ["12h_am_pm", "24h_colon", "12h_oclock", "natural"]
print("Experimental Configuration:")
print(f" Independent datasets: {len(RANDOM_SEEDS)}")
print(f" Total samples per dataset: {N_SAMPLES_PER_HOUR * 24}")
print(f" Total samples across all datasets: {N_SAMPLES_PER_HOUR * 24 * len(RANDOM_SEEDS)}")
print(f" GPT-2 layer: {GPT2_LAYER}/12")
print(f" Cross-validation folds: {N_FOLDS}")
print("\nStimulus Diversity:")
print(f" Time formats: {len(TIME_FORMATS)}")
print(f" Unique names: {len(NAMES)}")
print(f" Unique actions: {len(ACTIONS)}")
print(f" Theoretical unique sentences: {len(NAMES) * len(ACTIONS) * len(TIME_FORMATS) * 24:,}")
Experimental Configuration: Independent datasets: 5 Total samples per dataset: 240 Total samples across all datasets: 1200 GPT-2 layer: 6/12 Cross-validation folds: 5 Stimulus Diversity: Time formats: 4 Unique names: 51 Unique actions: 40 Theoretical unique sentences: 195,840
2. Data Generation Functions¶
Time Format Conversion¶
We implement 4 distinct time representations to test format-invariance:
- 12h_am_pm: "3pm", "11am" (concise)
- 24h_colon: "15:00", "23:00" (international standard)
- 12h_oclock: "3 o'clock in the afternoon" (verbose)
- natural: "three in the afternoon" (natural language)
def format_time(hour: int, format_type: str) -> str:
"""Format hour (0-23) to string for given format type."""
if format_type == "12h_am_pm":
if hour == 0:
return "12am"
elif hour < 12:
return f"{hour}am"
elif hour == 12:
return "12pm"
else:
return f"{hour - 12}pm"
elif format_type == "24h_colon":
return f"{hour:02d}:00"
elif format_type == "12h_oclock":
h = hour if hour <= 12 else hour - 12
h = 12 if h == 0 else h
period = "morning" if hour < 12 else "afternoon" if hour < 18 else "evening"
return f"{h} o'clock in the {period}"
elif format_type == "natural":
numbers = [
"zero",
"one",
"two",
"three",
"four",
"five",
"six",
"seven",
"eight",
"nine",
"ten",
"eleven",
"twelve",
]
h = hour if hour <= 12 else hour - 12
h = 12 if h == 0 else h
period = "morning" if hour < 12 else "afternoon" if hour < 18 else "evening"
return f"{numbers[h]} in the {period}"
return str(hour)
def generate_time_dataset(n_samples_per_hour: int, seed: int, time_formats: List[str]) -> Tuple[List[str], List[int]]:
"""Generate sentences with hours and return (sentences, hours)."""
sentences = []
hours = []
rng = np.random.default_rng(seed)
for hour in range(24):
for _ in range(n_samples_per_hour):
name = rng.choice(NAMES)
action = rng.choice(ACTIONS)
format_type = rng.choice(time_formats)
time_str = format_time(hour, format_type)
sentence = f"{name} {action} at {time_str}."
sentences.append(sentence)
hours.append(hour)
indices = rng.permutation(len(sentences))
sentences = [sentences[i] for i in indices]
hours = [hours[i] for i in indices]
return sentences, hours
print("Example sentences with different time formats:")
print()
for fmt in TIME_FORMATS:
example_hour = 14
formatted = format_time(example_hour, fmt)
print(f" {fmt:15s}: Alice walked the dog at {formatted}.")
print("\nExample sentences across different hours:")
print()
for hour in [0, 6, 12, 18, 23]:
formatted = format_time(hour, "12h_am_pm")
print(f" Hour {hour:2d}: Bob made breakfast at {formatted}.")
Example sentences with different time formats: 12h_am_pm : Alice walked the dog at 2pm. 24h_colon : Alice walked the dog at 14:00. 12h_oclock : Alice walked the dog at 2 o'clock in the afternoon. natural : Alice walked the dog at two in the afternoon. Example sentences across different hours: Hour 0: Bob made breakfast at 12am. Hour 6: Bob made breakfast at 6am. Hour 12: Bob made breakfast at 12pm. Hour 18: Bob made breakfast at 6pm. Hour 23: Bob made breakfast at 11pm.
3. Load GPT-2 Model¶
We use GPT-2 (small, 117M parameters):
print("Loading GPT-2 model...")
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
model = GPT2Model.from_pretrained("gpt2")
model.eval()
if tokenizer.pad_token is None:
tokenizer.pad_token = tokenizer.eos_token
print("\nModel loaded successfully:")
print(f" Architecture: {model.config.model_type}")
print(f" Hidden size: {model.config.hidden_size}")
print(f" Number of layers: {model.config.n_layer}")
print(f" Number of attention heads: {model.config.n_head}")
print(f" Vocabulary size: {model.config.vocab_size:,}")
Loading GPT-2 model... Model loaded successfully: Architecture: gpt2 Hidden size: 768 Number of layers: 12 Number of attention heads: 12 Vocabulary size: 50,257
Layer Selection: We focus on the middle layer (Layer 6) of GPT-2 Small.
- Rationale: Research shows that the model learns in a hierarchy: lower layers focus on basic word forms, while middle layers capture grammar and structure. This structural understanding builds the foundation for the complex meanings found in the upper layers. (Hewitt, J., & Manning, C. D. (2019). A Structural Probe for Finding Syntax in Word Representations.).
def extract_hour_activations(sentences: List[str], layer_idx: int = 6) -> np.ndarray:
"""Extract GPT-2 hidden states at hour token for each sentence."""
activations = []
with torch.no_grad():
for sentence in sentences:
inputs = tokenizer(sentence, return_tensors="pt", padding=False)
outputs = model(**inputs, output_hidden_states=True)
hidden_states = outputs.hidden_states[layer_idx]
tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"][0])
hour_token_idx = -1
for idx, token in enumerate(tokens):
token_lower = token.lower()
if any(x in token_lower for x in ["am", "pm", ":", "clock", "morning", "afternoon", "evening"]):
hour_token_idx = idx
break
if hour_token_idx == -1:
hour_token_idx = -2
activation = hidden_states[0, hour_token_idx, :].numpy()
activations.append(activation)
return np.array(activations)
4. Generate Multiple Datasets and Extract Activations¶
Reproducibility Protocol¶
We generate 5 independent datasets with different random seeds to:
- Assess consistency of topological structures across stimuli
- Compute confidence intervals for shape goodness-of-fit
- Test generalization beyond specific name-action combinations
Each dataset contains (N per hour × 24 hours) samples.
datasets = []
print(f"Generating {len(RANDOM_SEEDS)} independent datasets...\n")
print("=" * 80)
for seed_idx, seed in enumerate(RANDOM_SEEDS):
print(f"\nDataset {seed_idx + 1}/{len(RANDOM_SEEDS)} (seed={seed})")
print("-" * 80)
sentences, hours = generate_time_dataset(
n_samples_per_hour=N_SAMPLES_PER_HOUR, seed=seed, time_formats=TIME_FORMATS
)
print(f" Generated {len(sentences)} sentences")
print(f' Example: "{sentences[0]}"')
print(f" Extracting GPT-2 activations from layer {GPT2_LAYER}...")
X_activations = extract_hour_activations(sentences, layer_idx=GPT2_LAYER)
datasets.append({"seed": seed, "sentences": sentences, "hours": np.array(hours), "activations": X_activations})
print(f" Activations shape: {X_activations.shape}")
print(f" Statistics: mean={X_activations.mean():.3f}, std={X_activations.std():.3f}")
print(f" Range: [{X_activations.min():.3f}, {X_activations.max():.3f}]")
print("\n" + "=" * 80)
print(f"Total datasets prepared: {len(datasets)}")
print(f"Total samples: {sum(len(d['sentences']) for d in datasets)}")
Generating 5 independent datasets... ================================================================================ Dataset 1/5 (seed=42) -------------------------------------------------------------------------------- Generated 240 sentences Example: "Scott ate lunch at 16:00." Extracting GPT-2 activations from layer 6... Activations shape: (240, 768) Statistics: mean=0.161, std=19.866 Range: [-65.555, 2875.865] Dataset 2/5 (seed=123) -------------------------------------------------------------------------------- Generated 240 sentences Example: "Jack went for a run at 1pm." Extracting GPT-2 activations from layer 6... Activations shape: (240, 768) Statistics: mean=0.085, std=14.208 Range: [-65.555, 2875.865] Dataset 3/5 (seed=456) -------------------------------------------------------------------------------- Generated 240 sentences Example: "Jason started the meeting at two in the morning." Extracting GPT-2 activations from layer 6... Activations shape: (240, 768) Statistics: mean=0.141, std=18.615 Range: [-65.555, 2875.865] Dataset 4/5 (seed=789) -------------------------------------------------------------------------------- Generated 240 sentences Example: "Peter watered the plants at 10:00." Extracting GPT-2 activations from layer 6... Activations shape: (240, 768) Statistics: mean=0.144, std=18.610 Range: [-65.555, 2875.865] Dataset 5/5 (seed=1024) -------------------------------------------------------------------------------- Generated 240 sentences Example: "Marco studied for the exam at 17:00." Extracting GPT-2 activations from layer 6... Activations shape: (240, 768) Statistics: mean=0.216, std=23.220 Range: [-65.555, 2875.865] ================================================================================ Total datasets prepared: 5 Total samples: 1200
5. Define Baseline Shapes¶
Baseline Hypothesis Set¶
We test 10 standard baseline configurations to serve as a reference point for our 3D topological hypotheses. These represent simpler, lower-dimensional structural assumptions:
Continuous (7):
- Circular (2 variants): The standard representation of cyclic time (different radii).
- Spiral (2 variants): Combines cyclicity with linear progression (different winding tightness).
- Euclidean & LogLinear: Standard linear regression assumptions.
- Semicircular: Tests for partial cyclicity.
Discrete (3):
- Chain: Represents time as a sequential path without cyclicity.
- Cluster: Represents time as unordered, distinct categorical groups.
- DiscreteCircular: A step-wise cyclic representation.
Configuration Rationale¶
These baselines test if the model's representation is merely linear or simply cyclic in 2D, before we test the 3D topological hypotheses (Figure-8, Torus).
all_shapes = [
CircularShape(radious=1.0, normalize_labels=True),
CircularShape(radious=2.0, normalize_labels=True),
EuclideanShape(normalize_labels=True),
LogLinearShape(normalize_labels=True),
SpiralShape(initial_radius=0.5, growth_rate=0.1, num_turns=2.0),
SpiralShape(initial_radius=0.5, growth_rate=0.1, num_turns=3.0),
SemicircularShape(normalize_labels=True),
ChainShape(threshold=2.0, normalize_labels=False),
ClusterShape(),
DiscreteCircularShape(),
]
print(f"Baseline Shape Configurations: {len(all_shapes)} total\n")
print("=" * 80)
print("\n1. Continuous Variants (2D/1D):")
print(" - Circular (x2): Standard cyclic clock model (R=1.0, R=2.0)")
print(" - Spiral (x2): Expanding cycle (2 turns, 3 turns)")
print(" - Linear (x3): Euclidean, LogLinear (Weber-Fechner), Semicircular")
print("\n2. Discrete Variants:")
print(" - Chain (x1): Sequential neighbor connectivity")
print(" - Cluster (x1): Unordered distinct hour states")
print(" - DiscCirc (x1): Step-wise cyclic representation")
print("=" * 80)
Baseline Shape Configurations: 10 total ================================================================================ 1. Continuous Variants (2D/1D): - Circular (x2): Standard cyclic clock model (R=1.0, R=2.0) - Spiral (x2): Expanding cycle (2 turns, 3 turns) - Linear (x3): Euclidean, LogLinear (Weber-Fechner), Semicircular 2. Discrete Variants: - Chain (x1): Sequential neighbor connectivity - Cluster (x1): Unordered distinct hour states - DiscCirc (x1): Step-wise cyclic representation ================================================================================
6. Define Specific Manifold Generators¶
Mathematical Parametrizations¶
We directly generate 3D coordinates for five specific manifold variants based on the implementations below:
1. Gerono Lemniscate¶
Classic figure-8 in 3D with a vertical wave modulation: $$x(t) = \cos(t)$$ $$y(t) = \sin(t)\cos(t)$$ $$z(t) = 0.5 \sin(t)\sin(t/2)$$
Mathematical source: Wolfram MathWorld: Eight Curve
2. Bernoulli Lemniscate¶
Infinity symbol with saddle curvature: $$x(t) = \frac{\sqrt{2}\cos(t)}{\sin^2(t) + 1}$$ $$y(t) = \frac{\sqrt{2}\cos(t)\sin(t)}{\sin^2(t) + 1}$$ $$z(t) = \frac{\sin(2t)}{4}$$
Mathematical source: Wolfram MathWorld: Lemniscate
3. Twisted Figure-8¶
Strongly twisted Lissajous variant: $$x(t) = \sin(t)$$ $$y(t) = \frac{\sin(2t)}{2}$$ $$z(t) = \sin(t)\cos(t)$$
Mathematical source: Wolfram MathWorld: Lissajous Curve
Key property: All figure-8 variants have a crossing point (self-intersection) at $t=0$ or center, representing the cyclic "midnight" transition.
4. Torus Path (Helical Trace)¶
A specific path winding around a torus surface: $$x(t) = (R + r\cos(v))\cos(t)$$ $$y(t) = (R + r\cos(v))\sin(t)$$ $$z(t) = r\sin(v)$$
where:
- $v = \text{ratio} \cdot t$ (determines the winding)
- $R$ = major radius (distance from center to tube center)
- $r$ = minor radius (tube thickness)
Mathematical source: Wolfram MathWorld: Torus
5. Trefoil Knot ($3_1$ Knot)¶
A non-trivial topological knot with three distinct lobes (scaled by factor $1/3$ for normalization): $$x(t) = \frac{1}{3} (\sin(t) + 2\sin(2t))$$ $$y(t) = \frac{1}{3} (\cos(t) - 2\cos(2t))$$ $$z(t) = \frac{1}{3} (-\sin(3t))$$
Mathematical source: Wolfram MathWorld: Trefoil Knot
Key properties:
- Torus: No self-intersections (smooth manifold), doubly periodic.
- Trefoil: Non-planar, self-intertwined loop without direct self-intersection points.
def generate_figure8_gerono(n_points: int = 1000) -> np.ndarray:
"""Gerono lemniscate (figure-8) 3D points."""
t = np.linspace(0, 2 * np.pi, n_points)
x = np.cos(t)
y = np.sin(t) * np.cos(t)
z = 0.5 * np.sin(t) * np.sin(t / 2)
return np.stack([x, y, z], axis=1)
def generate_figure8_bernoulli(n_points: int = 1000) -> np.ndarray:
"""Bernoulli lemniscate (figure-8) 3D points."""
t = np.linspace(-np.pi, np.pi, n_points)
a = 1.0
denom = np.sin(t) ** 2 + 1
x = a * np.sqrt(2) * np.cos(t) / denom
y = a * np.sqrt(2) * np.cos(t) * np.sin(t) / denom
z = np.sin(2 * t) / 4
return np.stack([x, y, z], axis=1)
def generate_figure8_twisted(n_points: int = 1000) -> np.ndarray:
"""Twisted lemniscate (figure-8) 3D points."""
t = np.linspace(0, 2 * np.pi, n_points)
x = np.sin(t)
y = np.sin(2 * t) / 2
z = np.sin(t) * np.cos(t)
return np.stack([x, y, z], axis=1)
def generate_torus_path(n_points: int = 1000, R: float = 2.0, r: float = 1.0, ratio: float = 1.0) -> np.ndarray:
"""Torus path 3D points (major R, minor r, winding ratio)."""
t = np.linspace(0, 2 * np.pi, n_points)
v = ratio * t
x = (R + r * np.cos(v)) * np.cos(t)
y = (R + r * np.cos(v)) * np.sin(t)
z = r * np.sin(v)
return np.stack([x, y, z], axis=1)
def generate_trefoil_knot(n_points: int = 1000) -> np.ndarray:
"""Trefoil knot 3D points."""
t = np.linspace(0, 2 * np.pi, n_points)
x = np.sin(t) + 2 * np.sin(2 * t)
y = np.cos(t) - 2 * np.cos(2 * t)
z = -np.sin(3 * t)
return np.stack([x, y, z], axis=1) / 3.0
def map_hours_to_manifold(hours: np.ndarray, manifold_points: np.ndarray) -> np.ndarray:
"""Map hour indices to manifold coordinates by nearest point."""
n_points = manifold_points.shape[0]
indices = np.round((hours / 24.0) * (n_points - 1)).astype(int)
indices = np.clip(indices, 0, n_points - 1)
return manifold_points[indices]
def generate_trefoil_knot_2d(n_points: int = 1000) -> np.ndarray:
"""2D projection of trefoil knot (xy plane)."""
t = np.linspace(0, 2 * np.pi, n_points)
x = np.sin(t) + 2 * np.sin(2 * t)
y = np.cos(t) - 2 * np.cos(2 * t)
return np.stack([x, y], axis=1) / 3.0
topological_configs = [
("Gerono", generate_figure8_gerono(200)),
("Bernoulli", generate_figure8_bernoulli(200)),
("Twisted", generate_figure8_twisted(200)),
("Torus_Path", generate_torus_path(200, R=2.0, r=1.0)),
("Trefoil_Knot_3D", generate_trefoil_knot(200)),
("Trefoil_Knot_2D", generate_trefoil_knot_2d(200)),
]
print(f"Integrating {len(topological_configs)} fixed topological hypotheses...\n")
for name, template_points in topological_configs:
ndim = template_points.shape[1]
hypothesis = UserProvidedSMDSParametrization(
n_components=ndim, fixed_template=template_points, mapper=map_hours_to_manifold, name=name
)
hypothesis.name = name
all_shapes.append(hypothesis)
Integrating 6 fixed topological hypotheses...
7. List all Shapes to test¶
print("=" * 80)
print(f"Final Hypothesis List ({len(all_shapes)} Candidates):")
print("-" * 60)
for i, shape in enumerate(all_shapes):
if hasattr(shape, "name"):
display_name = f"Fixed: {shape.name}"
else:
display_name = shape.__class__.__name__
if hasattr(shape, "n_components"):
dim = f"{shape.n_components}D"
else:
dim = "2D"
print(f"{i + 1:02d}. {display_name:<30} | {dim}")
print("=" * 80)
================================================================================ Final Hypothesis List (16 Candidates): ------------------------------------------------------------ 01. CircularShape | 2D 02. CircularShape | 2D 03. EuclideanShape | 2D 04. LogLinearShape | 2D 05. SpiralShape | 2D 06. SpiralShape | 2D 07. SemicircularShape | 2D 08. ChainShape | 2D 09. ClusterShape | 2D 10. DiscreteCircularShape | 2D 11. Fixed: Gerono | 3D 12. Fixed: Bernoulli | 3D 13. Fixed: Twisted | 3D 14. Fixed: Torus_Path | 3D 15. Fixed: Trefoil_Knot_3D | 3D 16. Fixed: Trefoil_Knot_2D | 2D ================================================================================
8. Visualize Topological Hypotheses¶
We visualize the generated 3D and 2D manifolds to verify their topological properties before using them as targets for the MDS mapping.
The plots below compare:
- Figure-8 Variants: Gerono, Bernoulli, and Twisted Lemniscates.
- Toroidal Variants: Torus path and Trefoil Knot.
fig = plt.figure(figsize=(20, 10))
selected_configs = [
("Gerono Lemniscate", topological_configs[0][1]),
("Bernoulli Lemniscate", topological_configs[1][1]),
("Twisted Figure-8", topological_configs[2][1]),
("Torus (R=2, r=1)", topological_configs[3][1]),
("Trefoil Knot", topological_configs[4][1]),
("Trefoil Knot 2D", topological_configs[5][1]),
]
for idx, (name, manifold) in enumerate(selected_configs):
ndim = manifold.shape[1]
hours_normalized = np.linspace(0, 24, manifold.shape[0])
if ndim == 2:
ax = fig.add_subplot(2, 3, idx + 1)
scatter = ax.scatter(manifold[:, 0], manifold[:, 1], c=hours_normalized, cmap="twilight", s=20, alpha=0.6)
ax.plot(manifold[:, 0], manifold[:, 1], "gray", linewidth=1, alpha=0.4)
ax.scatter(
manifold[0, 0],
manifold[0, 1],
c="red",
s=200,
marker="*",
edgecolors="black",
linewidths=2,
label="Midnight (t=0)",
zorder=10,
)
ax.set_xlabel("X", fontsize=10)
ax.set_ylabel("Y", fontsize=10)
ax.set_aspect("equal", adjustable="datalim")
else:
ax = fig.add_subplot(2, 3, idx + 1, projection="3d")
scatter = ax.scatter(
manifold[:, 0], manifold[:, 1], manifold[:, 2], c=hours_normalized, cmap="twilight", s=20, alpha=0.6
)
ax.plot(manifold[:, 0], manifold[:, 1], manifold[:, 2], "gray", linewidth=1, alpha=0.4)
ax.scatter(
manifold[0, 0],
manifold[0, 1],
manifold[0, 2],
c="red",
s=200,
marker="*",
edgecolors="black",
linewidths=2,
label="Midnight (t=0)",
zorder=10,
)
ax.set_xlabel("X", fontsize=10)
ax.set_ylabel("Y", fontsize=10)
ax.set_zlabel("Z", fontsize=10)
ax.view_init(elev=20, azim=45)
ax.set_title(name, fontsize=14, fontweight="bold")
ax.legend(fontsize=9)
cbar = plt.colorbar(scatter, ax=ax, shrink=0.6, pad=0.1)
cbar.set_label("Hour", fontsize=9)
plt.suptitle("Topological Hypotheses: Figure-8 and Toroidal Parametrizations", fontsize=16, fontweight="bold", y=0.98)
plt.tight_layout()
plt.show()
9. Run Comprehensive Manifold Discovery¶
SMDS Pipeline¶
For each dataset and shape configuration:
- Distance Calculation: Compute the ideal pairwise distance matrix ($D_{target}$) based on the shape's geometry.
- Optimization: Find the linear projection ($W$) of the GPT-2 activations that minimizes the stress (difference between activation distances and target distances).
- Cross-Validation: Train on 80% of the data, evaluate the stress score on the held-out 20% (5-fold CV).
- Aggregation: Average the normalized stress scores across all folds and datasets.
Metric: Score (Higher is better).
Computational Load: The pipeline executes 350 SMDS fits in total:
- 5 Independent Datasets
- 14 Shape Configurations
- 5 Cross-Validation Folds per shape/dataset
all_results = []
visualization_paths = []
last_csv_path = None
print("Running comprehensive manifold discovery...\n")
n_fits = len(datasets) * len(all_shapes) * N_FOLDS
print(f"Total fits: {len(datasets)} datasets × {len(all_shapes)} shapes × {N_FOLDS} folds = {n_fits}\n")
print("=" * 80)
for dataset_idx, dataset in enumerate(datasets):
print(f"\nDataset {dataset_idx + 1}/{len(datasets)} (seed={dataset['seed']})")
print("-" * 80)
results_df, csv_path = discover_manifolds(
dataset["activations"],
dataset["hours"],
shapes=all_shapes,
n_folds=N_FOLDS,
n_jobs=-1,
experiment_name=f"{EXPERIMENT_NAME}_seed{dataset['seed']}",
save_results=True,
create_png_visualization=True,
clear_cache=True,
)
results_df["dataset_seed"] = dataset["seed"]
results_df["dataset_idx"] = dataset_idx
all_results.append(results_df)
if csv_path:
last_csv_path = csv_path
result_dir = Path(csv_path).parent
viz_path = result_dir / f"{result_dir.name}_visualized.png"
if viz_path.exists():
visualization_paths.append(viz_path)
print("\nTop 5 shapes for this dataset:")
display_cols = [
col for col in results_df.columns if any(x in col.lower() for x in ["shape", "mean", "stress", "score"])
]
if display_cols:
print(results_df[display_cols].head(5).to_string(index=False))
combined_results = pd.concat(all_results, ignore_index=True)
print("\n" + "=" * 80)
print(f"Discovery complete: {combined_results.shape[0]} total results")
print(f"Visualization plots collected: {len(visualization_paths)}")
Running comprehensive manifold discovery...
Total fits: 5 datasets × 16 shapes × 5 folds = 400
================================================================================
Dataset 1/5 (seed=42)
--------------------------------------------------------------------------------
Saving to: /Users/arwinsg/code/supervised-multidimensional-scaling/smds/pipeline/saved_results/Hour_Manifold_Comprehensive_seed42_2026-02-13_141640_2cf641/Hour_Manifold_Comprehensive_seed42_2026-02-13_141640_2cf641.csv
Computed and cached CircularShape
Computed and cached CircularShape
Computed and cached EuclideanShape
Computed and cached LogLinearShape
Computed and cached SpiralShape
Computed and cached SpiralShape
Computed and cached SemicircularShape
Warning: Distance matrix is incomplete. Using optimization to fit W.
Warning: Distance matrix is incomplete. Using optimization to fit W.
Warning: Distance matrix is incomplete. Using optimization to fit W.
Warning: Distance matrix is incomplete. Using optimization to fit W.
Warning: Distance matrix is incomplete. Using optimization to fit W.
Warning: Distance matrix is incomplete. Using optimization to fit W.
Warning: Distance matrix is incomplete. Using optimization to fit W.
Warning: Distance matrix is incomplete. Using optimization to fit W.
Warning: Distance matrix is incomplete. Using optimization to fit W.
Warning: Distance matrix is incomplete. Using optimization to fit W.
Computed and cached ChainShape
Computed and cached ClusterShape
Computed and cached DiscreteCircularShape
Computed and cached Gerono
Computed and cached Bernoulli
Computed and cached Twisted
Computed and cached Torus_Path
Computed and cached Trefoil_Knot_3D
Computed and cached Trefoil_Knot_2D
Visual result saved under: /Users/arwinsg/code/supervised-multidimensional-scaling/smds/pipeline/saved_results/Hour_Manifold_Comprehensive_seed42_2026-02-13_141640_2cf641/Hour_Manifold_Comprehensive_seed42_2026-02-13_141640_2cf641_visualized.png
Cache cleared
Top 5 shapes for this dataset:
shape mean_scale_normalized_stress std_scale_normalized_stress fold_scale_normalized_stress mean_non_metric_stress std_non_metric_stress fold_non_metric_stress mean_shepard_goodness_score std_shepard_goodness_score fold_shepard_goodness_score mean_normalized_stress std_normalized_stress fold_normalized_stress mean_normalized_kl_divergence
Trefoil_Knot_3D 0.611761 0.005860 [0.6061050047733393, 0.6161237116212532, 0.6041266172716558, 0.6195831209129936, 0.6128674326793493] 0.885646 0.002449 [0.8828562561770309, 0.8876638867982793, 0.8824918278776901, 0.8871257359067307, 0.8880924774459765] 0.521005 0.018450 [0.5130629612636001, 0.525210453979686, 0.5061080301417783, 0.5550763836656577, 0.5055651982981811] 0.605928 0.009781 [0.5989431395999996, 0.6159891798017821, 0.5917766239445754, 0.617243890781432, 0.6056850753916797] 0.941224
SpiralShape 0.593058 0.022953 [0.6176201410295181, 0.5883016052408122, 0.5851936592913921, 0.556467995505818, 0.6177065670043709] 0.874233 0.010365 [0.8852138623782198, 0.8739994513711462, 0.8686892820269819, 0.8579887229219502, 0.8852753503377709] 0.536947 0.055237 [0.5896699927583219, 0.519688980058846, 0.5240548170837026, 0.4489872126833727, 0.6023321489178431] 0.588821 0.026006 [0.6136773064457457, 0.5878548570193592, 0.5841194067469431, 0.5434740604739523, 0.6149792457191133] 0.778463
Trefoil_Knot_2D 0.590938 0.007418 [0.5822522421284757, 0.5997364966364604, 0.5856321760835728, 0.5998858783710803, 0.5871854662327751] 0.868411 0.003175 [0.8649571547620065, 0.8724557497317181, 0.866700101324063, 0.8720158691198401, 0.8659238841887797] 0.553431 0.017192 [0.5347779452905659, 0.5654291488452475, 0.5444694026838341, 0.5810343074152082, 0.5414443403586036] 0.585532 0.010372 [0.5746965683939902, 0.5991714481622031, 0.5758613011002074, 0.5966246997940667, 0.5813082719731264] 0.945988
ClusterShape 0.490707 0.025104 [0.4508576969229048, 0.479836335706078, 0.527328345666773, 0.4992488038994769, 0.49626542282323227] 0.762488 0.025240 [0.7178213770440961, 0.7553991006856542, 0.7918236448244781, 0.7774473011575358, 0.7699473814346514] 0.030950 0.029251 [0.04559650960273986, 0.018754551322276866, 0.0812021337820214, 0.003971616638316422, 0.005227506777120248] 0.246567 0.019180 [0.2554196775547478, 0.2532276658323124, 0.22723492423510405, 0.27431846261983883, 0.2226353956375896] 0.993871
Twisted 0.485653 0.031239 [0.437033231620364, 0.527924421332908, 0.48680201445384996, 0.46935997357002, 0.5071436102996187] 0.808917 0.017385 [0.7819538688522827, 0.8202967314026199, 0.8028006850113265, 0.8059492152378126, 0.833583440490511] 0.342918 0.089692 [0.22001421988147885, 0.49259681407457073, 0.3617748792228423, 0.2944334723521389, 0.3457701942507499] 0.424526 0.041603 [0.34546152581658596, 0.4640629930671194, 0.42360328485565946, 0.44151551214910356, 0.4479849663253328] 0.791637
Dataset 2/5 (seed=123)
--------------------------------------------------------------------------------
Saving to: /Users/arwinsg/code/supervised-multidimensional-scaling/smds/pipeline/saved_results/Hour_Manifold_Comprehensive_seed123_2026-02-13_141714_2958a5/Hour_Manifold_Comprehensive_seed123_2026-02-13_141714_2958a5.csv
Computed and cached CircularShape
Computed and cached CircularShape
Computed and cached EuclideanShape
Computed and cached LogLinearShape
Computed and cached SpiralShape
Computed and cached SpiralShape
Computed and cached SemicircularShape
Warning: Distance matrix is incomplete. Using optimization to fit W.
Warning: Distance matrix is incomplete. Using optimization to fit W.
Warning: Distance matrix is incomplete. Using optimization to fit W.
Warning: Distance matrix is incomplete. Using optimization to fit W.
Warning: Distance matrix is incomplete. Using optimization to fit W.
Warning: Distance matrix is incomplete. Using optimization to fit W.
Warning: Distance matrix is incomplete. Using optimization to fit W.
Warning: Distance matrix is incomplete. Using optimization to fit W.
Warning: Distance matrix is incomplete. Using optimization to fit W.
Warning: Distance matrix is incomplete. Using optimization to fit W.
Computed and cached ChainShape
Computed and cached ClusterShape
Computed and cached DiscreteCircularShape
Computed and cached Gerono
Computed and cached Bernoulli
Computed and cached Twisted
Computed and cached Torus_Path
Computed and cached Trefoil_Knot_3D
Computed and cached Trefoil_Knot_2D
Visual result saved under: /Users/arwinsg/code/supervised-multidimensional-scaling/smds/pipeline/saved_results/Hour_Manifold_Comprehensive_seed123_2026-02-13_141714_2958a5/Hour_Manifold_Comprehensive_seed123_2026-02-13_141714_2958a5_visualized.png
Cache cleared
Top 5 shapes for this dataset:
shape mean_scale_normalized_stress std_scale_normalized_stress fold_scale_normalized_stress mean_non_metric_stress std_non_metric_stress fold_non_metric_stress mean_shepard_goodness_score std_shepard_goodness_score fold_shepard_goodness_score mean_normalized_stress std_normalized_stress fold_normalized_stress mean_normalized_kl_divergence
Trefoil_Knot_3D 0.620543 0.037787 [0.6083870015393387, 0.592194326843061, 0.5820680571482378, 0.6318474882121028, 0.6882168578233802] 0.895446 0.016531 [0.8963455474658334, 0.8844490966883708, 0.875144422458971, 0.897035858550036, 0.9242555881473346] 0.518628 0.095178 [0.48124411856320537, 0.47195112189304944, 0.39994361313889437, 0.560509889548512, 0.6794927856286493] 0.609716 0.037323 [0.6011817908960082, 0.5655479261745491, 0.5814166275434407, 0.6294813818117069, 0.6709529531957245] 0.936807
SpiralShape 0.596873 0.025656 [0.6267282368737255, 0.5528726024777431, 0.5872068790642272, 0.6021960038540459, 0.6153637587861711] 0.875782 0.010954 [0.8881799046998203, 0.8566051459713785, 0.8716341890809102, 0.8803535966492853, 0.8821375881145194] 0.558595 0.050939 [0.6303298153734521, 0.5028087182155097, 0.5053022688573777, 0.5529946355100179, 0.6015407042095717] 0.584495 0.043076 [0.6265342750812344, 0.5027245566645271, 0.5872008138740646, 0.5949019535179301, 0.6111143664198403] 0.771689
Trefoil_Knot_2D 0.592883 0.044457 [0.5717251743509406, 0.5638198745070908, 0.5453035247397138, 0.6140058464579378, 0.6695619498985034] NaN NaN [nan, 0.864396960520317, 0.8461192982376546, 0.885389936044614, 0.9113095002115129] 0.534245 0.095723 [0.49543266499785904, 0.47918029445758153, 0.41969428651047513, 0.5817429616437599, 0.6951748167658639] 0.581405 0.042231 [0.5676064646571882, 0.5355608621218412, 0.5451121108721735, 0.6105796557071232, 0.6481634376967444] 0.940710
Twisted 0.497035 0.028454 [0.44278427506005913, 0.5062073693928435, 0.4985725631211665, 0.5247457463349563, 0.5128659550559005] 0.814038 0.021509 [0.7733992192450169, 0.8275014283688383, 0.8152226757092238, 0.8354377733277459, 0.8186293057638585] 0.369370 0.041001 [0.30312603999586496, 0.3716623224552408, 0.3519552563205787, 0.39593712936189746, 0.4241682837273716] 0.439304 0.044567 [0.37819604176333466, 0.411201815354734, 0.4352637268218654, 0.5087648586984672, 0.4630940951300143] 0.793497
Gerono 0.481609 0.021091 [0.4598658548263409, 0.47134447781359057, 0.46956114477709143, 0.4873505204966453, 0.5199234476842114] 0.820454 0.014187 [0.7981472860411327, 0.8133799254216147, 0.819645908500464, 0.8366240052684231, 0.8344749856943612] 0.244224 0.065382 [0.19123245076269157, 0.23553151287501034, 0.18876031311997288, 0.2374067956431743, 0.36818969173370264] 0.447031 0.019963 [0.43962068367099716, 0.4367277141219361, 0.4332841499928761, 0.48671710205712915, 0.4388074849066087] 0.836502
Dataset 3/5 (seed=456)
--------------------------------------------------------------------------------
Saving to: /Users/arwinsg/code/supervised-multidimensional-scaling/smds/pipeline/saved_results/Hour_Manifold_Comprehensive_seed456_2026-02-13_141740_11492d/Hour_Manifold_Comprehensive_seed456_2026-02-13_141740_11492d.csv
Computed and cached CircularShape
Computed and cached CircularShape
Computed and cached EuclideanShape
Computed and cached LogLinearShape
Computed and cached SpiralShape
Computed and cached SpiralShape
Computed and cached SemicircularShape
Warning: Distance matrix is incomplete. Using optimization to fit W.
Warning: Distance matrix is incomplete. Using optimization to fit W.
Warning: Distance matrix is incomplete. Using optimization to fit W.
Warning: Distance matrix is incomplete. Using optimization to fit W.
Warning: Distance matrix is incomplete. Using optimization to fit W.
Warning: Distance matrix is incomplete. Using optimization to fit W.
Warning: Distance matrix is incomplete. Using optimization to fit W.
Warning: Distance matrix is incomplete. Using optimization to fit W.
Warning: Distance matrix is incomplete. Using optimization to fit W.
Warning: Distance matrix is incomplete. Using optimization to fit W.
Computed and cached ChainShape
Computed and cached ClusterShape
Computed and cached DiscreteCircularShape
Computed and cached Gerono
Computed and cached Bernoulli
Computed and cached Twisted
Computed and cached Torus_Path
Computed and cached Trefoil_Knot_3D
Computed and cached Trefoil_Knot_2D
Visual result saved under: /Users/arwinsg/code/supervised-multidimensional-scaling/smds/pipeline/saved_results/Hour_Manifold_Comprehensive_seed456_2026-02-13_141740_11492d/Hour_Manifold_Comprehensive_seed456_2026-02-13_141740_11492d_visualized.png
Cache cleared
Top 5 shapes for this dataset:
shape mean_scale_normalized_stress std_scale_normalized_stress fold_scale_normalized_stress mean_non_metric_stress std_non_metric_stress fold_non_metric_stress mean_shepard_goodness_score std_shepard_goodness_score fold_shepard_goodness_score mean_normalized_stress std_normalized_stress fold_normalized_stress mean_normalized_kl_divergence
SpiralShape 0.602468 0.078251 [0.6726244516330431, 0.6494043567328367, 0.630989851326366, 0.4519571343560074, 0.6073649229644604] 0.867244 0.059732 [0.9100530714604332, 0.9025992104754288, 0.8922936881795644, 0.7492934986223632, 0.8819797851837281] 0.613439 0.082205 [0.7123680679643595, 0.6560906361123175, 0.6328210415426564, 0.4664304578078993, 0.5994826546811717] 0.575926 0.109891 [0.6693060477725639, 0.6493918459471062, 0.6299311601571699, 0.36813842270062447, 0.5628609884468092] 0.741697
Trefoil_Knot_3D 0.586313 0.067427 [0.6449681463575391, 0.623991877097924, 0.5880047092286118, 0.4564786036764662, 0.6181227390395869] 0.866135 0.056304 [0.9036088290117344, 0.8984855254585097, 0.877059158008932, 0.7549778826990567, 0.8965432882340056] 0.504998 0.067701 [0.5999299189554197, 0.5460360971028034, 0.41963414298449814, 0.4369533432911934, 0.522437461011109] 0.529307 0.165267 [0.6313010314249521, 0.6168850502821708, 0.5844618466618365, 0.2001758522476872, 0.6137106273880868] 0.899866
Trefoil_Knot_2D 0.566652 0.057947 [0.6250608554861063, 0.6101033657238544, 0.5435514091115521, 0.46460308436811015, 0.5899392143162187] NaN NaN [0.8833971612832898, 0.8851326210441623, nan, 0.767503032242789, nan] 0.523189 0.082935 [0.6373786317992622, 0.5814949202072742, 0.4121685755928486, 0.44922424237716396, 0.5356798498603034] 0.520714 0.134164 [0.6157781444968954, 0.60502874897345, 0.5403822755715495, 0.2573973468130011, 0.5849845660097395] 0.918106
Twisted 0.489316 0.043557 [0.5220020977861441, 0.5097660824218724, 0.5322614684514344, 0.41353459543992743, 0.4690135000035244] 0.813319 0.024107 [0.8253304020320068, 0.8336422601182398, 0.8311797062660288, 0.7686032979685933, 0.8078400566608288] 0.354195 0.071784 [0.4249916525273554, 0.3911750745517101, 0.41927750787314444, 0.2575443729817177, 0.2779849150207873] 0.417488 0.107166 [0.4668293630162159, 0.4851583891512843, 0.48103712040673097, 0.20459169082161133, 0.44982512826578835] 0.804013
ClusterShape 0.471764 0.031982 [0.47706381811302967, 0.4988363478603506, 0.5080691131305134, 0.4190287776688547, 0.4558215288213916] 0.743834 0.035728 [0.7462042031315237, 0.7789395454731816, 0.7805509005986866, 0.6838915784142278, 0.7295838430123307] 0.044358 0.027309 [0.03512430098504914, 0.0021655161927244663, 0.05434883694567549, 0.08634970026426306, 0.0438035775479592] 0.256932 0.030741 [0.29430175172069506, 0.27033592547683316, 0.20095937043296386, 0.2584381269898862, 0.2606234542078699] 0.992842
Dataset 4/5 (seed=789)
--------------------------------------------------------------------------------
Saving to: /Users/arwinsg/code/supervised-multidimensional-scaling/smds/pipeline/saved_results/Hour_Manifold_Comprehensive_seed789_2026-02-13_141806_533304/Hour_Manifold_Comprehensive_seed789_2026-02-13_141806_533304.csv
Computed and cached CircularShape
Computed and cached CircularShape
Computed and cached EuclideanShape
Computed and cached LogLinearShape
Computed and cached SpiralShape
Computed and cached SpiralShape
Computed and cached SemicircularShape
Warning: Distance matrix is incomplete. Using optimization to fit W.
Warning: Distance matrix is incomplete. Using optimization to fit W.
Warning: Distance matrix is incomplete. Using optimization to fit W.
Warning: Distance matrix is incomplete. Using optimization to fit W.
Warning: Distance matrix is incomplete. Using optimization to fit W.
Warning: Distance matrix is incomplete. Using optimization to fit W.
Warning: Distance matrix is incomplete. Using optimization to fit W.
Warning: Distance matrix is incomplete. Using optimization to fit W.
Warning: Distance matrix is incomplete. Using optimization to fit W.
Warning: Distance matrix is incomplete. Using optimization to fit W.
Computed and cached ChainShape
Computed and cached ClusterShape
Computed and cached DiscreteCircularShape
Computed and cached Gerono
Computed and cached Bernoulli
Computed and cached Twisted
Computed and cached Torus_Path
Computed and cached Trefoil_Knot_3D
Computed and cached Trefoil_Knot_2D
Visual result saved under: /Users/arwinsg/code/supervised-multidimensional-scaling/smds/pipeline/saved_results/Hour_Manifold_Comprehensive_seed789_2026-02-13_141806_533304/Hour_Manifold_Comprehensive_seed789_2026-02-13_141806_533304_visualized.png
Cache cleared
Top 5 shapes for this dataset:
shape mean_scale_normalized_stress std_scale_normalized_stress fold_scale_normalized_stress mean_non_metric_stress std_non_metric_stress fold_non_metric_stress mean_shepard_goodness_score std_shepard_goodness_score fold_shepard_goodness_score mean_normalized_stress std_normalized_stress fold_normalized_stress mean_normalized_kl_divergence
Trefoil_Knot_3D 0.602163 0.020283 [0.629419775781632, 0.6183028308719856, 0.6041557270338161, 0.5763377787300086, 0.5825995111025162] 0.881170 0.013014 [0.8975187737100048, 0.8929135832967331, 0.8813843515362217, 0.8715442933756004, 0.8624908801537776] 0.493759 0.042423 [0.541400746098268, 0.522616567485095, 0.4740354716141925, 0.4212380278977278, 0.5095028577519014] 0.581830 0.034815 [0.6235664498065632, 0.6133246190493449, 0.5862571084156196, 0.5550802560255783, 0.5309228292186565] 0.929704
SpiralShape 0.584383 0.025790 [0.6248839357625022, 0.5760972391447572, 0.5927225762244661, 0.5831397499925859, 0.5450697288258511] 0.866846 0.016471 [0.8890922027472794, 0.8679258686454143, 0.8710803823356157, 0.8682773904577746, 0.8378561280674026] 0.528306 0.043966 [0.6063343739218352, 0.47838053972968053, 0.5334300858611427, 0.5275951557328048, 0.4957909465214621] 0.568030 0.035634 [0.6239541756221197, 0.5546119372267215, 0.5880970272100853, 0.5551960020283074, 0.5182907195057966] 0.746470
Trefoil_Knot_2D 0.572568 0.020002 [0.5969838761081758, 0.5896627632756524, 0.5769880293207006, 0.5452557860243636, 0.5539502008538526] NaN NaN [nan, 0.8711891217112202, 0.8569779428217936, 0.848397125744314, 0.8381838843468549] 0.507705 0.038920 [0.5408535711185724, 0.5340368617521628, 0.5059890084217673, 0.43348857782102485, 0.5241568140042795] 0.551260 0.034503 [0.5877808577136119, 0.5846716263548226, 0.5610133479709652, 0.5221870236850494, 0.5006455045379562] 0.932833
Twisted 0.497777 0.032059 [0.5103504285381464, 0.4974695808869243, 0.5305568771700888, 0.5132759818039825, 0.43723062070607555] 0.811804 0.032521 [0.8318927745782561, 0.817118891951259, 0.8356149263871047, 0.8264290539432367, 0.7479653039980909] 0.374274 0.032068 [0.3332199686203896, 0.339785153999245, 0.403325688632913, 0.4105984446576336, 0.38443831219957597] 0.439844 0.075414 [0.4656720112671524, 0.47950734201182843, 0.5005534613045091, 0.4620310724045129, 0.2914579904708977] 0.812911
Gerono 0.492541 0.013426 [0.4970351791277908, 0.4688723530402452, 0.48745401911893305, 0.5040872874286143, 0.5052581169987025] 0.818576 0.022875 [0.8348227238822946, 0.7746382908976612, 0.8193443131466445, 0.8266157041157203, 0.8374592282960671] 0.310416 0.034717 [0.28580983296971296, 0.33680691113507966, 0.25408732862622035, 0.34334576873534606, 0.332030219899426] 0.454451 0.032396 [0.46896710653769746, 0.4029977826568393, 0.43482847668443214, 0.46850523681426703, 0.496956273070456] 0.852087
Dataset 5/5 (seed=1024)
--------------------------------------------------------------------------------
Saving to: /Users/arwinsg/code/supervised-multidimensional-scaling/smds/pipeline/saved_results/Hour_Manifold_Comprehensive_seed1024_2026-02-13_141833_252039/Hour_Manifold_Comprehensive_seed1024_2026-02-13_141833_252039.csv
Computed and cached CircularShape
Computed and cached CircularShape
Computed and cached EuclideanShape
Computed and cached LogLinearShape
Computed and cached SpiralShape
Computed and cached SpiralShape
Computed and cached SemicircularShape
Warning: Distance matrix is incomplete. Using optimization to fit W.
Warning: Distance matrix is incomplete. Using optimization to fit W.
Warning: Distance matrix is incomplete. Using optimization to fit W.
Warning: Distance matrix is incomplete. Using optimization to fit W.
Warning: Distance matrix is incomplete. Using optimization to fit W.
Warning: Distance matrix is incomplete. Using optimization to fit W.
Warning: Distance matrix is incomplete. Using optimization to fit W.
Warning: Distance matrix is incomplete. Using optimization to fit W.
Warning: Distance matrix is incomplete. Using optimization to fit W.
Warning: Distance matrix is incomplete. Using optimization to fit W.
Computed and cached ChainShape
Computed and cached ClusterShape
Computed and cached DiscreteCircularShape
Computed and cached Gerono
Computed and cached Bernoulli
Computed and cached Twisted
Computed and cached Torus_Path
Computed and cached Trefoil_Knot_3D
Computed and cached Trefoil_Knot_2D
Visual result saved under: /Users/arwinsg/code/supervised-multidimensional-scaling/smds/pipeline/saved_results/Hour_Manifold_Comprehensive_seed1024_2026-02-13_141833_252039/Hour_Manifold_Comprehensive_seed1024_2026-02-13_141833_252039_visualized.png
Cache cleared
Top 5 shapes for this dataset:
shape mean_scale_normalized_stress std_scale_normalized_stress fold_scale_normalized_stress mean_non_metric_stress std_non_metric_stress fold_non_metric_stress mean_shepard_goodness_score std_shepard_goodness_score fold_shepard_goodness_score mean_normalized_stress std_normalized_stress fold_normalized_stress mean_normalized_kl_divergence
Trefoil_Knot_3D 0.569014 0.014884 [0.594368566364656, 0.5648204964203702, 0.573951580717086, 0.5495494778416599, 0.5623818350100633] 0.867210 0.014588 [0.8891110912690009, 0.8622457403066924, 0.8790400851500124, 0.8511003185355736, 0.8545517417241358] 0.407518 0.021025 [0.4422029559869142, 0.4165153965107484, 0.381980331307386, 0.3910470876081606, 0.4058444698196147] 0.542759 0.038707 [0.5829662797756366, 0.47678869615845954, 0.5735303572141823, 0.5234780318010244, 0.5570313079656186] 0.919707
SpiralShape 0.560052 0.035410 [0.5549822368534079, 0.6245115941920113, 0.5645100975999005, 0.5242179039672497, 0.5320379849702935] 0.854632 0.020933 [0.8508929111848205, 0.8943214979274784, 0.8511732487863896, 0.832823177223993, 0.8439473078197561] 0.463745 0.074303 [0.4323783953384882, 0.5981215128830666, 0.4822691979943208, 0.42342199756210197, 0.38253602283133076] 0.546001 0.040893 [0.5532669693489086, 0.6083711511574623, 0.5625309531250939, 0.488208639642853, 0.5176276743670523] 0.743346
Trefoil_Knot_2D 0.538806 0.010729 [0.5548638157515717, 0.5404128358061906, 0.5445586176829499, 0.5251125689943235, 0.5290818987978639] NaN NaN [nan, 0.8448241852189537, nan, 0.8266318927159506, 0.8293768446022831] 0.427716 0.019342 [0.4473062802659223, 0.45250050902390954, 0.415280058012473, 0.4219412672059971, 0.40155216004237027] 0.514379 0.035571 [0.5497490358892585, 0.4513972714472042, 0.5444992354723578, 0.503164646209446, 0.5230871712196901] 0.925645
ClusterShape 0.473459 0.025164 [0.5173235825668769, 0.45694651348641013, 0.44880405014218594, 0.4854884234184348, 0.4587317884442873] 0.747073 0.029832 [0.795311857798801, 0.7243753442017837, 0.7224016528593198, 0.7689834214994261, 0.7242943719529127] 0.030516 0.019448 [0.013897907120885004, 0.035763927536758326, 0.03519134577971763, 0.006086595949726599, 0.06164083612632504] 0.239555 0.016900 [0.2638119346581511, 0.23424098993064246, 0.21273185865237954, 0.23798290732972516, 0.2490095830233372] 0.993312
Twisted 0.450078 0.008446 [0.462750126132612, 0.4425541889187665, 0.4461556142571177, 0.4573180702239876, 0.4416112131436356] 0.788328 0.011444 [0.8095316135407774, 0.7809757348963715, 0.7779705957059548, 0.7822358099633919, 0.7909281708359516] 0.240379 0.039368 [0.21552142747022734, 0.238359196240515, 0.2612911038047962, 0.30112138357335944, 0.18560262754334897] 0.349450 0.071884 [0.4146305693499439, 0.368918717932715, 0.32330508917864953, 0.2232078342645426, 0.4171886136679299] 0.742436
================================================================================
Discovery complete: 80 total results
Visualization plots collected: 5
Open Streamlit Dashboard¶
Run the cell below to open the Streamlit dashboard with the last discovery run. The dashboard shows results and interactive plots; it opens in your browser (or a new tab).
open_dashboard.main(last_csv_path)
Launching Dashboard... You can now view your Streamlit app in your browser. Local URL: http://localhost:8501 Network URL: http://192.168.2.135:8501 Stopping...
--------------------------------------------------------------------------- KeyboardInterrupt Traceback (most recent call last) Cell In[13], line 1 ----> 1 open_dashboard.main(last_csv_path) File ~/code/supervised-multidimensional-scaling/smds/pipeline/open_dashboard.py:12, in main(saved_file_path) 9 visualizer_path = os.path.join(os.path.dirname(__file__), "dashboard.py") 11 if saved_file_path: ---> 12 subprocess.run(["streamlit", "run", visualizer_path, "--", saved_file_path]) 13 else: 14 subprocess.run(["streamlit", "run", visualizer_path]) File ~/.local/share/uv/python/cpython-3.13.5-macos-aarch64-none/lib/python3.13/subprocess.py:556, in run(input, capture_output, timeout, check, *popenargs, **kwargs) 554 with Popen(*popenargs, **kwargs) as process: 555 try: --> 556 stdout, stderr = process.communicate(input, timeout=timeout) 557 except TimeoutExpired as exc: 558 process.kill() File ~/.local/share/uv/python/cpython-3.13.5-macos-aarch64-none/lib/python3.13/subprocess.py:1214, in Popen.communicate(self, input, timeout) 1212 stderr = self.stderr.read() 1213 self.stderr.close() -> 1214 self.wait() 1215 else: 1216 if timeout is not None: File ~/.local/share/uv/python/cpython-3.13.5-macos-aarch64-none/lib/python3.13/subprocess.py:1280, in Popen.wait(self, timeout) 1278 endtime = _time() + timeout 1279 try: -> 1280 return self._wait(timeout=timeout) 1281 except KeyboardInterrupt: 1282 # https://bugs.python.org/issue25942 1283 # The first keyboard interrupt waits briefly for the child to 1284 # exit under the common assumption that it also received the ^C 1285 # generated SIGINT and will exit rapidly. 1286 if timeout is not None: File ~/.local/share/uv/python/cpython-3.13.5-macos-aarch64-none/lib/python3.13/subprocess.py:2066, in Popen._wait(self, timeout) 2064 if self.returncode is not None: 2065 break # Another thread waited. -> 2066 (pid, sts) = self._try_wait(0) 2067 # Check the pid and loop as waitpid has been known to 2068 # return 0 even without WNOHANG in odd situations. 2069 # http://bugs.python.org/issue14396. 2070 if pid == self.pid: File ~/.local/share/uv/python/cpython-3.13.5-macos-aarch64-none/lib/python3.13/subprocess.py:2024, in Popen._try_wait(self, wait_flags) 2022 """All callers to this function MUST hold self._waitpid_lock.""" 2023 try: -> 2024 (pid, sts) = os.waitpid(self.pid, wait_flags) 2025 except ChildProcessError: 2026 # This happens if SIGCLD is set to be ignored or waiting 2027 # for child processes has otherwise been disabled for our 2028 # process. This child is dead, we can't get the status. 2029 pid = self.pid KeyboardInterrupt:
10. Display Visualization Plots from Discovery Pipeline¶
These plots show the top-ranked manifolds for each dataset.
if visualization_paths:
n_plots = len(visualization_paths)
n_cols = min(2, n_plots)
n_rows = (n_plots + n_cols - 1) // n_cols
fig, axes = plt.subplots(n_rows, n_cols, figsize=(15 * n_cols, 10 * n_rows))
if n_plots == 1:
axes = np.array([axes])
axes = axes.flatten()
for idx, viz_path in enumerate(visualization_paths):
img = Image.open(viz_path)
axes[idx].imshow(img)
axes[idx].axis("off")
axes[idx].set_title(f"Dataset {idx + 1} (seed={datasets[idx]['seed']})", fontsize=16, fontweight="bold")
for idx in range(len(visualization_paths), len(axes)):
axes[idx].axis("off")
plt.tight_layout()
plt.show()
else:
print("No visualization plots found.")
11. Aggregate Results Across All Datasets¶
Statistical Aggregation¶
Given a stress metric, for each shape, we compute:
- Mean score: Central tendency across datasets
- Std: Variability
- Min/Max: Range of performance
- Count: Number of measurements (should be n_datasets)
Key metric: Mean ± Std provides confidence in shape ranking. Note: If there is no metric given, we aggreagte the result over the mean of the scores for different metrics
stress = "mean_scale_normalized_stress"
shape_col = None
for col in combined_results.columns:
if "shape" in col.lower() and shape_col is None:
shape_col = col
break
metric_cols = [c for c in combined_results.columns if c.startswith("mean_")]
if not metric_cols:
metric_cols = [
c for c in combined_results.columns if "mean" in c.lower() and ("stress" in c.lower() or "shepard" in c.lower())
]
combined_results = combined_results.copy()
if stress is not None and stress in combined_results.columns:
combined_results["score"] = combined_results[stress]
else:
combined_results["score"] = combined_results[metric_cols].mean(axis=1)
has_score = (stress is not None and stress in combined_results.columns) or len(metric_cols) > 0
if shape_col and has_score:
aggregated = combined_results.groupby(shape_col)["score"].agg(["mean", "std", "min", "max", "count"]).reset_index()
aggregated.columns = ["Shape", "Mean_Score", "Std_Score", "Min_Score", "Max_Score", "N_Runs"]
aggregated = aggregated.sort_values("Mean_Score", ascending=False)
aggregated["CV"] = (aggregated["Std_Score"] / aggregated["Mean_Score"]) * 100
print("=" * 120)
print("AGGREGATED RESULTS:")
print("=" * 120)
print(f"\nStatistics computed over {len(datasets)} independent datasets")
print(f"Each shape tested with {N_FOLDS}-fold cross-validation\n")
print(aggregated.to_string(index=False, float_format=lambda x: f"{x:.4f}"))
print("\n" + "=" * 120)
best_shape = aggregated.iloc[0]["Shape"]
best_mean_score = aggregated.iloc[0]["Mean_Score"]
best_std_score = aggregated.iloc[0]["Std_Score"]
best_cv = aggregated.iloc[0]["CV"]
print(f"\nBest Shape: {best_shape}")
print(f" Mean Score: {best_mean_score:.4f}")
print(f" Std: ±{best_std_score:.4f}")
print(f" Coefficient of Variation: {best_cv:.1f}%")
print(
f" 95% CI (approx): [{best_mean_score - 2 * best_std_score:.4f}, {best_mean_score + 2 * best_std_score:.4f}]"
)
else:
print("Could not identify shape and score columns for aggregation.")
print(f"Available columns: {list(combined_results.columns)}")
========================================================================================================================
AGGREGATED RESULTS:
========================================================================================================================
Statistics computed over 5 independent datasets
Each shape tested with 5-fold cross-validation
Shape Mean_Score Std_Score Min_Score Max_Score N_Runs CV
Trefoil_Knot_3D 0.5980 0.0206 0.5690 0.6205 5 3.4391
Trefoil_Knot_2D 0.5724 0.0219 0.5388 0.5929 5 3.8326
SpiralShape 0.4951 0.0988 0.3870 0.6025 10 19.9485
Twisted 0.4840 0.0196 0.4501 0.4978 5 4.0561
ClusterShape 0.4722 0.0122 0.4569 0.4907 5 2.5872
Gerono 0.4592 0.0321 0.4131 0.4925 5 6.9897
Torus_Path 0.4589 0.0185 0.4312 0.4770 5 4.0270
CircularShape 0.4524 0.0211 0.4180 0.4774 10 4.6680
SemicircularShape 0.4255 0.0326 0.3741 0.4573 5 7.6549
DiscreteCircularShape 0.4184 0.0220 0.3843 0.4433 5 5.2637
Bernoulli 0.4013 0.0261 0.3646 0.4278 5 6.5054
EuclideanShape 0.3654 0.0454 0.2958 0.4121 5 12.4345
LogLinearShape 0.3633 0.0452 0.2902 0.4066 5 12.4485
ChainShape 0.3163 0.0223 0.2943 0.3473 5 7.0522
========================================================================================================================
Best Shape: Trefoil_Knot_3D
Mean Score: 0.5980
Std: ±0.0206
Coefficient of Variation: 3.4%
95% CI (approx): [0.5568, 0.6391]
12. Statistical Comparison: Hypotheses vs. Baselines¶
Statistical Significance Testing¶
Objective: Rigorously determine if the proposed topological models (Figure-8, Torus, Trefoil) provide a significantly better fit than the best-performing standard baseline.
Formal Hypothesis:
- Null Hypothesis ($H_0$): The proposed topological hypothesis yields a stress score equal to or worse (higher) than the best baseline model.
- Alternative Hypothesis ($H_1$): The proposed topological hypothesis yields a statistically significantly lower stress score (better fit).
Method: Independent two-sample t-test (one-tailed) comparing the distribution of stress scores across all cross-validation folds and seeds.
if "combined_results" not in locals() or combined_results.empty:
print("Error: 'combined_results' dataframe not found or empty. Please run previous cells.")
else:
if "score" not in combined_results.columns:
valid_cols = [c for c in combined_results.columns if "scale_normalized" in c and "mean" in c]
if valid_cols:
combined_results["score"] = combined_results[valid_cols[0]]
else:
metric_cols = [c for c in combined_results.columns if c.startswith("mean_")]
combined_results["score"] = combined_results[metric_cols].mean(axis=1)
shape_col = next((col for col in combined_results.columns if "shape" in col.lower()), None)
if not shape_col:
print("Error: Could not find shape column in combined_results")
else:
topo_keywords = ["Gerono", "Bernoulli", "Twisted", "Torus", "Trefoil", "UserProvided"]
is_topo = combined_results[shape_col].apply(lambda x: any(k in x for k in topo_keywords))
standard_df = combined_results[~is_topo].copy()
topo_df = combined_results[is_topo].copy()
if standard_df.empty or topo_df.empty:
print("Error: Could not split results into Baseline and Topological sets.")
print(f"Standard count: {len(standard_df)}, Topological count: {len(topo_df)}")
else:
def aggregate_scores(df):
"""Aggregate score by shape/variant (mean, std, count)."""
agg = df.groupby(shape_col)["score"].agg(["mean", "std", "count"]).reset_index()
agg.columns = ["Variant", "Mean_Score", "Std_Score", "N_Runs"]
return agg.sort_values("Mean_Score", ascending=False)
agg_standard = aggregate_scores(standard_df)
agg_topo = aggregate_scores(topo_df)
best_standard_shape = agg_standard.iloc[0]["Variant"]
best_hypothesis = agg_topo.iloc[0]["Variant"]
standard_scores = standard_df[standard_df[shape_col] == best_standard_shape]["score"].values
hypothesis_scores = topo_df[topo_df[shape_col] == best_hypothesis]["score"].values
t_stat, p_value = stats.ttest_ind(hypothesis_scores, standard_scores, alternative="greater")
if "Torus" in best_hypothesis or "Trefoil" in best_hypothesis:
hyp_type = "Toroidal/Knot"
else:
hyp_type = "Figure-8"
pooled_std = np.sqrt((standard_scores.std(ddof=1) ** 2 + hypothesis_scores.std(ddof=1) ** 2) / 2)
if pooled_std == 0:
cohens_d = 0.0
else:
cohens_d = (hypothesis_scores.mean() - standard_scores.mean()) / pooled_std
sig_level = "***" if p_value < 0.001 else "**" if p_value < 0.01 else "*" if p_value < 0.05 else "n.s."
effect_size_label = "small" if abs(cohens_d) < 0.5 else "medium" if abs(cohens_d) < 0.8 else "large"
print("=" * 100)
print("STATISTICAL COMPARISON: BASELINE vs TOPOLOGICAL HYPOTHESES")
print("=" * 100)
print(f"\nBest Standard Shape: {best_standard_shape}")
print(f" Mean Score: {standard_scores.mean():.4f} ± {standard_scores.std():.4f}")
print(f" n = {len(standard_scores)}")
print(f"\nBest Topological Hypothesis: {best_hypothesis} ({hyp_type})")
print(f" Mean Score: {hypothesis_scores.mean():.4f} ± {hypothesis_scores.std():.4f}")
print(f" n = {len(hypothesis_scores)}")
print("\nTwo-sample t-test (one-tailed, H1: hypothesis > baseline):")
print(f" t-statistic: {t_stat:.4f}")
print(f" p-value: {p_value:.6f}")
print(f" Significance: {sig_level}")
if p_value < 0.05:
print(f"\n {hyp_type} hypothesis is SUPPORTED (p < 0.05)")
else:
print(f"\n{hyp_type} hypothesis is NOT supported (p >= 0.05)")
print(f"\nCohen's d (effect size): {cohens_d:.3f}")
print(f"Interpretation: {effect_size_label} effect")
print("\n" + "=" * 100)
==================================================================================================== STATISTICAL COMPARISON: BASELINE vs TOPOLOGICAL HYPOTHESES ==================================================================================================== Best Standard Shape: SpiralShape Mean Score: 0.4951 ± 0.0937 n = 10 Best Topological Hypothesis: Trefoil_Knot_3D (Toroidal/Knot) Mean Score: 0.5980 ± 0.0184 n = 5 Two-sample t-test (one-tailed, H1: hypothesis > baseline): t-statistic: 2.2627 p-value: 0.020712 Significance: * Toroidal/Knot hypothesis is SUPPORTED (p < 0.05) Cohen's d (effect size): 1.441 Interpretation: large effect ====================================================================================================
13. Comprehensive Visualizations¶
13.1 Consistency Analysis: Coefficient of Variation¶
if "combined_results" not in locals() or combined_results.empty:
print("Error: 'combined_results' dataframe not found. Please run previous cells.")
else:
shape_col = next((col for col in combined_results.columns if "shape" in col.lower()), None)
if "score" not in combined_results.columns:
valid_cols = [c for c in combined_results.columns if "scale_normalized" in c and "mean" in c]
if valid_cols:
combined_results["score"] = combined_results[valid_cols[0]]
else:
metric_cols = [c for c in combined_results.columns if c.startswith("mean_")]
combined_results["score"] = combined_results[metric_cols].mean(axis=1)
agg_all = combined_results.groupby(shape_col)["score"].agg(["mean", "std"]).reset_index()
agg_all.columns = ["Config", "Mean_Score", "Std_Score"]
agg_all["CV"] = (agg_all["Std_Score"] / agg_all["Mean_Score"]) * 100
topo_keywords = ["Gerono", "Bernoulli", "Twisted", "Torus", "Trefoil", "UserProvided"]
def get_type(name):
"""Return 'Standard', 'Figure-8', or 'Torus' from variant name."""
is_topo = any(k in name for k in topo_keywords)
if not is_topo:
return "Standard"
elif "Torus" in name or "Trefoil" in name:
return "Torus"
else:
return "Figure-8"
agg_all["Type"] = agg_all["Config"].apply(get_type)
# --- PLOTTING ---
fig, ax = plt.subplots(figsize=(20, 12))
color_map = {"Standard": "steelblue", "Figure-8": "coral", "Torus": "mediumseagreen"}
colors = [color_map[t] for t in agg_all["Type"]]
sizes = 350
scatter = ax.scatter(
agg_all["Mean_Score"], agg_all["CV"], s=sizes, c=colors, alpha=0.7, edgecolors="black", linewidths=1.0
)
for idx, row in agg_all.iterrows():
ax.annotate(
row["Config"].replace("Shape", "").replace("UserProvided", ""),
(row["Mean_Score"], row["CV"]),
xytext=(5, 5),
textcoords="offset points",
fontsize=14,
alpha=0.9,
)
median_score = agg_all["Mean_Score"].median()
median_cv = agg_all["CV"].median()
ax.axvline(x=median_score, color="gray", linestyle=":", linewidth=2, alpha=0.8)
ax.axhline(y=median_cv, color="gray", linestyle=":", linewidth=2, alpha=0.8)
ax.set_xlabel("Mean Score (Higher is Better)", fontsize=20, fontweight="bold")
ax.set_ylabel("CV of Score across Folds [%] (lower = more stable)", fontsize=20, fontweight="bold")
ax.set_title(
"Stability Analysis: Mean Score vs. Coefficient of Variation across CV Folds",
fontsize=18,
fontweight="bold",
pad=15,
)
ax.text(
0.5,
1,
f"Metric used: {stress}",
transform=ax.transAxes,
ha="center",
va="bottom",
fontsize=12,
fontweight="bold",
color="gray",
)
ax.text(
0.98,
0.02,
"IDEAL REGION\n(High Score, Stable)",
transform=ax.transAxes,
ha="right",
va="bottom",
color="green",
alpha=0.3,
fontsize=12,
fontweight="bold",
bbox=dict(facecolor="white", alpha=0.5, edgecolor="none"),
)
ax.tick_params(axis="both", labelsize=20)
ax.grid(True, alpha=0.3, linestyle="--")
legend_elements = [
Line2D([0], [0], marker="o", color="w", markerfacecolor="steelblue", markersize=14, label="Standard Shapes"),
Line2D([0], [0], marker="o", color="w", markerfacecolor="coral", markersize=14, label="Figure-8 Variants"),
Line2D(
[0],
[0],
marker="o",
color="w",
markerfacecolor="mediumseagreen",
markersize=14,
label="Torus/Knot Variants",
),
Line2D(
[0],
[0],
color="gray",
linestyle=":",
linewidth=2,
label=f"Median (Score={median_score:.2f}, CV={median_cv:.1f}%)",
),
]
ax.legend(handles=legend_elements, fontsize=16, loc="best", frameon=True, framealpha=0.9)
plt.tight_layout()
plt.show()
13.2 Detailed Distribution: Violin Plots¶
if "combined_results" not in locals() or combined_results.empty:
print("Error: 'combined_results' dataframe not found. Please run previous cells.")
else:
shape_col = next((col for col in combined_results.columns if "shape" in col.lower()), None)
if "score" not in combined_results.columns:
valid_cols = [c for c in combined_results.columns if "scale_normalized" in c and "mean" in c]
if valid_cols:
combined_results["score"] = combined_results[valid_cols[0]]
else:
metric_cols = [c for c in combined_results.columns if c.startswith("mean_")]
combined_results["score"] = combined_results[metric_cols].mean(axis=1)
topo_keywords = ["Gerono", "Bernoulli", "Twisted", "Torus", "Trefoil", "UserProvided"]
is_topo = combined_results[shape_col].apply(lambda x: any(k in x for k in topo_keywords))
agg_standard = combined_results[~is_topo].groupby(shape_col)["score"].mean().sort_values(ascending=False)
top_5_standard = agg_standard.head(5).index.tolist()
agg_topo = combined_results[is_topo].groupby(shape_col)["score"].mean().sort_values(ascending=False)
topo_variants = agg_topo.index.tolist()
fig, ax = plt.subplots(figsize=(16, 8))
positions = []
labels = []
pos = 0
for shape in top_5_standard:
data = combined_results[combined_results[shape_col] == shape]["score"].values
parts = ax.violinplot([data], positions=[pos], widths=0.7, showmeans=True, showmedians=True)
for pc in parts["bodies"]:
pc.set_facecolor("steelblue")
pc.set_alpha(0.6)
for partname in ("cbars", "cmins", "cmaxes", "cmeans", "cmedians"):
if partname in parts:
parts[partname].set_edgecolor("black")
parts[partname].set_linewidth(1)
positions.append(pos)
labels.append(shape.replace("Shape", ""))
pos += 1
if top_5_standard and topo_variants:
pos += 0.5
for variant in topo_variants:
data = combined_results[combined_results[shape_col] == variant]["score"].values
parts = ax.violinplot([data], positions=[pos], widths=0.7, showmeans=True, showmedians=True)
is_torus = "Torus" in variant or "Trefoil" in variant
variant_color = "mediumseagreen" if is_torus else "coral"
for pc in parts["bodies"]:
pc.set_facecolor(variant_color)
pc.set_alpha(0.6)
for partname in ("cbars", "cmins", "cmaxes", "cmeans", "cmedians"):
if partname in parts:
parts[partname].set_edgecolor("black")
parts[partname].set_linewidth(1)
positions.append(pos)
labels.append(variant.replace("UserProvided", "").replace("Shape", ""))
pos += 1
ax.set_xticks(positions)
ax.set_xticklabels(labels, rotation=45, ha="right", fontsize=10)
ax.set_ylabel("Score (Higher is Better)", fontsize=12, fontweight="bold")
ax.set_title(
f"Score Distributions: Top 5 Standard vs. Topological Hypotheses for {stress}", fontsize=14, fontweight="bold"
)
ax.grid(axis="y", alpha=0.3, linestyle="--")
legend_elements = [
Patch(facecolor="steelblue", alpha=0.6, label="Standard Shapes"),
Patch(facecolor="coral", alpha=0.6, label="Figure-8 Variants"),
Patch(facecolor="mediumseagreen", alpha=0.6, label="Torus/Knot Variants"),
]
ax.legend(handles=legend_elements, fontsize=11, loc="best")
plt.tight_layout()
plt.show()
print("\nViolin plot interpretation:")
print(" - Width: Distribution density (frequency of scores)")
print(" - Horizontal lines: Min, Max, Mean, Median")
print(" - Height: Range of scores across cross-validation folds")
Violin plot interpretation: - Width: Distribution density (frequency of scores) - Horizontal lines: Min, Max, Mean, Median - Height: Range of scores across cross-validation folds
print("=" * 100)
print("EXPERIMENTAL SUMMARY")
print("=" * 100)
if "combined_results" not in locals() or combined_results.empty:
print("Error: 'combined_results' dataframe not found. Please run previous cells.")
else:
shape_col = next((col for col in combined_results.columns if "shape" in col.lower()), None)
if "score" not in combined_results.columns:
valid_cols = [c for c in combined_results.columns if "scale_normalized" in c and "mean" in c]
if valid_cols:
combined_results["score"] = combined_results[valid_cols[0]]
else:
metric_cols = [c for c in combined_results.columns if c.startswith("mean_")]
combined_results["score"] = combined_results[metric_cols].mean(axis=1)
topo_keywords = ["Gerono", "Bernoulli", "Twisted", "Torus", "Trefoil", "UserProvided"]
is_topo = combined_results[shape_col].apply(lambda x: any(k in x for k in topo_keywords))
agg_standard = (
combined_results[~is_topo].groupby(shape_col)["score"].agg(["mean", "std"]).sort_values("mean", ascending=False)
)
agg_topo = (
combined_results[is_topo].groupby(shape_col)["score"].agg(["mean", "std"]).sort_values("mean", ascending=False)
)
# --- PRINT OUTPUT ---
print("\n1. BASELINE SHAPES (Top 5):")
print("-" * 100)
if not agg_standard.empty:
for idx, (name, row) in enumerate(agg_standard.head(5).iterrows()):
clean_name = name.replace("Shape", "")
print(f" {idx + 1}. {clean_name:25s} Score: {row['mean']:.4f} ± {row['std']:.4f}")
else:
print(" (No baseline shapes found)")
print("\n2. TOPOLOGICAL HYPOTHESES:")
print("-" * 100)
if not agg_topo.empty:
for idx, (name, row) in enumerate(agg_topo.iterrows()):
clean_name = name.replace("UserProvided", "").replace("Shape", "")
hyp_type = "Torus" if ("Torus" in name or "Trefoil" in name) else "Figure-8"
print(f" {idx + 1}. {clean_name:25s} [{hyp_type:10s}] Score: {row['mean']:.4f} ± {row['std']:.4f}")
else:
print(" (No topological hypotheses found)")
if not agg_standard.empty and not agg_topo.empty:
best_base_name = agg_standard.index[0]
best_base_score = agg_standard.iloc[0]["mean"]
best_base_std = agg_standard.iloc[0]["std"]
best_topo_name = agg_topo.index[0]
best_topo_score = agg_topo.iloc[0]["mean"]
best_topo_std = agg_topo.iloc[0]["std"]
hyp_type = "Torus" if ("Torus" in best_topo_name or "Trefoil" in best_topo_name) else "Figure-8"
score_diff = best_topo_score - best_base_score
improvement_pct = (score_diff / best_base_score) * 100 if best_base_score != 0 else 0
print("\n3. BEST COMPARISON:")
print("-" * 100)
print(f" Baseline: {best_base_name.replace('Shape', ''):25s} {best_base_score:.4f} ± {best_base_std:.4f}")
topo_label = best_topo_name.replace("UserProvided", "").replace("Shape", "")
print(f" Hypothesis: {topo_label:25s} [{hyp_type}] {best_topo_score:.4f} ± {best_topo_std:.4f}")
print(f" Difference: {score_diff:+.4f} ({improvement_pct:+.1f}%)")
if "p_value" in locals():
print(f" t-test: p = {p_value:.6f}")
if p_value < 0.05:
print(f" Result: ✓ {hyp_type} hypothesis SUPPORTED (p < 0.05)")
else:
print(f" Result: ✗ {hyp_type} hypothesis NOT supported (p >= 0.05)")
else:
print(" (t-test statistics not available in local scope)")
print("\n4. METHODOLOGY:")
print("-" * 100)
layer_info = GPT2_LAYER if "GPT2_LAYER" in locals() else "?"
n_seeds = len(datasets) if "datasets" in locals() else "?"
n_folds_info = N_FOLDS if "N_FOLDS" in locals() else "?"
print(f" Model: GPT-2 Small, Layer {layer_info}/12")
print(f" Datasets: {n_seeds} seeds")
print(f" Cross-validation: {n_folds_info}-fold")
if "agg_standard" in locals():
print(f" Baseline shapes: {len(agg_standard)} variants")
if "agg_topo" in locals():
print(f" Hypotheses: {len(agg_topo)} variants")
print("\n" + "=" * 100)
==================================================================================================== EXPERIMENTAL SUMMARY ==================================================================================================== 1. BASELINE SHAPES (Top 5): ---------------------------------------------------------------------------------------------------- 1. Spiral Score: 0.4951 ± 0.0988 2. Cluster Score: 0.4722 ± 0.0122 3. Circular Score: 0.4524 ± 0.0211 4. Semicircular Score: 0.4255 ± 0.0326 5. DiscreteCircular Score: 0.4184 ± 0.0220 2. TOPOLOGICAL HYPOTHESES: ---------------------------------------------------------------------------------------------------- 1. Trefoil_Knot_3D [Torus ] Score: 0.5980 ± 0.0206 2. Trefoil_Knot_2D [Torus ] Score: 0.5724 ± 0.0219 3. Twisted [Figure-8 ] Score: 0.4840 ± 0.0196 4. Gerono [Figure-8 ] Score: 0.4592 ± 0.0321 5. Torus_Path [Torus ] Score: 0.4589 ± 0.0185 6. Bernoulli [Figure-8 ] Score: 0.4013 ± 0.0261 3. BEST COMPARISON: ---------------------------------------------------------------------------------------------------- Baseline: Spiral 0.4951 ± 0.0988 Hypothesis: Trefoil_Knot_3D [Torus] 0.5980 ± 0.0206 Difference: +0.1028 (+20.8%) t-test: p = 0.020712 Result: ✓ Torus hypothesis SUPPORTED (p < 0.05) 4. METHODOLOGY: ---------------------------------------------------------------------------------------------------- Model: GPT-2 Small, Layer 6/12 Datasets: 5 seeds Cross-validation: 5-fold Baseline shapes: 8 variants Hypotheses: 6 variants ====================================================================================================