Hour Manifold Discovery Experiment¶

Research Objective¶

We investigate whether Language Models (specifically GPT-2) represent temporal concepts, specifically the 24-hour cycle, using interpretable topological structures. We test if the internal activation space maps onto specific topological manifolds better than standard baselines.

Topological Hypotheses¶

We compare three distinct topological classes against standard baselines:

1. Figure-8 Topology (Lemniscates)¶

Models the 24-hour cycle as two distinct loops (AM/PM) meeting at a central crossing point.

rationale: Captures the linguistic and functional distinction between morning and afternoon/evening while maintaining continuity at midnight.
Variants: Gerono, Bernoulli, and Twisted Lemniscates.

2. Toroidal Topology¶

Models time on the surface of a torus.

rationale: Encodes two nested periodicities (daily cycle + sub-cycles) without self-intersection points.
Variants: Standard Torus paths with varying radii ratios.

3. Trefoil Knot ($3_1$ Knot)¶

A non-trivial knot that winds around a torus surface.

rationale: Represents a complex, self-embedded cycle where the path winds 3 times around the minor axis and 2 times around the major axis.
Significance: Tests for higher-complexity cyclic structures beyond simple circles.

Methodology¶

We utilize Supervised Multidimensional Scaling (SMDS) to map GPT-2 hidden states to target 3D manifolds.

Data Generation: Synthesized datasets with 2400 samples (100 per hour) to ensure dense manifold coverage.
- Varied contexts using 50+ names and 40+ distinct actions.
Model & Extraction: GPT-2 Small.
- Extraction of hidden states at the temporal token position (Layer 6).
Manifold Definitions: Baselines: Standard shapes (Circle, Spiral, Helix, Linear).
- Hypotheses: Analytically defined 3D coordinates for Figure-8, Torus, and Trefoil shapes.
Evaluation: Train/Test Split (80/20): Models are fitted on training data; scores are reported on unseen test data to verify structural generalization.
- Metric: Stress score (lower is better), measuring the distortion required to map neural activations to the target geometry.

In [1]:

Copied!





import warnings
from pathlib import Path
from typing import List, Tuple

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
import torch
from matplotlib.lines import Line2D
from matplotlib.patches import Patch
from PIL import Image
from scipy import stats
from transformers import GPT2Model, GPT2Tokenizer

from smds import UserProvidedSMDSParametrization
from smds.pipeline import open_dashboard
from smds.pipeline.discovery_pipeline import discover_manifolds
from smds.shapes.continuous_shapes import CircularShape, EuclideanShape, LogLinearShape, SemicircularShape, SpiralShape
from smds.shapes.discrete_shapes import (
    ChainShape,
    ClusterShape,
    DiscreteCircularShape,
)

warnings.filterwarnings("ignore")

sns.set_style("whitegrid")
plt.rcParams["figure.dpi"] = 100
%matplotlib inline
import warnings
from pathlib import Path
from typing import List, Tuple

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
import torch
from matplotlib.lines import Line2D
from matplotlib.patches import Patch
from PIL import Image
from scipy import stats
from transformers import GPT2Model, GPT2Tokenizer

from smds import UserProvidedSMDSParametrization
from smds.pipeline import open_dashboard
from smds.pipeline.discovery_pipeline import discover_manifolds
from smds.shapes.continuous_shapes import CircularShape, EuclideanShape, LogLinearShape, SemicircularShape, SpiralShape
from smds.shapes.discrete_shapes import (
    ChainShape,
    ClusterShape,
    DiscreteCircularShape,
)

warnings.filterwarnings("ignore")

sns.set_style("whitegrid")
plt.rcParams["figure.dpi"] = 100
%matplotlib inline

/Users/arwinsg/code/supervised-multidimensional-scaling/.venv/lib/python3.13/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm

1. Configuration¶

Experimental Parameters¶

Random seeds: Multiple independent datasets to assess consistency
Samples per hour: Balance between statistical power and computational cost
GPT-2 layer: Middle layer (6/12) where semantic representations are typically strongest
Cross-validation: 5-fold to ensure robust generalization

Data Diversity¶

We maximize stimulus variability to probe robust representations:

4 time formats: 12-hour AM/PM, 24-hour, o'clock notation, natural language
51 unique names: Avoid tokenization artifacts
40 actions: Diverse semantic contexts

In [2]:

Copied!





RANDOM_SEEDS = [42, 123, 456, 789, 1024]
N_SAMPLES_PER_HOUR = 10
GPT2_LAYER = 6
N_FOLDS = 5
EXPERIMENT_NAME = "Hour_Manifold_Comprehensive"

NAMES = [
    "Alice",
    "Bob",
    "Charlie",
    "George",
    "Kevin",
    "Laura",
    "Michael",
    "Rachel",
    "William",
    "Aaron",
    "Ian",
    "Kyle",
    "Martin",
    "Rose",
    "Marco",
    "Andrew",
    "Frank",
    "Henry",
    "Jack",
    "Leon",
    "Peter",
    "Scott",
    "Grant",
    "Neil",
    "Dean",
    "Hope",
    "April",
    "Connor",
    "Brandon",
    "Joy",
    "Emily",
    "Hunter",
    "Tyler",
    "Blake",
    "Dallas",
    "Walker",
    "John",
    "Fred",
    "Steve",
    "Matt",
    "Luke",
    "Richard",
    "Maria",
    "Jerry",
    "Robert",
    "Mark",
    "Max",
    "Jason",
    "Alex",
    "Josh",
    "Ryan",
]

ACTIONS = [
    "walked the dog",
    "made coffee",
    "read a book",
    "went to sleep",
    "ate lunch",
    "called a friend",
    "watched a movie",
    "wrote a letter",
    "cleaned the house",
    "went for a run",
    "cooked dinner",
    "played the piano",
    "studied for the exam",
    "watered the plants",
    "checked emails",
    "did yoga",
    "baked a cake",
    "painted a picture",
    "fixed the car",
    "shopped for groceries",
    "meditated",
    "took a shower",
    "brushed teeth",
    "turned off the lights",
    "opened the window",
    "locked the door",
    "started the meeting",
    "finished work",
    "planned the trip",
    "listened to music",
    "charged the phone",
    "fed the cat",
    "drank some tea",
    "organized the desk",
    "took a nap",
    "solved a puzzle",
    "played chess",
    "wrote code",
    "debugged the program",
    "deployed the app",
]

TIME_FORMATS = ["12h_am_pm", "24h_colon", "12h_oclock", "natural"]

print("Experimental Configuration:")
print(f"  Independent datasets: {len(RANDOM_SEEDS)}")
print(f"  Total samples per dataset: {N_SAMPLES_PER_HOUR * 24}")
print(f"  Total samples across all datasets: {N_SAMPLES_PER_HOUR * 24 * len(RANDOM_SEEDS)}")
print(f"  GPT-2 layer: {GPT2_LAYER}/12")
print(f"  Cross-validation folds: {N_FOLDS}")
print("\nStimulus Diversity:")
print(f"  Time formats: {len(TIME_FORMATS)}")
print(f"  Unique names: {len(NAMES)}")
print(f"  Unique actions: {len(ACTIONS)}")
print(f"  Theoretical unique sentences: {len(NAMES) * len(ACTIONS) * len(TIME_FORMATS) * 24:,}")
RANDOM_SEEDS = [42, 123, 456, 789, 1024]
N_SAMPLES_PER_HOUR = 10
GPT2_LAYER = 6
N_FOLDS = 5
EXPERIMENT_NAME = "Hour_Manifold_Comprehensive"

NAMES = [
    "Alice",
    "Bob",
    "Charlie",
    "George",
    "Kevin",
    "Laura",
    "Michael",
    "Rachel",
    "William",
    "Aaron",
    "Ian",
    "Kyle",
    "Martin",
    "Rose",
    "Marco",
    "Andrew",
    "Frank",
    "Henry",
    "Jack",
    "Leon",
    "Peter",
    "Scott",
    "Grant",
    "Neil",
    "Dean",
    "Hope",
    "April",
    "Connor",
    "Brandon",
    "Joy",
    "Emily",
    "Hunter",
    "Tyler",
    "Blake",
    "Dallas",
    "Walker",
    "John",
    "Fred",
    "Steve",
    "Matt",
    "Luke",
    "Richard",
    "Maria",
    "Jerry",
    "Robert",
    "Mark",
    "Max",
    "Jason",
    "Alex",
    "Josh",
    "Ryan",
]

ACTIONS = [
    "walked the dog",
    "made coffee",
    "read a book",
    "went to sleep",
    "ate lunch",
    "called a friend",
    "watched a movie",
    "wrote a letter",
    "cleaned the house",
    "went for a run",
    "cooked dinner",
    "played the piano",
    "studied for the exam",
    "watered the plants",
    "checked emails",
    "did yoga",
    "baked a cake",
    "painted a picture",
    "fixed the car",
    "shopped for groceries",
    "meditated",
    "took a shower",
    "brushed teeth",
    "turned off the lights",
    "opened the window",
    "locked the door",
    "started the meeting",
    "finished work",
    "planned the trip",
    "listened to music",
    "charged the phone",
    "fed the cat",
    "drank some tea",
    "organized the desk",
    "took a nap",
    "solved a puzzle",
    "played chess",
    "wrote code",
    "debugged the program",
    "deployed the app",
]

TIME_FORMATS = ["12h_am_pm", "24h_colon", "12h_oclock", "natural"]

print("Experimental Configuration:")
print(f"  Independent datasets: {len(RANDOM_SEEDS)}")
print(f"  Total samples per dataset: {N_SAMPLES_PER_HOUR * 24}")
print(f"  Total samples across all datasets: {N_SAMPLES_PER_HOUR * 24 * len(RANDOM_SEEDS)}")
print(f"  GPT-2 layer: {GPT2_LAYER}/12")
print(f"  Cross-validation folds: {N_FOLDS}")
print("\nStimulus Diversity:")
print(f"  Time formats: {len(TIME_FORMATS)}")
print(f"  Unique names: {len(NAMES)}")
print(f"  Unique actions: {len(ACTIONS)}")
print(f"  Theoretical unique sentences: {len(NAMES) * len(ACTIONS) * len(TIME_FORMATS) * 24:,}")

Experimental Configuration:
  Independent datasets: 5
  Total samples per dataset: 240
  Total samples across all datasets: 1200
  GPT-2 layer: 6/12
  Cross-validation folds: 5

Stimulus Diversity:
  Time formats: 4
  Unique names: 51
  Unique actions: 40
  Theoretical unique sentences: 195,840

2. Data Generation Functions¶

Time Format Conversion¶

We implement 4 distinct time representations to test format-invariance:

12h_am_pm: "3pm", "11am" (concise)
24h_colon: "15:00", "23:00" (international standard)
12h_oclock: "3 o'clock in the afternoon" (verbose)
natural: "three in the afternoon" (natural language)

In [3]:

Copied!





def format_time(hour: int, format_type: str) -> str:
    """Format hour (0-23) to string for given format type."""
    if format_type == "12h_am_pm":
        if hour == 0:
            return "12am"
        elif hour < 12:
            return f"{hour}am"
        elif hour == 12:
            return "12pm"
        else:
            return f"{hour - 12}pm"

    elif format_type == "24h_colon":
        return f"{hour:02d}:00"

    elif format_type == "12h_oclock":
        h = hour if hour <= 12 else hour - 12
        h = 12 if h == 0 else h
        period = "morning" if hour < 12 else "afternoon" if hour < 18 else "evening"
        return f"{h} o'clock in the {period}"

    elif format_type == "natural":
        numbers = [
            "zero",
            "one",
            "two",
            "three",
            "four",
            "five",
            "six",
            "seven",
            "eight",
            "nine",
            "ten",
            "eleven",
            "twelve",
        ]
        h = hour if hour <= 12 else hour - 12
        h = 12 if h == 0 else h
        period = "morning" if hour < 12 else "afternoon" if hour < 18 else "evening"
        return f"{numbers[h]} in the {period}"

    return str(hour)


def generate_time_dataset(n_samples_per_hour: int, seed: int, time_formats: List[str]) -> Tuple[List[str], List[int]]:
    """Generate sentences with hours and return (sentences, hours)."""
    sentences = []
    hours = []

    rng = np.random.default_rng(seed)

    for hour in range(24):
        for _ in range(n_samples_per_hour):
            name = rng.choice(NAMES)
            action = rng.choice(ACTIONS)
            format_type = rng.choice(time_formats)
            time_str = format_time(hour, format_type)

            sentence = f"{name} {action} at {time_str}."
            sentences.append(sentence)
            hours.append(hour)

    indices = rng.permutation(len(sentences))
    sentences = [sentences[i] for i in indices]
    hours = [hours[i] for i in indices]

    return sentences, hours


print("Example sentences with different time formats:")
print()
for fmt in TIME_FORMATS:
    example_hour = 14
    formatted = format_time(example_hour, fmt)
    print(f"  {fmt:15s}: Alice walked the dog at {formatted}.")

print("\nExample sentences across different hours:")
print()
for hour in [0, 6, 12, 18, 23]:
    formatted = format_time(hour, "12h_am_pm")
    print(f"  Hour {hour:2d}: Bob made breakfast at {formatted}.")
def format_time(hour: int, format_type: str) -> str:
    """Format hour (0-23) to string for given format type."""
    if format_type == "12h_am_pm":
        if hour == 0:
            return "12am"
        elif hour < 12:
            return f"{hour}am"
        elif hour == 12:
            return "12pm"
        else:
            return f"{hour - 12}pm"

    elif format_type == "24h_colon":
        return f"{hour:02d}:00"

    elif format_type == "12h_oclock":
        h = hour if hour <= 12 else hour - 12
        h = 12 if h == 0 else h
        period = "morning" if hour < 12 else "afternoon" if hour < 18 else "evening"
        return f"{h} o'clock in the {period}"

    elif format_type == "natural":
        numbers = [
            "zero",
            "one",
            "two",
            "three",
            "four",
            "five",
            "six",
            "seven",
            "eight",
            "nine",
            "ten",
            "eleven",
            "twelve",
        ]
        h = hour if hour <= 12 else hour - 12
        h = 12 if h == 0 else h
        period = "morning" if hour < 12 else "afternoon" if hour < 18 else "evening"
        return f"{numbers[h]} in the {period}"

    return str(hour)


def generate_time_dataset(n_samples_per_hour: int, seed: int, time_formats: List[str]) -> Tuple[List[str], List[int]]:
    """Generate sentences with hours and return (sentences, hours)."""
    sentences = []
    hours = []

    rng = np.random.default_rng(seed)

    for hour in range(24):
        for _ in range(n_samples_per_hour):
            name = rng.choice(NAMES)
            action = rng.choice(ACTIONS)
            format_type = rng.choice(time_formats)
            time_str = format_time(hour, format_type)

            sentence = f"{name} {action} at {time_str}."
            sentences.append(sentence)
            hours.append(hour)

    indices = rng.permutation(len(sentences))
    sentences = [sentences[i] for i in indices]
    hours = [hours[i] for i in indices]

    return sentences, hours


print("Example sentences with different time formats:")
print()
for fmt in TIME_FORMATS:
    example_hour = 14
    formatted = format_time(example_hour, fmt)
    print(f"  {fmt:15s}: Alice walked the dog at {formatted}.")

print("\nExample sentences across different hours:")
print()
for hour in [0, 6, 12, 18, 23]:
    formatted = format_time(hour, "12h_am_pm")
    print(f"  Hour {hour:2d}: Bob made breakfast at {formatted}.")

Example sentences with different time formats:

  12h_am_pm      : Alice walked the dog at 2pm.
  24h_colon      : Alice walked the dog at 14:00.
  12h_oclock     : Alice walked the dog at 2 o'clock in the afternoon.
  natural        : Alice walked the dog at two in the afternoon.

Example sentences across different hours:

  Hour  0: Bob made breakfast at 12am.
  Hour  6: Bob made breakfast at 6am.
  Hour 12: Bob made breakfast at 12pm.
  Hour 18: Bob made breakfast at 6pm.
  Hour 23: Bob made breakfast at 11pm.

3. Load GPT-2 Model¶

We use GPT-2 (small, 117M parameters):

In [4]:

Copied!





print("Loading GPT-2 model...")
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
model = GPT2Model.from_pretrained("gpt2")
model.eval()

if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

print("\nModel loaded successfully:")
print(f"  Architecture: {model.config.model_type}")
print(f"  Hidden size: {model.config.hidden_size}")
print(f"  Number of layers: {model.config.n_layer}")
print(f"  Number of attention heads: {model.config.n_head}")
print(f"  Vocabulary size: {model.config.vocab_size:,}")
print("Loading GPT-2 model...")
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
model = GPT2Model.from_pretrained("gpt2")
model.eval()

if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

print("\nModel loaded successfully:")
print(f"  Architecture: {model.config.model_type}")
print(f"  Hidden size: {model.config.hidden_size}")
print(f"  Number of layers: {model.config.n_layer}")
print(f"  Number of attention heads: {model.config.n_head}")
print(f"  Vocabulary size: {model.config.vocab_size:,}")

Loading GPT-2 model...

Model loaded successfully:
  Architecture: gpt2
  Hidden size: 768
  Number of layers: 12
  Number of attention heads: 12
  Vocabulary size: 50,257

Layer Selection: We focus on the middle layer (Layer 6) of GPT-2 Small.

Rationale: Research shows that the model learns in a hierarchy: lower layers focus on basic word forms, while middle layers capture grammar and structure. This structural understanding builds the foundation for the complex meanings found in the upper layers. (Hewitt, J., & Manning, C. D. (2019). A Structural Probe for Finding Syntax in Word Representations.).

In [5]:

Copied!





def extract_hour_activations(sentences: List[str], layer_idx: int = 6) -> np.ndarray:
    """Extract GPT-2 hidden states at hour token for each sentence."""
    activations = []

    with torch.no_grad():
        for sentence in sentences:
            inputs = tokenizer(sentence, return_tensors="pt", padding=False)
            outputs = model(**inputs, output_hidden_states=True)
            hidden_states = outputs.hidden_states[layer_idx]

            tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"][0])
            hour_token_idx = -1

            for idx, token in enumerate(tokens):
                token_lower = token.lower()
                if any(x in token_lower for x in ["am", "pm", ":", "clock", "morning", "afternoon", "evening"]):
                    hour_token_idx = idx
                    break

            if hour_token_idx == -1:
                hour_token_idx = -2

            activation = hidden_states[0, hour_token_idx, :].numpy()
            activations.append(activation)

    return np.array(activations)
def extract_hour_activations(sentences: List[str], layer_idx: int = 6) -> np.ndarray:
    """Extract GPT-2 hidden states at hour token for each sentence."""
    activations = []

    with torch.no_grad():
        for sentence in sentences:
            inputs = tokenizer(sentence, return_tensors="pt", padding=False)
            outputs = model(**inputs, output_hidden_states=True)
            hidden_states = outputs.hidden_states[layer_idx]

            tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"][0])
            hour_token_idx = -1

            for idx, token in enumerate(tokens):
                token_lower = token.lower()
                if any(x in token_lower for x in ["am", "pm", ":", "clock", "morning", "afternoon", "evening"]):
                    hour_token_idx = idx
                    break

            if hour_token_idx == -1:
                hour_token_idx = -2

            activation = hidden_states[0, hour_token_idx, :].numpy()
            activations.append(activation)

    return np.array(activations)

4. Generate Multiple Datasets and Extract Activations¶

Reproducibility Protocol¶

We generate 5 independent datasets with different random seeds to:

Assess consistency of topological structures across stimuli
Compute confidence intervals for shape goodness-of-fit
Test generalization beyond specific name-action combinations

Each dataset contains (N per hour × 24 hours) samples.

In [6]:

Copied!





datasets = []

print(f"Generating {len(RANDOM_SEEDS)} independent datasets...\n")
print("=" * 80)

for seed_idx, seed in enumerate(RANDOM_SEEDS):
    print(f"\nDataset {seed_idx + 1}/{len(RANDOM_SEEDS)} (seed={seed})")
    print("-" * 80)

    sentences, hours = generate_time_dataset(
        n_samples_per_hour=N_SAMPLES_PER_HOUR, seed=seed, time_formats=TIME_FORMATS
    )

    print(f"  Generated {len(sentences)} sentences")
    print(f'  Example: "{sentences[0]}"')
    print(f"  Extracting GPT-2 activations from layer {GPT2_LAYER}...")

    X_activations = extract_hour_activations(sentences, layer_idx=GPT2_LAYER)

    datasets.append({"seed": seed, "sentences": sentences, "hours": np.array(hours), "activations": X_activations})

    print(f"  Activations shape: {X_activations.shape}")
    print(f"  Statistics: mean={X_activations.mean():.3f}, std={X_activations.std():.3f}")
    print(f"  Range: [{X_activations.min():.3f}, {X_activations.max():.3f}]")

print("\n" + "=" * 80)
print(f"Total datasets prepared: {len(datasets)}")
print(f"Total samples: {sum(len(d['sentences']) for d in datasets)}")
datasets = []

print(f"Generating {len(RANDOM_SEEDS)} independent datasets...\n")
print("=" * 80)

for seed_idx, seed in enumerate(RANDOM_SEEDS):
    print(f"\nDataset {seed_idx + 1}/{len(RANDOM_SEEDS)} (seed={seed})")
    print("-" * 80)

    sentences, hours = generate_time_dataset(
        n_samples_per_hour=N_SAMPLES_PER_HOUR, seed=seed, time_formats=TIME_FORMATS
    )

    print(f"  Generated {len(sentences)} sentences")
    print(f'  Example: "{sentences[0]}"')
    print(f"  Extracting GPT-2 activations from layer {GPT2_LAYER}...")

    X_activations = extract_hour_activations(sentences, layer_idx=GPT2_LAYER)

    datasets.append({"seed": seed, "sentences": sentences, "hours": np.array(hours), "activations": X_activations})

    print(f"  Activations shape: {X_activations.shape}")
    print(f"  Statistics: mean={X_activations.mean():.3f}, std={X_activations.std():.3f}")
    print(f"  Range: [{X_activations.min():.3f}, {X_activations.max():.3f}]")

print("\n" + "=" * 80)
print(f"Total datasets prepared: {len(datasets)}")
print(f"Total samples: {sum(len(d['sentences']) for d in datasets)}")

Generating 5 independent datasets...

================================================================================

Dataset 1/5 (seed=42)
--------------------------------------------------------------------------------
  Generated 240 sentences
  Example: "Scott ate lunch at 16:00."
  Extracting GPT-2 activations from layer 6...
  Activations shape: (240, 768)
  Statistics: mean=0.161, std=19.866
  Range: [-65.555, 2875.865]

Dataset 2/5 (seed=123)
--------------------------------------------------------------------------------
  Generated 240 sentences
  Example: "Jack went for a run at 1pm."
  Extracting GPT-2 activations from layer 6...
  Activations shape: (240, 768)
  Statistics: mean=0.085, std=14.208
  Range: [-65.555, 2875.865]

Dataset 3/5 (seed=456)
--------------------------------------------------------------------------------
  Generated 240 sentences
  Example: "Jason started the meeting at two in the morning."
  Extracting GPT-2 activations from layer 6...
  Activations shape: (240, 768)
  Statistics: mean=0.141, std=18.615
  Range: [-65.555, 2875.865]

Dataset 4/5 (seed=789)
--------------------------------------------------------------------------------
  Generated 240 sentences
  Example: "Peter watered the plants at 10:00."
  Extracting GPT-2 activations from layer 6...
  Activations shape: (240, 768)
  Statistics: mean=0.144, std=18.610
  Range: [-65.555, 2875.865]

Dataset 5/5 (seed=1024)
--------------------------------------------------------------------------------
  Generated 240 sentences
  Example: "Marco studied for the exam at 17:00."
  Extracting GPT-2 activations from layer 6...
  Activations shape: (240, 768)
  Statistics: mean=0.216, std=23.220
  Range: [-65.555, 2875.865]

================================================================================
Total datasets prepared: 5
Total samples: 1200

5. Define Baseline Shapes¶

Baseline Hypothesis Set¶

We test 10 standard baseline configurations to serve as a reference point for our 3D topological hypotheses. These represent simpler, lower-dimensional structural assumptions:

Continuous (7):
- Circular (2 variants): The standard representation of cyclic time (different radii).
- Spiral (2 variants): Combines cyclicity with linear progression (different winding tightness).
- Euclidean & LogLinear: Standard linear regression assumptions.
- Semicircular: Tests for partial cyclicity.
Discrete (3):
- Chain: Represents time as a sequential path without cyclicity.
- Cluster: Represents time as unordered, distinct categorical groups.
- DiscreteCircular: A step-wise cyclic representation.

Configuration Rationale¶

These baselines test if the model's representation is merely linear or simply cyclic in 2D, before we test the 3D topological hypotheses (Figure-8, Torus).

In [7]:

Copied!





all_shapes = [
    CircularShape(radious=1.0, normalize_labels=True),
    CircularShape(radious=2.0, normalize_labels=True),
    EuclideanShape(normalize_labels=True),
    LogLinearShape(normalize_labels=True),
    SpiralShape(initial_radius=0.5, growth_rate=0.1, num_turns=2.0),
    SpiralShape(initial_radius=0.5, growth_rate=0.1, num_turns=3.0),
    SemicircularShape(normalize_labels=True),
    ChainShape(threshold=2.0, normalize_labels=False),
    ClusterShape(),
    DiscreteCircularShape(),
]

print(f"Baseline Shape Configurations: {len(all_shapes)} total\n")
print("=" * 80)

print("\n1. Continuous Variants (2D/1D):")
print("  - Circular (x2): Standard cyclic clock model (R=1.0, R=2.0)")
print("  - Spiral   (x2): Expanding cycle (2 turns, 3 turns)")
print("  - Linear   (x3): Euclidean, LogLinear (Weber-Fechner), Semicircular")

print("\n2. Discrete Variants:")
print("  - Chain    (x1): Sequential neighbor connectivity")
print("  - Cluster  (x1): Unordered distinct hour states")
print("  - DiscCirc (x1): Step-wise cyclic representation")

print("=" * 80)
all_shapes = [
    CircularShape(radious=1.0, normalize_labels=True),
    CircularShape(radious=2.0, normalize_labels=True),
    EuclideanShape(normalize_labels=True),
    LogLinearShape(normalize_labels=True),
    SpiralShape(initial_radius=0.5, growth_rate=0.1, num_turns=2.0),
    SpiralShape(initial_radius=0.5, growth_rate=0.1, num_turns=3.0),
    SemicircularShape(normalize_labels=True),
    ChainShape(threshold=2.0, normalize_labels=False),
    ClusterShape(),
    DiscreteCircularShape(),
]

print(f"Baseline Shape Configurations: {len(all_shapes)} total\n")
print("=" * 80)

print("\n1. Continuous Variants (2D/1D):")
print("  - Circular (x2): Standard cyclic clock model (R=1.0, R=2.0)")
print("  - Spiral   (x2): Expanding cycle (2 turns, 3 turns)")
print("  - Linear   (x3): Euclidean, LogLinear (Weber-Fechner), Semicircular")

print("\n2. Discrete Variants:")
print("  - Chain    (x1): Sequential neighbor connectivity")
print("  - Cluster  (x1): Unordered distinct hour states")
print("  - DiscCirc (x1): Step-wise cyclic representation")

print("=" * 80)

Baseline Shape Configurations: 10 total

================================================================================

1. Continuous Variants (2D/1D):
  - Circular (x2): Standard cyclic clock model (R=1.0, R=2.0)
  - Spiral   (x2): Expanding cycle (2 turns, 3 turns)
  - Linear   (x3): Euclidean, LogLinear (Weber-Fechner), Semicircular

2. Discrete Variants:
  - Chain    (x1): Sequential neighbor connectivity
  - Cluster  (x1): Unordered distinct hour states
  - DiscCirc (x1): Step-wise cyclic representation
================================================================================

6. Define Specific Manifold Generators¶

Mathematical Parametrizations¶

We directly generate 3D coordinates for five specific manifold variants based on the implementations below:

1. Gerono Lemniscate¶

Classic figure-8 in 3D with a vertical wave modulation: $$x(t) = \cos(t)$$ $$y(t) = \sin(t)\cos(t)$$ $$z(t) = 0.5 \sin(t)\sin(t/2)$$

Mathematical source: Wolfram MathWorld: Eight Curve

2. Bernoulli Lemniscate¶

Infinity symbol with saddle curvature: $$x(t) = \frac{\sqrt{2}\cos(t)}{\sin^2(t) + 1}$$ $$y(t) = \frac{\sqrt{2}\cos(t)\sin(t)}{\sin^2(t) + 1}$$ $$z(t) = \frac{\sin(2t)}{4}$$

Mathematical source: Wolfram MathWorld: Lemniscate

3. Twisted Figure-8¶

Strongly twisted Lissajous variant: $$x(t) = \sin(t)$$ $$y(t) = \frac{\sin(2t)}{2}$$ $$z(t) = \sin(t)\cos(t)$$

Mathematical source: Wolfram MathWorld: Lissajous Curve

Key property: All figure-8 variants have a crossing point (self-intersection) at $t=0$ or center, representing the cyclic "midnight" transition.

4. Torus Path (Helical Trace)¶

A specific path winding around a torus surface: $$x(t) = (R + r\cos(v))\cos(t)$$ $$y(t) = (R + r\cos(v))\sin(t)$$ $$z(t) = r\sin(v)$$

where:

$v = \text{ratio} \cdot t$ (determines the winding)
$R$ = major radius (distance from center to tube center)
$r$ = minor radius (tube thickness)

Mathematical source: Wolfram MathWorld: Torus

5. Trefoil Knot ($3_1$ Knot)¶

A non-trivial topological knot with three distinct lobes (scaled by factor $1/3$ for normalization): $$x(t) = \frac{1}{3} (\sin(t) + 2\sin(2t))$$ $$y(t) = \frac{1}{3} (\cos(t) - 2\cos(2t))$$ $$z(t) = \frac{1}{3} (-\sin(3t))$$

Mathematical source: Wolfram MathWorld: Trefoil Knot

Key properties:

Torus: No self-intersections (smooth manifold), doubly periodic.
Trefoil: Non-planar, self-intertwined loop without direct self-intersection points.

In [8]:

Copied!





def generate_figure8_gerono(n_points: int = 1000) -> np.ndarray:
    """Gerono lemniscate (figure-8) 3D points."""
    t = np.linspace(0, 2 * np.pi, n_points)
    x = np.cos(t)
    y = np.sin(t) * np.cos(t)
    z = 0.5 * np.sin(t) * np.sin(t / 2)
    return np.stack([x, y, z], axis=1)


def generate_figure8_bernoulli(n_points: int = 1000) -> np.ndarray:
    """Bernoulli lemniscate (figure-8) 3D points."""
    t = np.linspace(-np.pi, np.pi, n_points)
    a = 1.0
    denom = np.sin(t) ** 2 + 1
    x = a * np.sqrt(2) * np.cos(t) / denom
    y = a * np.sqrt(2) * np.cos(t) * np.sin(t) / denom
    z = np.sin(2 * t) / 4
    return np.stack([x, y, z], axis=1)


def generate_figure8_twisted(n_points: int = 1000) -> np.ndarray:
    """Twisted lemniscate (figure-8) 3D points."""
    t = np.linspace(0, 2 * np.pi, n_points)
    x = np.sin(t)
    y = np.sin(2 * t) / 2
    z = np.sin(t) * np.cos(t)
    return np.stack([x, y, z], axis=1)


def generate_torus_path(n_points: int = 1000, R: float = 2.0, r: float = 1.0, ratio: float = 1.0) -> np.ndarray:
    """Torus path 3D points (major R, minor r, winding ratio)."""
    t = np.linspace(0, 2 * np.pi, n_points)
    v = ratio * t
    x = (R + r * np.cos(v)) * np.cos(t)
    y = (R + r * np.cos(v)) * np.sin(t)
    z = r * np.sin(v)
    return np.stack([x, y, z], axis=1)


def generate_trefoil_knot(n_points: int = 1000) -> np.ndarray:
    """Trefoil knot 3D points."""
    t = np.linspace(0, 2 * np.pi, n_points)
    x = np.sin(t) + 2 * np.sin(2 * t)
    y = np.cos(t) - 2 * np.cos(2 * t)
    z = -np.sin(3 * t)
    return np.stack([x, y, z], axis=1) / 3.0


def map_hours_to_manifold(hours: np.ndarray, manifold_points: np.ndarray) -> np.ndarray:
    """Map hour indices to manifold coordinates by nearest point."""
    n_points = manifold_points.shape[0]
    indices = np.round((hours / 24.0) * (n_points - 1)).astype(int)
    indices = np.clip(indices, 0, n_points - 1)
    return manifold_points[indices]


def generate_trefoil_knot_2d(n_points: int = 1000) -> np.ndarray:
    """2D projection of trefoil knot (xy plane)."""
    t = np.linspace(0, 2 * np.pi, n_points)
    x = np.sin(t) + 2 * np.sin(2 * t)
    y = np.cos(t) - 2 * np.cos(2 * t)
    return np.stack([x, y], axis=1) / 3.0


topological_configs = [
    ("Gerono", generate_figure8_gerono(200)),
    ("Bernoulli", generate_figure8_bernoulli(200)),
    ("Twisted", generate_figure8_twisted(200)),
    ("Torus_Path", generate_torus_path(200, R=2.0, r=1.0)),
    ("Trefoil_Knot_3D", generate_trefoil_knot(200)),
    ("Trefoil_Knot_2D", generate_trefoil_knot_2d(200)),
]

print(f"Integrating {len(topological_configs)} fixed topological hypotheses...\n")


for name, template_points in topological_configs:
    ndim = template_points.shape[1]

    hypothesis = UserProvidedSMDSParametrization(
        n_components=ndim, fixed_template=template_points, mapper=map_hours_to_manifold, name=name
    )

    hypothesis.name = name

    all_shapes.append(hypothesis)
def generate_figure8_gerono(n_points: int = 1000) -> np.ndarray:
    """Gerono lemniscate (figure-8) 3D points."""
    t = np.linspace(0, 2 * np.pi, n_points)
    x = np.cos(t)
    y = np.sin(t) * np.cos(t)
    z = 0.5 * np.sin(t) * np.sin(t / 2)
    return np.stack([x, y, z], axis=1)


def generate_figure8_bernoulli(n_points: int = 1000) -> np.ndarray:
    """Bernoulli lemniscate (figure-8) 3D points."""
    t = np.linspace(-np.pi, np.pi, n_points)
    a = 1.0
    denom = np.sin(t) ** 2 + 1
    x = a * np.sqrt(2) * np.cos(t) / denom
    y = a * np.sqrt(2) * np.cos(t) * np.sin(t) / denom
    z = np.sin(2 * t) / 4
    return np.stack([x, y, z], axis=1)


def generate_figure8_twisted(n_points: int = 1000) -> np.ndarray:
    """Twisted lemniscate (figure-8) 3D points."""
    t = np.linspace(0, 2 * np.pi, n_points)
    x = np.sin(t)
    y = np.sin(2 * t) / 2
    z = np.sin(t) * np.cos(t)
    return np.stack([x, y, z], axis=1)


def generate_torus_path(n_points: int = 1000, R: float = 2.0, r: float = 1.0, ratio: float = 1.0) -> np.ndarray:
    """Torus path 3D points (major R, minor r, winding ratio)."""
    t = np.linspace(0, 2 * np.pi, n_points)
    v = ratio * t
    x = (R + r * np.cos(v)) * np.cos(t)
    y = (R + r * np.cos(v)) * np.sin(t)
    z = r * np.sin(v)
    return np.stack([x, y, z], axis=1)


def generate_trefoil_knot(n_points: int = 1000) -> np.ndarray:
    """Trefoil knot 3D points."""
    t = np.linspace(0, 2 * np.pi, n_points)
    x = np.sin(t) + 2 * np.sin(2 * t)
    y = np.cos(t) - 2 * np.cos(2 * t)
    z = -np.sin(3 * t)
    return np.stack([x, y, z], axis=1) / 3.0


def map_hours_to_manifold(hours: np.ndarray, manifold_points: np.ndarray) -> np.ndarray:
    """Map hour indices to manifold coordinates by nearest point."""
    n_points = manifold_points.shape[0]
    indices = np.round((hours / 24.0) * (n_points - 1)).astype(int)
    indices = np.clip(indices, 0, n_points - 1)
    return manifold_points[indices]


def generate_trefoil_knot_2d(n_points: int = 1000) -> np.ndarray:
    """2D projection of trefoil knot (xy plane)."""
    t = np.linspace(0, 2 * np.pi, n_points)
    x = np.sin(t) + 2 * np.sin(2 * t)
    y = np.cos(t) - 2 * np.cos(2 * t)
    return np.stack([x, y], axis=1) / 3.0


topological_configs = [
    ("Gerono", generate_figure8_gerono(200)),
    ("Bernoulli", generate_figure8_bernoulli(200)),
    ("Twisted", generate_figure8_twisted(200)),
    ("Torus_Path", generate_torus_path(200, R=2.0, r=1.0)),
    ("Trefoil_Knot_3D", generate_trefoil_knot(200)),
    ("Trefoil_Knot_2D", generate_trefoil_knot_2d(200)),
]

print(f"Integrating {len(topological_configs)} fixed topological hypotheses...\n")


for name, template_points in topological_configs:
    ndim = template_points.shape[1]

    hypothesis = UserProvidedSMDSParametrization(
        n_components=ndim, fixed_template=template_points, mapper=map_hours_to_manifold, name=name
    )

    hypothesis.name = name

    all_shapes.append(hypothesis)

Integrating 6 fixed topological hypotheses...

7. List all Shapes to test¶

In [9]:

Copied!





print("=" * 80)
print(f"Final Hypothesis List ({len(all_shapes)} Candidates):")
print("-" * 60)

for i, shape in enumerate(all_shapes):
    if hasattr(shape, "name"):
        display_name = f"Fixed: {shape.name}"
    else:
        display_name = shape.__class__.__name__

    if hasattr(shape, "n_components"):
        dim = f"{shape.n_components}D"
    else:
        dim = "2D"

    print(f"{i + 1:02d}. {display_name:<30} | {dim}")
print("=" * 80)
print("=" * 80)
print(f"Final Hypothesis List ({len(all_shapes)} Candidates):")
print("-" * 60)

for i, shape in enumerate(all_shapes):
    if hasattr(shape, "name"):
        display_name = f"Fixed: {shape.name}"
    else:
        display_name = shape.__class__.__name__

    if hasattr(shape, "n_components"):
        dim = f"{shape.n_components}D"
    else:
        dim = "2D"

    print(f"{i + 1:02d}. {display_name:<30} | {dim}")
print("=" * 80)

================================================================================
Final Hypothesis List (16 Candidates):
------------------------------------------------------------
01. CircularShape                  | 2D
02. CircularShape                  | 2D
03. EuclideanShape                 | 2D
04. LogLinearShape                 | 2D
05. SpiralShape                    | 2D
06. SpiralShape                    | 2D
07. SemicircularShape              | 2D
08. ChainShape                     | 2D
09. ClusterShape                   | 2D
10. DiscreteCircularShape          | 2D
11. Fixed: Gerono                  | 3D
12. Fixed: Bernoulli               | 3D
13. Fixed: Twisted                 | 3D
14. Fixed: Torus_Path              | 3D
15. Fixed: Trefoil_Knot_3D         | 3D
16. Fixed: Trefoil_Knot_2D         | 2D
================================================================================

8. Visualize Topological Hypotheses¶

We visualize the generated 3D and 2D manifolds to verify their topological properties before using them as targets for the MDS mapping.

The plots below compare:

Figure-8 Variants: Gerono, Bernoulli, and Twisted Lemniscates.
Toroidal Variants: Torus path and Trefoil Knot.

In [10]:

Copied!





fig = plt.figure(figsize=(20, 10))

selected_configs = [
    ("Gerono Lemniscate", topological_configs[0][1]),
    ("Bernoulli Lemniscate", topological_configs[1][1]),
    ("Twisted Figure-8", topological_configs[2][1]),
    ("Torus (R=2, r=1)", topological_configs[3][1]),
    ("Trefoil Knot", topological_configs[4][1]),
    ("Trefoil Knot 2D", topological_configs[5][1]),
]

for idx, (name, manifold) in enumerate(selected_configs):
    ndim = manifold.shape[1]
    hours_normalized = np.linspace(0, 24, manifold.shape[0])
    if ndim == 2:
        ax = fig.add_subplot(2, 3, idx + 1)
        scatter = ax.scatter(manifold[:, 0], manifold[:, 1], c=hours_normalized, cmap="twilight", s=20, alpha=0.6)
        ax.plot(manifold[:, 0], manifold[:, 1], "gray", linewidth=1, alpha=0.4)
        ax.scatter(
            manifold[0, 0],
            manifold[0, 1],
            c="red",
            s=200,
            marker="*",
            edgecolors="black",
            linewidths=2,
            label="Midnight (t=0)",
            zorder=10,
        )
        ax.set_xlabel("X", fontsize=10)
        ax.set_ylabel("Y", fontsize=10)
        ax.set_aspect("equal", adjustable="datalim")
    else:
        ax = fig.add_subplot(2, 3, idx + 1, projection="3d")
        scatter = ax.scatter(
            manifold[:, 0], manifold[:, 1], manifold[:, 2], c=hours_normalized, cmap="twilight", s=20, alpha=0.6
        )
        ax.plot(manifold[:, 0], manifold[:, 1], manifold[:, 2], "gray", linewidth=1, alpha=0.4)
        ax.scatter(
            manifold[0, 0],
            manifold[0, 1],
            manifold[0, 2],
            c="red",
            s=200,
            marker="*",
            edgecolors="black",
            linewidths=2,
            label="Midnight (t=0)",
            zorder=10,
        )
        ax.set_xlabel("X", fontsize=10)
        ax.set_ylabel("Y", fontsize=10)
        ax.set_zlabel("Z", fontsize=10)
        ax.view_init(elev=20, azim=45)
    ax.set_title(name, fontsize=14, fontweight="bold")
    ax.legend(fontsize=9)
    cbar = plt.colorbar(scatter, ax=ax, shrink=0.6, pad=0.1)
    cbar.set_label("Hour", fontsize=9)

plt.suptitle("Topological Hypotheses: Figure-8 and Toroidal Parametrizations", fontsize=16, fontweight="bold", y=0.98)
plt.tight_layout()
plt.show()
fig = plt.figure(figsize=(20, 10))

selected_configs = [
    ("Gerono Lemniscate", topological_configs[0][1]),
    ("Bernoulli Lemniscate", topological_configs[1][1]),
    ("Twisted Figure-8", topological_configs[2][1]),
    ("Torus (R=2, r=1)", topological_configs[3][1]),
    ("Trefoil Knot", topological_configs[4][1]),
    ("Trefoil Knot 2D", topological_configs[5][1]),
]

for idx, (name, manifold) in enumerate(selected_configs):
    ndim = manifold.shape[1]
    hours_normalized = np.linspace(0, 24, manifold.shape[0])
    if ndim == 2:
        ax = fig.add_subplot(2, 3, idx + 1)
        scatter = ax.scatter(manifold[:, 0], manifold[:, 1], c=hours_normalized, cmap="twilight", s=20, alpha=0.6)
        ax.plot(manifold[:, 0], manifold[:, 1], "gray", linewidth=1, alpha=0.4)
        ax.scatter(
            manifold[0, 0],
            manifold[0, 1],
            c="red",
            s=200,
            marker="*",
            edgecolors="black",
            linewidths=2,
            label="Midnight (t=0)",
            zorder=10,
        )
        ax.set_xlabel("X", fontsize=10)
        ax.set_ylabel("Y", fontsize=10)
        ax.set_aspect("equal", adjustable="datalim")
    else:
        ax = fig.add_subplot(2, 3, idx + 1, projection="3d")
        scatter = ax.scatter(
            manifold[:, 0], manifold[:, 1], manifold[:, 2], c=hours_normalized, cmap="twilight", s=20, alpha=0.6
        )
        ax.plot(manifold[:, 0], manifold[:, 1], manifold[:, 2], "gray", linewidth=1, alpha=0.4)
        ax.scatter(
            manifold[0, 0],
            manifold[0, 1],
            manifold[0, 2],
            c="red",
            s=200,
            marker="*",
            edgecolors="black",
            linewidths=2,
            label="Midnight (t=0)",
            zorder=10,
        )
        ax.set_xlabel("X", fontsize=10)
        ax.set_ylabel("Y", fontsize=10)
        ax.set_zlabel("Z", fontsize=10)
        ax.view_init(elev=20, azim=45)
    ax.set_title(name, fontsize=14, fontweight="bold")
    ax.legend(fontsize=9)
    cbar = plt.colorbar(scatter, ax=ax, shrink=0.6, pad=0.1)
    cbar.set_label("Hour", fontsize=9)

plt.suptitle("Topological Hypotheses: Figure-8 and Toroidal Parametrizations", fontsize=16, fontweight="bold", y=0.98)
plt.tight_layout()
plt.show()

No description has been provided for this image

9. Run Comprehensive Manifold Discovery¶

SMDS Pipeline¶

For each dataset and shape configuration:

Distance Calculation: Compute the ideal pairwise distance matrix ($D_{target}$) based on the shape's geometry.
Optimization: Find the linear projection ($W$) of the GPT-2 activations that minimizes the stress (difference between activation distances and target distances).
Cross-Validation: Train on 80% of the data, evaluate the stress score on the held-out 20% (5-fold CV).
Aggregation: Average the normalized stress scores across all folds and datasets.

Metric: Score (Higher is better).

Computational Load: The pipeline executes 350 SMDS fits in total:

5 Independent Datasets
14 Shape Configurations
5 Cross-Validation Folds per shape/dataset

In [11]:

Copied!





all_results = []
visualization_paths = []
last_csv_path = None

print("Running comprehensive manifold discovery...\n")
n_fits = len(datasets) * len(all_shapes) * N_FOLDS
print(f"Total fits: {len(datasets)} datasets × {len(all_shapes)} shapes × {N_FOLDS} folds = {n_fits}\n")
print("=" * 80)

for dataset_idx, dataset in enumerate(datasets):
    print(f"\nDataset {dataset_idx + 1}/{len(datasets)} (seed={dataset['seed']})")
    print("-" * 80)

    results_df, csv_path = discover_manifolds(
        dataset["activations"],
        dataset["hours"],
        shapes=all_shapes,
        n_folds=N_FOLDS,
        n_jobs=-1,
        experiment_name=f"{EXPERIMENT_NAME}_seed{dataset['seed']}",
        save_results=True,
        create_png_visualization=True,
        clear_cache=True,
    )

    results_df["dataset_seed"] = dataset["seed"]
    results_df["dataset_idx"] = dataset_idx
    all_results.append(results_df)

    if csv_path:
        last_csv_path = csv_path
        result_dir = Path(csv_path).parent
        viz_path = result_dir / f"{result_dir.name}_visualized.png"
        if viz_path.exists():
            visualization_paths.append(viz_path)

    print("\nTop 5 shapes for this dataset:")
    display_cols = [
        col for col in results_df.columns if any(x in col.lower() for x in ["shape", "mean", "stress", "score"])
    ]
    if display_cols:
        print(results_df[display_cols].head(5).to_string(index=False))

combined_results = pd.concat(all_results, ignore_index=True)
print("\n" + "=" * 80)
print(f"Discovery complete: {combined_results.shape[0]} total results")
print(f"Visualization plots collected: {len(visualization_paths)}")
all_results = []
visualization_paths = []
last_csv_path = None

print("Running comprehensive manifold discovery...\n")
n_fits = len(datasets) * len(all_shapes) * N_FOLDS
print(f"Total fits: {len(datasets)} datasets × {len(all_shapes)} shapes × {N_FOLDS} folds = {n_fits}\n")
print("=" * 80)

for dataset_idx, dataset in enumerate(datasets):
    print(f"\nDataset {dataset_idx + 1}/{len(datasets)} (seed={dataset['seed']})")
    print("-" * 80)

    results_df, csv_path = discover_manifolds(
        dataset["activations"],
        dataset["hours"],
        shapes=all_shapes,
        n_folds=N_FOLDS,
        n_jobs=-1,
        experiment_name=f"{EXPERIMENT_NAME}_seed{dataset['seed']}",
        save_results=True,
        create_png_visualization=True,
        clear_cache=True,
    )

    results_df["dataset_seed"] = dataset["seed"]
    results_df["dataset_idx"] = dataset_idx
    all_results.append(results_df)

    if csv_path:
        last_csv_path = csv_path
        result_dir = Path(csv_path).parent
        viz_path = result_dir / f"{result_dir.name}_visualized.png"
        if viz_path.exists():
            visualization_paths.append(viz_path)

    print("\nTop 5 shapes for this dataset:")
    display_cols = [
        col for col in results_df.columns if any(x in col.lower() for x in ["shape", "mean", "stress", "score"])
    ]
    if display_cols:
        print(results_df[display_cols].head(5).to_string(index=False))

combined_results = pd.concat(all_results, ignore_index=True)
print("\n" + "=" * 80)
print(f"Discovery complete: {combined_results.shape[0]} total results")
print(f"Visualization plots collected: {len(visualization_paths)}")

Running comprehensive manifold discovery...

Total fits: 5 datasets × 16 shapes × 5 folds = 400

================================================================================

Dataset 1/5 (seed=42)
--------------------------------------------------------------------------------
Saving to: /Users/arwinsg/code/supervised-multidimensional-scaling/smds/pipeline/saved_results/Hour_Manifold_Comprehensive_seed42_2026-02-13_141640_2cf641/Hour_Manifold_Comprehensive_seed42_2026-02-13_141640_2cf641.csv
Computed and cached CircularShape
Computed and cached CircularShape
Computed and cached EuclideanShape
Computed and cached LogLinearShape
Computed and cached SpiralShape
Computed and cached SpiralShape
Computed and cached SemicircularShape
Warning: Distance matrix is incomplete. Using optimization to fit W.
Warning: Distance matrix is incomplete. Using optimization to fit W.
Warning: Distance matrix is incomplete. Using optimization to fit W.
Warning: Distance matrix is incomplete. Using optimization to fit W.
Warning: Distance matrix is incomplete. Using optimization to fit W.
Warning: Distance matrix is incomplete. Using optimization to fit W.
Warning: Distance matrix is incomplete. Using optimization to fit W.
Warning: Distance matrix is incomplete. Using optimization to fit W.
Warning: Distance matrix is incomplete. Using optimization to fit W.
Warning: Distance matrix is incomplete. Using optimization to fit W.
Computed and cached ChainShape
Computed and cached ClusterShape
Computed and cached DiscreteCircularShape
Computed and cached Gerono
Computed and cached Bernoulli
Computed and cached Twisted
Computed and cached Torus_Path
Computed and cached Trefoil_Knot_3D
Computed and cached Trefoil_Knot_2D
Visual result saved under: /Users/arwinsg/code/supervised-multidimensional-scaling/smds/pipeline/saved_results/Hour_Manifold_Comprehensive_seed42_2026-02-13_141640_2cf641/Hour_Manifold_Comprehensive_seed42_2026-02-13_141640_2cf641_visualized.png
Cache cleared

Top 5 shapes for this dataset:
          shape  mean_scale_normalized_stress  std_scale_normalized_stress                                                                         fold_scale_normalized_stress  mean_non_metric_stress  std_non_metric_stress                                                                               fold_non_metric_stress  mean_shepard_goodness_score  std_shepard_goodness_score                                                                                 fold_shepard_goodness_score  mean_normalized_stress  std_normalized_stress                                                                                  fold_normalized_stress  mean_normalized_kl_divergence
Trefoil_Knot_3D                      0.611761                     0.005860 [0.6061050047733393, 0.6161237116212532, 0.6041266172716558, 0.6195831209129936, 0.6128674326793493]                0.885646               0.002449 [0.8828562561770309, 0.8876638867982793, 0.8824918278776901, 0.8871257359067307, 0.8880924774459765]                     0.521005                    0.018450         [0.5130629612636001, 0.525210453979686, 0.5061080301417783, 0.5550763836656577, 0.5055651982981811]                0.605928               0.009781     [0.5989431395999996, 0.6159891798017821, 0.5917766239445754, 0.617243890781432, 0.6056850753916797]                       0.941224
    SpiralShape                      0.593058                     0.022953  [0.6176201410295181, 0.5883016052408122, 0.5851936592913921, 0.556467995505818, 0.6177065670043709]                0.874233               0.010365 [0.8852138623782198, 0.8739994513711462, 0.8686892820269819, 0.8579887229219502, 0.8852753503377709]                     0.536947                    0.055237         [0.5896699927583219, 0.519688980058846, 0.5240548170837026, 0.4489872126833727, 0.6023321489178431]                0.588821               0.026006    [0.6136773064457457, 0.5878548570193592, 0.5841194067469431, 0.5434740604739523, 0.6149792457191133]                       0.778463
Trefoil_Knot_2D                      0.590938                     0.007418 [0.5822522421284757, 0.5997364966364604, 0.5856321760835728, 0.5998858783710803, 0.5871854662327751]                0.868411               0.003175  [0.8649571547620065, 0.8724557497317181, 0.866700101324063, 0.8720158691198401, 0.8659238841887797]                     0.553431                    0.017192        [0.5347779452905659, 0.5654291488452475, 0.5444694026838341, 0.5810343074152082, 0.5414443403586036]                0.585532               0.010372    [0.5746965683939902, 0.5991714481622031, 0.5758613011002074, 0.5966246997940667, 0.5813082719731264]                       0.945988
   ClusterShape                      0.490707                     0.025104  [0.4508576969229048, 0.479836335706078, 0.527328345666773, 0.4992488038994769, 0.49626542282323227]                0.762488               0.025240 [0.7178213770440961, 0.7553991006856542, 0.7918236448244781, 0.7774473011575358, 0.7699473814346514]                     0.030950                    0.029251 [0.04559650960273986, 0.018754551322276866, 0.0812021337820214, 0.003971616638316422, 0.005227506777120248]                0.246567               0.019180  [0.2554196775547478, 0.2532276658323124, 0.22723492423510405, 0.27431846261983883, 0.2226353956375896]                       0.993871
        Twisted                      0.485653                     0.031239    [0.437033231620364, 0.527924421332908, 0.48680201445384996, 0.46935997357002, 0.5071436102996187]                0.808917               0.017385  [0.7819538688522827, 0.8202967314026199, 0.8028006850113265, 0.8059492152378126, 0.833583440490511]                     0.342918                    0.089692      [0.22001421988147885, 0.49259681407457073, 0.3617748792228423, 0.2944334723521389, 0.3457701942507499]                0.424526               0.041603 [0.34546152581658596, 0.4640629930671194, 0.42360328485565946, 0.44151551214910356, 0.4479849663253328]                       0.791637

Dataset 2/5 (seed=123)
--------------------------------------------------------------------------------
Saving to: /Users/arwinsg/code/supervised-multidimensional-scaling/smds/pipeline/saved_results/Hour_Manifold_Comprehensive_seed123_2026-02-13_141714_2958a5/Hour_Manifold_Comprehensive_seed123_2026-02-13_141714_2958a5.csv
Computed and cached CircularShape
Computed and cached CircularShape
Computed and cached EuclideanShape
Computed and cached LogLinearShape
Computed and cached SpiralShape
Computed and cached SpiralShape
Computed and cached SemicircularShape
Warning: Distance matrix is incomplete. Using optimization to fit W.
Warning: Distance matrix is incomplete. Using optimization to fit W.
Warning: Distance matrix is incomplete. Using optimization to fit W.
Warning: Distance matrix is incomplete. Using optimization to fit W.
Warning: Distance matrix is incomplete. Using optimization to fit W.
Warning: Distance matrix is incomplete. Using optimization to fit W.
Warning: Distance matrix is incomplete. Using optimization to fit W.
Warning: Distance matrix is incomplete. Using optimization to fit W.
Warning: Distance matrix is incomplete. Using optimization to fit W.
Warning: Distance matrix is incomplete. Using optimization to fit W.
Computed and cached ChainShape
Computed and cached ClusterShape
Computed and cached DiscreteCircularShape
Computed and cached Gerono
Computed and cached Bernoulli
Computed and cached Twisted
Computed and cached Torus_Path
Computed and cached Trefoil_Knot_3D
Computed and cached Trefoil_Knot_2D
Visual result saved under: /Users/arwinsg/code/supervised-multidimensional-scaling/smds/pipeline/saved_results/Hour_Manifold_Comprehensive_seed123_2026-02-13_141714_2958a5/Hour_Manifold_Comprehensive_seed123_2026-02-13_141714_2958a5_visualized.png
Cache cleared

Top 5 shapes for this dataset:
          shape  mean_scale_normalized_stress  std_scale_normalized_stress                                                                           fold_scale_normalized_stress  mean_non_metric_stress  std_non_metric_stress                                                                               fold_non_metric_stress  mean_shepard_goodness_score  std_shepard_goodness_score                                                                              fold_shepard_goodness_score  mean_normalized_stress  std_normalized_stress                                                                                 fold_normalized_stress  mean_normalized_kl_divergence
Trefoil_Knot_3D                      0.620543                     0.037787    [0.6083870015393387, 0.592194326843061, 0.5820680571482378, 0.6318474882121028, 0.6882168578233802]                0.895446               0.016531   [0.8963455474658334, 0.8844490966883708, 0.875144422458971, 0.897035858550036, 0.9242555881473346]                     0.518628                    0.095178   [0.48124411856320537, 0.47195112189304944, 0.39994361313889437, 0.560509889548512, 0.6794927856286493]                0.609716               0.037323   [0.6011817908960082, 0.5655479261745491, 0.5814166275434407, 0.6294813818117069, 0.6709529531957245]                       0.936807
    SpiralShape                      0.596873                     0.025656   [0.6267282368737255, 0.5528726024777431, 0.5872068790642272, 0.6021960038540459, 0.6153637587861711]                0.875782               0.010954 [0.8881799046998203, 0.8566051459713785, 0.8716341890809102, 0.8803535966492853, 0.8821375881145194]                     0.558595                    0.050939     [0.6303298153734521, 0.5028087182155097, 0.5053022688573777, 0.5529946355100179, 0.6015407042095717]                0.584495               0.043076   [0.6265342750812344, 0.5027245566645271, 0.5872008138740646, 0.5949019535179301, 0.6111143664198403]                       0.771689
Trefoil_Knot_2D                      0.592883                     0.044457   [0.5717251743509406, 0.5638198745070908, 0.5453035247397138, 0.6140058464579378, 0.6695619498985034]                     NaN                    NaN                  [nan, 0.864396960520317, 0.8461192982376546, 0.885389936044614, 0.9113095002115129]                     0.534245                    0.095723  [0.49543266499785904, 0.47918029445758153, 0.41969428651047513, 0.5817429616437599, 0.6951748167658639]                0.581405               0.042231   [0.5676064646571882, 0.5355608621218412, 0.5451121108721735, 0.6105796557071232, 0.6481634376967444]                       0.940710
        Twisted                      0.497035                     0.028454  [0.44278427506005913, 0.5062073693928435, 0.4985725631211665, 0.5247457463349563, 0.5128659550559005]                0.814038               0.021509 [0.7733992192450169, 0.8275014283688383, 0.8152226757092238, 0.8354377733277459, 0.8186293057638585]                     0.369370                    0.041001   [0.30312603999586496, 0.3716623224552408, 0.3519552563205787, 0.39593712936189746, 0.4241682837273716]                0.439304               0.044567   [0.37819604176333466, 0.411201815354734, 0.4352637268218654, 0.5087648586984672, 0.4630940951300143]                       0.793497
         Gerono                      0.481609                     0.021091 [0.4598658548263409, 0.47134447781359057, 0.46956114477709143, 0.4873505204966453, 0.5199234476842114]                0.820454               0.014187  [0.7981472860411327, 0.8133799254216147, 0.819645908500464, 0.8366240052684231, 0.8344749856943612]                     0.244224                    0.065382 [0.19123245076269157, 0.23553151287501034, 0.18876031311997288, 0.2374067956431743, 0.36818969173370264]                0.447031               0.019963 [0.43962068367099716, 0.4367277141219361, 0.4332841499928761, 0.48671710205712915, 0.4388074849066087]                       0.836502

Dataset 3/5 (seed=456)
--------------------------------------------------------------------------------
Saving to: /Users/arwinsg/code/supervised-multidimensional-scaling/smds/pipeline/saved_results/Hour_Manifold_Comprehensive_seed456_2026-02-13_141740_11492d/Hour_Manifold_Comprehensive_seed456_2026-02-13_141740_11492d.csv
Computed and cached CircularShape
Computed and cached CircularShape
Computed and cached EuclideanShape
Computed and cached LogLinearShape
Computed and cached SpiralShape
Computed and cached SpiralShape
Computed and cached SemicircularShape
Warning: Distance matrix is incomplete. Using optimization to fit W.
Warning: Distance matrix is incomplete. Using optimization to fit W.
Warning: Distance matrix is incomplete. Using optimization to fit W.
Warning: Distance matrix is incomplete. Using optimization to fit W.
Warning: Distance matrix is incomplete. Using optimization to fit W.
Warning: Distance matrix is incomplete. Using optimization to fit W.
Warning: Distance matrix is incomplete. Using optimization to fit W.
Warning: Distance matrix is incomplete. Using optimization to fit W.
Warning: Distance matrix is incomplete. Using optimization to fit W.
Warning: Distance matrix is incomplete. Using optimization to fit W.
Computed and cached ChainShape
Computed and cached ClusterShape
Computed and cached DiscreteCircularShape
Computed and cached Gerono
Computed and cached Bernoulli
Computed and cached Twisted
Computed and cached Torus_Path
Computed and cached Trefoil_Knot_3D
Computed and cached Trefoil_Knot_2D
Visual result saved under: /Users/arwinsg/code/supervised-multidimensional-scaling/smds/pipeline/saved_results/Hour_Manifold_Comprehensive_seed456_2026-02-13_141740_11492d/Hour_Manifold_Comprehensive_seed456_2026-02-13_141740_11492d_visualized.png
Cache cleared

Top 5 shapes for this dataset:
          shape  mean_scale_normalized_stress  std_scale_normalized_stress                                                                          fold_scale_normalized_stress  mean_non_metric_stress  std_non_metric_stress                                                                               fold_non_metric_stress  mean_shepard_goodness_score  std_shepard_goodness_score                                                                                fold_shepard_goodness_score  mean_normalized_stress  std_normalized_stress                                                                                  fold_normalized_stress  mean_normalized_kl_divergence
    SpiralShape                      0.602468                     0.078251   [0.6726244516330431, 0.6494043567328367, 0.630989851326366, 0.4519571343560074, 0.6073649229644604]                0.867244               0.059732 [0.9100530714604332, 0.9025992104754288, 0.8922936881795644, 0.7492934986223632, 0.8819797851837281]                     0.613439                    0.082205       [0.7123680679643595, 0.6560906361123175, 0.6328210415426564, 0.4664304578078993, 0.5994826546811717]                0.575926               0.109891   [0.6693060477725639, 0.6493918459471062, 0.6299311601571699, 0.36813842270062447, 0.5628609884468092]                       0.741697
Trefoil_Knot_3D                      0.586313                     0.067427   [0.6449681463575391, 0.623991877097924, 0.5880047092286118, 0.4564786036764662, 0.6181227390395869]                0.866135               0.056304  [0.9036088290117344, 0.8984855254585097, 0.877059158008932, 0.7549778826990567, 0.8965432882340056]                     0.504998                    0.067701       [0.5999299189554197, 0.5460360971028034, 0.41963414298449814, 0.4369533432911934, 0.522437461011109]                0.529307               0.165267    [0.6313010314249521, 0.6168850502821708, 0.5844618466618365, 0.2001758522476872, 0.6137106273880868]                       0.899866
Trefoil_Knot_2D                      0.566652                     0.057947 [0.6250608554861063, 0.6101033657238544, 0.5435514091115521, 0.46460308436811015, 0.5899392143162187]                     NaN                    NaN                                [0.8833971612832898, 0.8851326210441623, nan, 0.767503032242789, nan]                     0.523189                    0.082935      [0.6373786317992622, 0.5814949202072742, 0.4121685755928486, 0.44922424237716396, 0.5356798498603034]                0.520714               0.134164      [0.6157781444968954, 0.60502874897345, 0.5403822755715495, 0.2573973468130011, 0.5849845660097395]                       0.918106
        Twisted                      0.489316                     0.043557 [0.5220020977861441, 0.5097660824218724, 0.5322614684514344, 0.41353459543992743, 0.4690135000035244]                0.813319               0.024107 [0.8253304020320068, 0.8336422601182398, 0.8311797062660288, 0.7686032979685933, 0.8078400566608288]                     0.354195                    0.071784      [0.4249916525273554, 0.3911750745517101, 0.41927750787314444, 0.2575443729817177, 0.2779849150207873]                0.417488               0.107166 [0.4668293630162159, 0.4851583891512843, 0.48103712040673097, 0.20459169082161133, 0.44982512826578835]                       0.804013
   ClusterShape                      0.471764                     0.031982 [0.47706381811302967, 0.4988363478603506, 0.5080691131305134, 0.4190287776688547, 0.4558215288213916]                0.743834               0.035728 [0.7462042031315237, 0.7789395454731816, 0.7805509005986866, 0.6838915784142278, 0.7295838430123307]                     0.044358                    0.027309 [0.03512430098504914, 0.0021655161927244663, 0.05434883694567549, 0.08634970026426306, 0.0438035775479592]                0.256932               0.030741 [0.29430175172069506, 0.27033592547683316, 0.20095937043296386, 0.2584381269898862, 0.2606234542078699]                       0.992842

Dataset 4/5 (seed=789)
--------------------------------------------------------------------------------
Saving to: /Users/arwinsg/code/supervised-multidimensional-scaling/smds/pipeline/saved_results/Hour_Manifold_Comprehensive_seed789_2026-02-13_141806_533304/Hour_Manifold_Comprehensive_seed789_2026-02-13_141806_533304.csv
Computed and cached CircularShape
Computed and cached CircularShape
Computed and cached EuclideanShape
Computed and cached LogLinearShape
Computed and cached SpiralShape
Computed and cached SpiralShape
Computed and cached SemicircularShape
Warning: Distance matrix is incomplete. Using optimization to fit W.
Warning: Distance matrix is incomplete. Using optimization to fit W.
Warning: Distance matrix is incomplete. Using optimization to fit W.
Warning: Distance matrix is incomplete. Using optimization to fit W.
Warning: Distance matrix is incomplete. Using optimization to fit W.
Warning: Distance matrix is incomplete. Using optimization to fit W.
Warning: Distance matrix is incomplete. Using optimization to fit W.
Warning: Distance matrix is incomplete. Using optimization to fit W.
Warning: Distance matrix is incomplete. Using optimization to fit W.
Warning: Distance matrix is incomplete. Using optimization to fit W.
Computed and cached ChainShape
Computed and cached ClusterShape
Computed and cached DiscreteCircularShape
Computed and cached Gerono
Computed and cached Bernoulli
Computed and cached Twisted
Computed and cached Torus_Path
Computed and cached Trefoil_Knot_3D
Computed and cached Trefoil_Knot_2D
Visual result saved under: /Users/arwinsg/code/supervised-multidimensional-scaling/smds/pipeline/saved_results/Hour_Manifold_Comprehensive_seed789_2026-02-13_141806_533304/Hour_Manifold_Comprehensive_seed789_2026-02-13_141806_533304_visualized.png
Cache cleared

Top 5 shapes for this dataset:
          shape  mean_scale_normalized_stress  std_scale_normalized_stress                                                                          fold_scale_normalized_stress  mean_non_metric_stress  std_non_metric_stress                                                                               fold_non_metric_stress  mean_shepard_goodness_score  std_shepard_goodness_score                                                                             fold_shepard_goodness_score  mean_normalized_stress  std_normalized_stress                                                                                 fold_normalized_stress  mean_normalized_kl_divergence
Trefoil_Knot_3D                      0.602163                     0.020283   [0.629419775781632, 0.6183028308719856, 0.6041557270338161, 0.5763377787300086, 0.5825995111025162]                0.881170               0.013014 [0.8975187737100048, 0.8929135832967331, 0.8813843515362217, 0.8715442933756004, 0.8624908801537776]                     0.493759                    0.042423      [0.541400746098268, 0.522616567485095, 0.4740354716141925, 0.4212380278977278, 0.5095028577519014]                0.581830               0.034815   [0.6235664498065632, 0.6133246190493449, 0.5862571084156196, 0.5550802560255783, 0.5309228292186565]                       0.929704
    SpiralShape                      0.584383                     0.025790  [0.6248839357625022, 0.5760972391447572, 0.5927225762244661, 0.5831397499925859, 0.5450697288258511]                0.866846               0.016471 [0.8890922027472794, 0.8679258686454143, 0.8710803823356157, 0.8682773904577746, 0.8378561280674026]                     0.528306                    0.043966   [0.6063343739218352, 0.47838053972968053, 0.5334300858611427, 0.5275951557328048, 0.4957909465214621]                0.568030               0.035634   [0.6239541756221197, 0.5546119372267215, 0.5880970272100853, 0.5551960020283074, 0.5182907195057966]                       0.746470
Trefoil_Knot_2D                      0.572568                     0.020002  [0.5969838761081758, 0.5896627632756524, 0.5769880293207006, 0.5452557860243636, 0.5539502008538526]                     NaN                    NaN                 [nan, 0.8711891217112202, 0.8569779428217936, 0.848397125744314, 0.8381838843468549]                     0.507705                    0.038920   [0.5408535711185724, 0.5340368617521628, 0.5059890084217673, 0.43348857782102485, 0.5241568140042795]                0.551260               0.034503   [0.5877808577136119, 0.5846716263548226, 0.5610133479709652, 0.5221870236850494, 0.5006455045379562]                       0.932833
        Twisted                      0.497777                     0.032059 [0.5103504285381464, 0.4974695808869243, 0.5305568771700888, 0.5132759818039825, 0.43723062070607555]                0.811804               0.032521  [0.8318927745782561, 0.817118891951259, 0.8356149263871047, 0.8264290539432367, 0.7479653039980909]                     0.374274                    0.032068     [0.3332199686203896, 0.339785153999245, 0.403325688632913, 0.4105984446576336, 0.38443831219957597]                0.439844               0.075414  [0.4656720112671524, 0.47950734201182843, 0.5005534613045091, 0.4620310724045129, 0.2914579904708977]                       0.812911
         Gerono                      0.492541                     0.013426 [0.4970351791277908, 0.4688723530402452, 0.48745401911893305, 0.5040872874286143, 0.5052581169987025]                0.818576               0.022875 [0.8348227238822946, 0.7746382908976612, 0.8193443131466445, 0.8266157041157203, 0.8374592282960671]                     0.310416                    0.034717 [0.28580983296971296, 0.33680691113507966, 0.25408732862622035, 0.34334576873534606, 0.332030219899426]                0.454451               0.032396 [0.46896710653769746, 0.4029977826568393, 0.43482847668443214, 0.46850523681426703, 0.496956273070456]                       0.852087

Dataset 5/5 (seed=1024)
--------------------------------------------------------------------------------
Saving to: /Users/arwinsg/code/supervised-multidimensional-scaling/smds/pipeline/saved_results/Hour_Manifold_Comprehensive_seed1024_2026-02-13_141833_252039/Hour_Manifold_Comprehensive_seed1024_2026-02-13_141833_252039.csv
Computed and cached CircularShape
Computed and cached CircularShape
Computed and cached EuclideanShape
Computed and cached LogLinearShape
Computed and cached SpiralShape
Computed and cached SpiralShape
Computed and cached SemicircularShape
Warning: Distance matrix is incomplete. Using optimization to fit W.
Warning: Distance matrix is incomplete. Using optimization to fit W.
Warning: Distance matrix is incomplete. Using optimization to fit W.
Warning: Distance matrix is incomplete. Using optimization to fit W.
Warning: Distance matrix is incomplete. Using optimization to fit W.
Warning: Distance matrix is incomplete. Using optimization to fit W.
Warning: Distance matrix is incomplete. Using optimization to fit W.
Warning: Distance matrix is incomplete. Using optimization to fit W.
Warning: Distance matrix is incomplete. Using optimization to fit W.
Warning: Distance matrix is incomplete. Using optimization to fit W.
Computed and cached ChainShape
Computed and cached ClusterShape
Computed and cached DiscreteCircularShape
Computed and cached Gerono
Computed and cached Bernoulli
Computed and cached Twisted
Computed and cached Torus_Path
Computed and cached Trefoil_Knot_3D
Computed and cached Trefoil_Knot_2D
Visual result saved under: /Users/arwinsg/code/supervised-multidimensional-scaling/smds/pipeline/saved_results/Hour_Manifold_Comprehensive_seed1024_2026-02-13_141833_252039/Hour_Manifold_Comprehensive_seed1024_2026-02-13_141833_252039_visualized.png
Cache cleared

Top 5 shapes for this dataset:
          shape  mean_scale_normalized_stress  std_scale_normalized_stress                                                                           fold_scale_normalized_stress  mean_non_metric_stress  std_non_metric_stress                                                                               fold_non_metric_stress  mean_shepard_goodness_score  std_shepard_goodness_score                                                                                  fold_shepard_goodness_score  mean_normalized_stress  std_normalized_stress                                                                                  fold_normalized_stress  mean_normalized_kl_divergence
Trefoil_Knot_3D                      0.569014                     0.014884     [0.594368566364656, 0.5648204964203702, 0.573951580717086, 0.5495494778416599, 0.5623818350100633]                0.867210               0.014588 [0.8891110912690009, 0.8622457403066924, 0.8790400851500124, 0.8511003185355736, 0.8545517417241358]                     0.407518                    0.021025          [0.4422029559869142, 0.4165153965107484, 0.381980331307386, 0.3910470876081606, 0.4058444698196147]                0.542759               0.038707   [0.5829662797756366, 0.47678869615845954, 0.5735303572141823, 0.5234780318010244, 0.5570313079656186]                       0.919707
    SpiralShape                      0.560052                     0.035410   [0.5549822368534079, 0.6245115941920113, 0.5645100975999005, 0.5242179039672497, 0.5320379849702935]                0.854632               0.020933  [0.8508929111848205, 0.8943214979274784, 0.8511732487863896, 0.832823177223993, 0.8439473078197561]                     0.463745                    0.074303       [0.4323783953384882, 0.5981215128830666, 0.4822691979943208, 0.42342199756210197, 0.38253602283133076]                0.546001               0.040893     [0.5532669693489086, 0.6083711511574623, 0.5625309531250939, 0.488208639642853, 0.5176276743670523]                       0.743346
Trefoil_Knot_2D                      0.538806                     0.010729   [0.5548638157515717, 0.5404128358061906, 0.5445586176829499, 0.5251125689943235, 0.5290818987978639]                     NaN                    NaN                               [nan, 0.8448241852189537, nan, 0.8266318927159506, 0.8293768446022831]                     0.427716                    0.019342        [0.4473062802659223, 0.45250050902390954, 0.415280058012473, 0.4219412672059971, 0.40155216004237027]                0.514379               0.035571     [0.5497490358892585, 0.4513972714472042, 0.5444992354723578, 0.503164646209446, 0.5230871712196901]                       0.925645
   ClusterShape                      0.473459                     0.025164 [0.5173235825668769, 0.45694651348641013, 0.44880405014218594, 0.4854884234184348, 0.4587317884442873]                0.747073               0.029832  [0.795311857798801, 0.7243753442017837, 0.7224016528593198, 0.7689834214994261, 0.7242943719529127]                     0.030516                    0.019448 [0.013897907120885004, 0.035763927536758326, 0.03519134577971763, 0.006086595949726599, 0.06164083612632504]                0.239555               0.016900 [0.2638119346581511, 0.23424098993064246, 0.21273185865237954, 0.23798290732972516, 0.2490095830233372]                       0.993312
        Twisted                      0.450078                     0.008446    [0.462750126132612, 0.4425541889187665, 0.4461556142571177, 0.4573180702239876, 0.4416112131436356]                0.788328               0.011444 [0.8095316135407774, 0.7809757348963715, 0.7779705957059548, 0.7822358099633919, 0.7909281708359516]                     0.240379                    0.039368       [0.21552142747022734, 0.238359196240515, 0.2612911038047962, 0.30112138357335944, 0.18560262754334897]                0.349450               0.071884    [0.4146305693499439, 0.368918717932715, 0.32330508917864953, 0.2232078342645426, 0.4171886136679299]                       0.742436

================================================================================
Discovery complete: 80 total results
Visualization plots collected: 5

Open Streamlit Dashboard¶

Run the cell below to open the Streamlit dashboard with the last discovery run. The dashboard shows results and interactive plots; it opens in your browser (or a new tab).

In [13]:

Copied!

open_dashboard.main(last_csv_path)
open_dashboard.main(last_csv_path)

Launching Dashboard...

  You can now view your Streamlit app in your browser.

  Local URL: http://localhost:8501
  Network URL: http://192.168.2.135:8501

  Stopping...

---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
Cell In[13], line 1
----> 1 open_dashboard.main(last_csv_path)

File ~/code/supervised-multidimensional-scaling/smds/pipeline/open_dashboard.py:12, in main(saved_file_path)
      9 visualizer_path = os.path.join(os.path.dirname(__file__), "dashboard.py")
     11 if saved_file_path:
---> 12     subprocess.run(["streamlit", "run", visualizer_path, "--", saved_file_path])
     13 else:
     14     subprocess.run(["streamlit", "run", visualizer_path])

File ~/.local/share/uv/python/cpython-3.13.5-macos-aarch64-none/lib/python3.13/subprocess.py:556, in run(input, capture_output, timeout, check, *popenargs, **kwargs)
    554 with Popen(*popenargs, **kwargs) as process:
    555     try:
--> 556         stdout, stderr = process.communicate(input, timeout=timeout)
    557     except TimeoutExpired as exc:
    558         process.kill()

File ~/.local/share/uv/python/cpython-3.13.5-macos-aarch64-none/lib/python3.13/subprocess.py:1214, in Popen.communicate(self, input, timeout)
   1212         stderr = self.stderr.read()
   1213         self.stderr.close()
-> 1214     self.wait()
   1215 else:
   1216     if timeout is not None:

File ~/.local/share/uv/python/cpython-3.13.5-macos-aarch64-none/lib/python3.13/subprocess.py:1280, in Popen.wait(self, timeout)
   1278     endtime = _time() + timeout
   1279 try:
-> 1280     return self._wait(timeout=timeout)
   1281 except KeyboardInterrupt:
   1282     # https://bugs.python.org/issue25942
   1283     # The first keyboard interrupt waits briefly for the child to
   1284     # exit under the common assumption that it also received the ^C
   1285     # generated SIGINT and will exit rapidly.
   1286     if timeout is not None:

File ~/.local/share/uv/python/cpython-3.13.5-macos-aarch64-none/lib/python3.13/subprocess.py:2066, in Popen._wait(self, timeout)
   2064 if self.returncode is not None:
   2065     break  # Another thread waited.
-> 2066 (pid, sts) = self._try_wait(0)
   2067 # Check the pid and loop as waitpid has been known to
   2068 # return 0 even without WNOHANG in odd situations.
   2069 # http://bugs.python.org/issue14396.
   2070 if pid == self.pid:

File ~/.local/share/uv/python/cpython-3.13.5-macos-aarch64-none/lib/python3.13/subprocess.py:2024, in Popen._try_wait(self, wait_flags)
   2022 """All callers to this function MUST hold self._waitpid_lock."""
   2023 try:
-> 2024     (pid, sts) = os.waitpid(self.pid, wait_flags)
   2025 except ChildProcessError:
   2026     # This happens if SIGCLD is set to be ignored or waiting
   2027     # for child processes has otherwise been disabled for our
   2028     # process.  This child is dead, we can't get the status.
   2029     pid = self.pid

KeyboardInterrupt:

10. Display Visualization Plots from Discovery Pipeline¶

These plots show the top-ranked manifolds for each dataset.

In [14]:

Copied!





if visualization_paths:
    n_plots = len(visualization_paths)
    n_cols = min(2, n_plots)
    n_rows = (n_plots + n_cols - 1) // n_cols

    fig, axes = plt.subplots(n_rows, n_cols, figsize=(15 * n_cols, 10 * n_rows))
    if n_plots == 1:
        axes = np.array([axes])
    axes = axes.flatten()

    for idx, viz_path in enumerate(visualization_paths):
        img = Image.open(viz_path)
        axes[idx].imshow(img)
        axes[idx].axis("off")
        axes[idx].set_title(f"Dataset {idx + 1} (seed={datasets[idx]['seed']})", fontsize=16, fontweight="bold")

    for idx in range(len(visualization_paths), len(axes)):
        axes[idx].axis("off")

    plt.tight_layout()
    plt.show()
else:
    print("No visualization plots found.")
if visualization_paths:
    n_plots = len(visualization_paths)
    n_cols = min(2, n_plots)
    n_rows = (n_plots + n_cols - 1) // n_cols

    fig, axes = plt.subplots(n_rows, n_cols, figsize=(15 * n_cols, 10 * n_rows))
    if n_plots == 1:
        axes = np.array([axes])
    axes = axes.flatten()

    for idx, viz_path in enumerate(visualization_paths):
        img = Image.open(viz_path)
        axes[idx].imshow(img)
        axes[idx].axis("off")
        axes[idx].set_title(f"Dataset {idx + 1} (seed={datasets[idx]['seed']})", fontsize=16, fontweight="bold")

    for idx in range(len(visualization_paths), len(axes)):
        axes[idx].axis("off")

    plt.tight_layout()
    plt.show()
else:
    print("No visualization plots found.")

11. Aggregate Results Across All Datasets¶

Statistical Aggregation¶

Given a stress metric, for each shape, we compute:

Mean score: Central tendency across datasets
Std: Variability
Min/Max: Range of performance
Count: Number of measurements (should be n_datasets)

Key metric: Mean ± Std provides confidence in shape ranking. Note: If there is no metric given, we aggreagte the result over the mean of the scores for different metrics

In [15]:

Copied!





stress = "mean_scale_normalized_stress"
shape_col = None
for col in combined_results.columns:
    if "shape" in col.lower() and shape_col is None:
        shape_col = col
        break
metric_cols = [c for c in combined_results.columns if c.startswith("mean_")]
if not metric_cols:
    metric_cols = [
        c for c in combined_results.columns if "mean" in c.lower() and ("stress" in c.lower() or "shepard" in c.lower())
    ]
combined_results = combined_results.copy()
if stress is not None and stress in combined_results.columns:
    combined_results["score"] = combined_results[stress]
else:
    combined_results["score"] = combined_results[metric_cols].mean(axis=1)

has_score = (stress is not None and stress in combined_results.columns) or len(metric_cols) > 0
if shape_col and has_score:
    aggregated = combined_results.groupby(shape_col)["score"].agg(["mean", "std", "min", "max", "count"]).reset_index()
    aggregated.columns = ["Shape", "Mean_Score", "Std_Score", "Min_Score", "Max_Score", "N_Runs"]
    aggregated = aggregated.sort_values("Mean_Score", ascending=False)
    aggregated["CV"] = (aggregated["Std_Score"] / aggregated["Mean_Score"]) * 100

    print("=" * 120)
    print("AGGREGATED RESULTS:")
    print("=" * 120)
    print(f"\nStatistics computed over {len(datasets)} independent datasets")
    print(f"Each shape tested with {N_FOLDS}-fold cross-validation\n")
    print(aggregated.to_string(index=False, float_format=lambda x: f"{x:.4f}"))
    print("\n" + "=" * 120)

    best_shape = aggregated.iloc[0]["Shape"]
    best_mean_score = aggregated.iloc[0]["Mean_Score"]
    best_std_score = aggregated.iloc[0]["Std_Score"]
    best_cv = aggregated.iloc[0]["CV"]

    print(f"\nBest Shape: {best_shape}")
    print(f"  Mean Score: {best_mean_score:.4f}")
    print(f"  Std: ±{best_std_score:.4f}")
    print(f"  Coefficient of Variation: {best_cv:.1f}%")
    print(
        f"  95% CI (approx): [{best_mean_score - 2 * best_std_score:.4f}, {best_mean_score + 2 * best_std_score:.4f}]"
    )
else:
    print("Could not identify shape and score columns for aggregation.")
    print(f"Available columns: {list(combined_results.columns)}")
stress = "mean_scale_normalized_stress"
shape_col = None
for col in combined_results.columns:
    if "shape" in col.lower() and shape_col is None:
        shape_col = col
        break
metric_cols = [c for c in combined_results.columns if c.startswith("mean_")]
if not metric_cols:
    metric_cols = [
        c for c in combined_results.columns if "mean" in c.lower() and ("stress" in c.lower() or "shepard" in c.lower())
    ]
combined_results = combined_results.copy()
if stress is not None and stress in combined_results.columns:
    combined_results["score"] = combined_results[stress]
else:
    combined_results["score"] = combined_results[metric_cols].mean(axis=1)

has_score = (stress is not None and stress in combined_results.columns) or len(metric_cols) > 0
if shape_col and has_score:
    aggregated = combined_results.groupby(shape_col)["score"].agg(["mean", "std", "min", "max", "count"]).reset_index()
    aggregated.columns = ["Shape", "Mean_Score", "Std_Score", "Min_Score", "Max_Score", "N_Runs"]
    aggregated = aggregated.sort_values("Mean_Score", ascending=False)
    aggregated["CV"] = (aggregated["Std_Score"] / aggregated["Mean_Score"]) * 100

    print("=" * 120)
    print("AGGREGATED RESULTS:")
    print("=" * 120)
    print(f"\nStatistics computed over {len(datasets)} independent datasets")
    print(f"Each shape tested with {N_FOLDS}-fold cross-validation\n")
    print(aggregated.to_string(index=False, float_format=lambda x: f"{x:.4f}"))
    print("\n" + "=" * 120)

    best_shape = aggregated.iloc[0]["Shape"]
    best_mean_score = aggregated.iloc[0]["Mean_Score"]
    best_std_score = aggregated.iloc[0]["Std_Score"]
    best_cv = aggregated.iloc[0]["CV"]

    print(f"\nBest Shape: {best_shape}")
    print(f"  Mean Score: {best_mean_score:.4f}")
    print(f"  Std: ±{best_std_score:.4f}")
    print(f"  Coefficient of Variation: {best_cv:.1f}%")
    print(
        f"  95% CI (approx): [{best_mean_score - 2 * best_std_score:.4f}, {best_mean_score + 2 * best_std_score:.4f}]"
    )
else:
    print("Could not identify shape and score columns for aggregation.")
    print(f"Available columns: {list(combined_results.columns)}")

========================================================================================================================
AGGREGATED RESULTS:
========================================================================================================================

Statistics computed over 5 independent datasets
Each shape tested with 5-fold cross-validation

                Shape  Mean_Score  Std_Score  Min_Score  Max_Score  N_Runs      CV
      Trefoil_Knot_3D      0.5980     0.0206     0.5690     0.6205       5  3.4391
      Trefoil_Knot_2D      0.5724     0.0219     0.5388     0.5929       5  3.8326
          SpiralShape      0.4951     0.0988     0.3870     0.6025      10 19.9485
              Twisted      0.4840     0.0196     0.4501     0.4978       5  4.0561
         ClusterShape      0.4722     0.0122     0.4569     0.4907       5  2.5872
               Gerono      0.4592     0.0321     0.4131     0.4925       5  6.9897
           Torus_Path      0.4589     0.0185     0.4312     0.4770       5  4.0270
        CircularShape      0.4524     0.0211     0.4180     0.4774      10  4.6680
    SemicircularShape      0.4255     0.0326     0.3741     0.4573       5  7.6549
DiscreteCircularShape      0.4184     0.0220     0.3843     0.4433       5  5.2637
            Bernoulli      0.4013     0.0261     0.3646     0.4278       5  6.5054
       EuclideanShape      0.3654     0.0454     0.2958     0.4121       5 12.4345
       LogLinearShape      0.3633     0.0452     0.2902     0.4066       5 12.4485
           ChainShape      0.3163     0.0223     0.2943     0.3473       5  7.0522

========================================================================================================================

Best Shape: Trefoil_Knot_3D
  Mean Score: 0.5980
  Std: ±0.0206
  Coefficient of Variation: 3.4%
  95% CI (approx): [0.5568, 0.6391]

12. Statistical Comparison: Hypotheses vs. Baselines¶

Statistical Significance Testing¶

Objective: Rigorously determine if the proposed topological models (Figure-8, Torus, Trefoil) provide a significantly better fit than the best-performing standard baseline.

Formal Hypothesis:

Null Hypothesis ($H_0$): The proposed topological hypothesis yields a stress score equal to or worse (higher) than the best baseline model.
Alternative Hypothesis ($H_1$): The proposed topological hypothesis yields a statistically significantly lower stress score (better fit).

Method: Independent two-sample t-test (one-tailed) comparing the distribution of stress scores across all cross-validation folds and seeds.

In [25]:

Copied!





if "combined_results" not in locals() or combined_results.empty:
    print("Error: 'combined_results' dataframe not found or empty. Please run previous cells.")
else:
    if "score" not in combined_results.columns:
        valid_cols = [c for c in combined_results.columns if "scale_normalized" in c and "mean" in c]
        if valid_cols:
            combined_results["score"] = combined_results[valid_cols[0]]
        else:
            metric_cols = [c for c in combined_results.columns if c.startswith("mean_")]
            combined_results["score"] = combined_results[metric_cols].mean(axis=1)

    shape_col = next((col for col in combined_results.columns if "shape" in col.lower()), None)

    if not shape_col:
        print("Error: Could not find shape column in combined_results")
    else:
        topo_keywords = ["Gerono", "Bernoulli", "Twisted", "Torus", "Trefoil", "UserProvided"]

        is_topo = combined_results[shape_col].apply(lambda x: any(k in x for k in topo_keywords))

        standard_df = combined_results[~is_topo].copy()
        topo_df = combined_results[is_topo].copy()

        if standard_df.empty or topo_df.empty:
            print("Error: Could not split results into Baseline and Topological sets.")
            print(f"Standard count: {len(standard_df)}, Topological count: {len(topo_df)}")
        else:

            def aggregate_scores(df):
                """Aggregate score by shape/variant (mean, std, count)."""
                agg = df.groupby(shape_col)["score"].agg(["mean", "std", "count"]).reset_index()
                agg.columns = ["Variant", "Mean_Score", "Std_Score", "N_Runs"]
                return agg.sort_values("Mean_Score", ascending=False)

            agg_standard = aggregate_scores(standard_df)
            agg_topo = aggregate_scores(topo_df)

            best_standard_shape = agg_standard.iloc[0]["Variant"]
            best_hypothesis = agg_topo.iloc[0]["Variant"]

            standard_scores = standard_df[standard_df[shape_col] == best_standard_shape]["score"].values
            hypothesis_scores = topo_df[topo_df[shape_col] == best_hypothesis]["score"].values

            t_stat, p_value = stats.ttest_ind(hypothesis_scores, standard_scores, alternative="greater")

            if "Torus" in best_hypothesis or "Trefoil" in best_hypothesis:
                hyp_type = "Toroidal/Knot"
            else:
                hyp_type = "Figure-8"

            pooled_std = np.sqrt((standard_scores.std(ddof=1) ** 2 + hypothesis_scores.std(ddof=1) ** 2) / 2)
            if pooled_std == 0:
                cohens_d = 0.0
            else:
                cohens_d = (hypothesis_scores.mean() - standard_scores.mean()) / pooled_std

            sig_level = "***" if p_value < 0.001 else "**" if p_value < 0.01 else "*" if p_value < 0.05 else "n.s."
            effect_size_label = "small" if abs(cohens_d) < 0.5 else "medium" if abs(cohens_d) < 0.8 else "large"

            print("=" * 100)
            print("STATISTICAL COMPARISON: BASELINE vs TOPOLOGICAL HYPOTHESES")
            print("=" * 100)
            print(f"\nBest Standard Shape: {best_standard_shape}")
            print(f"  Mean Score: {standard_scores.mean():.4f} ± {standard_scores.std():.4f}")
            print(f"  n = {len(standard_scores)}")

            print(f"\nBest Topological Hypothesis: {best_hypothesis} ({hyp_type})")
            print(f"  Mean Score: {hypothesis_scores.mean():.4f} ± {hypothesis_scores.std():.4f}")
            print(f"  n = {len(hypothesis_scores)}")

            print("\nTwo-sample t-test (one-tailed, H1: hypothesis > baseline):")
            print(f"  t-statistic: {t_stat:.4f}")
            print(f"  p-value: {p_value:.6f}")
            print(f"  Significance: {sig_level}")

            if p_value < 0.05:
                print(f"\n {hyp_type} hypothesis is SUPPORTED (p < 0.05)")
            else:
                print(f"\n{hyp_type} hypothesis is NOT supported (p >= 0.05)")

            print(f"\nCohen's d (effect size): {cohens_d:.3f}")
            print(f"Interpretation: {effect_size_label} effect")
            print("\n" + "=" * 100)
if "combined_results" not in locals() or combined_results.empty:
    print("Error: 'combined_results' dataframe not found or empty. Please run previous cells.")
else:
    if "score" not in combined_results.columns:
        valid_cols = [c for c in combined_results.columns if "scale_normalized" in c and "mean" in c]
        if valid_cols:
            combined_results["score"] = combined_results[valid_cols[0]]
        else:
            metric_cols = [c for c in combined_results.columns if c.startswith("mean_")]
            combined_results["score"] = combined_results[metric_cols].mean(axis=1)

    shape_col = next((col for col in combined_results.columns if "shape" in col.lower()), None)

    if not shape_col:
        print("Error: Could not find shape column in combined_results")
    else:
        topo_keywords = ["Gerono", "Bernoulli", "Twisted", "Torus", "Trefoil", "UserProvided"]

        is_topo = combined_results[shape_col].apply(lambda x: any(k in x for k in topo_keywords))

        standard_df = combined_results[~is_topo].copy()
        topo_df = combined_results[is_topo].copy()

        if standard_df.empty or topo_df.empty:
            print("Error: Could not split results into Baseline and Topological sets.")
            print(f"Standard count: {len(standard_df)}, Topological count: {len(topo_df)}")
        else:

            def aggregate_scores(df):
                """Aggregate score by shape/variant (mean, std, count)."""
                agg = df.groupby(shape_col)["score"].agg(["mean", "std", "count"]).reset_index()
                agg.columns = ["Variant", "Mean_Score", "Std_Score", "N_Runs"]
                return agg.sort_values("Mean_Score", ascending=False)

            agg_standard = aggregate_scores(standard_df)
            agg_topo = aggregate_scores(topo_df)

            best_standard_shape = agg_standard.iloc[0]["Variant"]
            best_hypothesis = agg_topo.iloc[0]["Variant"]

            standard_scores = standard_df[standard_df[shape_col] == best_standard_shape]["score"].values
            hypothesis_scores = topo_df[topo_df[shape_col] == best_hypothesis]["score"].values

            t_stat, p_value = stats.ttest_ind(hypothesis_scores, standard_scores, alternative="greater")

            if "Torus" in best_hypothesis or "Trefoil" in best_hypothesis:
                hyp_type = "Toroidal/Knot"
            else:
                hyp_type = "Figure-8"

            pooled_std = np.sqrt((standard_scores.std(ddof=1) ** 2 + hypothesis_scores.std(ddof=1) ** 2) / 2)
            if pooled_std == 0:
                cohens_d = 0.0
            else:
                cohens_d = (hypothesis_scores.mean() - standard_scores.mean()) / pooled_std

            sig_level = "***" if p_value < 0.001 else "**" if p_value < 0.01 else "*" if p_value < 0.05 else "n.s."
            effect_size_label = "small" if abs(cohens_d) < 0.5 else "medium" if abs(cohens_d) < 0.8 else "large"

            print("=" * 100)
            print("STATISTICAL COMPARISON: BASELINE vs TOPOLOGICAL HYPOTHESES")
            print("=" * 100)
            print(f"\nBest Standard Shape: {best_standard_shape}")
            print(f"  Mean Score: {standard_scores.mean():.4f} ± {standard_scores.std():.4f}")
            print(f"  n = {len(standard_scores)}")

            print(f"\nBest Topological Hypothesis: {best_hypothesis} ({hyp_type})")
            print(f"  Mean Score: {hypothesis_scores.mean():.4f} ± {hypothesis_scores.std():.4f}")
            print(f"  n = {len(hypothesis_scores)}")

            print("\nTwo-sample t-test (one-tailed, H1: hypothesis > baseline):")
            print(f"  t-statistic: {t_stat:.4f}")
            print(f"  p-value: {p_value:.6f}")
            print(f"  Significance: {sig_level}")

            if p_value < 0.05:
                print(f"\n {hyp_type} hypothesis is SUPPORTED (p < 0.05)")
            else:
                print(f"\n{hyp_type} hypothesis is NOT supported (p >= 0.05)")

            print(f"\nCohen's d (effect size): {cohens_d:.3f}")
            print(f"Interpretation: {effect_size_label} effect")
            print("\n" + "=" * 100)

====================================================================================================
STATISTICAL COMPARISON: BASELINE vs TOPOLOGICAL HYPOTHESES
====================================================================================================

Best Standard Shape: SpiralShape
  Mean Score: 0.4951 ± 0.0937
  n = 10

Best Topological Hypothesis: Trefoil_Knot_3D (Toroidal/Knot)
  Mean Score: 0.5980 ± 0.0184
  n = 5

Two-sample t-test (one-tailed, H1: hypothesis > baseline):
  t-statistic: 2.2627
  p-value: 0.020712
  Significance: *

 Toroidal/Knot hypothesis is SUPPORTED (p < 0.05)

Cohen's d (effect size): 1.441
Interpretation: large effect

====================================================================================================

13. Comprehensive Visualizations¶

13.1 Consistency Analysis: Coefficient of Variation¶

In [ ]:

Copied!





if "combined_results" not in locals() or combined_results.empty:
    print("Error: 'combined_results' dataframe not found. Please run previous cells.")
else:
    shape_col = next((col for col in combined_results.columns if "shape" in col.lower()), None)

    if "score" not in combined_results.columns:
        valid_cols = [c for c in combined_results.columns if "scale_normalized" in c and "mean" in c]
        if valid_cols:
            combined_results["score"] = combined_results[valid_cols[0]]
        else:
            metric_cols = [c for c in combined_results.columns if c.startswith("mean_")]
            combined_results["score"] = combined_results[metric_cols].mean(axis=1)

    agg_all = combined_results.groupby(shape_col)["score"].agg(["mean", "std"]).reset_index()
    agg_all.columns = ["Config", "Mean_Score", "Std_Score"]
    agg_all["CV"] = (agg_all["Std_Score"] / agg_all["Mean_Score"]) * 100

    topo_keywords = ["Gerono", "Bernoulli", "Twisted", "Torus", "Trefoil", "UserProvided"]

    def get_type(name):
        """Return 'Standard', 'Figure-8', or 'Torus' from variant name."""
        is_topo = any(k in name for k in topo_keywords)

        if not is_topo:
            return "Standard"
        elif "Torus" in name or "Trefoil" in name:
            return "Torus"
        else:
            return "Figure-8"

    agg_all["Type"] = agg_all["Config"].apply(get_type)

    # --- PLOTTING ---
    fig, ax = plt.subplots(figsize=(20, 12))

    color_map = {"Standard": "steelblue", "Figure-8": "coral", "Torus": "mediumseagreen"}
    colors = [color_map[t] for t in agg_all["Type"]]
    sizes = 350

    scatter = ax.scatter(
        agg_all["Mean_Score"], agg_all["CV"], s=sizes, c=colors, alpha=0.7, edgecolors="black", linewidths=1.0
    )

    for idx, row in agg_all.iterrows():
        ax.annotate(
            row["Config"].replace("Shape", "").replace("UserProvided", ""),
            (row["Mean_Score"], row["CV"]),
            xytext=(5, 5),
            textcoords="offset points",
            fontsize=14,
            alpha=0.9,
        )

    median_score = agg_all["Mean_Score"].median()
    median_cv = agg_all["CV"].median()

    ax.axvline(x=median_score, color="gray", linestyle=":", linewidth=2, alpha=0.8)
    ax.axhline(y=median_cv, color="gray", linestyle=":", linewidth=2, alpha=0.8)

    ax.set_xlabel("Mean Score (Higher is Better)", fontsize=20, fontweight="bold")
    ax.set_ylabel("CV of Score across Folds [%] (lower = more stable)", fontsize=20, fontweight="bold")
    ax.set_title(
        "Stability Analysis: Mean Score vs. Coefficient of Variation across CV Folds",
        fontsize=18,
        fontweight="bold",
        pad=15,
    )
    ax.text(
        0.5,
        1,
        f"Metric used: {stress}",
        transform=ax.transAxes,
        ha="center",
        va="bottom",
        fontsize=12,
        fontweight="bold",
        color="gray",
    )
    ax.text(
        0.98,
        0.02,
        "IDEAL REGION\n(High Score, Stable)",
        transform=ax.transAxes,
        ha="right",
        va="bottom",
        color="green",
        alpha=0.3,
        fontsize=12,
        fontweight="bold",
        bbox=dict(facecolor="white", alpha=0.5, edgecolor="none"),
    )

    ax.tick_params(axis="both", labelsize=20)
    ax.grid(True, alpha=0.3, linestyle="--")

    legend_elements = [
        Line2D([0], [0], marker="o", color="w", markerfacecolor="steelblue", markersize=14, label="Standard Shapes"),
        Line2D([0], [0], marker="o", color="w", markerfacecolor="coral", markersize=14, label="Figure-8 Variants"),
        Line2D(
            [0],
            [0],
            marker="o",
            color="w",
            markerfacecolor="mediumseagreen",
            markersize=14,
            label="Torus/Knot Variants",
        ),
        Line2D(
            [0],
            [0],
            color="gray",
            linestyle=":",
            linewidth=2,
            label=f"Median (Score={median_score:.2f}, CV={median_cv:.1f}%)",
        ),
    ]
    ax.legend(handles=legend_elements, fontsize=16, loc="best", frameon=True, framealpha=0.9)

    plt.tight_layout()
    plt.show()
if "combined_results" not in locals() or combined_results.empty:
    print("Error: 'combined_results' dataframe not found. Please run previous cells.")
else:
    shape_col = next((col for col in combined_results.columns if "shape" in col.lower()), None)

    if "score" not in combined_results.columns:
        valid_cols = [c for c in combined_results.columns if "scale_normalized" in c and "mean" in c]
        if valid_cols:
            combined_results["score"] = combined_results[valid_cols[0]]
        else:
            metric_cols = [c for c in combined_results.columns if c.startswith("mean_")]
            combined_results["score"] = combined_results[metric_cols].mean(axis=1)

    agg_all = combined_results.groupby(shape_col)["score"].agg(["mean", "std"]).reset_index()
    agg_all.columns = ["Config", "Mean_Score", "Std_Score"]
    agg_all["CV"] = (agg_all["Std_Score"] / agg_all["Mean_Score"]) * 100

    topo_keywords = ["Gerono", "Bernoulli", "Twisted", "Torus", "Trefoil", "UserProvided"]

    def get_type(name):
        """Return 'Standard', 'Figure-8', or 'Torus' from variant name."""
        is_topo = any(k in name for k in topo_keywords)

        if not is_topo:
            return "Standard"
        elif "Torus" in name or "Trefoil" in name:
            return "Torus"
        else:
            return "Figure-8"

    agg_all["Type"] = agg_all["Config"].apply(get_type)

    # --- PLOTTING ---
    fig, ax = plt.subplots(figsize=(20, 12))

    color_map = {"Standard": "steelblue", "Figure-8": "coral", "Torus": "mediumseagreen"}
    colors = [color_map[t] for t in agg_all["Type"]]
    sizes = 350

    scatter = ax.scatter(
        agg_all["Mean_Score"], agg_all["CV"], s=sizes, c=colors, alpha=0.7, edgecolors="black", linewidths=1.0
    )

    for idx, row in agg_all.iterrows():
        ax.annotate(
            row["Config"].replace("Shape", "").replace("UserProvided", ""),
            (row["Mean_Score"], row["CV"]),
            xytext=(5, 5),
            textcoords="offset points",
            fontsize=14,
            alpha=0.9,
        )

    median_score = agg_all["Mean_Score"].median()
    median_cv = agg_all["CV"].median()

    ax.axvline(x=median_score, color="gray", linestyle=":", linewidth=2, alpha=0.8)
    ax.axhline(y=median_cv, color="gray", linestyle=":", linewidth=2, alpha=0.8)

    ax.set_xlabel("Mean Score (Higher is Better)", fontsize=20, fontweight="bold")
    ax.set_ylabel("CV of Score across Folds [%] (lower = more stable)", fontsize=20, fontweight="bold")
    ax.set_title(
        "Stability Analysis: Mean Score vs. Coefficient of Variation across CV Folds",
        fontsize=18,
        fontweight="bold",
        pad=15,
    )
    ax.text(
        0.5,
        1,
        f"Metric used: {stress}",
        transform=ax.transAxes,
        ha="center",
        va="bottom",
        fontsize=12,
        fontweight="bold",
        color="gray",
    )
    ax.text(
        0.98,
        0.02,
        "IDEAL REGION\n(High Score, Stable)",
        transform=ax.transAxes,
        ha="right",
        va="bottom",
        color="green",
        alpha=0.3,
        fontsize=12,
        fontweight="bold",
        bbox=dict(facecolor="white", alpha=0.5, edgecolor="none"),
    )

    ax.tick_params(axis="both", labelsize=20)
    ax.grid(True, alpha=0.3, linestyle="--")

    legend_elements = [
        Line2D([0], [0], marker="o", color="w", markerfacecolor="steelblue", markersize=14, label="Standard Shapes"),
        Line2D([0], [0], marker="o", color="w", markerfacecolor="coral", markersize=14, label="Figure-8 Variants"),
        Line2D(
            [0],
            [0],
            marker="o",
            color="w",
            markerfacecolor="mediumseagreen",
            markersize=14,
            label="Torus/Knot Variants",
        ),
        Line2D(
            [0],
            [0],
            color="gray",
            linestyle=":",
            linewidth=2,
            label=f"Median (Score={median_score:.2f}, CV={median_cv:.1f}%)",
        ),
    ]
    ax.legend(handles=legend_elements, fontsize=16, loc="best", frameon=True, framealpha=0.9)

    plt.tight_layout()
    plt.show()

13.2 Detailed Distribution: Violin Plots¶

In [ ]:

Copied!





if "combined_results" not in locals() or combined_results.empty:
    print("Error: 'combined_results' dataframe not found. Please run previous cells.")
else:
    shape_col = next((col for col in combined_results.columns if "shape" in col.lower()), None)

    if "score" not in combined_results.columns:
        valid_cols = [c for c in combined_results.columns if "scale_normalized" in c and "mean" in c]
        if valid_cols:
            combined_results["score"] = combined_results[valid_cols[0]]
        else:
            metric_cols = [c for c in combined_results.columns if c.startswith("mean_")]
            combined_results["score"] = combined_results[metric_cols].mean(axis=1)

    topo_keywords = ["Gerono", "Bernoulli", "Twisted", "Torus", "Trefoil", "UserProvided"]

    is_topo = combined_results[shape_col].apply(lambda x: any(k in x for k in topo_keywords))

    agg_standard = combined_results[~is_topo].groupby(shape_col)["score"].mean().sort_values(ascending=False)
    top_5_standard = agg_standard.head(5).index.tolist()

    agg_topo = combined_results[is_topo].groupby(shape_col)["score"].mean().sort_values(ascending=False)
    topo_variants = agg_topo.index.tolist()

    fig, ax = plt.subplots(figsize=(16, 8))

    positions = []
    labels = []
    pos = 0

    for shape in top_5_standard:
        data = combined_results[combined_results[shape_col] == shape]["score"].values
        parts = ax.violinplot([data], positions=[pos], widths=0.7, showmeans=True, showmedians=True)
        for pc in parts["bodies"]:
            pc.set_facecolor("steelblue")
            pc.set_alpha(0.6)
        for partname in ("cbars", "cmins", "cmaxes", "cmeans", "cmedians"):
            if partname in parts:
                parts[partname].set_edgecolor("black")
                parts[partname].set_linewidth(1)

        positions.append(pos)
        labels.append(shape.replace("Shape", ""))
        pos += 1

    if top_5_standard and topo_variants:
        pos += 0.5

    for variant in topo_variants:
        data = combined_results[combined_results[shape_col] == variant]["score"].values
        parts = ax.violinplot([data], positions=[pos], widths=0.7, showmeans=True, showmedians=True)

        is_torus = "Torus" in variant or "Trefoil" in variant
        variant_color = "mediumseagreen" if is_torus else "coral"

        for pc in parts["bodies"]:
            pc.set_facecolor(variant_color)
            pc.set_alpha(0.6)
        for partname in ("cbars", "cmins", "cmaxes", "cmeans", "cmedians"):
            if partname in parts:
                parts[partname].set_edgecolor("black")
                parts[partname].set_linewidth(1)

        positions.append(pos)
        labels.append(variant.replace("UserProvided", "").replace("Shape", ""))
        pos += 1

    ax.set_xticks(positions)
    ax.set_xticklabels(labels, rotation=45, ha="right", fontsize=10)
    ax.set_ylabel("Score (Higher is Better)", fontsize=12, fontweight="bold")
    ax.set_title(
        f"Score Distributions: Top 5 Standard vs. Topological Hypotheses for {stress}", fontsize=14, fontweight="bold"
    )
    ax.grid(axis="y", alpha=0.3, linestyle="--")

    legend_elements = [
        Patch(facecolor="steelblue", alpha=0.6, label="Standard Shapes"),
        Patch(facecolor="coral", alpha=0.6, label="Figure-8 Variants"),
        Patch(facecolor="mediumseagreen", alpha=0.6, label="Torus/Knot Variants"),
    ]
    ax.legend(handles=legend_elements, fontsize=11, loc="best")

    plt.tight_layout()
    plt.show()

    print("\nViolin plot interpretation:")
    print("  - Width: Distribution density (frequency of scores)")
    print("  - Horizontal lines: Min, Max, Mean, Median")
    print("  - Height: Range of scores across cross-validation folds")
if "combined_results" not in locals() or combined_results.empty:
    print("Error: 'combined_results' dataframe not found. Please run previous cells.")
else:
    shape_col = next((col for col in combined_results.columns if "shape" in col.lower()), None)

    if "score" not in combined_results.columns:
        valid_cols = [c for c in combined_results.columns if "scale_normalized" in c and "mean" in c]
        if valid_cols:
            combined_results["score"] = combined_results[valid_cols[0]]
        else:
            metric_cols = [c for c in combined_results.columns if c.startswith("mean_")]
            combined_results["score"] = combined_results[metric_cols].mean(axis=1)

    topo_keywords = ["Gerono", "Bernoulli", "Twisted", "Torus", "Trefoil", "UserProvided"]

    is_topo = combined_results[shape_col].apply(lambda x: any(k in x for k in topo_keywords))

    agg_standard = combined_results[~is_topo].groupby(shape_col)["score"].mean().sort_values(ascending=False)
    top_5_standard = agg_standard.head(5).index.tolist()

    agg_topo = combined_results[is_topo].groupby(shape_col)["score"].mean().sort_values(ascending=False)
    topo_variants = agg_topo.index.tolist()

    fig, ax = plt.subplots(figsize=(16, 8))

    positions = []
    labels = []
    pos = 0

    for shape in top_5_standard:
        data = combined_results[combined_results[shape_col] == shape]["score"].values
        parts = ax.violinplot([data], positions=[pos], widths=0.7, showmeans=True, showmedians=True)
        for pc in parts["bodies"]:
            pc.set_facecolor("steelblue")
            pc.set_alpha(0.6)
        for partname in ("cbars", "cmins", "cmaxes", "cmeans", "cmedians"):
            if partname in parts:
                parts[partname].set_edgecolor("black")
                parts[partname].set_linewidth(1)

        positions.append(pos)
        labels.append(shape.replace("Shape", ""))
        pos += 1

    if top_5_standard and topo_variants:
        pos += 0.5

    for variant in topo_variants:
        data = combined_results[combined_results[shape_col] == variant]["score"].values
        parts = ax.violinplot([data], positions=[pos], widths=0.7, showmeans=True, showmedians=True)

        is_torus = "Torus" in variant or "Trefoil" in variant
        variant_color = "mediumseagreen" if is_torus else "coral"

        for pc in parts["bodies"]:
            pc.set_facecolor(variant_color)
            pc.set_alpha(0.6)
        for partname in ("cbars", "cmins", "cmaxes", "cmeans", "cmedians"):
            if partname in parts:
                parts[partname].set_edgecolor("black")
                parts[partname].set_linewidth(1)

        positions.append(pos)
        labels.append(variant.replace("UserProvided", "").replace("Shape", ""))
        pos += 1

    ax.set_xticks(positions)
    ax.set_xticklabels(labels, rotation=45, ha="right", fontsize=10)
    ax.set_ylabel("Score (Higher is Better)", fontsize=12, fontweight="bold")
    ax.set_title(
        f"Score Distributions: Top 5 Standard vs. Topological Hypotheses for {stress}", fontsize=14, fontweight="bold"
    )
    ax.grid(axis="y", alpha=0.3, linestyle="--")

    legend_elements = [
        Patch(facecolor="steelblue", alpha=0.6, label="Standard Shapes"),
        Patch(facecolor="coral", alpha=0.6, label="Figure-8 Variants"),
        Patch(facecolor="mediumseagreen", alpha=0.6, label="Torus/Knot Variants"),
    ]
    ax.legend(handles=legend_elements, fontsize=11, loc="best")

    plt.tight_layout()
    plt.show()

    print("\nViolin plot interpretation:")
    print("  - Width: Distribution density (frequency of scores)")
    print("  - Horizontal lines: Min, Max, Mean, Median")
    print("  - Height: Range of scores across cross-validation folds")

Violin plot interpretation:
  - Width: Distribution density (frequency of scores)
  - Horizontal lines: Min, Max, Mean, Median
  - Height: Range of scores across cross-validation folds

14. Final Summary and Conclusions¶

Experimental Summary¶

In [28]:

Copied!





print("=" * 100)
print("EXPERIMENTAL SUMMARY")
print("=" * 100)

if "combined_results" not in locals() or combined_results.empty:
    print("Error: 'combined_results' dataframe not found. Please run previous cells.")
else:
    shape_col = next((col for col in combined_results.columns if "shape" in col.lower()), None)

    if "score" not in combined_results.columns:
        valid_cols = [c for c in combined_results.columns if "scale_normalized" in c and "mean" in c]
        if valid_cols:
            combined_results["score"] = combined_results[valid_cols[0]]
        else:
            metric_cols = [c for c in combined_results.columns if c.startswith("mean_")]
            combined_results["score"] = combined_results[metric_cols].mean(axis=1)

    topo_keywords = ["Gerono", "Bernoulli", "Twisted", "Torus", "Trefoil", "UserProvided"]
    is_topo = combined_results[shape_col].apply(lambda x: any(k in x for k in topo_keywords))

    agg_standard = (
        combined_results[~is_topo].groupby(shape_col)["score"].agg(["mean", "std"]).sort_values("mean", ascending=False)
    )

    agg_topo = (
        combined_results[is_topo].groupby(shape_col)["score"].agg(["mean", "std"]).sort_values("mean", ascending=False)
    )

    # --- PRINT OUTPUT ---

    print("\n1. BASELINE SHAPES (Top 5):")
    print("-" * 100)
    if not agg_standard.empty:
        for idx, (name, row) in enumerate(agg_standard.head(5).iterrows()):
            clean_name = name.replace("Shape", "")
            print(f"   {idx + 1}. {clean_name:25s} Score: {row['mean']:.4f} ± {row['std']:.4f}")
    else:
        print("   (No baseline shapes found)")

    print("\n2. TOPOLOGICAL HYPOTHESES:")
    print("-" * 100)
    if not agg_topo.empty:
        for idx, (name, row) in enumerate(agg_topo.iterrows()):
            clean_name = name.replace("UserProvided", "").replace("Shape", "")
            hyp_type = "Torus" if ("Torus" in name or "Trefoil" in name) else "Figure-8"
            print(f"   {idx + 1}. {clean_name:25s} [{hyp_type:10s}] Score: {row['mean']:.4f} ± {row['std']:.4f}")
    else:
        print("   (No topological hypotheses found)")

    if not agg_standard.empty and not agg_topo.empty:
        best_base_name = agg_standard.index[0]
        best_base_score = agg_standard.iloc[0]["mean"]
        best_base_std = agg_standard.iloc[0]["std"]

        best_topo_name = agg_topo.index[0]
        best_topo_score = agg_topo.iloc[0]["mean"]
        best_topo_std = agg_topo.iloc[0]["std"]

        hyp_type = "Torus" if ("Torus" in best_topo_name or "Trefoil" in best_topo_name) else "Figure-8"

        score_diff = best_topo_score - best_base_score
        improvement_pct = (score_diff / best_base_score) * 100 if best_base_score != 0 else 0

        print("\n3. BEST COMPARISON:")
        print("-" * 100)
        print(f"   Baseline:     {best_base_name.replace('Shape', ''):25s} {best_base_score:.4f} ± {best_base_std:.4f}")
        topo_label = best_topo_name.replace("UserProvided", "").replace("Shape", "")
        print(f"   Hypothesis:   {topo_label:25s} [{hyp_type}] {best_topo_score:.4f} ± {best_topo_std:.4f}")
        print(f"   Difference:   {score_diff:+.4f} ({improvement_pct:+.1f}%)")

        if "p_value" in locals():
            print(f"   t-test:       p = {p_value:.6f}")
            if p_value < 0.05:
                print(f"   Result:       ✓ {hyp_type} hypothesis SUPPORTED (p < 0.05)")
            else:
                print(f"   Result:       ✗ {hyp_type} hypothesis NOT supported (p >= 0.05)")
        else:
            print("   (t-test statistics not available in local scope)")

    print("\n4. METHODOLOGY:")
    print("-" * 100)
    layer_info = GPT2_LAYER if "GPT2_LAYER" in locals() else "?"
    n_seeds = len(datasets) if "datasets" in locals() else "?"
    n_folds_info = N_FOLDS if "N_FOLDS" in locals() else "?"

    print(f"   Model:           GPT-2 Small, Layer {layer_info}/12")
    print(f"   Datasets:        {n_seeds} seeds")
    print(f"   Cross-validation: {n_folds_info}-fold")
    if "agg_standard" in locals():
        print(f"   Baseline shapes: {len(agg_standard)} variants")
    if "agg_topo" in locals():
        print(f"   Hypotheses:      {len(agg_topo)} variants")

print("\n" + "=" * 100)
print("=" * 100)
print("EXPERIMENTAL SUMMARY")
print("=" * 100)

if "combined_results" not in locals() or combined_results.empty:
    print("Error: 'combined_results' dataframe not found. Please run previous cells.")
else:
    shape_col = next((col for col in combined_results.columns if "shape" in col.lower()), None)

    if "score" not in combined_results.columns:
        valid_cols = [c for c in combined_results.columns if "scale_normalized" in c and "mean" in c]
        if valid_cols:
            combined_results["score"] = combined_results[valid_cols[0]]
        else:
            metric_cols = [c for c in combined_results.columns if c.startswith("mean_")]
            combined_results["score"] = combined_results[metric_cols].mean(axis=1)

    topo_keywords = ["Gerono", "Bernoulli", "Twisted", "Torus", "Trefoil", "UserProvided"]
    is_topo = combined_results[shape_col].apply(lambda x: any(k in x for k in topo_keywords))

    agg_standard = (
        combined_results[~is_topo].groupby(shape_col)["score"].agg(["mean", "std"]).sort_values("mean", ascending=False)
    )

    agg_topo = (
        combined_results[is_topo].groupby(shape_col)["score"].agg(["mean", "std"]).sort_values("mean", ascending=False)
    )

    # --- PRINT OUTPUT ---

    print("\n1. BASELINE SHAPES (Top 5):")
    print("-" * 100)
    if not agg_standard.empty:
        for idx, (name, row) in enumerate(agg_standard.head(5).iterrows()):
            clean_name = name.replace("Shape", "")
            print(f"   {idx + 1}. {clean_name:25s} Score: {row['mean']:.4f} ± {row['std']:.4f}")
    else:
        print("   (No baseline shapes found)")

    print("\n2. TOPOLOGICAL HYPOTHESES:")
    print("-" * 100)
    if not agg_topo.empty:
        for idx, (name, row) in enumerate(agg_topo.iterrows()):
            clean_name = name.replace("UserProvided", "").replace("Shape", "")
            hyp_type = "Torus" if ("Torus" in name or "Trefoil" in name) else "Figure-8"
            print(f"   {idx + 1}. {clean_name:25s} [{hyp_type:10s}] Score: {row['mean']:.4f} ± {row['std']:.4f}")
    else:
        print("   (No topological hypotheses found)")

    if not agg_standard.empty and not agg_topo.empty:
        best_base_name = agg_standard.index[0]
        best_base_score = agg_standard.iloc[0]["mean"]
        best_base_std = agg_standard.iloc[0]["std"]

        best_topo_name = agg_topo.index[0]
        best_topo_score = agg_topo.iloc[0]["mean"]
        best_topo_std = agg_topo.iloc[0]["std"]

        hyp_type = "Torus" if ("Torus" in best_topo_name or "Trefoil" in best_topo_name) else "Figure-8"

        score_diff = best_topo_score - best_base_score
        improvement_pct = (score_diff / best_base_score) * 100 if best_base_score != 0 else 0

        print("\n3. BEST COMPARISON:")
        print("-" * 100)
        print(f"   Baseline:     {best_base_name.replace('Shape', ''):25s} {best_base_score:.4f} ± {best_base_std:.4f}")
        topo_label = best_topo_name.replace("UserProvided", "").replace("Shape", "")
        print(f"   Hypothesis:   {topo_label:25s} [{hyp_type}] {best_topo_score:.4f} ± {best_topo_std:.4f}")
        print(f"   Difference:   {score_diff:+.4f} ({improvement_pct:+.1f}%)")

        if "p_value" in locals():
            print(f"   t-test:       p = {p_value:.6f}")
            if p_value < 0.05:
                print(f"   Result:       ✓ {hyp_type} hypothesis SUPPORTED (p < 0.05)")
            else:
                print(f"   Result:       ✗ {hyp_type} hypothesis NOT supported (p >= 0.05)")
        else:
            print("   (t-test statistics not available in local scope)")

    print("\n4. METHODOLOGY:")
    print("-" * 100)
    layer_info = GPT2_LAYER if "GPT2_LAYER" in locals() else "?"
    n_seeds = len(datasets) if "datasets" in locals() else "?"
    n_folds_info = N_FOLDS if "N_FOLDS" in locals() else "?"

    print(f"   Model:           GPT-2 Small, Layer {layer_info}/12")
    print(f"   Datasets:        {n_seeds} seeds")
    print(f"   Cross-validation: {n_folds_info}-fold")
    if "agg_standard" in locals():
        print(f"   Baseline shapes: {len(agg_standard)} variants")
    if "agg_topo" in locals():
        print(f"   Hypotheses:      {len(agg_topo)} variants")

print("\n" + "=" * 100)

====================================================================================================
EXPERIMENTAL SUMMARY
====================================================================================================

1. BASELINE SHAPES (Top 5):
----------------------------------------------------------------------------------------------------
   1. Spiral                    Score: 0.4951 ± 0.0988
   2. Cluster                   Score: 0.4722 ± 0.0122
   3. Circular                  Score: 0.4524 ± 0.0211
   4. Semicircular              Score: 0.4255 ± 0.0326
   5. DiscreteCircular          Score: 0.4184 ± 0.0220

2. TOPOLOGICAL HYPOTHESES:
----------------------------------------------------------------------------------------------------
   1. Trefoil_Knot_3D           [Torus     ] Score: 0.5980 ± 0.0206
   2. Trefoil_Knot_2D           [Torus     ] Score: 0.5724 ± 0.0219
   3. Twisted                   [Figure-8  ] Score: 0.4840 ± 0.0196
   4. Gerono                    [Figure-8  ] Score: 0.4592 ± 0.0321
   5. Torus_Path                [Torus     ] Score: 0.4589 ± 0.0185
   6. Bernoulli                 [Figure-8  ] Score: 0.4013 ± 0.0261

3. BEST COMPARISON:
----------------------------------------------------------------------------------------------------
   Baseline:     Spiral                    0.4951 ± 0.0988
   Hypothesis:   Trefoil_Knot_3D           [Torus] 0.5980 ± 0.0206
   Difference:   +0.1028 (+20.8%)
   t-test:       p = 0.020712
   Result:       ✓ Torus hypothesis SUPPORTED (p < 0.05)

4. METHODOLOGY:
----------------------------------------------------------------------------------------------------
   Model:           GPT-2 Small, Layer 6/12
   Datasets:        5 seeds
   Cross-validation: 5-fold
   Baseline shapes: 8 variants
   Hypotheses:      6 variants

====================================================================================================