• Home
  • Dataset Description
  • ‣ Download
  • ‣ Experimental Setup
  • ‣ Experimental Protocol
  • ‣ Data Description
  • ‣ Known Issues
  • License
  • ‣ License Terms
  • ‣ Participant Consent
Puzzle-STAMPS Puzzle-STAMPS
  • Home
  • Dataset Description
    Download Experimental Setup Experimental Protocol Data Description Known Issues
  • License
    License Terms Participant Consent

Dataset Description

Puzzle-STAMPS — A Multimodal Motion–Physiology–Speech Dataset

Download Experimental Setup Experimental Protocol Data Description Known Issues

Data Description

Participant Demographics & Statistics

The released dataset includes 143 participants (including two non-binary individuals) organized into 35 teams of four and one team of three. We balanced the teams by gender (with no more than two members of the same gender per team) and ensured that members did not know each other. The figure below depicts the distribution of participants’ age (M = 26.64 years, SD = 4.56 years), educational background (53.15% reported a bachelor’s degree as their highest qualification, 33.57% a master’s degree, and 4.20% a doctorate), and ethnicity (53.14% identified as Asian, 31.47% as White, 6.29% as Hispanic, 5.60% as Middle Eastern, 2.80% as Mixed, and 0.70% as Black). Team assignments were mainly dictated by scheduling and demographic constraints.

Participant demographics, per-team puzzle progression, and leadership-related scores.
(Left) Distribution of participants’ age, educational background, and ethnicity, with gender indicated by color; shape represents ethnicity. (Center) Per-team puzzle progression as recorded by the timer system; each colored block corresponds to one of the eight timed segments (denoted S1–S8), and bar length reflects elapsed time. (Right) Self-reported motivation to lead score versus peer-rated leadership identity and CATME-BARS (Comprehensive Assessment of Team Member Effectiveness Behaviorally Anchored Rating Scale) scores for each participant.

Released Files & Temporal Alignment

All recordings are temporally aligned to a common reference point corresponding to the onset of the first puzzle (time = 0). The released dataset contains 75 h and 47 min of puzzle-solving activity and 408 h and 7 min of cumulative physiological data, since physiological recording started before and ended after the main task.

The following files are released per session:

File / Stream Format & Contents
Audio recordings One audio file per participant, cropped to the collaborative phase (first to last puzzle)
Speech transcripts Automatic participant-specific SRT transcripts (plus manually corrected transcripts for some teams). Audio was segmented using Silero VAD (0.4 threshold, 800 ms silence) and transcribed using NVIDIA’s Canary-Qwen-2.5B to preserve natural speech disfluencies.
Video recordings Five exocentric camera streams, cropped to the collaborative phase.
Physiological Garment (ECG, respiration, SpO2, temperature) CSV files with timestamps in seconds. One file per participant per modality.
IMU signals CSV file per participant, timestamps in seconds.
Pozyx / RTLS CSV file per session with timestamps in seconds and 2D positions, along with additional per-tag inertial data: orientation (pitch, roll, yaw), acceleration (x, y, z), magnetic field (x, y, z), angular velocity (x, y, z), quaternion (x, y, z, w), linear acceleration (x, y, z), and gravity (x, y, z).
OptiTrack motion capture - CSV file per session with head pose and orientation, time-origin adjusted to align with the experimental reference (time = 0).
- CSV files containing OptiTrack coordinates of the puzzle box, toolbox, and screen, derived from markers temporarily attached to these objects before each session.
Timer log (SRT) Per-session SRT timer file recording when each puzzle segment was started, solved, and/or failed, and when hints were issued by the timer system. Timestamps are in seconds relative to time = 0.
Participant questionnaire file One global Excel file consolidating each participant’s responses across all administered instruments, from demographic information and pre-task self-reports to post-task ratings. All participant data are pseudonymized.

Together, these synchronized files enable analyses at multiple temporal resolutions, from fine-grained multimodal dynamics to puzzle-level performance and phase transitions.

Please read the README file on Edmond for more details about the structure of the dataset and the released files.

Release Scope & Train/Held-Out Split

To avoid interaction leakage across splits, all data partitioning was performed at the team level. The dataset is split as follows:

  • Public release (training set): 36 teams.
  • Held-out set: 10 teams, reserved for future benchmark evaluations and community challenges.

The composition of the held-out set was guided by both practical and scientific constraints:

  • One of the two three-person teams was reserved to preserve representation of this rarer team configuration.
  • Two teams whose members did not consent to the public sharing of raw audio and video were excluded from the public release.
  • Remaining held-out teams were selected to maintain a subpopulation distribution that closely mirrors the released set with respect to ethnicity, age, and education level.
  • Known data quality issues were deliberately distributed across both the released and held-out portions, so that the public training set is not artificially cleaner than the reserved evaluation data.

← Experimental Protocol     Known Issues →

© 2026 Max-Planck-Gesellschaft - Imprint - Privacy Policy - License
© 2026 Max-Planck-Gesellschaft