Back to hubresearch-lab / multimodal-evals-v1
DS

research-lab/multimodal-evals-v1A richer repo page backed by a normalized hub store so the frontend can swap from seeded data to indexer output without changing page components.

9.7K downloads188 likesUpdated 4 days agoLicense cc-by-4.0public
datasetevalsagentictoolsmultimodal
Model cardFiles and versionsUseDiscussionsPreviewFunding

multimodal-evals-v1

Agentic benchmark shards with trajectory transcripts, tool traces, and evaluator labels.

Highlights - hot storage with 3 replicas - manifest CID bafydatasetmanifest - funded runway 42 days - moderation flags 1

Intended use Local MVP demo repo rendered from the normalized indexer store shape.

Runway and storage

Funded throughMay 1, 2026
Monthly burn$38.60 / mo
Storage classhot
Replicas3
Manifest1.3.0
Runway confidence47%

Example snippets

Transformers, bash, and local integration examples
Fetch via APIbash
curl http://localhost:3000/api/repos?namespace=research-lab
Read from adapter layertypescript
import { getRepo } from "@/app/data";

const repo = getRepo("research-lab", "multimodal-evals-v1");
console.log(repo?.manifest.artifacts);

Repository files

3 files in current manifest
M
README.md
public | model_card | CID bafydatasetreadme
Open
9 KB | 4 days ago
Download
D
data/train-00001.parquet
public | dataset_shard | CID bafytrainparquet
Open
594 MB | 4 days ago
Download
D
data/validation-00001.parquet
public | dataset_shard | CID bafyvalidationparquet
Open
126 MB | 4 days ago
Download

Dataset splits

Public sample only
SplitSamplesFormatNotes
train842,000parquettool traces + labels
validation55,000parquetbalanced benchmark slice
preview250jsonlsafe public sample

Discussions

Manifest sync and mirror health
research-lab · 4 replies · Updated 4 days ago

Tracking whether the latest manifest CID, billing runway, and moderation state stay aligned in local dev.

watching
Should this repo stay visible under current filters?
policy-review · 7 replies · Updated 1 day ago

Frontend filters are driven by normalized flag records rather than hard protocol takedowns.

open