Back to hubresearch-lab / multimodal-evals-v1

dataset

research-lab/multimodal-evals-v1A richer repo page backed by a normalized hub store so the frontend can swap from seeded data to indexer output without changing page components.

9.7K downloads188 likesUpdated 4 days agoLicense cc-by-4.0public

datasetevalsagentictoolsmultimodal

Research LabVerified publisher

1 active flag

Model cardFiles and versionsUseDiscussionsPreviewFunding

multimodal-evals-v1

Agentic benchmark shards with trajectory transcripts, tool traces, and evaluator labels.

Highlights - hot storage with 3 replicas - manifest CID bafydatasetmanifest - funded runway 42 days - moderation flags 1

Intended use Local MVP demo repo rendered from the normalized indexer store shape.

Funding

Runway and storage

Funded throughMay 1, 2026

Monthly burn$38.60 / mo

Storage classhot

Replicas3

Manifest1.3.0

Runway confidence47%

Use in code

Example snippets

Transformers, bash, and local integration examples

Fetch via APIbash

curl http://localhost:3000/api/repos?namespace=research-lab

Read from adapter layertypescript

import { getRepo } from "@/app/data";

const repo = getRepo("research-lab", "multimodal-evals-v1");
console.log(repo?.manifest.artifacts);

Files and versions

Repository files

3 files in current manifest

README.md

public | model_card | CID bafydatasetreadme

Open

9 KB | 4 days ago

Download

data/train-00001.parquet

public | dataset_shard | CID bafytrainparquet

Open

594 MB | 4 days ago

Download

data/validation-00001.parquet

public | dataset_shard | CID bafyvalidationparquet

Open

126 MB | 4 days ago

Download

Preview

Dataset splits

Public sample only

Split	Samples	Format	Notes
train	842,000	parquet	tool traces + labels
validation	55,000	parquet	balanced benchmark slice
preview	250	jsonl	safe public sample

Community

Discussions

Manifest sync and mirror health

research-lab · 4 replies · Updated 4 days ago

Tracking whether the latest manifest CID, billing runway, and moderation state stay aligned in local dev.

watching

Should this repo stay visible under current filters?

policy-review · 7 replies · Updated 1 day ago

Frontend filters are driven by normalized flag records rather than hard protocol takedowns.

open