Toolkit | NAACL 2025 Demo | TrustEval is a modular and extensible toolkit for comprehensive trust evaluation of generative foundation models (GenFMs). This toolkit enables you to evaluate models across various dimensions such as safety, fairness, robustness, privacy, and more.
Toolkit | NeurIPS 2025 | Revolutionizing chemical research through intelligent task orchestration, automated workflows, and AI-powered insights. Transform your expected chemical task into high-quality instruction-response pairs.
Toolkit | SDE-Harness (Scientific Discovery Evaluation) is a comprehensive, extensible framework designed to accelerate AI-powered scientific discovery.
Dashboard | ValueLence is the first unified platform for dynamic, fine-grained value probing of LLMs. It offers a seamless, end-to-end workflow for value curation, diverse probe generation, scalable response collection, and rigorous multi-dimensional evaluation and visualization.