Open-source evaluation toolkit of large vision-language models (LVLMs), ...
A Simple Math and Pseudo C# Expression Evaluator in One C# File. Can als...
TCExam is a CBA (Computer-Based Assessment) system (e-exam, CBT - Comput...
A collection of datasets that pair questions with SQL queries.
recommender system library for the CLR (.NET)
Benchmarking long-form factuality in large language models. Original cod...
Case Recommender: A Flexible and Extensible Python Framework for Recomme...
LightEval is a lightweight LLM evaluation suite that Hugging Face has be...
Behavioral "black-box" testing for recommender systems
Resource, Evaluation and Detection Papers for ChatGPT
C# Eval Expression | Evaluate, Compile, and Execute C# code and expressi...
Simple Safe Sandboxed Extensible Expression Evaluator for Python
面向中文大模型价值观的评估与对齐研究
ERRor ANnotation Toolkit: Automatically extract and classify grammatical...
Python Single Object Tracking Evaluation