A unified evaluation framework for large language models
-
Updated
Apr 29, 2025 - Python
A unified evaluation framework for large language models
Corruption and Perturbation Robustness (ICLR 2019)
Benchmarking Generalized Out-of-Distribution Detection
PyBullet CartPole and Quadrotor environments—with CasADi symbolic a priori dynamics—for learning-based control and RL
A Harder ImageNet Test Set (CVPR 2021)
Code and information for face image quality assessment with SER-FIQ
Diffusion Classifier leverages pretrained diffusion models to perform zero-shot classification without additional training
Benchmark your model on out-of-distribution datasets with carefully collected human comparison data (NeurIPS 2021 Oral)
Tensorflow implementation of "Compounding the Performance Improvements of Assembled Techniques in a Convolutional Neural Network"
auto_LiRPA: An Automatic Linear Relaxation based Perturbation Analysis Library for Neural Networks and General Computational Graphs
alpha-beta-CROWN: An Efficient, Scalable and GPU Accelerated Neural Network Verifier (winner of VNN-COMP 2021, 2022, 2023, and 2024)
[NeurIPS 2023] RoboDepth: Robust Out-of-Distribution Depth Estimation under Corruptions
Self-Supervised Learning for OOD Detection (NeurIPS 2019)
ImageNet-R(endition) and DeepAugment (ICCV 2021)
Extend python lists operations using .NET's LINQ syntax for clean and fast coding.
A preliminary evaluation of ChatGPT/GPT-4 for machine translation.
Code & data accompanying the NeurIPS 2020 paper "Iterative Deep Graph Learning for Graph Neural Networks: Better and Robust Node Embeddings".
A repository and benchmark for online test-time adaptation.
Repo for "Benchmarking Robustness of 3D Point Cloud Recognition against Common Corruptions" https://arxiv.org/abs/2201.12296
Fiddler Auditor is a tool to evaluate language models.
Add a description, image, and links to the robustness topic page so that developers can more easily learn about it.
To associate your repository with the robustness topic, visit your repo's landing page and select "manage topics."