ツール
OmniParser V2
10月 2024
OmniParser is an advanced vision-based screen parsing module that converts user interface (UI) screenshots into structured elements, allowing agents to execute actions across various applications using visual data . By harnessing large vision-language model capabilities, OmniParser improves both efficiency and…
Eureka ML Insights
9月 2024
This repository contains the code for the Eureka ML Insights, a framework for standardizing evaluations of large foundation models, beyond single-score reporting and rankings. The framework is designed to help researchers and practitioners run reproducible evaluations of generative models using a…
Trace
7月 2024
Trace is a new AutoDiff-like tool for training AI systems end-to-end with general feedback (like numerical rewards or losses, natural language text, compiler errors, etc.). Trace generalizes the back-propagation algorithm by capturing and propagating an AI system’s execution trace. Trace…
Phi-3
4月 2024
The Phi-3-Mini-128K-Instruct is a 3.8B parameters, lightweight, state-of-the-art open model trained with the Phi-3 datasets that includes both synthetic data and the filtered publicly available websites data with a focus on high-quality and reasoning dense properties. The model belongs to…
KITAB Dataset
2月 2024
🕮 KITAB is a challenging dataset and a dynamic data collection approach for testing abilities of Large Language Models (LLMs) in answering information retrieval queries with constraint filters. A filtering query with constraints can be of the form “List all books…
HoloAssist
1月 2024
A large-scale egocentric human interaction dataset, where two people collaboratively complete physical manipulation tasks.
Orca-2-13B
1月 2024
Orca 2 is a finetuned version of LLAMA-2. It is built for research purposes only and provides a single turn response in tasks such as reasoning over user given data, reading comprehension, math problem solving and text summarization. The model…
Orca-2-7B
1月 2024
Orca 2 is a finetuned version of LLAMA-2. It is built for research purposes only and provides a single turn response in tasks such as reasoning over user given data, reading comprehension, math problem solving and text summarization. The model…
LLF-Bench
1月 2024
LLF Bench is a benchmark for evaluating learning agents that provides a diverse collection of interactive learning problems where the agent gets language feedback instead of rewards (as in RL) or action feedback (as in imitation learning).