BahaaGalal 's Collections LLM for Coding
updated
Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large
Language Models in Code Generation from Scientific Plots
Paper
• 2405.07990
• Published
• 20
Large Language Models as Planning Domain Generators
Paper
• 2405.06650
• Published
• 13
AutoCrawler: A Progressive Understanding Web Agent for Web Crawler
Generation
Paper
• 2404.12753
• Published
• 43
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real
Computer Environments
Paper
• 2404.07972
• Published
• 51
LLoCO: Learning Long Contexts Offline
Paper
• 2404.07979
• Published
• 22
CodecLM: Aligning Language Models with Tailored Synthetic Data
Paper
• 2404.05875
• Published
• 18
Elephants Never Forget: Memorization and Learning of Tabular Data in
Large Language Models
Paper
• 2404.06209
• Published
• 5
Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs
Paper
• 2404.05719
• Published
• 83
CantTalkAboutThis: Aligning Language Models to Stay on Topic in
Dialogues
Paper
• 2404.03820
• Published
• 25
CodeEditorBench: Evaluating Code Editing Capability of Large Language
Models
Paper
• 2404.03543
• Published
• 18
Language Models as Compilers: Simulating Pseudocode Execution Improves
Algorithmic Reasoning in Language Models
Paper
• 2404.02575
• Published
• 50
RAFT: Adapting Language Model to Domain Specific RAG
Paper
• 2403.10131
• Published
• 72
Quiet-STaR: Language Models Can Teach Themselves to Think Before
Speaking
Paper
• 2403.09629
• Published
• 79
Design2Code: How Far Are We From Automating Front-End Engineering?
Paper
• 2403.03163
• Published
• 98
StarCoder 2 and The Stack v2: The Next Generation
Paper
• 2402.19173
• Published
• 152
StructLM: Towards Building Generalist Models for Structured Knowledge
Grounding
Paper
• 2402.16671
• Published
• 27
API-BLEND: A Comprehensive Corpora for Training and Benchmarking API
LLMs
Paper
• 2402.15491
• Published
• 15
OpenCodeInterpreter: Integrating Code Generation with Execution and
Refinement
Paper
• 2402.14658
• Published
• 83
Copilot Evaluation Harness: Evaluating LLM-Guided Software Programming
Paper
• 2402.14261
• Published
• 10
TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue
Summarization
Paper
• 2402.13249
• Published
• 15
Chain-of-Thought Reasoning Without Prompting
Paper
• 2402.10200
• Published
• 109
A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts
Paper
• 2402.09727
• Published
• 38
MPIrigen: MPI Code Generation through Domain-Specific Language Models
Paper
• 2402.09126
• Published
• 14
Multi-line AI-assisted Code Authoring
Paper
• 2402.04141
• Published
• 10
StepCoder: Improve Code Generation with Reinforcement Learning from
Compiler Feedback
Paper
• 2402.01391
• Published
• 43
ReGAL: Refactoring Programs to Discover Generalizable Abstractions
Paper
• 2401.16467
• Published
• 10
CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution
Paper
• 2401.03065
• Published
• 11