Introducing Benchmate

At Scientist, the pharma industry’s leading research platform, we have a deep interest in data science and a long history of applying machine learning to scientific research — from building predictive models for aiding in supplier selection, which we’ve been doing since 2019, to constructing models that estimate the turnaround time of individual research projects more recently.
Given our history and keen interest in the intersection of machine learning and research in the life sciences, we knew that we were going to leverage Large Language Models to their fullest. To that end, about a year ago, in fact almost a year ago to the day, I set out with a team of developers to build an LLM component to the Scientist.com platform. We knew we wanted it to have access to online models such as GPT-4 and Claude, but we also wanted to build and host our own models. We wanted to make it available to users on Scientist.com, but we also wanted to use it internally from wherever we were, including directly via Slack. We knew we wanted it to generate text, provide answers and offer insights, but we also wanted to use it to extract information from unstructured text. Finally, and most importantly, we wanted to integrate it directly into the company’s main product, the Scientist.com marketplace.
The result of our work is Benchmate.ai — our platform for LLM apps. By abstracting different aspects of running a successful LLM, we’ve built a tool that allows our Language Models Team to iterate quickly, ground our models in reality and deploy tailored solutions for incredibly varied use cases. Benchmate’s core abstractions include:
- Language Models that abstract the underlying models into a standard interface, allowing us to swap OpenAI, Anthropic and locally running models with ease.
- Data Sources that allow for dynamically adding custom data to model and help ground it in reality and reduce hallucinations.
- Tool Sets that allow the model to access real time information and provide more dynamic responses.
- Chat Bots that bring the Language Model, Data Sources and Tool Sets together with a prompt and some other niceties to give the user or API something to interact with.
- Testing Suites that allow us to quantitatively determine how effective and how safe our bots are and track those metrics over time, including end user feedback.
- Extraction Pipelines that present a new way of using LLMs, allowing us to use the LLM to ask questions of documents and extract structured data out of unstructured text.
Essentially, I’ve been dedicating a great deal of my time lately, both personally and professionally, to Benchmate, and my plan is to share my findings and expand on many of the abstractions above in a series of blog posts, which will be available on the Scientist.com blog.
One of the first features we built using this platform was our Auto Negotiate feature. We’re excited about Auto Negotiate and it’s already resulted in meaningful savings, but it’s really just a proof of concept for our Benchmate platform and we’re even more excited about the next set of features this foundation will allow us to build.