Understand and debug your LLM chains
Our new LLM debugging tool, with Trace Timeline and Trace Table, is a natural extension of our scalable experiment tracking, designed to support ML practitioners working on prompt engineering for LLMs. Easily review past results, identify and debug errors, gather insights about model behavior, and share learnings with colleagues. Visualize and drill down into every component and activity throughout the trace of your program.
Drill into your model architecture
Practitioners doing prompt engineering need to understand how chain components are set up as part of their iteration and analysis. Model Architecture provides a detailed description of all the settings, tools, agents and prompt details within the topology of a particular chain.
Run OpenAI evaluations with W&B Launch
Use W&B Launch to easily run any evaluation from OpenAI Evals - a fast-growing repository of dozens of evaluation suites for LLM evaluation - with just one click. Launch packages up everything you need to run the evaluation report, logs the evaluation in W&B Tables for easy visualization and analysis, and generates a Report for seamless collaboration. Use the one-line OpenAI integration to log OpenAI model inputs and outputs.
Visualize and analyze text data with W&B Tables
To better support prompt engineering practitioners working with text data, we’ve made several improvements to how we display text in Tables. Users can now visualize Markdown, as well as display the diff between 2 strings, to better understand the impact of changes to their LLM prompts. Long-text fields also now include tooltips and string previews.
W&B is trusted by the teams building state-of-the-art LLMs
"The challenge with GCP is you're trying to parse terminal output. What I really like about Prompts is that when I get an error, I can see which step in the chain broke and why. Trying to get this out [otherwise] is such a pain."
"We use W&B for pretty much all of our model training."