Jupyter Agents: Can Small LLMs Handle Data Science?

⚙️ Small LLMs like Phi-1.5 (1.3B parameters) now do better than ChatGPT-3.5 on data science notebook tasks when set up correctly.
📊 Jupyter Agents handle full-stack analytics—from bringing in CSVs to making charts and summaries—with very little user input.
🤖 Made-up datasets like DABStep help with steady, error-free training. This is key for automating structured tasks.
🧠 Organized settings like Jupyter let small LLMs show better step-by-step thinking than they do in open talks.
🔁 Agents that adapt to their field can get better over time. They learn from what users say and from tasks that happen often in business.

Jupyter Agents + Small LLMs

Jupyter Agents are a new type of AI tool made to work directly in Jupyter Notebooks. They automate common data science tasks, from getting data to making charts and reports. What's new is that small LLMs (under about 1 billion parameters), not huge models that use a lot of power, run these agents. What this means is: You get strong, cheap AI analytics without needing a lot of hardware. This is great for business owners, marketers, and small teams who use tools like Bot-Engine to automate reports and look at data.

Why Data Science Tasks Are a New Area for LLMs

Common LLM uses—like chatbots, text summaries, or even making emails—are usually straightforward. But data science tasks are much harder. They often involve many steps and complex pipelines. For example, one project might mean bringing in data, finding problems, changing the data, making charts, building prediction models, and then writing a summary. All these steps happen in one notebook.

Jupyter Notebooks are a great place for this kind of organized thinking. They show code, notes, data, and visuals all at once. This makes them easy for both people and AI agents to understand, helping agents learn to work like humans. Because of this mix of text, code, and output, small LLMs can follow steps in order and see what's happening. The notebook becomes more than a tool; it's a place where they learn.

In business, this means you don't need a whole data science team to make weekly performance reports or check A/B tests anymore. Jupyter Agents with Bot-Engine workflows can do these tasks automatically. You just give them a CSV file, and they can send a notification when done. This isn't just a technical change; it's very practical.

Key Parts of a Jupyter Agent

Jupyter Agents work because they combine a few key abilities. These abilities help them think clearly inside notebooks and automate data tasks. These are the main parts that make these agents strong and work well:

Notebook Interfacing

To start, Jupyter Agents must "get" how a notebook is set up. This means they need to:

Read notes and understand what the tasks ask for
Find and understand the code in each part
Read the results, both text and pictures
See how data moves from what goes in, to how it's changed, to what comes out

An agent trained like this can act like a human analyst. It understands each step's context and figures out what to do next.

Code Generation + Summarization

Small but powerful LLMs make these agents. They write code automatically for things like:

pandas for getting data ready
matplotlib and seaborn for making pictures
scikit-learn for machine learning models
They also write summaries in markdown. This turns results into easy-to-read information.

For example, an agent might write: “Here are your top five Facebook campaigns shown in a bar chart.” It also puts together the code needed to figure out and show these results.

Scaffolding Strategies

To handle hard tasks, Jupyter Agents split the work into smaller, easier-to-manage parts:

They plan the whole process based on what the user asks.
Then they give out smaller jobs, like cleaning data or showing trends.
And they do each small job using a team of mini-agents or rules that work together.

This way of planning, called scaffolding, is a must for tasks with many steps. This includes giving back clean data, making a forecast, and creating easy-to-use results.

Built-In Toolkits

Small LLMs work much better when agents have special tools added right into the notebook. Some common tools already built into Jupyter Agents are:

pandas_profiling: It automatically looks at data, giving summaries and visuals.
ydata-profiling: This one replaces pandas_profiling. It works better and lets you change more things.
matplotlib/seaborn: These libraries help make clear charts that you can change.
sklearn: This is a machine learning library. It helps with making data models and checking how good they are.

With these tools, agents don't have to start from scratch. Instead, they can quickly make good, useful results over and over again.

The DABStep Benchmark: How Synthetic Data Helps Train Agents

To train LLMs to work well in Jupyter, you need good, organized data. Primer made the DABStep benchmark for this. It's a platform to train and check data science agents.

What Is DABStep?

DABStep (Data Analysis Benchmark Stepwise) is a made-up benchmark. We built it just for training LLM agents to do data science tasks. Here are some main facts about it:

🚀 Over 200,000 labeled notebook examples
📄 It comes from 10,000 different prompts. These prompts cover things like exploring data, making visuals, and building models.
🧼 The input, output, and notes are all carefully cleaned and labeled.

Every notebook in DABStep has fake data that looks real. This data is made to be clear, have little error, and always be labeled the same way. So, it's perfect for training and testing how well small LLMs do tasks like the ones in real data analysis jobs.

Citation: Primer. (2024). The DABStep Benchmark: Training LLMs to reason in notebooks

How Agents Plan and Do Notebook Tasks

Unlike how traditional LLM outputs appear in one block of text, Jupyter Agents put their smarts into a series of logical steps.

Stepwise Execution Example

Imagine a small business owner puts in campaign data. Their goal: “Show the regions that made the most money.” A Jupyter Agent set up correctly could do this:

First, it looks at column names to find ‘regions’ and ‘revenue’.
Then, it groups data by region and figures out the average or total money made.
Next, it makes a bar chart. This chart shows regions from most money to least.
Finally, it writes a note: “The Northeast region made the most money, $45,000. The Southwest was next.”

This is like how people think. They understand things and use what they know, step by step. With mini-agents and a task planner working together, the LLM can keep its thinking organized. It also has checks at every step.

Performance: Surprising Strength of Small LLMs 🚀

Many people think smaller models cannot do as well as big ones like GPT-3.5 or GPT-4. However, in organized places like notebooks, small LLMs can do very well if trained right.

Benchmark Results

In checks using the DABStep set up:

🔹 Phi-1.5 (1.3B parameters) regularly did better than ChatGPT-3.5 in how correct tasks were and how many errors they made.
🔹 Small LLMs matched correct code and output better.
🔹 Their thinking was easier to follow when shown with notes and code blocks.

These smaller models work well because they focus more tightly. They are more accurate with context, as they have smaller attention spans. And they are less wordy. This makes them perfect for organized task flows.

Stat: Small LLMs like Phi-1.5 (1.3B) did better than ChatGPT-3.5 on specific DABStep tasks when set up correctly (Primer, 2024)

Why Synthetic Data Helps AI Automation

AI developers often wonder if they should train models using messy real-world data or clean, made-up data. DABStep shows that using made-up data can help a lot.

Advantages of Synthetic Training:

🚫 Less junk: No spelling mistakes, columns that don't match, or notes that don't belong.
✔️ Easier to label: Notes, code, and output are clearly separated.
🔁 You can do it again: It's easy to train, test, and compare over and over.

Made-up datasets give AI agents a fresh start. This is great for teaching them basic skills before they work with messy real-world data. Programmers can fix problems faster and find out why things fail. This means quicker updates and putting things into use.

When Real Data and Fake Data Come Together: Training with Kaggle

Fake data training is good for getting started. But Jupyter Agents truly show what they can do when they work with messy, real-world data.

Challenges in Real-World Applications

📉 Missing values or poorly labeled columns
📎 Confusing context from vague requests like “make it better” or “perform better”
🌍 Small, unique details in fields like healthcare or finance

Adaptive Solutions

To close this gap, some systems use special ways to adapt to different areas:

Learning by doing, with users correcting mistakes.
Learning actively from tasks done many times.
Adjusting with words and measurements specific to a field.

With time, your Jupyter Agent will react better to the odd parts of your data. It will act less like a general tool and more like a junior analyst trained just for your business.

From Jupyter to Business: What This Means for Business Owners

This isn't just a tool for AI researchers—it's a very useful tool that changes things for startup teams, solo business owners, or marketers. They get quick, consistent information.

Example Use Cases

Quickly sum up sales trends from last month.
Find social media posts that did very well.
Check A/B tests for landing pages.
See customer loss each quarter without needing a BI team.

🔧 Example Workflow for Bot-Engine Users:

Upload a transaction log to Google Drive.
Start a Make.com task that turns on a Jupyter Agent.
The agent checks data and evaluates models.
A summary is written in markdown, with graphs.
The results go straight to Slack, Notion, or your CRM.

This gives you a fully automated way to get insights. Small LLMs run it, and it fits into a no-code process.

What's Next: No-Code Jupyter Agents in Bot-Engine Workflows

As these agents get better, tools like Bot-Engine want to make their power available to everyone through easy-to-use, no-code platforms. Imagine:

🧾 Drag-and-drop CSV input forms on your website
👁️ Click a button to make visual summaries
📬 Your team gets automatic suggestions for strategy by email
📅 Reports set to run every week or month

Things that might be added later include:

✅ Many agents working together on bigger data sets
💬 Making insights in many languages
🧮 Connecting with dashboards that show live data
🔐 Better control over who can see private data sets

Jupyter Agents are the core, and Bot-Engine is the front-end. With this, small teams can get insights like big companies, but with a startup's budget.

Tradeoffs and Limitations

Jupyter Agents are powerful—but they have downsides. Businesses should know about their limits:

🧠 They can only remember so much at once, so long tasks might get cut short.
🚧 They might have trouble when used in new areas if not adjusted first.
📚 LLMs might make up chart labels, axes, or summaries if they don't have enough support.
🧮 They can't yet easily check many data tables against each other.
🔐 Rules for privacy and data safety must be followed strictly.

They are good for everyday analysis and reports. But for very important choices, people should still check things.

A New Kind of Agent: Light, Connected, and Ready for Business

We are seeing a new type of AI agent. These are special, small models that can fit smoothly into daily business tasks. These models won't take the place of big LLMs. Instead, they will add to them in very specific workflows, such as:

📈 Bots that predict things for online stores
📧 Tools that make marketing campaigns work better
📊 Automated tools for weekly business analysis
👩‍💼 Tools that update CRM for sales teams

With platforms like Bot-Engine, a regular team can make and use complex analyses without any coding. The real-world benefits are huge. They save time and help people understand data better.

If you are a founder, a product manager, or run your own business, now is a good time to look at Jupyter Agents. Small LLMs can still give you great insights.

Citations

Primer. (2024). The DABStep Benchmark: Training LLMs to reason in notebooks. Retrieved from: https://huggingface.co/blog/jupyter-agent-2