Building the product I've always wanted

Jan 15, 2025

After spending the last 8+ years immersed in the world of data annotation and crowdsourcing, I had to admit to myself that I wasn’t going to be happy until I solved the frustrating gap I saw in the tools available to those developing microwork solutions. There just aren’t any solutions that truly serve the needs of those who are trying to build annotation pipelines or setup other types of tasks with humans-in-the-loop.

So, for much of this year I’ve been taking what started as a small, proof-of-concept that I used to support solutions for my clients, and built it into a new service I call Task Assembly.

Task Assembly will make it easy to take any repetitive data annotation task (yes, this is about building AI solutions, but it can be used for a lot more) and quickly distribute it to teams of workers. We give you all the tools you need to define a task, set up a team of workers, and manage quality.

A consistent source of pain

I’ve spent the better part of the past 10 years helping clients and customers figure out how to gather annotated data, either using Amazon MTurk, vendors, or members of their team. Along the way I’ve consistently run into the same set of problems that showed up on every project: It’s too hard to build a task interface to collect the data. I spend so much time monitoring quality I could just do it myself. Building a data pipeline the process is so complicated. And many more.

More than that, I became frustrated with how limiting many platforms were when it came time to build anything but the simplest task flow.

A few years ago, I started building a solution for my clients that would allow us to more easily use MTurk to gather the data they needed. This service handled a lot of data management headaches associated with MTurk and provided built-in tools to easily manage worker quality and ensure we were getting the data they needed. Over the past few years, I’ve been able to deliver some incredibly interesting solutions for my clients and have iterated on the features I included.

While it was incredibly useful, and even saw regular use by a team at Amazon, it had some limitations I needed to address. Over the course of this year, I’ve taken the learnings and ideas and built them into Task Assembly.

An IDE for microtasks

The biggest pain point in developing a new task is that the tools just aren’t there. Because task development draws on data science, web development, and data engineering skills, none of the tools quite met my needs. I often found myself toggling back and forth between an IDE like Webstorm to build the task interface, a Jupyter Notebook, and a browser window. Even then, I typically had to write bespoke code to visualize the data I was getting back from the workers completing tasks.

Task Assembly simplifies all of this by bringing your data, task interface, and code together in an easy-to-use tool that lets you test and validate each step of your task’s data flow. Upload a sample of your input data to get started, then build a task interface and view how it will appear to workers. Submit it yourself and validate the data that is returned. Select data processing Bots to format your data, or write your own, and see how the response changes. Add a Bot to handle resolving answers from multiple workers into a single result and review the response that is returned. At each step, you can see how the changes you make impact the data you get back.

To ensure you get the results you need, users start a recording session to capture Training and Test tasks that will be used to ensure your workers understand the requirements of the task and are providing quality results. When you start getting responses from your workers, you can review responses and drill into the responses each worker provides.

At every step of the process, we’ve tried to find ways to smooth off rough edges we often face in building and deploying tasks.

Treat workers like humans and not an API

When Amazon Mechanical Turk was first launched it was described as “artificial artificial intelligence” and this very mechanistic approach pervaded how the service (and a lot of other services that followed) was designed. The service wasn’t really designed to treat workers as humans. They make mistakes, need to take breaks, get distracted, and need an experience that allows them to be as effective as possible.

We’ve spent a lot of time innovating the worker experience to ensure workers can be as effective and accurate as possible. When workers first start working on a task, we automatically start by asking them to review the instructions. We then let them work on training tasks that help build their confidence that they understand the instructions correctly. We keep the instructions nearby, so they can always refer to them.

As they begin working on the real work, annotators will immediately note that submitting and moving to the next task happens in a fraction of the time it typically takes to move from one task to the next. On MTurk it can take up to 10 seconds from submitting to being able to work on the next task. With Task Assembly, this happens in milliseconds.

As annotators work through tasks, they’ll also note that they have options they’ve never had before. Now they can go back and correct mistakes if they hit submit too early. If a task looks a bit more complicated, they can shelve it and come back to it later. If it’s simply a task they don’t feel they can answer confidently, they can skip it and go to the next. By giving more flexibility to workers, we can reduce anxiety and help them build the confidence they need to give you the data you need.

Results, not data

When developing annotation tasks, one of the most common challenges is to review the data to spot check the results, identify the causes of errors, and refine your strategies for how you consolidate multiple responses into a single result for a task. If you’re working with simple text responses this isn’t terribly difficult to simply review in Excel. But the moment the data becomes more complex, you need a way to visualize the results, not stare at a JSON object.

Let’s look at the common bounding box image annotation task. The results are typically an array of coordinates that a worker drew around items in an image. When reviewing results, you want to see where on the image each workers drew boxes, and the resulting boxes for the task. No one I know can simply look at an array of x/y coordinates and know instinctively what they’re looking at.

Task Assembly solves this by giving you a result viewer interface where you can visually review the results of each task including each worker’s contributions. This simplifies review and let’s you catch issues early in development of your annotation project.

More to come

If you can’t tell, I’m extremely excited I’m finally able to start talking about this and sharing all of the great features we’ve been able to build into Task Assembly. I’ll be sharing more details about how we got here and where were we’re headed with the service over the coming weeks.

Data Annotation Diary

Discussion about this post