Unlocking AppStream for data annotation

One of the most technically complex MTurk tasks I've ever worked on

Jan 31, 2025

A couple years ago one of my clients was faced with a unique challenge. Yield was declining on a process they had built to gather data on their customers from their websites. In digging into the data, we noticed that some websites could no longer be automatically navigated because the sites had been migrated to JavaScript-heavy web applications. These webapps obscured much of the relevant information unless the website was navigated by a user.

I suggested we find a way to record the interaction steps to get to the relevant data and then play those steps back using software such as Puppeteer or Playwrite. Those tools are designed for testing websites but could be used to simulate user interaction to get the right data. Of course, for this to work we needed a way to gather the scripts for tens of thousands of websites. Tools already existed for recording interaction steps for use in testing, but figuring out how to put those tools in the hands of MTurk workers or other annotators would be a challenge.

Whether you’re working with MTurk workers or a team of annotators, relying on your annotators to install specific software on their device is fraught with potential issues. MTurk makes this particularly problematic because they explicitly forbid requiring users to install software to complete a task. In thinking through options, I was reminded of an idea I had a number of years ago when I was with MTurk.

Enter AppStream

Years ago, when I was working on the MTurk team at Amazon, a customer reached out with a unique challenge. They were working with drone data and needed to find a way to have workers use 3d modeling software to annotate that data. I can’t go into details on how they planned to use the data, but the challenge was very real. To successfully complete the task, workers would need to install expensive 3d modeling software on their computer and upload a file containing the completed result when they are done. Lots of problems here…

As I noodled on the challenge and approaches for solving it, I was intrigued by the potential of using AWS AppStream to address this type of annotation. AppStream lets you install applications on a fleet of Windows or Linux computers and then stream those applications to any browser. Since MTurk tasks are simply HTML, it would theoretically be possible to install the modeling software on a fleet of AppStream instances and then stream it to workers via the browser.

It was a fascinating idea. The biggest challenges I could see at the time would be the handoff between MTurk and the AppStream session and building some type of agent that could reside on the service to manage submitting the results when the task was complete. The trick would be maintaining a connection between two completely independent computers, the worker’s computer viewing the task on the MTurk website, and a remote computer in the cloud running the software.

In the end the idea was too complex for the customer to implement in the time they had available, but the idea stuck with me.

Chrome in AppStream

When we started discussing solutions for my client’s problem, I was reminded of this idea and realized this might be a good fit as a first foray into using AppStream. We’d have a lot more flexibility because a web browser was already an internet-enabled application that could assist in coordinating the connection between the session and the MTurk task.

The best way to gather the scripts we needed for this project was to use a Chrome extension that would record the actions a user took while navigating a website. While we couldn’t ask MTurk workers to install an extension, we could certainly install it in Chrome on an AppStream image.

In addition, the Chrome extension could also act as the agent and be responsible for submitting the result script back for our task. Since MTurk doesn’t support submitting task data using a secondary channel, I added a feature to my Task Assembly service to allow a task to securely submit additional data for a task.

We also added a feature to Task Assembly that would allow a task interface to make secure calls back to the service and invoke processing Bots. This callback feature made it possible to generate an AppStream StreamingURL when the worker viewed the task and clicked a button to start the session. When workers click the Start button, the callback would retrieve a new StreamingURL associated with their WorkerId and load a session showing Chrome within their browser.

When generating the StreamingURL we included a SessionContext that contained details on that task. This included the URL the worker would need to visit, along with the IDs associated with the task. When the session was loaded, this information was passed to Chrome and the extension would use it to automatically load the URL in the browser. It would use the IDs to submit the script back to Task Assembly when the worker completed the steps involved in navigating the website.

As a final step, we added a WebSocket connection between Task Assembly and the task interface. This allowed us to automatically submit the HIT within MTurk when a user clicked submit within the AppStream session.

As shown in the diagram above, you can see the solution was a bit more complex than your typical annotation task. But because we were working with a set of well understood web technologies, it was all manageable and could be built to be stable enough to support annotation within MTurk.

Building it

There is way too much complexity here to share the minute details of how we built this, but I’ll share some of the key aspects.

Chrome extension

This was the first challenge. We built a lightweight Chrome extension that handled several responsibilities:

Loading the target URL in a new tab when the session is opened and closing any tabs from prior tasks that hadn’t been closed.
An overlay interface that would let workers snapshot content when they navigated to the target data, provide feedback on the number of steps and snapshots that had been recorded, allow them to submit when they were done, and an Invalid option that would let workers indicate the requested data wasn’t present.
Record the steps a worker took within their browser session.
Submit the script back to Task Assembly.

AppStream image

We used the AppStream Image Builder to create a Linux image with Chrome installed along with our Chrome extension. To kick off the application, we created an application script that would retrieve the URL and IDs from the Session Context and open Chrome with a launcher URL containing those attributes. When a session is launched, this script automatically opens Chrome with the launcher URL and the extension routes the worker to the target URL.

Task interface

The task interface needed to do double duty, handling both opening the AppStream session, and provide a way for users to submit the task if something went wrong with the stream. We also needed to ensure that workers didn’t try to open multiple streaming sessions at the same time. Because the AppStream session is tied to their WorkerId, if they start streaming from multiple browser windows they would be connecting to the same computer.

To keep things simple, we initially only show workers a Start button and text admonishing them to only open one session at a time. When they click the button, the callback method retrieves the StreamingURL and opens a session window within the page. In addition, we included a set of buttons at the bottom of the page that a user could click to submit the task if the streaming session failed.

To support submitting the task when Submit is clicked within the AppStream session, we connect to Task Assembly via a web socket. When a submit message is received, we triggered a form submit to MTurk to let the worker advance to the next task.

Fleet management

This is a unique challenge when dealing with a service like MTurk. By design, MTurk allows as many workers as possible to start working on your task to get results as quickly as possible. However, AppStream doesn’t scale fast enough to keep up with this type of demand. It typically takes up to two minutes for a new instance to be available, which isn’t ideal for MTurk workers who prioritize quick task load times.

To get around this problem, we limit our audience of workers to roughly fifty that have done a good job on other tasks in the past. Before we post a batch of work, we spin up enough instances to serve the expected number of workers who will work on it (typically about 75% of the audience) and set the scaling rules to ensure there are always a few idle instances to pick up any workers who join after the initial group starts.

One unexpected concern was how to handle instances that simply got stuck and wouldn’t accept new connections. In initial testing, some workers would repeatedly experience broken sessions on all tasks. We solved this by adding a step to terminate an instance automatically if a worker clicks the button indicating that the session didn’t load successfully.

Next steps

The biggest annoyance with this task for workers is the amount of time they must spend waiting for each task to connect to AppStream. It can typically take anywhere from 10 to 30 seconds for the connection and stream to be established. This is even though workers are generally connecting to the same instance they were using in the previous task. For longer running tasks, this isn’t a big deal, but when the task itself typically only takes one to two minutes, this is a large drain on time.

A better solution would be to open a single streaming session, and let workers complete as many recordings as they wish within their session. With the latest version of Task Assembly, this is now possible because we can now update the Chrome extension to operate in much the same way as the Task Assembly Task Portal. Like the Task Portal, workers will be able to accept a single HIT and complete as many tasks as they wish within a single Assignment. When they decide to stop working, they will receive the base HIT reward as well as a bonus for all the additional tasks they complete.

Additional ideas

There are a range of ways I could envision applying this approach to future projects. Beyond the applications in several types of web scraping, you could envision using this to gather interaction patterns to support agentic workflows that integrate with existing websites rather than via APIs. I hinted at this last year when I shared a post on the launch on the rabbit r1. It’s probably not a good idea, and you would need to build a lot of protections to maintain security, but you could also use this approach to support granting annotators access to resources that would otherwise be behind a firewall.

Of course, one of the more interesting challenges would be supporting software applications other than Chrome. Giving workers access to powerful software for use in annotation would open a range of options for annotating data using tools that aren’t yet supported via web clients. One could envision giving expert annotators access to medical imaging software, supporting annotating using 3d modeling software, or distributing tasks requiring Windows software to a team of annotators using thin clients.

Data Annotation Diary

Discussion about this post