Label Studio

What is Label Studio?

Label Studio is an open source data labeling tool for labeling and exploring multiple types of data. It allows you to do the following:

  • Perform different types of labeling with many data formats.

Label Studio is also available in Enterprise and Cloud editions with additional features. For more information, see the Label Studio features page.

Quick start

  1. Install Label Studio:
    pip install label-studio
  2. Start Label Studio
    label-studio start
  3. Open the Label Studio UI at http://localhost:8080.
  4. Sign up with an email address and password that you create.
  5. Click Create to create a project and start labeling data.
  6. Name the project, and if you want, type a description and select a color.
  7. Click Data Import and upload the data files that you want to use. If you want to use data from a local directory, cloud storage bucket, or database, skip this step for now.
  8. Click Labeling Setup and choose a template and customize the label names for your use case.
  9. Click Save to save your project.

You’re ready to start labeling and annotating your data!

Terminology

When you upload data to Label Studio, each item in the dataset becomes a labeling task. The following table describes some terms you might encounter as you use Label Studio.


Project List Screenshot


Data Manager Screenshot


Quick View Screenshot
Term Description
Dataset What you import into Label Studio, comprised of individual items, or labeling tasks.
Task A distinct item from a dataset that is ready to be labeled, pre-annotated, or has already been annotated. For example: a sentence of text, an image, or a video clip.
Region The portion of the task identified for labeling. For images, an example region is a bounding box. For text, an example region is a span of text. Often has a label assigned to it.
Labels What you add to each region while labeling a task in Label Studio.
Relation A defined relationship between two labeled regions.
Result A label applied to a specific region as stored in an annotation or prediction. See Label Studio JSON format of annotated tasks.
Annotations The output of a labeling task. Previously called “completions”.
Predictions,
Pre-annotations
Annotations in Label Studio format that machine learning models create for an unlabeled dataset. See import pre-annotations
Templates Example labeling configurations that you can use to specify the type of labeling that you’re performing with your dataset. See all available templates
Tags Configuration options to customize the labeling interface. See more about tags.

Features

Label Studio is available as a Community edition open source data labeling tool. It is also available as a paid version with extended functionality and support. Smaller organizations might want to consider the SaaS option and larger teams with robust data labeling needs can get the Enterprise edition. To get started with Label Studio Enterprise edition, contact the Heartex team.

Functionality Community Enterprise
User Management
User accounts to associate labeling activities to a specific user. ✔️ ✔️
Role-based access control for each user account. ✔️
Organizations and workspaces to manage users and projects. ✔️
Project Management
Projects to manage data labeling activities. ✔️ ✔️
Templates to get started with specific data labeling tasks faster. ✔️ ✔️
Data Management
Manage your data in a user interface. ✔️ ✔️
Import data from many sources. ✔️ ✔️
Export data into many formats. ✔️ ✔️
Synchronize data from and to remote data storage. ✔️ ✔️
Data Labeling Workflows
Assign specific annotators to specific tasks. ✔️
Automatic queue management. ✔️
Label text, images, audio data, HTML, and time series data. ✔️ ✔️
Label mixed types of data. ✔️ ✔️
Annotator-specific view. ✔️
Annotator Performance
Control label quality by monitoring annotator agreement. ✔️
Manage and review annotator performance. ✔️
Verify model and annotator accuracy against ground truth annotations. ✔️
Verify annotation results. ✔️
Assign reviewers to review annotation results. ✔️
Machine Learning
Connect machine learning models to Label Studio with an SDK. ✔️ ✔️
Accelerate labeling with active learning. ✔️ ✔️
Automatically label dataset items with ML models. ✔️ ✔️
Analytics and Reporting
Reporting and analytics on labeling and annotation activity. ✔️
Activity log to use to audit annotator activity. ✔️
Advanced Functionality
API access to manage Label Studio. ✔️ ✔️
On-premises deployment of Label Studio. ✔️ ✔️
Support for single sign-on using LDAP or SAML. ✔️

Labeling workflow

Start and finish a labeling project with Label Studio by following these steps:

  1. Install Label Studio.
  2. Start Label Studio.
  3. Create accounts for Label Studio. Create an account to manage and set up labeling projects.
  4. Set up the labeling project. Define the type of labeling to perform on the dataset and configure project settings.
  5. Set up the labeling interface. Add the labels that you want annotators to apply and customize the labeling interface.
  6. Import data as labeling tasks.
  7. Label and annotate the data.
  8. Export the labeled data or the annotations.

Architecture

You can use any of the Label Studio components in your own tools, or customize them to suit your needs. Before customizing Label Studio extensively, you might want to review Label Studio Enterprise Edition to see if it already contains the relevant functionality you want to build. See Label Studio Features for more.

The component parts of Label Studio are available as modular extensible packages that you can integrate into your existing machine learning processes and tools.

Module Technology Description
Label Studio Backend Python and Django Use to perform data labeling.
Label Studio Frontend JavaScript web app using React and MST Perform data labeling in a user interface.
Data Manager JavaScript web app using React Manage data and tasks for labeling.
Machine Learning Backends Python Predict data labels at various parts of the labeling process.

Information collected by Label Studio

Label Studio collects anonymous usage statistics about the number of page visits and data types being used in labeling configurations that you set up. No sensitive information is included in the information we collect. The information we collect helps us improve the experience of labeling data in Label Studio and helps us plan future data types and labeling configurations to support.