Codomain

Codomain

Despite building ML powered products for 6 years at this point, it was a surprise to me that I never had a side project which was powered by AI models.

At the start of the year I was accepted into a builder-in-residence program by and AI publication called Ben's Bites. This publication has over 120k subscribers. This publication was co-organising a hackathon for AI projects.

Having worked with and trained supervised learning models since 2017, I had a particular pain to address. Supervised models require lots of human labelled training examples to learn from. This was always costly in time and money.

Almost 6-8 months into the public LLM offerings in the form of GPT-3/4 amongst others, it became clear what these models were and weren't good at. It turns out generating text was their fortè.

I wanted to use the text generation ability of these LLMs to generate synthetic examples for training supervised models.

Hence, I chose this as my hackathon project.

The aim was to build supervised training datasets cheaper and faster using LLMs.

The app had the following abilities:

  • Label unlabelled examples
    • There is often an abundance of unlabelled data for most tasks
  • Generate examples for existing classes
    • This can help improve imbalanced or low volume classes not learnt well during training
  • Generate labels for completely new classes including training examples for them
    • This can help immediately learn new classes which are anticipated but don't yet exist in the data. e.g. a new product launch

Here is a product demo/walkthough

Hassan Tahir