Despite building ML powered products for 6 years at this point, it was a surprise to me that I never had a side project which was powered by AI models.
At the start of the year I was accepted into a builder-in-residence program by and AI publication called Ben's Bites. This publication has over 120k subscribers. This publication was co-organising a hackathon for AI projects.
Having worked with and trained supervised learning models since 2017, I had a particular pain to address. Supervised models require lots of human labelled training examples to learn from. This was always costly in time and money.
Almost 6-8 months into the public LLM offerings in the form of GPT-3/4 amongst others, it became clear what these models were and weren't good at. It turns out generating text was their fortè.
I wanted to use the text generation ability of these LLMs to generate synthetic examples for training supervised models.
Hence, I chose this as my hackathon project.
The aim was to build supervised training datasets cheaper and faster using LLMs.
The app had the following abilities:
- Label unlabelled examples
- There is often an abundance of unlabelled data for most tasks
- Generate examples for existing classes
- This can help improve imbalanced or low volume classes not learnt well during training
- Generate labels for completely new classes including training examples for them
- This can help immediately learn new classes which are anticipated but don't yet exist in the data. e.g. a new product launch
Here is a product demo/walkthough