LightTag, a newly launched startup from a former NLP researcher at Citi, has constructed a “textual content annotation platform” designed to help information scientists who must shortly create coaching information for his or her AI programs. It’s a basic picks ‘n’ shovels transfer, in that the bootstrapped Berlin-based firm is hoping to benefit from the present growth in AI growth.
Particularly, LightTag goals to unravel one of many predominant bottlenecks of ‘deep studying’-based AI growth: what you get out is just nearly as good because the labeled information you place in. The issue, nonetheless, is that labelling information is laborious, and because it’s a job carried out by groups of people it’s liable to inaccuracy and inconsistency. LightTag’s team-based workflow, intelligent UI, and in-built qc is an try to mitigate this.
“What I’ve taken from [my previous positions] to LightTag is an understanding that labeled information is extra vital to success in machine studying than intelligent algorithms,” says founder Tal Perry. “The distinction in a profitable machine studying challenge usually boiled right down to how nicely the gathering and use of labeled information was executed and managed. There’s a large hole within the tooling to persistently do this nicely, that’s why I constructed LightTag”.
Perry says LightTag’s annotation interface is designed to maintain labellers “efficient and engaged”. It additionally employs its personal “AI” to be taught from earlier labelling and make annotation recommendations. The platform additionally automates the work of managing a challenge, by way of assigning duties to labellers and ensuring there may be sufficient overlap and duplication to maintain accuracy and consistency excessive.
“We’ve made it dead-simple to annotate with a workforce (sounds apparent, however nothing else makes it straightforward),” he says. “To verify the information is nice, LightTag routinely assigns work to workforce members so that there’s overlap between them. This enables challenge managers to measure settlement and recognise issues of their challenge early on. For instance, if a selected annotator is performing worse than others”.
In the meantime, Perry says buying labeled information is likely one of the silent development sectors within the latest AI growth, however for a lot of sector-specific industries, akin to medical, authorized or monetary, outsourcing the job will not be an possibility. That’s as a result of the information is usually too delicate, or too specialist for non-subject consultants to course of. To deal with this, LightTag presents an on-premise model along with SaaS.
“Each firm has large textual content datasets which are unstructured (CRM data, name transcripts, emails and so forth). ‘Deep Studying’ has made it algorithmically possible to faucet that information, however to make use of Deep Studying we have to prepare the mannequin with labeled datasets. Most firms can’t outsource labelling on textual content as a result of the information is simply too sophisticated (biology, finance), regulated (CRM data) or each (medical data),” explains the LightTag founder.
Working in numerous pilots and in personal beta since December 2018, and publicly launched this month, LightTag has already been utilized by the information science workforce at a big Silicon Valley tech firm that wishes its AI to know free-form textual content in profiles, in addition to by an power firm to analyse logs from oil rigs to foretell issues drilling at sure depths. The startup has additionally completed a pilot with a medical imaging firm labelling stories related to MRI scans.