Who trains the trainers?
Our ability to influence LLMs is seriously circumscribed. Perhaps if you’re the owner of the LLM and associated tool, you can exert outsized influence on its output. For example, AWS should be able to train Amazon Q to answer questions, etc., related to AWS services. There’s an open question as to whether Q would be “biased” toward AWS services, but that’s almost a secondary concern. Maybe it steers a developer toward Amazon ElastiCache and away from Redis, simply by virtue of having more and better documentation and information to offer a developer. The primary concern is ensuring these tools have enough good training data so they don’t lead developers astray.
For example, in my role running developer relations for MongoDB, we’ve worked with AWS and others to train their LLMs with code samples, documentation, etc. What we haven’t done (and can’t do) is ensure that the LLMs generate correct responses. If a Stack Overflow Q&A has 10 bad examples and three good examples of how to shard in MongoDB, how can we be certain a developer asking GitHub Copilot or another tool for guidance gets informed by the three positive examples? The LLMs have trained on all sorts of good and bad data from the public Internet, so it’s a bit of a crapshoot as to whether a developer will get good advice from a given tool.
Microsoft’s Victor Dibia delves into this, suggesting, “As developers rely more on codegen models, we need to also consider how well does a codegen model assist with a specific library/framework/tool.” At MongoDB, we regularly evaluate how well the different LLMs address a range of topics so that we can gauge their relative efficacy and work with the different LLM vendors to try to improve performance. But it’s still an opaque exercise without clarity on how to ensure the different LLMs give developers correct guidance. There’s no shortage of advice on how to train LLMs, but it’s all for LLMs that you own. If you’re the development team behind Apache Iceberg, for example, how do you ensure that OpenAI is trained on the best possible data so that developers using Iceberg have a great experience? As of today, you can’t, which is a problem. There’s no way to ensure developers asking questions (or expecting code completion) from third-party LLMs will get good answers.