An excessive amount of duplication
Some stage of competitors and parallel improvement is wholesome for innovation, however the present state of affairs seems more and more wasteful. A number of organizations are constructing comparable capabilities, with every contributing a large carbon footprint. This redundancy turns into significantly questionable when many fashions carry out equally on commonplace benchmarks and real-world duties.
The variations in capabilities between LLMs are sometimes delicate; most excel at comparable duties resembling language technology, summarization, and coding. Though some fashions, like GPT-4 or Claude, might barely outperform others in benchmarks, the hole is often incremental moderately than revolutionary.
Most LLMs are educated on overlapping information units, together with publicly accessible web content material (Wikipedia, Frequent Crawl, books, boards, information, and so forth.). This shared basis results in similarities in information and capabilities as fashions take up the identical factual information, linguistic patterns, and biases. Variations come up from fine-tuning proprietary information units or slight architectural changes, however the core normal information stays extremely redundant throughout fashions.