GPT-4 – what’s new? Diving deeper into the black box

The continuous improvement of language models has been a significant focus in the research and development of artificial intelligence for some time now, with model sizes increasing by a factor of 10 each for the last five years. The goal is to enable natural and human-like interaction with language technology that can be applied across a wide range of fields such as question-answering and document processing.

The GPT (Generative Pre-trained Transformer) models have been around since the summer of 2018. Its latest version-3.5 series has also been the basis for ChatGPT which has become so popular over the last months. With this hype, the question of what comes next has become similarly popular.

What can version 4 do?

The answer to that question has come in the form of GPT-4 these days. There have been rumors for quite some time – e.g., that the number of parameters will be magnitudes larger again (100 trillion compared to 175 billion for GPT-3) and speculation about what this will mean in terms of performance.
With its publication, OpenAI announces that while the model will still be “less capable than humans in many real-world scenarios” it shows astonishing results in professional benchmarks. It is now said to perform amongst the top 10% of participants in the well-known bar exam, that lawyers must pass to be admitted to practice in courtrooms. As a reference, the former GPT-3.5 model performed amongst the bottom 10% in the same exam.

The developers claim that the model has not been specifically trained to perform well in this exam and others that have been used for evaluation. So, even though some of the exams’ challenges have been seen during training, this shows the capability of GPT-4.
In addition to the factual accuracy of generated outputs, GPT-4 is reported to have better steering capabilities. This means it can adapt to different language styles or types of text – ranging from an interview or mentoring dialogs to producing well-structured articles.
Also, due to its ability to take much longer inputs, it can be used to summarize input texts in various ways – from highlighting several key takeaways to even one-liners with forced alliterations and similar.

Visual Input

Another obvious update is the possibility to feed visual input into the network. It can detect objects, describe them verbally, and even do things like emphasizing odd aspects of a picture. While multi-modal embedding spaces have been around for some time and used, for example, in search engines to bridge the gap between image and text domain, a first sight also suggests significantly improved capabilities here.

Limited capabilities

As for the former versions, GPT is a language model, producing outputs that sound plausible based on statistics on large corpora of training texts rather than drawing conclusions on a database of known facts and rules. This results in the effect of “hallucination”, i.e. producing statements that sound correct to the layman but are just not factually correct. As this is a main issue that prevents many use cases, the developers have paid special attention to this limitation. Indeed, GPT-4 outperforms its predecessor, scoring significantly higher at internal factual performance evaluations.

Another main limitation is that the majority of its training sources date back to September 2021 and earlier. So more recent topics are either not represented at all or only to a certain degree.

GPT-4 for Developers

As for the prior versions, there is an API offered to GPT-4. As of now, this is only accessible in commercial pro-version. For free usage, there is a waiting list to be granted access.

Future use

Knowing the model’s limitations mentioned above and the general performance, there are a large number of interesting use cases in various contexts. Nevertheless, it needs to be clear that GPT-4 or LLMs (large language models) are not a one-fits-all solution to every challenge. As for all AI systems, the user needs to consider the actual technique that solves a task efficiently, data usage and privacy, adaptability to a specific (enterprise) context, and many more.

At Cloudflight we have been applying various machine-learning techniques across different industries. We are glad to support you in your AI initiatives, from strategy definition over the design to finally implementing, operating, and maintaining AI solutions.