In a world inundated with data, the ability to distill meaningful insights is akin to navigating treacherous waters. Jonah Lynch, a trailblazer in digital humanities and a freelance consultant for machine learning (AI) solutions, and a distinguished member of Neodata Group’s Advisors Board, stands at the helm, charting a course through the vast ocean of information. As a statistics educator at the University of Milan (Bicocca), Lynch combines his expertise in academia with practical insights from the frontiers of AI. You can find his other writings at jonahlynch.substack.com
The Challenge of Information Overload
Good decisions are the result of good information. Specific and actionable knowledge of the world is like the nautical charts that allow a ship’s captain to round the capes, avoid the hidden shoals and rocks, and safely arrive at the harbor. But what could the captain do if instead of a few dozen dangerous spots in the water there were hundreds of thousands, or millions? He would need to simply stay home. There is a limit to the amount of information that a human can take into account. While that limit is highly variable, nobody would argue that it doesn’t exist.
To see this in greater detail, consider the case of ArXiv, a website founded at Cornell University in 1991 that hosts free pre-print articles from several sciences: physics, math, biology, computer science, and a few others. The site is extremely useful for scientists. Instead of having to wait months or years for journals to publish the latest results, researchers can see what everyone else around the world is discovering and publishing on a daily basis. Real-time updates and free open access can accelerate discovery greatly. Similar sites exist for other domains, such as PubMed for medical research.
But as anyone who has tried to use this information knows, such a large amount of data can be paralyzing. The arXiv site has a page showing statistics of articles published over time. In all domains, the numbers are overwhelming! Who can read 60,000 articles in computer science every year? That’s 165 articles a day!
For a few days, it might be possible to keep up to date, but have a sick day or spend Sunday with your family, and the sheer quantity of information you have not yet read feels more and more like drinking from a hose. The water just doesn’t stop coming.
Granted, there are ways to reduce the dimension of the information (read just the titles, just the abstracts, just the review articles, just the conclusions… you get the point.) Still, human time scales and human memories are insufficient to deal with the quantity of information available. In order to navigate well, avoid danger and embrace opportunity, this information is important. But how can one create a map that is constantly changing, with hundreds of thousands of new details every month?
AI as a Solution: Unleashing the Power of Neural Networks
Humans have developed a variety of ways to solve this problem. Our information technologies include writing itself, archival strategies and library structures, publication and debate technologies including conferences, journals, and social media. Our biology has other, more ancient dimension-reduction techniques, like our limbic system which among other things can reduce highly complex information into a few basic emotional responses. There are good reasons to “trust your gut”, which is often able to summarize a wide variety of inputs into useful, if perhaps not always optimal, decisions.
There are also good reasons not to trust our ancient emotional responses. There are many problems that are intractable to intuition. In physics, the levels of truth that could be intuited from ordinary sense-experience ended around Newton’s time. After that, who could directly intuit the functioning of electromagnetic fields, or relativity, or quantum mechanics? You have to trust the logic of the equations, which is often like feeling your way around a pitch-black cellar. There is no bird’s-eye view available, and you can only proceed carefully, step by step.
In business too, there is often no trustworthy intuition available. The present moment is new, the playing field is constantly changing, what worked in the past may not work today. Data streams in from a variety of sources and is processed by teams of scientists who seek the patterns that can help managers and executives chart the next step forward. The quantities of data, the speed of response, and the complexity of the machinery used to find patterns are ever increasing.
Today, people often speak of Artificial Intelligence, usually meaning large language models (LLMs) based on neural networks, as a possible solution to the issue. Since their invention, artificial neural networks have proven able to learn representations of many hard problems. The latest architectures, instantiated in applications like ChatGPT, are remarkable in their ability to find and “creatively” replicate patterns in language, something that only a few years ago seemed impossible. An explosion in products as well as R&D is currently going on in this field: every day, new fine-tuned LLMs are released for specialized tasks, and new neural network architectures are invented, increasing the number of tasks that can be optimized in this manner. Debate rages about whether these models are able to generalize from data, or “only” create a nuanced, but essentially statistical output that does not contain understanding about the world. Either way, the advances are impressive.
In my own research, I am developing ways to leverage the high-dimensional representation of text within LLMs to extract useful information using mathematical tools. For instance, by “embedding” a large amount of text within a language model, it is possible to calculate and visualize which texts cluster in the same regions of the space of the model. This information can be used to create effective summaries of large amounts of information, and even to generate hypotheses about what might be discovered in the future of the field, in order to concentrate resources and investments in that direction.
The Ethical Dimension of Knowledge
These technologies are another step on our way to observe the world more closely, and notice more completely its structure. The more precisely we know how the world is structured, the better able we will be to make decisions. Knowledge has an exquisitely ethical component too: the better we know how things are interrelated (in the economy and the biosphere, for instance), the better able we will be to take the full cost of our actions into account. With fuller knowledge of the interconnections and interdependencies in the world, optimizing for profit could be made to coincide more closely with optimizing for the common good. Better knowledge of the weaknesses of human perception and judgment, as famously studied by Kahnemann and Tversky, can guide us to create intellectual prosthetics that help us judge more truly in the domains where our nature fails us.
Of course, what remains outside our knowledge will continue to affect us, and even if we could reach complete knowledge of the structure of the world, dark possibilities might still remain (“the human heart is deceitful above all things”, wrote the prophet Jeremiah).
What we can do is incrementally grow our knowledge and improve our map of the world. We are explorers, ever and again setting out into the deep: navigating unfamiliar waters, wondering what lies beyond the next headland or on the other side of the ocean. AI is a new tool on our journey, like a telescope that allows us to see farther and deeper than we could before.
For Inquiries and Collaboration, reach out to Neodata Group at info@neodatagroup.ai. Join us in shaping the future of information-driven insights and decision-making.
Jonah Lynch
Jonah Lynch, a trailblazer in digital humanities and a freelance consultant for machine learning (AI) solutions, and a distinguished member of Neodata Group's Advisors Board, stands at the helm, charting a course through the vast ocean of information. As a statistics educator at the University of Milan (Bicocca), Lynch combines his expertise in academia with practical insights from the frontiers of AI.
-
This author does not have any more posts.