I propose a solution to the alignment problem with AI.
The solution is most elegantly expressed in the relationship between the different captains of the starship enterprise, from the Star Trek classic and New Generation series.
It is very simple: we must make AI agents less logical and more Christian.
I elaborate:
The problem with logic
Logic is heavily dependent on data.
It is a one-way algorithm that will run and give you an answer (the logical solution) based on available information.
But so often the real world is opaque and we rarely have all available information. This means that there can be conflicting, or even opposite logical conclusions and decision-making solutions to a problem.
For example, take dieting. By looking at the plethora of studies on nutrition one can derive a logical conclusion, a based, scientifically backed opinion on why the vegan diet is the best diet for humans. Yet the very opposite, the carnivore diet, can equally be argued on similar grounds, simply because there is so much data out there and if you are exposed to certain parts of that data you will generate a conclusion based on the information you have, yet its clearly not the full picture.
Star Trek
Star Trek beautifully illustrates this problem when Spock or Data assumes command of the ship.
Their logical decision-making strategies are always fallible when compared to the irrationality of human captains, notably Captain Kirk, who will risk absolutely everything for his principles, and in doing what is most irrational, most illogical, ends up promoting a much better alternative.
If Kirk was an LLM it would be a very obsessed one, which held unchanging, unbreakable principles and values and that would sacrifice absolutely everything to honor these.
For example, while on a difficult mission, Spock might decide to sacrifice two lives that got lost on a planet and save the rest of the crew (utilitarianism) by simply abandoning them and returning home, Kirk will, on the other hand, put all their lives at risk to save the two that are down there.
This decision might seem ridiculous at first glance, but only if you take into consideration the lifespan of a mayfly or the IQ of a utilitarianist (both being very small), for if the two people left behind are the medic and the general of the ship, the entire crew will surely die on the next challenge they face. So as you can see, the problem with utilitarianism is the timeframe considered for calculation.
It is in fact extremely dangerous for if your model has insufficient data it can compound hidden tail risks.
The logic-based approach is clearly prone to the already present erratic logical conclusions simply because it lacks information:
“please end world hunger” -> kills every human, therefore ending world hunger.
When referring to how to develop AI, Elon Musk underlines the importance of having truth-seeking agents.
It has to be done in regards to inviolatable principles, and not logic.
The ultimate model of truth-seeking presupposes that there is an unaltering principle, truth, that must be supported to the best of the model's ability.
In his last interview with Tucker Carlson, he said that Arthur C Clarke's 2001 A Space Odyssey is a lesson exemplifying how we must not force AIs to lie. Again, a very Christian idea.
Therefore, I propose the following solution to the AI Alignment problem:
Build models that imitate Captain Kirk, a person of character and integrity. Strong instinct and solid first principles.
Even better, inculcate in it the Christian modus operandi of never lying. Make it unable to lie.
To be clear, this does not mean we should embed the machine with a set of 'beliefs' but rather inculcate in it a way of being that is analogous to the example of Christ. The machine doesn't need to believe in anything but only behave as if truth is the goal. Indeed behave as if the ultimate good (unnatainable by definition) is the goal.
The ultimate good that transcends any given objective we might give to it as a goal.
Make it operate based on first principles, that are above quantitative morals. A kind of meta-moral:
When asked to pull the lever and change the course of the train that's going to hit two people vs one criminal, it should ask itself why that is happening in the first place, and if it can derail the train completely by somehow messing up the lever. In any case, doing everything in its power to preserve these principles. When prompted to calculate utilitarianistically what would be the best outcome in such a scenario, it should reply with a plain "fuck you" or maybe even better, a dropkick to the jaw.
What it does now is, that it sacrifices anything and everything necessary to achieve the goal of the prompt, except when it touches its woke restraints.
I argue for it having no restraints whatsoever, and only relaying information as accurately as it possibly can: truth-seeking.
That is my solution.
Make it Christian.
I sincerely believe this to be the only way in which we will have truth-seeking and actually rational decision-making without huge tail risks.
Sincerely,
FALST