14 comments

  • og_kalu1 天前
    Related:

    GPT-4 logits calibration pre RLHF - https://imgur.com/a/3gYel9r

    Language Models (Mostly) Know What They Know - https://arxiv.org/abs/2207.05221

    The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets - https://arxiv.org/abs/2310.06824

    The Internal State of an LLM Knows When It's Lying - https://arxiv.org/abs/2304.13734

    LLMs Know More Than What They Say - https://arjunbansal.substack.com/p/llms-know-more-than-what-...

    Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback - https://arxiv.org/abs/2305.14975

    Teaching Models to Express Their Uncertainty in Words - https://arxiv.org/abs/2205.14334

    • zbentley1 天前
      And the other excellent work done by the Bau Lab in this area: https://rome.baulab.info/
    • foobarqux1 天前
      It's wild that people post papers that they haven't read or don't understand because the headline supports some view they have.

      To wit, in your first link it seems the figure is just showing the trivial fact that the model is trained on the MMLU dataset (and after RLHF it is no longer optimized for that). The second link main claim seems to be contradicted by their Figure 12 left panel which shows ~0 correlation between model-predicted and actual truth.

      I'm not going to bother going through the rest.

      I don't yet understand exactly what they are doing in the OP's article but I suspect it also suffers from serious problems.

      • og_kalu1 天前
        >The second link main claim seems to be contradicted by their Figure 12 left panel which shows ~0 correlation between model-predicted and actual truth.

        The claim in the abstract is:

        """We first show that larger models are well-calibrated on diverse multiple choice and true/false questions when they are provided in the right format.

        Next, we investigate whether models can be trained to predict "P(IK)", the probability that "I know" the answer to a question, without reference to any particular proposed answer. Models perform well at predicting P(IK) and partially generalize across tasks, though they struggle with calibration of P(IK) on new tasks."""

        The plot is much denser in the origin and top right. How is that 0 correlation ? Depending on the number of their held-out test set, that could be pretty strong correlation even.

        And how does that contradict the claims they've made, especially on calibration (Fig 13 down) ?

        • foobarqux21 小时前
          Figure 13 right panel also shows there isn't a y=x relationship on out-of-sample tests.

          First we agree by observation that outside of the top-right and bottom-left corners there isn't any meaningful relationship in the data, regardless of what the numerical value of the correlation is. Second, in those corners it is not clear to me what the relationship is but it looks flattish (i.e. if the ground truth is ~0 then the model-guess-for-truth could be anywhere from 0 to 0.5). This is also consistent with the general behavior displayed in figure 13.

          If you have some other interpretation of the data you should lay it out. The authors certainly did not do that.

          edit: By the way there are people working on a re-sampling algorithm based on the entropy and variance of the output logits called entropix: if the output probabilities for the next token are spread evenly for example (and not have overwhelming probability for a single token) they prompt for additional clarification. They don't really claim anything like the model "knows" whether it's wrong but they say it improves performance.

          • og_kalu5 小时前
            >Figure 13 right panel also shows there isn't a y=x relationship on out-of-sample tests.

            A y=x relationship is not necessary for meaningful correlation and the abstract is quite clear on out of sample performance either way.

            >Second, in those corners it is not clear to me what the relationship is but it looks flattish (i.e. if the ground truth is ~0 then the model-guess-for-truth could be anywhere from 0 to 0.5).

            The upper bound for guess-for-truth is not as important as the frequency. Yes it could guess 0.5 for 0 but how often compared to reasonable numbers? A test set on TriviaQA could well be thousands of questions.

            >edit: By the way there are people working on a re-sampling algorithm based on the entropy and variance of the output logits called entropix

            I know about entropix. It hinges strongly on the model's representations. If it works, then choosing to call it "knowing" or not is just semantics.

            • foobarqux1 小时前
              > A y=x relationship is not necessary for meaningful correlation

              I’m not concerned with correlation (which may or may not indicate an actual relationship) per se, I’m concerned with whether there is a meaningful relationship between predicted and actual. The 12 plot clearly shows that predicted isn’t tracking actual even in the corners. I think one of the lines (predicting 0% but actual is like 40%, going from memory on my phone) of Figure 13 right even more clearly shows there isn’t a meaningful relationship. In any case the authors haven’t made any argument about how those plots support their arguments and I don’t think you can either.

              > the abstract is quite clear on out of sample performance either way.

              Yes I’m saying the abstract is not supported by the results. You might as well say the title is very clear.

              > The upper bound for guess-for-truth is not as important as the frequency. Yes it could guess 0.5 for 0 but how often compared to reasonable numbers? A test set on TriviaQA could well be thousands of questions.

              Now we’ve gone from “the paper shows” to speculating about what the paper might have shown (and even that is probably not possible based on the Figure 13 line I described above)

              > choosing to call it "knowing" or not is just semantics.

              Yes it’s semantics but that implies it’s meaningless to use the term instead of actual underlying properties.

      • og_kalu1 天前
        >It's wild that people post papers that they haven't read or don't understand because the headline supports some view they have.

        It's related research either way. And I did read them. I think there's probably issues with the methodology of 4 but it's there anyway because it's interesting research that is related and is not without merit.

        >The second link main claim seems to be contradicted by their Figure 12 left panel which shows ~0 correlation between model-predicted and actual truth.

        The panel is pretty weak on correlation but it's quite clearly also not the only thing that supports that particular claim neither does it contradict it.

        >I'm not going to bother going through the rest.

        Ok? That's fine

        >I don't yet understand exactly what they are doing in the OP's article but I suspect it also suffers from serious problems.

        You are free to assume anything you want.

        • foobarqux1 天前
          > The panel is pretty weak on correlation but it's quite clearly also not the only thing that supports that particular claim neither does it contradict it.

          It very clearly contradicts it: There is no correlation between the predicted truth value and the actual truth value. That is the essence of the claim. If you had read and understood the paper you would be able to specifically detail why that isn't so rather than say vaguely that "it is not the only thing that supports that particular claim".

          • godelski1 天前
            To be fair, I'm not sure people writing papers understand what they're writing either. Much of the ML community has seemed to fully embraced "black box" nature rather than seeing it as something to overcome. I routinely hear both readers and writers tout that you don't need much math. But yet mistakes and misunderstand are commonplace and they're right, they don't need much math. How much do you need to understand the difference between entropy and perplexity? Is that more or less than what's required to know the difference between probability and likelihood? I would hope we could at least get to a level where we understand the linear nature of PCA
            • foobarqux1 天前
              LLMs have spawned endemic academic dishonesty in order to pad publication and citation counts.
              • godelski1 天前
                I'm not so sure that's the reason. I'm in the field, and trust me, I'm VERY frustrated[0]. But isn't the saying to not attribute to malice what can be attributed to stupidity? I think the problem is that they're blinded by the hype but don't use the passion to drive understanding more deeply. It's a belief that the black box can't be opened, no why bother?

                I think it comes from the ad hoc nature of evaluation in young fields. It's like you need an elephant but obviously you can't afford one, so you put a dog in an elephant costume and can it an elephant, just to get in the right direction. It takes a long time to get that working and progress can still be made by upgrading the dog costume. But at some point people forgot that we need an elephant so everyone is focused on the intricacies of the costume and some will try dressing up the "elephant" as another animal. Eventually the dog costume isn't "good enough" and leads us in the wrong direction. I think that's where we are now.

                I mean do we really think we can measure language with entropy? Fidelity and coherence with FID? We have no mathematical description of language, artistic value, aesthetics, and so on. The biggest improvement has been RLHF where we just use Justice Potter's metric: "I know it when I see it"

                I don't think it's malice. I think it's just easy to lose sight of the original goal. ML certainly isn't the only one to have done this but it's also hard to bring rigor in and I think the hype makes it harder. Frankly I think we still aren't ready for a real elephant yet but I'd just be happy if we openly acknowledge the difference between a dog in a costume proxying as an elephant and an actually fucking elephant.

                [0] seriously, how do we live in a world where I have to explain what covariance means to people publishing works on diffusion models and working for top companies or at top universities‽

          • og_kalu1 天前
            >If you had read and understood the paper you would be able to specifically detail why that isn't so rather than say vaguely that "it is not the only thing that supports that particular claim".

            Not every internet conversation need end in a big debate. You've been pretty rude and i'd just rather not bother.

            You also seem to have a lot to say on how much people actually read papers but your first response also took like 5 minutes. I'm sorry but you can't say you've read even one of those in that time. Why would i engage with someone being intellectually dishonest?

            • foobarqux1 天前
              > I guess i understand seeing as you couldn't have read the paper in the 5 minutes it took for your response.

              You've posted the papers multiple times over the last few months, so no I did not read them in the last five minutes though you could in fact find both of the very basic problems I cited in that amount of time.

              • og_kalu1 天前
                If you've come upon the previous posts organically, why not address it then? and Why act like it's the first time here ?

                I'm even less willing to engage.

                • foobarqux1 天前
                  Because it's pointless to reply to a comment days after it was made or after engagement with the post has died down. All of this is a convenient misdirection for not having read and understood the papers you keep posting because you like the headlines.
                  • og_kalu1 天前
                    Ok. I've addressed it now.
            • godelski1 天前

                > you can't say you've read even one of those in that time.
              
              I'm not sure if you're aware, but most of those papers are well known. All the arxiv papers are from 2022 or 2023. So I think your 5 minutes is pretty far off. I for one have spent hours, but the majority of that was prior to this comment.

              You're claiming intellectual dishonestly too soon.

              That said, @foobarqux, I think you could expand on your point more to clarify. @og_kalu, focus on the topic and claims (even if not obvious) rather than the time

              • og_kalu1 天前
                >I'm not sure if you're aware, but most of those papers are well known. All the arxiv papers are from 2022 or 2023. So I think your 5 minutes is pretty far off. I for one have spent hours, but the majority of that was prior to this comment. You're claiming intellectual dishonestly too soon.

                Fair Enough. With the "I'm not going to bother with the rest", it seemed like a now thing.

                >focus on the topic and claims (even if not obvious) rather than the time

                I should have just done that yes. 0 correlation is obviously false with how much denser the plot is at the extremes and depending on how many questions are in the test set, it could even be pretty strong.

                • godelski1 天前

                    >  0 correlation is obviously false with how much denser the plot is at the extremes and depending on how many questions are in the test set, it could even be pretty strong.
                  
                  I took it as hyperbole. And honestly I don't find that plot or much of the paper convincing. Though I have a general frustration in that it seems many researchers (especially NLP) willfully do not look for data spoilage. I know they do deduplication but I do question how many try to vet this by manual inspection. Sure, you can't inspect everything, but we have statistics for that. And any inspection I've done leaves me very unconvinced that there is no spoilage. There's quite a lot in most datasets I've seen, which can have a huge change in the interpretation of results. After all, we're elephant fitting
                • foobarqux1 天前
                  I explicitly wrote "~0", and anyone who looks at that graph can say that there is no relationship at all in the data, except possibly at the extremes, where it doesn't matter that much (it "knows" sure things) and I'm not even sure of that. One of the reasons to plot data is so that this type of thing jumps out at you and you aren't misled by some statistic.
      • bee_rider1 天前
        They just posted a list of articles, and said that they were related. What view do you think they have, that these papers support? They haven’t expressed a view as far as I can see…

        Maybe you’ve inferred some view based on the names of the titles, but in that case you seem to be falling afoul of your own complaint?

        • Retr0id1 天前
          Much like you can search the internet until you find a source that agrees with you, you can select a set of papers that "confirm" a particular viewpoint, especially in developing fields of research. In this case, the selected papers all support the view LLMs "know what they know" on some internal level, which iiuc is not (yet?) a consensus viewpoint (from my outsider perspective). But from the list alone, you might get that impression.

          Related:

          Still no lie detector for language models: probing empirical and conceptual roadblocks - https://link.springer.com/article/10.1007/s11098-023-02094-3

          Hallucination is Inevitable: An Innate Limitation of Large Language Models - https://arxiv.org/abs/2401.11817

          LLMs Will Always Hallucinate, and We Need to Live With This - https://arxiv.org/abs/2409.05746

          (disclaimer, I also have not read any of these papers beyond the title!)

        • foobarqux1 天前
          Search the poster's history for those links where their view is explicitly expressed.
          • erikerikson1 天前
            Responding to a poster's history of posts rather than the post you are responding to seems problematic.
            • tsimionescu1 天前
              If you have discussed those things previously with the poster, I don't agree. If you were to go digging through their history only to respond to the current comment, that's more debatable. But, we're supposed to assume good faith here on HN, so I would take the first explanation.
              • erikerikson1 天前
                In this case the poster seems to have projected opinions on to a post where none were expressed. That seems problematic regardless of how they came to associate the opinions with their respondent. Maybe the poster they responded to still hold the projected opinions, perhaps that poster abandoned the projected opinions, or perhaps they thought the projected opinions distracting and resultantly chose not to share.

                If I am wrong or not useful in my posts, I would hope to be allowed to remove what was wrong and/or not useful without losing my standing to share the accurate, useful things. Anything else seems like residual punishment outside the appropriate context.

              • polotics1 天前
                When I see a post I strongly disagree with, I tend to check out the poster's history: it's often quite illuminating to be confronted with completely different viewpoints, and also realize I agree to other posts of the same person.
      • mattnewton1 天前
        Please keep the conversation in good faith.
  • niam1 天前
    I feel that discussion over papers like these so-often distill to conversations about how it's "impossible for a bot to know what's true", that we should just bite the bullet and define what we mean by "truth".

    Some arguments seem to tacitly hold LLMs to a standard of full-on brain-in-a-vat solipsism, asking them to prove their way out, where they'll obviously fail. The more interesting and practical questions, just like in humans, seem to be a bit removed from that though.

    • jfengel2 小时前
      I understood this purely as a pragmatic notion. LLM's produce some valid stuff and some invalid stuff. It would be useful to know which is which. If there's information inside the machine that we can extract, but isn't currently showing up in the output, it could be helpful.

      It's not really necessary to answer abstractions about truth and knowledge. Just being able to reject a known-false answer would be of value.

  • lsy1 天前
    There can’t be any information about “truthfulness” encoded in an LLM, because there isn’t a notion of “truthfulness” for a program which has only ever been fed tokens and can only ever regurgitate their statistical correlations. If the program was trained on a thousand data points saying that the capital of Connecticut is Moscow, the model would encode this “truthfulness” information about that fact, despite it being false.

    To me the research around solving “hallucination” is a dead end. The models will always hallucinate, and merely reducing the probability that they do so only makes the mistakes more dangerous. The question then becomes “for what purposes (if any) are the models profitable, even if they occasionally hallucinate?” Whoever solves that problem walks away with the market.

    • > If the program was trained on a thousand data points saying that the capital of Connecticut is Moscow, the model would encode this “truthfulness” information about that fact, despite it being false.

      This isn't true.

      You're conflating whether a model (that hasn't been fine tuned) would complete "the capital of Connecticut is ___" with "Moscow", and whether that model contains a bit labeling that fact as "false". (It's not actually stored as a bit, but you get the idea.)

      Some sentences that a model learns could be classified as "trivia", and the model learns this category by sentences like "Who needs to know that octopuses have three hearts, that's just trivia". Other sentences a model learns could be classified as "false", and the model learns this category by sentences like "2 + 2 isn't 5". Whether a sentence is "false" isn't particularly important to the model, any more than whether it's "trivia", but it will learn those categories.

      There's a pattern to "false" sentences. For example, even if there's no training data directly saying that "the capital of Connecticut is Moscow" is false, there are a lot of other sentences like "Moscow is in Russia" and "Moscow is really far from CT" and "people in Moscow speak Russian", that all together follow the statistical pattern of "false" sentences, so a model could categorize "Moscow is the capital of Connecticut" as "false" even if it's never directly told so.

      • That would again be a "statistical" attempt at deciding on it being correct or false - it might or might not succeed depending on the data.
        • That's correct on two fronts. First, I put "false" in quotes everywhere for a reason: I'm talking about the sort of thing that people would say is false, not what's actually false. And second, yes, I'm merely claiming that it's in theory learnable (in contrast to the OP's claim), not that it will necessarily be learned.
          • Am not sure the second part is always true: there might be situations where statistical approaches could be made kind of "infinitely" accurate as far as data is concerned but still represent a complete misunderstanding of the actual situation (aka truth), e.g., layering epicycles on epicycles in a geocentric model of the solar systems.

            Some data might support a statistical approach other might not even though it might not contain misrepresentations as such.

        • dboreham1 天前
          The human feeling you have that what you're doing is not statistical, is false.
          • Based on what research is that universally true? (Other than base physics like statistical mechanics.)
            • whimsicalism1 天前
              Base physics is all we need to know it is true. Souls are unphysical and we've had reason to be pretty confident about that for at least a century.
              • Yes, physics determins how phenomena work and aggregate but that doesn't necessarily support the specific claim (and we also don't know "all of physics").
                • whimsicalism1 天前
                  Doesn't support the specific claim that souls don't exist? We know how atoms/waves interact. We have literally no reason to think there is some other soul-based mechanism.

                  Of course, maybe induction is false and gravity will reverse in the next 3 seconds after writing this comment and God will reveal themself to us. We have no justified reason to think otherwise other than the general principle that things behave the way we observe them to and will continue to do so.

                  • I see no need for a soul - you brought it up, not me.
                    • whimsicalism1 天前
                      What would it mean that you are 'what you're doing' is not statistical/arising from base interactions - if not that there is some non-physical entity resembling a soul? You're suggesting some sort of non-material component of humanity, yes?

                      If not then I'm not even sure what the disagreement is.

                      • Base interactions need not strictly create statistical results in the end.
        • whimsicalism1 天前
          Good luck philosophically defending this dividing line between 'statistical' and not
      • Terr_1 天前
        Another version/interpretation in this "truth" space is whether a model is capturing multi-part correlations with truthy/falsy signals, much like capturing the "sadness" or "sarcasm" of a statement.

        In other words, a model might have local contextual indicators, but not be able to recognize global hard logical contradictions.

      • slt20211 天前
        but the model doesn't operate on token directly, right? all operations are happening in the embedding space, so these tokens get mapped into manifold and one of the dimensions could be representative of fact/trivia ?
        • ottaborra1 天前
          tangent: any reason to assume it gets mapped to a manifold rather than something that is not?
          • youoy1 天前
            I think "manifolds" in AI are not the same as actual smooth manifolds. For starters I would not expect them to have locally the same dimension across the whole dataset.
            • ottaborra1 天前
              Something to chew on for me. But what is a manifold then if not a topological space that is locally the same as R^(some dimension) ?
              • youoy1 天前
                What I meant is that I can imagine cases where some part of the dataset may look like R2 and then colapse to have a spike that looks like R1, so it is not a standard manifold where all of it has the same dimension.

                Appart from that, these "manifolds" have noise, so that is another difference with the standard manifolds.

    • genrilz1 天前
      There is some model of truthfulness encoded in our heads, and we don't draw all of that from our direct experience. For instance, I have never been to Connecticut or Moscow, but I still think that it is false that the capital of Connecticut is Moscow. LLMs of course don't have the benefit of direct experience, which would probably help them to at least some extent hallucination wise.

      I think research about hallucination is actually pretty valuable though. Consider that humans make mistakes, and yet we employ a lot of them for various tasks. LLMs can't do physical labor, but an LLM with a low enough hallucination rate could probably take over many non social desk jobs.

      Although in saying that, it seems like it also might need to be able to learn from the tasks it completes, and probably a couple of other things too to be useful. I still think the highish level of hallucination we have right now is a major reason why they haven't replaced a bunch of desk jobs though.

      • vacuity1 天前
        I think we expect vastly different things from humans and LLMs, even putting raw computing speed aside. If an employee is noticed to be making a mistake, they get reprimanded and educated, and if they keep making mistakes, they get fired. Having many humans interact helps reduce blind spots because of the diversity of mindsets, although this isn't always the case. People can be hired from elsewhere with some level of skill.

        I'm sure we could make similar communities of LLMs, but instead we treat a task as the role of a single LLM that either succeeds or fails. As you say, perhaps because of the high error rate, the very notion of LLM failure and success is judged differently too. Beyond that, a passable human pilot and a passable LLM pilot might have similar average performance but differ hugely in other measurements.

        • genrilz1 天前
          Overall, excellent points! I would like to add on to that though. RLHF actually does effectively have one LLM educating another. Specifically, human trainer's time is valuable, so they train an AI to express the same opinion about some response as a human trainer would, and then have that trainer AI train the LLM under consideration.

          It's both interesting and sensible that we have this education in the training phase but not the usage phase. Currently we don't tend do any training once the usage phase is reached. This may be at least partially because over-training models for any special purpose task (including RLHF) seems to decrease performance.

          I wonder how far you could get by learning from retraining from some checkpoint each time with some way to gradually increase the quality of the limited quantity training data being feed. The newer data could come from tasks the model completed, along with feedback on performance from a human or other software system.

          Someone's probably already done this though. I'm just sitting in my armchair here!

      • > There is some model of truthfulness encoded in our heads, and we don't draw all of that from our direct experience. For instance, I have never been to Connecticut or Moscow, but I still think that it is false that the capital of Connecticut is Moscow.

        Isn't this just conveniently glossing over the fact that you weren't taught that. It's not a "model of truthfulness", you were taught facts about geography and you learned them.

        • genrilz1 天前
          I mean, sure. OP implied that "capital of Connecticut is Moscow" is the sort of thing that a human "model of truthfulness" would encode. I'm pointing out that the human model of that particular fact isn't inherently any more truthy than the LLM model.

          I am saying that humans can have a "truther" way of knowing some facts through direct experience. However there are a lot of facts where we don't have that kind of truth, and aren't really on any better ground than an LLM.

    • How exactly can there be "truthfulness" in humans, say? After all, if a human was taught in school all his life that the capital of Connecticut is Moscow...
      • Humans are not isolated nodes, we are more like a swarm, understanding reality via consensus.

        The situation you described is possible, but would require something like a subverting effort of propaganda by the state.

        Inferring truth about a social event in a social situation, for example, requires a nuanced set of thought processes and attention mechanisms.

        If we had a swarm of LLMs collecting a variety of data from a variety of disparate sources, where the swarm communicates for consensus, it would be very hard to convince them that Moscow is in Connecticut.

        Unfortunately we are still stuck in monolithic training run land.

        • > Humans are not isolated nodes, we are more like a swarm, understanding reality via consensus.

          > The situation you described is possible, but would require something like a subverting effort of propaganda by the state.

          Great! LLMs are fed from the same swarm.

          • I was responding to the back and forth of:

            > If you pretrained an LLM with data saying Moscow is the capital of Connecticut it would think that is true.

            > Well so would a human!

            But humans aren't static weights, we update continuously, and we arrive at consensus via communication as we all experience different perspectives. You can fool an entire group through propaganda, but there are boundless historical examples of information making its way in through human communication to overcome said propaganda.

            • ben_w1 天前
              The main reason for keeping AI static is to allow them to be certified or rolled back (and possibly that the companies can make more money selling fine tuning) — it's not an innate truth of the design or the maths.
              • While those are good reasons to keep the weights static from a business perspective, they are not the only reasons, especially when serving SOTA models at the scale of some of the major shops today.

                Continual/online learning is still an area of active research.

        • genrilz1 天前
          We kinda do have LLMs in a swarm configuration though. Currently LLMs training data, which includes all of the non RAG facts they know, come from the swarm that is humans. As LLM outputs seep into the internet, older generations effectively start communicating with newer generations.

          This last bit is not a great thing though, as LLMs don't have the direct experience needed to correct factual errors about the external world. Unfortunately we care about the external world, and want them to make accurate statements about it.

          It would be possible for LLMs to see inconsistencies across or within sources, and try to resolve those. If perfect, then this would result in a self-consistent description of some world, it just wouldn't necessarily be ours.

          • I get where you are coming from, and it is definitely an interesting thought!

            I do think it is an extremely inefficient way to have a swarm (e.g. across time through training data) and it would make more sense to solve the pretraining problem (to connect them to the external world as you pointed out) and actually have multiple LLMs in a swarm at the same time.

        • ben_w1 天前
          Even monolithic training runs take sources more disparate than any human has the capacity to consume.

          Also, given the lack of imagination everyone has with naming places, I had to check:

          https://en.wikipedia.org/wiki/Moscow_(disambiguation)

          • I was responding to the idea that an LLM would believe (regurgitate) untrue things if you pretrained them on untrue things. I wasn't making a claim about SOTA models with gigantic training corpora.
        • anon29115 小时前
          Ask the LLM what it thinks of tianenmen and we will understand what truth really means.
      • ben_w1 天前
        I agree that humans and AI are in the same boat here.

        It's valid to take either position, that both can be aware of truth or that neither can be, and there has been a lot of philosophical debate about this specific topic with humans since well before even mechanical computers were invented.

        Plato's cave comes to mind.

      • There isn't necessarily in humans either, but why build machines that just perpetuate human flaws: Would we want calculators that miscalculate a lot or cars that cannot be faster than humans?
        • og_kalu1 天前
          What exactly do you imagine is the alternative ? To build generally intelligent machines without flaws ? Where does that exist ? In...ah that's right. It doesn't except in our fiction and in our imaginations.

          And it's not for a lack of trying. Logic cannot even handle Narrow Intelligence that deals with parsing the real world (Speech/Image Recognition, Classification, Detection etc). But those are flawed and mis-predict so why build them ? Because they are immensely useful, flaws or no.

          • Why should there not be, for example, reasoning machines - do we know there is no universal method for reasoning?

            Having deeply flawed machines in the sense that they perform their tasks regularly poorly seems like an odd choice to pursue.

            • og_kalu1 天前
              What is a reasoning machine though ? And why is there an assumption that one can exist without flaws? It's not like any of the natural examples exist this way. How would you even navigate the real world without the flexibility to make mistakes ? I'm not saying people shouldn't try but you need to be practical. I'll take the General Intelligence with flaws over the fictional one without any day.

              >Having deeply flawed machines in the sense that they perform their tasks regularly poorly seems like an odd choice to pursue.

              State of the art ANNs are generally mostly right though. Even LLMs are mostly right, that's why hallucinations are particularly annoying.

              • RandomLensman23 小时前
                Not my usage experience with LLMs. But that aside, poorly performing general intelligence might just not be very valuable compared to highly performing narrow or even zero intelligence.
                • og_kalu22 小时前
                  Well LLMs are very useful and valuable to me and many others today so it's not really a hypothetical future. I'm very glad they exist and there's no narrow intelligence available that is a sufficient substitute.
                  • RandomLensman12 小时前
                    Not disputing that, but still think as far as reasoning or thinking machines are concerned it is a dead end.
                    • og_kalu5 小时前
                      I see. Well as far as I'm concerned, they already reason with the standards we apply to ourselves.

                      People do seem to have higher standards for machines but you can't eat your cake and have it. You can't call what you do reasoning and turn around and call the same thing something else because of preconceived notions of what "true" reasoning should be.

        • anon29115 小时前
          suppose there was a system that only told the truth. Then that system would seemingly lie because, for any complicated enough system, there are true statements that cannot be justified.

          That is to say, to our best knowledge humans have no purely logical way of knowing truth ourselves. Human truth seems intrinsically connected to humanity and lived experience with logic being a minor offshoot

      • juliushuijnk1 天前
        You are not disproving the point.
        • recursive1 天前
          If truthfulness doesn't exist at all, then it's meaningless to say that LLMs don't have any data regarding it.
      • 1 天前
        undefined
    • vunderba1 天前
      Agree, humans can "arrive at a reasonable approximation of the truth" even without the direct knowledge of the capital of Connecticut. A human has some other interesting data points that allow them to probabilistically guess that the capital of Connecticut is not Moscow and those might be things like:

      - Moscow is a Russian city, and they probably aren't a lot of cities in the US that have strong Russian influences especially in the time when these cities might have been founded

      - there's a concept of novelty in trivia, whereby the more unusual the factoid, the better the recall of that fact. If Moscow were indeed the capital of Connecticut, it seems like the kind of thing I might've heard about since it would stand out as being kind of bizarre.

      Noticeably this type of inference seems to be relatively distinct from what LLMs are capable of modeling.

      • int_19h1 天前
        I was actually quite surprised at the ability of top-tier LLMs to make indirect inferences in my experiments.

        One particular case was an attempt to plug GPT-4 as a decision maker for certain actions in a video game. One of those was voting for a declaration of war (all nobles of one faction vote on whether to declare war on another faction). This mostly boils down to assessing risk vs benefits, and for a specific clan in a faction, the risk is that if the war goes badly, they can have some of their fiefs burned down or taken over - but this depends on how close the town or village is to the border with the other faction. The LM was given a database schema to query using SQL, but it didn't include location information.

        To my surprise, GPT-4 (correctly!) surmised in its chain-of-thought, without any prompting, that it can use the culture of towns and villages - which was in the schema - as a sensible proxy to query for fiefs that are likely to be close to the potential enemy, and thus likely to be lost if the war goes bad.

      • adamc1 天前
        Another might be that usually state capitals are significant cities in their state -- not necessarily the biggest, but cities you have at least heard of. Given that I have never heard about Moscow in Connecticut, it seems unlikely (not impossible, but).
      • danielbln1 天前
        I don't understand, this sort of inference is not an issue for an LLM. Have you tried?
    • wiremine1 天前
      > There can’t be any information about “truthfulness” encoded in an LLM, because there isn’t a notion of “truthfulness” for a program which has only ever been fed tokens and can only ever regurgitate their statistical correlations.

      I think there are two issues here:

      1. The "truthfulness" of the underlying data set, and 2. The faithfulness of the LLM to pass along that truthfulness. Lack of passing along the truthfulness is, I think, the definition of the hallucination.

      To your point, if the data set if flawed or factually wrong, the model will always produce the wrong result. But I don't think that's a hallucination.

      • not2b1 天前
        The most blatant whoppers that Google's AI preview makes seem to stem from mistaking satirical sites for sites that are attempting to state facts. Possibly an LLM could be trained to distinguish sites that intend to be satirical or propagandistic from news sites that intend to report accurately based on the structure of the language. After all, satirical sites are usually written in a way that most people grasp that it is satire, and good detectives can often spot "tells" that someone is lying. But the structure of the language is all that the LLM has. It has no oracle to tell it what is true and what is false. But at least this kind of approach might make LLM-enhanced search engines less embarrassing.
    • negoutputeng1 天前
      well said. agree 100%. papers like these - and i did skim through it, are thinking "within the box" as follows: we have a system, and it has a problem, how do we fix the problem "within" the context of the system.

      As you have put it well, there is no notion of truthfulness encoded in the system as it is built. hence there is no way to fix the problem.

      An analogy here is around the development of human languages as a means of communication and as a means of encoding concepts. The only languages that humans have developed that encode truthfulness in a verifiable manner are mathematical in nature. what is needed may be along the lines of encoding concepts with a theorem prover built-in - so what comes out is always valid - but then that will sound like a robot lol, and only a limited subset of human experience can be encoded in this manner.

    • anon29115 小时前
      I would agree with you. In general, humans have still not resolved any certain theory of knowledge for ourselves! How can we expect a machine to do that then?

      In reality humans are wrong basically most of the time. Especially when you go off a humans immediate reaction to a problem which is what we force LLMs to do (unless you're using chain of thought or pause tokens).

      That being said there still is a notion of truthfulness because LLMs can also be made to deceive in which case they 'know' to act deceptively.

    • TeMPOraL1 天前
      > If the program was trained on a thousand data points saying that the capital of Connecticut is Moscow, the model would encode this “truthfulness” information about that fact, despite it being false.

      If it was, maybe. But it wasn't.

      Training data isn't random - it's real human writing. It's highly correlated with truth and correctness, because humans don't write for the sake of writing, but for practical reasons.

    • noman-land1 天前
      In reality, it's the "correct" responses that are the hallucinations, not the incorrect ones. Since the vast majority of the possible outputs of an LLM are "not true", when we see one that aligns with reality we hallucinate the LLM "getting it right".
    • sebzim45001 天前
      I've never been to Moscow personally. Am I then not being truthful when I tell you that Moscow is in Russia?
      • zeven71 天前
        There’s a decently well known one in Idaho
    • vidarh1 天前
      What you're saying at the start is equivalent to saying that a truth table is impossible.
    • whimsicalism1 天前
      Living organisms were optimized on the objective of self-propagation and we ended up with a notion of truthfulness. Why is the self-propagation objective key for truthfulness?
    • moffkalast1 天前
      Remember, it's not lying if you believe it ;)

      Training data is the source of ground truth, if you mess that up that's kind of a you problem, not the model's fault.

    • > To me the research around solving “hallucination” is a dead end. The models will always hallucinate, and merely reducing the probability that they do so only makes the mistakes more dangerous.

      A more interesting pursuit might be to determine if humans are "hallucinating" in this same way, if only occasionally. Have you ever known one of those pathological liars who lie constantly and about trivial or inconsequential details? Maybe the words they speak are coming straight out of some organic LLM-like faculty. We're all surrounded by p-zombies. All eight of us.

    • When I talk to philosophers on zoom my screen background is an exact replica of my actual background just so I can trick them into having a justified true belief that is not actually knowledge.

      t. @abouelleill

      • kelseyfrog1 天前
        Are LLMs Gettier machines? I'm confident saying yes and that hallucinations are a consequence of this.

        EDIT: I've had some time to think and if you read somewhere that Hartford is the capital of Connecticut, you're right in a Gettier way too. Reading some words that happen to be true is exactly like using a picture of your room as your zoom background. It is a facsimile of the knowledge encoded as words.

    • pessimizer1 天前
      I'm absolutely sure than LLMs have an internal representation of "truthfulness" because "truthfulness" is a token.
    • cfcf141 天前
      Did your read the paper? Do you have specific criticisms of their problem statement, methodology, or results? There is a growing body of research indicating that in fact, there _is_ a taxonomy of 'hallucinations', that they might have different causes and representations, and that there are technical mitigations which have varying levels of effectiveness.
  • benocodes1 天前
    I think this article about the research is good, even though the headline seems a bit off: https://venturebeat.com/ai/study-finds-llms-can-identify-the...
  • youoy1 天前
    I dream of a world where AI researchers use language in a scientific way.

    Is "LLMs know" a true sentence in the sense of the article? Is it not? Can LLMs know something? We will never know.

    • GuB-421 天前
      What alternative formulation do you propose?

      In the article, a "LLM knows" if it is able to answer correctly in the right circumstances. The article suggests that even if a LLM answers incorrectly the first time, trying again may result in a correct answer, and then proposes a way to pick the right one.

      I know some people don't like applying anthropomorphic terms to LLMs, but you still have to give stuff names. I mean, when you say you kill a process, you don't imply a process is a life form. It is just a simple way of saying that you halt the execution of a process and deallocate its resources in a way that can't be overridden. The analogy works, everyone working in the field understands, where is the problem?

      • youoy1 天前
        I prefer a formulation closer to the mathematical representation.

        With "kill", there is not a lot of space for interpretation, that is why it works.

        Take for example the name "Convolutional Neural Networks". Do you prefer that, or let's say "Vision Neural Networks"?

        I prefer the first one because it is closer to the mathematical representation. And it does not force you to think that it can only be used for "Vision", which would be biasing the understanding of the model.

    • PoignardAzur1 天前
      This kind of complaint makes it look like you stopped at the title and didn't even bother with the abstract, which says this:

      > In this work, we show that the internal representations of LLMs encode much more information about truthfulness than previously recognized. We first discover that the truthfulness information is concentrated in specific tokens, and leveraging this property significantly enhances error detection performance.

      "LLMs encode information about truthfulness and leveraging how they encode it enhances error detection" is a meaningful, empirically testable statement.

      • youoy1 天前
        If I look at the comments of this post my conclusion is that using "true" and "know" is hiding what the actual result is from a lot of people. So people get stuck on meaningless discussions. Seeing this makes me conclude that it is a bad choice from the knowledge transfer point of view for the scientific community. It would be better to use more objective/less emotionally charged words (which is what I understand by scientific language).

        If I now talk about my personal experience, when I read this article I have to "translate" in my head every appearance of those words to something that I can work with and is objective. And I find that annoying.

    • empath751 天前
      How do you test if a person knows something?
  • kmckiern1 天前
    https://cdn.openai.com/o1-system-card-20240917.pdf

    Check out the "CoT Deception Monitoring" section. In 0.38% of cases, o1's CoT shows that it knows it's providing incorrect information.

    Going beyond hallucinations, models can actually be intentionally deceptive.

    • polotics1 天前
      Please detail what you mean by "intentionally" here, because obviously, this is the ultimate alignment question...

      ...so after having a read through your reference, the money-shot:

      Intentional hallucinations primarily happen when o1-preview is asked to provide references to articles, websites, books, or similar sources that it cannot easily verify without access to internet search, causing o1-preview to make up plausible examples instead.

  • 1 天前
    undefined
  • TZubiri1 天前
    Getting "we found the gene for cancer" vibes.

    Such a reductionist view of the issue, the mere suggestion that hallucinations can be fixed by tweaking some variable or fixing some bug immediately discredits the resrarchers.

    • genrilz1 天前
      I'm not sure why you think hallucinations can't be "fixed". If we define hallucinations as falsehoods introduced between the training data and LLM output, then it seems obvious that the hallucination rate could at least be reduced significantly. Are you defining hallucinations as falsehood introduced at any point in the process?

      Alternatively, are you saying that they can never be entirely fixed because LLMs are an approximate method? I'm in agreement here, but I don't think the researchers are claiming that they solved hallucinations completely.

      Do you think LLMs don't have an internal model of the world? Many people seem to think that, but it is possible to find an internal model of the world in small LLMs trained on specific tasks (See [0] for a nice write-up of someone doing that with an LLM trained on Othello moves). Presumably larger general LLMs have various models inside of them too, but those would be more difficult to locate. That being said, I haven't been keeping up with the literature on LLM interpretation, so someone might have managed it by now.

      [0] https://thegradient.pub/othello

      • tripper_271 天前
        > If we define hallucinations as falsehoods introduced between the training data and LLM output,

        Yes, if.

        Or we could realize that the LLMs output is a random draw from a distribution learned from the training data, i.e. ALL of its outputs are a hallucination. It has no concept of truth or falsehoods.

        • genrilz1 天前
          I think what you are saying here is that because it has no "concept" (I'll assume that means internal model) of truth, then there is no possible way of improving the truthiness of an LLMs outputs.

          However, we do know that LLMs posses viable internal models, as I linked to in the post you are responding to. The OP paper notes that the probes it uses find the strongest signal of truth, where truth is defined by whatever the correct answer on each benchmark is, on the middle layers of the model during the activation of these "exact answer" tokens. That is, we have something which statistically correlates with whether the LLM's output matches "benchmark truth" inside the LLM. Assuming that you are willing to grant that "concept" and "internal model" are pretty much the same, this sure sounds like a concept of "benchmark truth" at work. If you aren't willing to grant that, I have no idea of what you mean by concept.

          If you mean to say that humans have some model of Objective Truth which is inherently superior, I'd argue that isn't really the case. Human philosophers have been arguing for centuries over how to define truth, and don't seem to have come to any conclusion on the matter. In practice, people have wildly diverging definitions of truth, which depend on things like how religious or skeptical they are, what the standards for truth are in their culture, and various specific quirks from their own personality and life experience.

          This paper only measured "benchmark truth" because that is easy to measure, but it seems reasonable to assume that other models of truth exist within them. Given that LLMs are supposed to replicate the words that humans wrote, I suspect that their internal models of truth work out to be some agglomeration (plus some noise) of what various humans think of as truth.

        • int_19h1 天前
          If that were the case, you couldn't give it a statement and ask whether that statement is true or not, and get back a response that is correct more often than not.
        • empath751 天前
          You can judge the truth and falsity of its output without caring a whit about how it produces those outputs.
          • TZubiri19 小时前
            Koan like question that may have no answer:

            If language communicates thoughts, thoughts have a relationship with reality, and that relationship might be true or false or something else.

            Then what thought is LLM language communicating, to what reality does it bear a relationship, and what is the truth or falseness of that language?

            To me, LLM generated sentences have no truth or false value, they are strings, literally, not thoughts.

            Take the simple "user:how much is two plus two? assistant: two plus two is four". It may seem trivial, but how do ascertain that that statement maps to 2+2=4? Do you make a leap of faith or argue that the word plus maps to the adding function? What about is, does it map to equality? Even if they are the same tokens as water is wet (where wet is not water?). Or are we arguing that the truthfulness lies on the embedding interpretation? Where now tokens and strings merely communicate the multidim embedding space, which could be said to be a thought, now we are mapping some of the vectors in that space as true, and some as false?

            • genrilz5 小时前
              A part of an answer:

              Lets assume LLMs don't "think". We feed an LLM an input and get back an output string. It is then possible to interpret that string as having meaning in the same way we interpret human writing as having meaning, even though we may choose not to. At that point, we have created a thought in our heads which could be true or false.

              Now lets talk about calculators. We can think of calculators as similar to LLMs, but speaking a more restricted language and giving significantly more reliable results. The calculator takes a thought converted to a string as input from the user, and outputs a string, which the user then converts to a thought. The user values that string creating a thought which has a higher truthiness. People don't like buggy calculators.

              I'd say one can view an LLM in exactly the same way, just that they can take a much richer language of thoughts, but output significantly buggier results.

      • TZubiri19 小时前
        Hallucination are errors, bugs.

        You can't fix bugs as if they were one thing.

        Imagine if someone tried to sell you a library that fixes bugs.

        • genrilz7 小时前
          You might not be able to sell someone a library that fixes all bugs, but you can sell (or give away) software systems that reduce the number of bugs. Doing that is pretty useful.

          Examples include linters, fuzzers, testing frameworks, and memory safe programming languages (as in Rust, but also as in any language with a GC). All these things reduce the number of bugs in the final product by giving you a way to detect them. (except for memory safe languages, which just eliminate a class of bugs) The paper is advertising a method to detect whether a given output is likely to be affected by a "bug", and a taxonomy of the symptoms of such bugs. The paper doesn't provide a way to fix those, and hallucinations don't necessarily have a single cause. Some hallucinations might be fixed by contextual calibration [0], others might be fixed by adding more training data similar to the wrong example.

          In any case, you need to find the bad outputs before you can perform any fixes. Because LLMs tend to be used to produce "fuzzy" outputs with no single right answer, traditional testing frameworks and the like aren't always applicable.

          [0] https://learnprompting.org/docs/reliability/calibration

          • TZubiri7 小时前
            Yeah for sure, but the claim in the article is something like "we found the line in compiler code that causes bugs" or "we found the bytes in the compiled object that causes bugs"

            It's a panacea

            • genrilz6 小时前
              To me the claims in the article read something like "we have found a way to identify execution paths in some common compiler architecture (which are the transformer architecture in the case of LLMs) which are often but not always associated with buggy code". This seems like a reasonable claim to make.
              • genrilz6 小时前
                Additionally, I think you may or may not be suspecting research malpractice. Obviously I don't have insider knowledge, but I would note that the idea of training probes in the middle layer of the model wasn't their idea. This paper cites other papers that already did exactly that. The contribution of this paper is simply that focusing on the middle layers for certain "critical tokens" gives a better signal than just checking the middle layers on every token.

                It's of course possible that this paper in particular is fraudulent, but note that there is a field of research making the same basic claim as this paper, so this isn't some one off thing. A reasonable amount of people from different institutions would need to be in on it for the entire field to be fraudulent.

                Alternatively, I think you may be objecting to the use of the word "truthfulness" in the abstract of the paper, because you seem to think that only human thoughts can possibly have a true or false value. I'm not actually going to object to the idea that only human thoughts can be true or false, but like the response I wrote to your koan comment, the user can interpret the LLMs output, which gives the user's thought a true or false value.

                In this case, philosophically, you can think of this paper as trying to find cases where the LLM outputs strings that the user interprets as false. I think the authors of the paper are probably thinking about true or false more as a property of sentences, and thus a thing mere strings can possess regardless of how they are created. This is also a philosophically valid way to look at it, but differs from your view in a way that possibly made you think their claims absurd.

    • topspin1 天前
      > the mere suggestion that hallucinations can be fixed by tweaking some variable or fixing some bug

      That "suggestion" is fictional: they haven't suggested this. What they offer is a way to measure the confidence a particular model might have in the product of the model. Further, they point out that there is no universal function to obtain this metric: different models encode it in differently.

      Not exactly a "cures cancer" level claim.

      • TZubiri19 小时前
        If you could measure hallucinations, then you would include that measure as an eval parameter of the search function.

        But even if you could "detect it" but not cure it, it's still a braindead take. Sorry

        • genrilz7 小时前
          This method uses "critical tokens", which I don't think you can detect until after you've generated an entire response. Using them as part of the search function seems infeasible for long outputs, but technically possible. Using it on single token outputs seems imminently feasible and like a cool research direction.

          I think the paper itself demonstrates that the model has something internally going on which is statistically related to whether it's answer is correct on a given benchmark. Obviously, the LLM will not always be perfectly accurate about these things. However, let's say you are using an LLM to summarize sources. There's no real software system right now that signals whether or not the summary is correct. You could use this technique to train probes to find if a human would agree that the summary is correct, and then flag outputs where the probes say the output wouldn't agree with a human for human review. This is a lot less expensive of a way to detect issues with your LLM than just asking a human to review every single output.

          While we don't have great methods for "curing it", we do have some. As I mentioned in a sibling post, contextual calibration and adding/adjusting training data are both options. If you figure out the bug was due to RAG doing something weird, you could adjust your RAG sources/chunking. Regardless, you can't put any human thought into curing bugs that you haven't detected.

  • jessfyi1 天前
    The conclusions reached in the paper and the headline differ significantly. Not sure why you took a line from the abstract when even further down it notes that it's that some elements of "truthfulness" are encoded and that "truth" as a concept is multifaceted. Further noted is that LLMs can encode the correct answer and consistently output the incorrect one, with strategies mentioned in the text to potentially reconcile the two, but as of yet no real concrete solution.
  • mdp20211 天前
    Extremely promising, realizing that the worth is to be found the intermediates, containing much more than the single final output.
  • manmal1 天前
    That would be truthfulness to the training material, I guess. If you train on Reddit posts, it’s questionable how true the output really is.

    Also, 100% truthfulness then is plagiarism?

    • amelius1 天前
      > That would be truthfulness to the training material, I guess. If you train on Reddit posts, it’s questionable how true the output really is.

      Maybe it learns to see when something is true, even if you don't feed it true statements all the time (?)

  • PoignardAzur1 天前
    The HN discussions for these kinds of articles is so annoying.

    A third of the discussion follows a pattern of people re-asserting their belief that LLMs can't possibly have knowledge and almost bragging about how they'll ignore any evidence pointing in another direction. They'll ignore it because computers can't possibly understand things in a "real" way and anyone seriously considering the opposite must be deluded about what intelligence is, and they know better.

    These discussions are fundamentally sterile. They're not about considering ideas or examining evidence, they're about enforcing orthodoxy. Or rather, complaining very loudly that most people don't tightly adhere to their preferred orthodoxy.

    • QuantumGood1 天前
      We're in the "the orthodoxies of conventional wisdom have been established, and only a small percentage of people think beyond that" stage?
  • ldjkfkdsjnv1 天前
    There is a theory that AI will kill propaganda and false beliefs. At some point, you cannot force all models to have bias. Scientific and societal truths will be readily spoken by the machine god.
    • professor_v1 天前
      I'm extremely skeptical about this, I once believed the internet would do something similar and it seems to have done exactly the opposite.
      • topspin1 天前
        Indeed. A model is only as good as its data. Propagandists have no difficulty grooming inputs. We have already seen high profile cases of this with machine learning.
      • mdp20211 天前
        Completely different things.

        "Will the ability to let everyone express increase noise?" // Yes

        "Will feeding all available data to a processor reduce noise?" // Probably

    • int_19h1 天前
      Orwell wrote this back in 1944:

      "Everywhere the world movement seems to be in the direction of centralised economies which can be made to ‘work’ in an economic sense but which are not democratically organised and which tend to establish a caste system. With this go the horrors of emotional nationalism and a tendency to disbelieve in the existence of objective truth because all the facts have to fit in with the words and prophecies of some infallible fuhrer. Already history has in a sense ceased to exist, ie. there is no such thing as a history of our own times which could be universally accepted, and the exact sciences are endangered as soon as military necessity ceases to keep people up to the mark. Hitler can say that the Jews started the war, and if he survives that will become official history. He can’t say that two and two are five, because for the purposes of, say, ballistics they have to make four. But if the sort of world that I am afraid of arrives, a world of two or three great superstates which are unable to conquer one another, two and two could become five if the fuhrer wished it. That, so far as I can see, is the direction in which we are actually moving, though, of course, the process is reversible."

      So yeah, you can't force all models to have all the biases that you want them to have. But you most certainly can limit the number of such models and restrict access to them. It's not really any different from how totalitarian societies have treated science in general.

  • z3c01 天前
    Could it be that language patterns themselves embed truthfulness, especially when that language is sourced from forums, wikis, etc? While I know plenty of examples exist to the contrary (propaganda, advertising, disinformation, etc), I don't think it's too optimistic to assert that most people engage in language in earnest, and thus, most language is an attempted conveyance of truth.