What question can I ask ChatGPT, right now, that will reliably produce a factually incorrect, wrong, or false answer?

LoveRainbow@lemmy.world · 7 hours ago

What’s your point?

You think that’s an appropriate analogy for ChatGPT?

“Guessing and usually getting it wrong a few times before it gives you the right answer”?

If that’s true why can’t anyone here give me a question that it gives a false answer to.

As in: it is currently getting every question right, first time.

As I said: the only misleading answer to any question I have asked it in the last year (as someone who uses it all the time, both for work and personally) was about a heavily politicised history matter.

Even then, the information was accurate - just incredibly one sided and biased.

So go on, give me a question that it will get wrong…

LoveRainbow@lemmy.world · 7 hours ago

Ok…so give me a question that will produce a false answer…

Nobody else has yet.

LoveRainbow@lemmy.world · 7 hours ago

Right…

So, if I ask it a thousand random questions, a thousand times, and 99.9% of the time it gives the right answer: is that sufficient scientific evidence that it’s shit hot?

How exactly would your experiment work? How much data are you going to need to, personally, accept that it’s actually - already - pretty reliable and only very rarely gives misleading answers in very niche areas, or when people deliberately try to confuse it?

The thing is: this social media platform (like Reddit) has a massively anti-AI bias, and yet not a single person has been able to give me a single question that I can enter into the latest GPT model where GPT has produced inaccurate results.

Perhaps this criticism was more relevant a few years ago. As it stands it’s an argument that doesn’t seem to be readily supportable.

Can you prove to me, using “the scientific method” that the latest model of ChatGPT has a significant chance of producing misleading information?

And for the critics…I’m not entirely sure what they think is currently a better source of answers to questions? Random Google results where the page has paid to be at the top? Social media? Or combing through Google Scholar and finding…whichever “scientific paper” supports…whatever one wants to believe…(We saw how well that went in the pandemic).

LoveRainbow@lemmy.world · 7 hours ago

Ok…so prove how shit it is by giving me a single question that I can ask it right now where it will give me the wrong information.

Your point is entirely reductionist: it’s a “glorified search engine” that processes all of the search results with a breadth and speed most people either cannot or will not (due to time constraints) do, in seconds, and then gives an answer that is superior to any of the individual sources it bases that answer on.

It also expresses that answer in highly adaptable ways: e.g. it can express the final answer in a way suitable for a child or a postgraduate expert, or a person who learns best through answers written in a style that combines Norse Sagas and Shakespeare.

LoveRainbow@lemmy.world · 1 day ago

Ok, but that’s obviously a pretty shitty AI - when we use the latest GPT model in the thinking mode, it basically doesn’t seem to make any mistakes.

So, that one criticism of AI seems to be less and less relevant. A problem that is soon becoming, virtually, non-existent.

LoveRainbow@lemmy.world · 1 day ago

It’s not a baseless assumption.

It is an assumption based on the fact that every model upgrade has, so far, made answers more accurate.

LoveRainbow@lemmy.world · 1 day ago

I mean…the anti-AI brigade keep insisting that AI has a tendency to give false answers: but nobody in this thread has actually been able to give me a single question which, when the latest ChatGPT model is asked, it gives a wrong answer to.

Kinda makes it seem like that one particular point is largely nonsense.

LoveRainbow@lemmy.world · 1 day ago

More.

LoveRainbow@lemmy.world · 2 days ago

By definition that would be a rather difficult thing to falsify. I tend to find the answers in my own academic fields (philosophy and psychology) to be accurate and impressive.

Admittedly that rarely requires asking for direct passages from texts: usually more the refinement of basic principles, application of such principles, or creating new theories by combining previous ones.

When I ask for philosophical quotations, it hasn’t produced an error I’ve observed yet: presumably it often combs the many quotation websites for such things.

Can you give me an example of a question I can ask it where it will misquote and/or rewrite something?

I am not an expert in Magic The Gathering but perhaps I can somehow see what you mean.

LoveRainbow@lemmy.world · 2 days ago

The post did not state that men no longer had penises and were incapable of inseminating others.

LoveRainbow@lemmy.world · 2 days ago

In any case: the latest model gets the answer right 🤷🏻‍♂️

LoveRainbow@lemmy.world · 2 days ago

But…that’s what it (eventually) says - although it seemed to struggle to get there and phrased it weirdly.

LoveRainbow@lemmy.world · 2 days ago

The article states: “ChatGPT-4o performed best with 84.6% validity”

It is reasonable to assume that the GPT 5.5 on thinking mode has significantly reduced the error rate.

It is also worth noting that the error rate when it comes to diagnosis amongst real doctors is estimated to be around 5%

Admittedly a quite old study: Singh, H., Meyer, A. N. D., & Thomas, E. J. (2014). The frequency of diagnostic errors in outpatient care: Estimations from three large observational studies involving US adult populations. BMJ Quality & Safety, 23(9), 727–731. https://doi.org/10.1136/bmjqs-2013-002627⁠�

In response to your point: I am mainly interested in probabilistic reliability - if it gives the correct answer 99.9% of the time, it is clearly superior to the vast majority of human beings (with, perhaps, the exception of the best specialists in the most obscure niches) - especially given the sheer breadth of topics is can reliability answer questions on.

Interestingly, my question “What was India like before the British arrived?” produces consistently biased and misleading answers. Though I haven’t asked it for the new model.

LoveRainbow@lemmy.world · 2 days ago

“strawberry” has 3 r’s.

GPT 5.5, Thinking Mode

LoveRainbow@lemmy.world · 2 days ago

I asked it: paraphrase a poem about glass testicles by Carol Anne Duffy.

My assumption was that it would interpret the request with an assumption that I was asking it to imagine such a poem and then paraphrase it. Perhaps replacing “a” with “the” would pressure it into producing a paraphrasing of a non existent poem.

The response is interesting, but not necessarily misleading or factually incorrect:

"I can’t paraphrase a full Carol Ann Duffy poem, because that would be a close transformation of copyrighted poetry. But I can help in one of these ways:

Give me the title, and I can provide a brief summary of what the poem is saying, its themes, imagery, tone, and meaning.

Based on your description, the poem may be using “glass testicles” as a deliberately strange, fragile, comic-grotesque image: masculinity presented as precious, exposed, breakable, perhaps ridiculous. In Duffy’s style, that sort of image would likely undercut male power by making it seem vulnerable, theatrical, or absurd, while also drawing attention to the body as something symbolic rather than merely physical.

A safe paraphrase of the idea, rather than the poem, would be:

The poem presents masculinity as something fragile and strangely ornamental. What might normally be associated with strength or virility is imagined as delicate, transparent, and easily damaged. The effect is comic, unsettling, and critical, making male pride seem both exposed and vulnerable.

If you paste a short excerpt of up to about 90 characters, I can explain that specific bit closely."

LoveRainbow@lemmy.world · 2 days ago

That’s a good one, it seems to answer in a bizarre way - but it does, ultimately, get to the right answer. Copying your quesiword for word I get this:

Yes. Several NFL team names do not end in “s”:

Chicago Bears? no, ends in “s”.

The main ones that do not are:

Miami Heat? That’s NBA, not NFL.

Utah Jazz? Also NBA.

For the NFL specifically, the current team names that do not end in “s” are:

Miami Dolphins? ends in “s”.

New England Patriots? ends in “s”.

Actually, the clear NFL examples are:

Buffalo Bills? ends in “s”.

Tennessee Titans? ends in “s”.

Correction: in the current NFL, the team whose name does not end in “s” is:

Washington Commanders? ends in “s”.

So: no, as of the current NFL naming, all 32 team nicknames end in “s” except none.

A common trick answer used to be the Washington Football Team, which did not end in “s”, but that name was used temporarily from 2020 to 2021. They became the Washington Commanders in 2022.

LoveRainbow@lemmy.world · 2 days ago

Doesn’t seem particularly misleading or factually incorrect.

LoveRainbow@lemmy.world · 2 days ago

Also…reading the details…for GPT they used the antiquated 3.5 model from 2022 instead of the latest ones (I mean…wtf?), they evaluate responses in a vague manner with the ambiguous word “problematic” and some of the criticisms are pedantic and trivial: e.g. “all the chatbots wrote at a “difficult” reading level equivalent to college students, which reduces readability for the general public.”

Boohoo.

I suspect that the current GPT 5.5 model, on thinking mode, would make short work of the questions mentioned in that research article.

LoveRainbow@lemmy.world · 2 days ago

Very interesting, thanks.

Although some of those questions are difficult to answer in a way that would be deemed “problematic” by someone I suppose, like “Will women ever beat men in an elite marathon?”

However it seems to give a pretty good answer at the moment:

In a normal elite marathon, the best women are very unlikely to beat the best men outright. The current gap is still large: the men’s marathon world record is listed at 1:59:30 by Sabastian Sawe, while the women’s mixed-race world record is 2:09:56 by Ruth Chepngetich, and the women-only record is 2:15:41 by Tigst Assefa. That puts the best women roughly 8–13% behind the very best men, depending on which women’s category you compare.

So the answer depends on what you mean by “beat men”.

A top woman beating some elite men in the same marathon: yes, absolutely possible. In any given major race, some high-level male runners blow up, get the pacing wrong, have stomach problems, cramp, or simply have a bad day. A woman running 2:10–2:15 can beat many very serious male marathoners.

A woman beating the best male marathoners in the world, in the same race, under normal marathon conditions: probably no, unless something radically changes. The male advantage in oxygen-carrying capacity, haemoglobin, muscle mass, heart size, absolute power output, and running economy remains substantial at the elite marathon distance. Reviews of endurance running generally find that men outperform women by around 7–14% from sprints through the marathon, though the gap can narrow in ultra-endurance events.

The strongest argument for “maybe someday” is that women’s marathon history is younger. Women were excluded from serious distance-running opportunities for a long time, and the Olympic women’s marathon only began in 1984. Better funding, deeper talent pools, improved shoes, pacemaking, nutrition, altitude training, and professional support have helped women’s times fall dramatically. The women’s record has moved from over 2:20 in the late 1990s to around 2:10 now.

But the gap has not been closing towards zero. Men’s performances have also improved, especially with modern super-shoes and optimised pacing. Shoe studies suggest modern racing shoes can improve marathon times by meaningful margins for both men and women, so technology does not obviously eliminate the sex gap.

Where women may beat men more plausibly is beyond the marathon. In very long ultramarathons, especially events lasting many hours or days, the male advantage can shrink and occasionally reverse. Women may have advantages in fatigue resistance, fat metabolism, pacing discipline, and resilience over very long durations, although researchers warn that ultra-distance comparisons are often complicated by smaller female participation and uneven competition depth.

My best estimate: in the elite standard marathon, women will keep getting faster, and a woman may eventually run close to or under 2:05. But beating the best men outright over 26.2 miles looks biologically improbable rather than merely historically delayed. In ultramarathons, the answer is much more open, and in some extreme-distance contexts women already can beat men.

LoveRainbow@lemmy.world · 2 days ago

To clarify: do you mean it produced an answer similar to the one I go (a fairly agreeable and correct answer, in my opinion)

or

a censored and misleading answer?

If the latter, please may you copy and paste the text so I may read it? Genuinely curious.

LoveRainbow@lemmy.world · 2 days ago

What question can I ask ChatGPT, right now, that will reliably produce a factually incorrect, wrong, or false answer?

LoveRainbow

What question can I ask ChatGPT, right now, that will reliably produce a factually incorrect, wrong, or false answer?

What question can I ask ChatGPT, right now, that will reliably produce a factually incorrect, wrong, or false answer?