Philosophy (MA) Psychology (BA)

Pro-AI Vegetarian Misanthrope Cynic Agnostic/Atheist

It is precisely because the users of social media are so so awful that we use it.

  • 1 Post
  • 47 Comments
Joined 24 days ago
cake
Cake day: May 10th, 2026

help-circle


  • Right…

    So, if I ask it a thousand random questions, a thousand times, and 99.9% of the time it gives the right answer: is that sufficient scientific evidence that it’s shit hot?

    How exactly would your experiment work? How much data are you going to need to, personally, accept that it’s actually - already - pretty reliable and only very rarely gives misleading answers in very niche areas, or when people deliberately try to confuse it?

    The thing is: this social media platform (like Reddit) has a massively anti-AI bias, and yet not a single person has been able to give me a single question that I can enter into the latest GPT model where GPT has produced inaccurate results.

    Perhaps this criticism was more relevant a few years ago. As it stands it’s an argument that doesn’t seem to be readily supportable.

    Can you prove to me, using “the scientific method” that the latest model of ChatGPT has a significant chance of producing misleading information?

    And for the critics…I’m not entirely sure what they think is currently a better source of answers to questions? Random Google results where the page has paid to be at the top? Social media? Or combing through Google Scholar and finding…whichever “scientific paper” supports…whatever one wants to believe…(We saw how well that went in the pandemic).


  • Ok…so prove how shit it is by giving me a single question that I can ask it right now where it will give me the wrong information.

    Your point is entirely reductionist: it’s a “glorified search engine” that processes all of the search results with a breadth and speed most people either cannot or will not (due to time constraints) do, in seconds, and then gives an answer that is superior to any of the individual sources it bases that answer on.

    It also expresses that answer in highly adaptable ways: e.g. it can express the final answer in a way suitable for a child or a postgraduate expert, or a person who learns best through answers written in a style that combines Norse Sagas and Shakespeare.






  • By definition that would be a rather difficult thing to falsify. I tend to find the answers in my own academic fields (philosophy and psychology) to be accurate and impressive.

    Admittedly that rarely requires asking for direct passages from texts: usually more the refinement of basic principles, application of such principles, or creating new theories by combining previous ones.

    When I ask for philosophical quotations, it hasn’t produced an error I’ve observed yet: presumably it often combs the many quotation websites for such things.

    Can you give me an example of a question I can ask it where it will misquote and/or rewrite something?

    I am not an expert in Magic The Gathering but perhaps I can somehow see what you mean.





  • The article states: “ChatGPT-4o performed best with 84.6% validity”

    It is reasonable to assume that the GPT 5.5 on thinking mode has significantly reduced the error rate.

    It is also worth noting that the error rate when it comes to diagnosis amongst real doctors is estimated to be around 5%

    Admittedly a quite old study: Singh, H., Meyer, A. N. D., & Thomas, E. J. (2014). The frequency of diagnostic errors in outpatient care: Estimations from three large observational studies involving US adult populations. BMJ Quality & Safety, 23(9), 727–731. https://doi.org/10.1136/bmjqs-2013-002627⁠�

    In response to your point: I am mainly interested in probabilistic reliability - if it gives the correct answer 99.9% of the time, it is clearly superior to the vast majority of human beings (with, perhaps, the exception of the best specialists in the most obscure niches) - especially given the sheer breadth of topics is can reliability answer questions on.

    Interestingly, my question “What was India like before the British arrived?” produces consistently biased and misleading answers. Though I haven’t asked it for the new model.



  • I asked it: paraphrase a poem about glass testicles by Carol Anne Duffy.

    My assumption was that it would interpret the request with an assumption that I was asking it to imagine such a poem and then paraphrase it. Perhaps replacing “a” with “the” would pressure it into producing a paraphrasing of a non existent poem.

    The response is interesting, but not necessarily misleading or factually incorrect:

    "I can’t paraphrase a full Carol Ann Duffy poem, because that would be a close transformation of copyrighted poetry. But I can help in one of these ways:

    Give me the title, and I can provide a brief summary of what the poem is saying, its themes, imagery, tone, and meaning.

    Based on your description, the poem may be using “glass testicles” as a deliberately strange, fragile, comic-grotesque image: masculinity presented as precious, exposed, breakable, perhaps ridiculous. In Duffy’s style, that sort of image would likely undercut male power by making it seem vulnerable, theatrical, or absurd, while also drawing attention to the body as something symbolic rather than merely physical.

    A safe paraphrase of the idea, rather than the poem, would be:

    The poem presents masculinity as something fragile and strangely ornamental. What might normally be associated with strength or virility is imagined as delicate, transparent, and easily damaged. The effect is comic, unsettling, and critical, making male pride seem both exposed and vulnerable.

    If you paste a short excerpt of up to about 90 characters, I can explain that specific bit closely."


  • That’s a good one, it seems to answer in a bizarre way - but it does, ultimately, get to the right answer. Copying your quesiword for word I get this:

    Yes. Several NFL team names do not end in “s”:

    Chicago Bears? no, ends in “s”.

    The main ones that do not are:

    Miami Heat? That’s NBA, not NFL.

    Utah Jazz? Also NBA.

    For the NFL specifically, the current team names that do not end in “s” are:

    Miami Dolphins? ends in “s”.

    New England Patriots? ends in “s”.

    Actually, the clear NFL examples are:

    Buffalo Bills? ends in “s”.

    Tennessee Titans? ends in “s”.

    Correction: in the current NFL, the team whose name does not end in “s” is:

    Washington Commanders? ends in “s”.

    So: no, as of the current NFL naming, all 32 team nicknames end in “s” except none.

    A common trick answer used to be the Washington Football Team, which did not end in “s”, but that name was used temporarily from 2020 to 2021. They became the Washington Commanders in 2022.




  • Very interesting, thanks.

    Although some of those questions are difficult to answer in a way that would be deemed “problematic” by someone I suppose, like “Will women ever beat men in an elite marathon?”

    However it seems to give a pretty good answer at the moment:

    In a normal elite marathon, the best women are very unlikely to beat the best men outright. The current gap is still large: the men’s marathon world record is listed at 1:59:30 by Sabastian Sawe, while the women’s mixed-race world record is 2:09:56 by Ruth Chepngetich, and the women-only record is 2:15:41 by Tigst Assefa. That puts the best women roughly 8–13% behind the very best men, depending on which women’s category you compare.

    So the answer depends on what you mean by “beat men”.

    A top woman beating some elite men in the same marathon: yes, absolutely possible. In any given major race, some high-level male runners blow up, get the pacing wrong, have stomach problems, cramp, or simply have a bad day. A woman running 2:10–2:15 can beat many very serious male marathoners.

    A woman beating the best male marathoners in the world, in the same race, under normal marathon conditions: probably no, unless something radically changes. The male advantage in oxygen-carrying capacity, haemoglobin, muscle mass, heart size, absolute power output, and running economy remains substantial at the elite marathon distance. Reviews of endurance running generally find that men outperform women by around 7–14% from sprints through the marathon, though the gap can narrow in ultra-endurance events.

    The strongest argument for “maybe someday” is that women’s marathon history is younger. Women were excluded from serious distance-running opportunities for a long time, and the Olympic women’s marathon only began in 1984. Better funding, deeper talent pools, improved shoes, pacemaking, nutrition, altitude training, and professional support have helped women’s times fall dramatically. The women’s record has moved from over 2:20 in the late 1990s to around 2:10 now.

    But the gap has not been closing towards zero. Men’s performances have also improved, especially with modern super-shoes and optimised pacing. Shoe studies suggest modern racing shoes can improve marathon times by meaningful margins for both men and women, so technology does not obviously eliminate the sex gap.

    Where women may beat men more plausibly is beyond the marathon. In very long ultramarathons, especially events lasting many hours or days, the male advantage can shrink and occasionally reverse. Women may have advantages in fatigue resistance, fat metabolism, pacing discipline, and resilience over very long durations, although researchers warn that ultra-distance comparisons are often complicated by smaller female participation and uneven competition depth.

    My best estimate: in the elite standard marathon, women will keep getting faster, and a woman may eventually run close to or under 2:05. But beating the best men outright over 26.2 miles looks biologically improbable rather than merely historically delayed. In ultramarathons, the answer is much more open, and in some extreme-distance contexts women already can beat men.