By using this site, you agree privacy policies
Accept
Geek RoomGeek RoomGeek Room
  • Home
  • Tech
    TechShow More
    Split Technology Park welcomes first tenants: 26 MPSs and 6 startups
    October 31, 2024
    INNVEST Summit 2024: A premier event for innovation and economic competitiveness in the Western Balkans
    October 31, 2024
    Diaspora 4 Innovation: Kick-off event launches a new era for Albanian higher education
    October 31, 2024
    AI for good: Generative AI – Tirana chapter empowers Albanian Youth in tech innovation
    October 29, 2024
    Business Angel Summit 2024: Pioneering Investment and Startup Growth in Sarajevo
    October 29, 2024
  • Mobile
    MobileShow More
    Xiaomi 15 and 15 Pro set to launch on October 29: Official renders released
    October 24, 2024
    Dangerous virus infects millions of mobile phones through popular apps
    October 3, 2024
    The new iPhone 16 arrives in Croatia with a steep price tag
    September 26, 2024
    Beware of these phone numbers: Block them immediately to avoid scams
    September 11, 2024
    Beyond the brand: What really matters when buying a mobile phone
    September 5, 2024
  • Apps
    AppsShow More
    Shoppable widget by EmbedSocial: Revolutionizing E-commerce with authentic shopper content
    October 31, 2024
    Intel prevails in long-running legal battle against €1 billion EU fine
    October 31, 2024
    New definition of open source artificial intelligence released by OSI
    October 29, 2024
    CaSys introduces “Pay by Link” payment service for SMEs in Macedonia
    October 24, 2024
    Kickstarter surpasses $8 billion in donations across all projects
    October 17, 2024
  • Science
    ScienceShow More
    Sofia Tech Park: A thriving innovation hub for Southeast Europe
    October 29, 2024
    Breakthrough in prostate cancer treatment: Croatian scientists develop Vini, a tool to predict effective drug combinations
    October 24, 2024
    Digital Realty partners with Ecolab to pilot AI-powered water conservation solution
    October 24, 2024
    Sofia Tech Park to host the Southeast European Innovators Challenge Conference
    October 11, 2024
    ACG accelerates European growth with major expansion in Croatia
    October 9, 2024
  • Gaming
    GamingShow More
    “Windblown” – The new game from the creators of Dead Cells
    October 24, 2024
    Kraken Empire’s Journey and the creative brilliance of Toy Tactics
    October 21, 2024
    Serbian game studio Tricoman set to make a mark with their new RPG ‘Godforged’ on Steam
    October 16, 2024
    Release the demon with Kill Knight: A phenomenal combat experience with untapped potential
    October 14, 2024
    Nordeus launches new football game “Top Goal: Football Champion” in Serbia
    October 9, 2024
  • Cars
    CarsShow More
    Serbia signs strategic agreement with Hyundai Engineering for 1 GW of Solar Power
    October 16, 2024
    Stara Zagora: Poised to lead Bulgaria’s automotive revolution
    October 15, 2024
    Dacia unveils new Bigster: The flagship model for the C-SUV segment
    October 9, 2024
    Kineton Albania: Pioneering innovation in the automotive industry
    October 8, 2024
    Albania’s vehicle numbers surge in 2024: 73% of registered cars are over 15 years old
    August 20, 2024
  • Entertainment
    EntertainmentShow More
    Where are Generation Z’s famous tech entrepreneurs?
    October 29, 2024
    AllWeb offers special discounts for startups: A unique opportunity for networking and growth
    October 23, 2024
    Montenegro census reveals no ethnic majority, Montenegrins and Serbs nearly equal
    October 16, 2024
    “Primordial Passion” is the first luxury Albanian watch valued at €1.4 million by Argjendari Pirro
    October 15, 2024
    Albania takes the stage at BIG event Paris: Culture and innovation as economic drivers
    October 12, 2024
Search
Reading: I verified ChatGPT against Bard, Claude, and Copilot – this AI provided the most confidently incorrect
Notification Show More
Aa
Geek RoomGeek Room
Aa
  • Tech
  • Mobile
  • Apps
  • Science
  • Gaming
  • Cars
  • Entertainment
Search
  • Home
  • Tech
  • Mobile
  • Apps
  • Science
  • Gaming
  • Cars
  • Entertainment
Geek Room > Blog > Apps > I verified ChatGPT against Bard, Claude, and Copilot – this AI provided the most confidently incorrect
Apps

I verified ChatGPT against Bard, Claude, and Copilot – this AI provided the most confidently incorrect

Last updated: 2023/12/12 at 11:54 PM
Share
8 Min Read

Generative artificial intelligence (AI) is notoriously prone to factual errors. So, what do you do when you’ve asked ChatGPT to generate 150 presumed facts and you don’t want to spend an entire weekend confirming each by hand?

Contents
Anthropic ClaudeCopilot… or nopilot?BardChatGPTConclusions and cautions

Well, in my case, I turned to other AIs. In this article, I’ll explain the project, consider how each AI performed in a fact-checking showdown, and provide some final thoughts and cautions if you also want to venture down this maze of twisty, little passages that are all alike.

So here’s the thing. If GPT-4, the OpenAI large language model (LLM) used by ChatGPT Plus, generated the fact statements, I wasn’t entirely convinced it should be checking them. That’s like asking high school students to write a history paper without using any references, and then self-correct their work. They’re already starting with suspect information — and then you’re letting them correct themselves? No, that doesn’t sound right to me.

But what if we fed those facts to other LLMs inside of other AIs? Both Google’s Bard and Anthropic’s Claude have their LLMs. Bing uses GPT-4, but I figured I’d test its responses just to be completionist. As you’ll see, I got the best feedback from Bard, so I fed its responses back into ChatGPT in a round-robin perversion of the natural order of the universe.

Anthropic Claude

Claude employs the Claude 2 LLM, also utilized within Notion’s AI implementation. I provided it with a PDF containing the full set of facts (without pictures). Overall, Claude found the fact list to be mostly accurate, but it had some clarifications for three items. Due to the limit on the length of ChatGPT facts, nuance in the fact descriptions was inhibited, and Claude took issue with some of that lack of nuance. In general, it was an encouraging response.

Copilot… or nopilot?

Moving on to Microsoft’s Copilot, the renamed Bing Chat AI. Copilot doesn’t allow PDFs to be uploaded, so I attempted to paste in the text from all 50 state facts. This approach failed immediately because Copilot only accepts prompts of up to 2,000 characters.

I asked Copilot the following: “The following text contains state names followed by three facts for each state. Please examine the facts and identify any that are in error for that state.” Here’s what I got back:

It essentially repeated the fact data I asked it to check. Attempts to guide it with a more forceful prompt resulted in it providing the same data I asked it to verify. This output seemed odd, considering Copilot uses the same LLM as ChatGPT. Clearly, Microsoft has tuned it differently than ChatGPT. I gave up and moved onto Bard.

Bard

Google recently announced its new Gemini LLM. As I don’t have access to Gemini, I ran these tests on Google’s PaLM 2 model. Compared to Claude and Copilot, Bard excelled, or, in a more Shakespearian manner, it “doth bestride the narrow world like a Colossus.” Check out the results below:

However, there were discrepancies. I fed this list back to ChatGPT, and it found two discrepancies in the Alaska and Ohio answers. Bard’s fact-checking appears impressive, but it often misses the point and gets things just as wrong as any other AI.

Let’s consider Nevada and Area 51, as an example. ChatGPT mentioned, “Top-secret military base, rumoured UFO sightings.” Bard attempted to clarify, asserting, “Area 51 isn’t merely rumoured to have UFO sightings. It’s a genuine top-secret military facility, and its purpose is unknown.” Essentially, they convey similar information. The nuance lost in Bard’s response stems from the constraints of a concise word limit.

Another instance where Bard criticized ChatGPT without grasping the context was concerning Minnesota. While Wisconsin boasts numerous lakes, Bard didn’t assert that Minnesota has the most lakes. It simply labelled Minnesota as the “Land of 10,000 lakes,” a common slogan for the state.

Kansas became another point of contention for Bard. ChatGPT stated, “Home to the geographic centre of the contiguous US.” Bard argued it was South Dakota, valid when considering Alaska and Hawaii. However, ChatGPT specified “contiguous,” and in that context, the honour goes to a location near Lebanon, Kansas.

ChatGPT

Right away, I could tell Bard got one of its facts wrong – Alaska is far bigger than Texas. So, I wanted to see if ChatGPT could fact-check Bard’s claim. It’s commonly accepted that Wilbur and Orville Wright flew the first aircraft, although they built their Wright Flyer in Dayton, Ohio. However, ChatGPT took issue with Bard’s erroneous claim that Texas is the biggest state. It also had a bit of a tizzy over Ohio vs. Kansas as the birth of aviation, which is more controversial than most schools teach.

As you can see, ChatGPT took issue with Bard’s claim that Texas is the biggest state. It also disagreed over Ohio vs. Kansas as the birthplace of aviation, a topic more controversial than taught in most schools.

Conclusions and cautions

Let’s address something upfront: if you’re submitting a paper or a document where factual accuracy is crucial, it’s imperative to conduct your own fact-checking. Otherwise, your aspirations, akin to Texas, might find themselves overshadowed by an Alaska-sized problem.

As evidenced in our tests, the outcomes, much like Bard’s, might appear impressive but can be entirely or partially inaccurate. On the whole, delving into the realm of having various AIs crosscheck each other was intriguing, and it’s a process I intend to delve into further. However, the results only firmly established how inconclusive they were.

Copilot threw in the towel entirely, expressing a desire to return to its nap. Claude raised concerns about the nuance in a few responses. Bard came down hard on a multitude of answers, proving that, apparently, to err is not confined to human nature but extends to AI as well.

In conclusion, let me borrow the words of the venerable Bard himself and proclaim, “Confusion now hath made his masterpiece!”

You Might Also Like

Split Technology Park welcomes first tenants: 26 MPSs and 6 startups

INNVEST Summit 2024: A premier event for innovation and economic competitiveness in the Western Balkans

Shoppable widget by EmbedSocial: Revolutionizing E-commerce with authentic shopper content

Intel prevails in long-running legal battle against €1 billion EU fine

Diaspora 4 Innovation: Kick-off event launches a new era for Albanian higher education

Share This Article
Facebook Whatsapp Whatsapp Copy Link
Previous Article Research on Astrocytes establishes a link between Vitamin B12 and the advancement of multiple sclerosis
Next Article How to pin messages in WhatsApp individual and group chats

Social networks

Instagram Follow

Latest news

Split Technology Park welcomes first tenants: 26 MPSs and 6 startups
Tech October 31, 2024
INNVEST Summit 2024: A premier event for innovation and economic competitiveness in the Western Balkans
Tech October 31, 2024
Shoppable widget by EmbedSocial: Revolutionizing E-commerce with authentic shopper content
Apps October 31, 2024
Intel prevails in long-running legal battle against €1 billion EU fine
Apps October 31, 2024

Related articles

Tech

Split Technology Park welcomes first tenants: 26 MPSs and 6 startups

October 31, 2024
Tech

INNVEST Summit 2024: A premier event for innovation and economic competitiveness in the Western Balkans

October 31, 2024
Apps

Shoppable widget by EmbedSocial: Revolutionizing E-commerce with authentic shopper content

October 31, 2024
Apps

Intel prevails in long-running legal battle against €1 billion EU fine

October 31, 2024

About us

Geek Room is dedicated to technology and its enthusiasts through real-time information and videos about the latest innovations. Connect with our staff via email at: [email protected]
For cooperation opportunities, write to us at: [email protected]

Find us:

© 2023 Geekroom All Rights Reserved. Developed by MIMS
adbanner
AdBlock Detected
Our site is an advertising supported site. Please whitelist to support our site.
Okay, I'll Whitelist
Welcome Back!

Sign in to your account

Lost your password?