1. 0
  1.  

  2. 2

    On top of this, professional exams, especially the bar exam, notoriously overemphasize subject-matter knowledge and underemphasize real-world skills, which are far harder to measure in a standardized, computer-administered way. In other words, not only do these exams emphasize the wrong thing, they overemphasize precisely the thing that language models are good at.

    This is a bit of understatement. The bar exam is designed to test for not just adherence to the law, but fealty; lawyers are required to swear an oath to the law itself. This isn’t just for liability, but also for producing legally-minded people.

    I suspect that similar alignment issues will arise any time a “general-purpose transformer” (if such things exist) is used for a professional task. Professionals have standards and ethics, and their current admission processes already require fine-tuning already-educated humans; we may have to fine-tune transformers in order to get sufficiently-ethical lawyerbots.

    1. 2

      Some people have even been anecdotally testing if these tools can do peer review.

      Considering that ChatGPT has deep misunderstandings about things like molecular biology embedded in it, this is a complete nonstarter. I spent half an hour trying to get it to adjust those misunderstandings, and it flat out kept telling me that I was wrong.

      1. 2

        Are these likely to be common misunderstandings that it is amplifying?

        1. 2

          Oh, very much so. Kind of the relationship between pop science and the understanding of an actual researcher.

      2. 1

        This article talks a bit about how some of the striking results of GPT passing difficult exams may actually be due to it memorizing the answers.