UCB CS294/194-196 Large Language Model Agents

⭐️⭐️⭐️
This course focuses on the development and application of Large Language Models (LLMs) as agents capable of interacting with the world and performing various tasks. The curriculum is designed to explore the foundational concepts, essential abilities, and infrastructures necessary for LLM agents, as well as their applications across diverse domains.
Author

UCB

Course Link

Course Structure and Content:

Lecture 01: LLM Reasoning

Note

AI should be able to learn from just a few examples, like what humans usually do. Huamns can learn from just a few examples because humans can reason

The basic LLM is just parrot mimic human languages. We can add a reasoning process before get answer. For example:

The reasoning process can be see as intermediate steps before get answer. So, we can conclude see that:

Note

Key Idea:

Derive the Final Answer through Intermediate Steps

How we can add those property?

We need,

  • Training with intermediate steps
  • Filetuning with intermediate steps
  • Prompting with intermediate steps

with curated dataset, for example, GSM8K (Cobbe et al., n.d.). One of the example data is:

{"question": "Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?", 
"answer": "Natalia sold 48/2 = <<48/2=24>>24 clips in May.\nNatalia sold 48+24 = <<48+24=72>>72 clips altogether in April and May.\n#### 72"}

BUT, why intermediate steps are helpful?

One of the theory proposed by (Li et al., n.d.) is that:

  • Constant-depth transformers can solve any inherently serial problem as long as it generates sufficiently long intermediate reasoning steps
  • Transformers which directly generate final answers either requires a huge depth to solve or cannot solve at all

that means:

  • Generating more intermediate steps (think longer)
  • Too long to generate? Calling external tools, e.g. MCTS

Further more, we can trigger step by step reasoning without using demonstrate examples:

by just add prompt let's think step by step

Howevery, this zero-shot(not provide any example) is worse than few-shot learning(provide several examples). How can we improve the zero-shot learning ability. One of the good discover is that LLMs are analogical reasoners (Yasunaga et al., n.d.). Which mean:

adaptively generate relevant examples and knowledge, rather than just using a fix set of examples

What more? Well, we are so lazy, is there any method that we can not even pass let's think step by step prompts? (chai?) propose a chain-of-thought decoding, which enable LLM do the reasoning process without explicit prompts.

One notice is that, we don’t want intermediate steps, what we want is just the final answer. How we can utilise the reason process to get better answer. One basic idea is we merge all the intermediate steps. This is call Self-Consistency (10.48550/arXiv.2203.11171?), which get the most highest probability answer,

So far, we talk good side about the LLM. what is the silver side? Well, LLM is easily distracted by irrelevant context. And it cannot Self-Correct Reasoning.

References

Cobbe, Karl, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, et al. n.d. “Training Verifiers to Solve Math Word Problems.” https://doi.org/10.48550/arXiv.2110.14168.
Li, Zhiyuan, Hong Liu, Denny Zhou, and Tengyu Ma. n.d. “Chain of Thought Empowers Transformers to Solve Inherently Serial Problems.” https://doi.org/10.48550/arXiv.2402.12875.
Yasunaga, Michihiro, Xinyun Chen, Yujia Li, Panupong Pasupat, Jure Leskovec, Percy Liang, Ed H. Chi, and Denny Zhou. n.d. “Large Language Models as Analogical Reasoners.” https://doi.org/10.48550/arXiv.2310.01714.