'LLM' 태그의 글 목록

Notice

Recent Posts

Recent Comments

Link

« 2025/12 »
일	월	화	수	목	금	토
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

목록LLM (2)

huginn muninn

[Paper Review] DeepSeek-V3 Technical Report (2024) - 1

https://arxiv.org/pdf/2412.19437 DeepSeek가 미국의 H100 같은 고사양 칩 없이, H800만으로도 최첨단 AI 모델을 만들어냈다. 보통 LLM을 훈련하려면 엄청난 연산량과 메모리가 필요한데, MLA(Multi-Head Latent Attention)랑 MoE(Mixture of Experts) 구조를 써서 연산 비용을 확 줄였다고 한다. 덕분에 저비용으로도 대규모 모델을 돌릴 수 있게 됐고, 심지어 오픈소스로 공개까지 했다 wow! 코드 꽁꽁 숨기는 다른 회사들 생각하면 좀 통쾌하기도 하고 ㅋㅋ 게다가 LLM의 블랙박스 문제 해결에 한 걸음 다가간 것 같아서 꽤나 설레는 중이다. 지금 DeepSeek 때문에 미국도 난리고 주식장도 난리인데, X나 스레드 내에서의 여..

자연어 처리 2025. 1. 30. 20:56

How Is ChatGPT’s Behavior Changing over Time? (부제 : ChatGPT의 성능이 떨어지고 있다니?)

2023년 7월 18일에 나온 아주 따끈따끈한 레포트. https://arxiv.org/abs/2307.09009 How is ChatGPT's behavior changing over time? GPT-3.5 and GPT-4 are the two most widely used large language model (LLM) services. However, when and how these models are updated over time is opaque. Here, we evaluate the March 2023 and June 2023 versions of GPT-3.5 and GPT-4 on four diverse tasks: 1) s arxiv.org 최근 GPT-3.5, GPT-4와 같은..

Paper Review 2023. 7. 27. 22:45

이전 Prev 1 Next 다음

목록LLM (2)

huginn muninn

티스토리툴바