|
|
| : |
|
| Special Talk : |
ÁÂÀå : |
|
 |
| ¹ßÇ¥Á¦¸ñ : ÃʰŴë AI ½Ã´ë, K-AI °æÀï·Â È®º¸¸¦ À§ÇÑ ³×À̹öŬ¶ó¿ìµåÀÇ ÀÎÆÛ·±½º ÃÖÀûÈ Àü·« ¹× ÇÙ½É ÀÎÇÁ¶ó ±â¼ú |
|
| ¹ßÇ¥ÀÚ : ¹Ú¹è¼º |
À̸ÞÀÏ : |
|
| ¼Ò¼Ó : ³×À̹ö Ŭ¶ó¿ìµå |
ºÎ¼ : |
|
| Á÷À§ : ¸®´õ |
¹ßÇ¥ÀϽà : 2026. 1. 28.(¼ö) 13:30-15:30 |
|
|
|
|
| ¹ßÇ¥ÀÚ¾à·Â : |
|
[Profile]
• Çö) NAVER Cloud, AI Computing Solution, Team Lead
• Àü) »ï¼ºÀüÀÚ »ï¼º¸®¼Ä¡ (Samsung Research)
• ÃæºÏ´ëÇб³ ÀüÀÚ°øÇкΠÇлç
[Expertise]
• LLM Inference Optimization
• AI/LLM Compression
• On-device AI Optimization
[Selected Publications]
• DropBP: Accelerating Fine-Tuning of Large Language Models by Dropping Backward Propagation (NeurIPS 2024)
• LUT-GEMM: Quantized Matrix Multiplication based on LUTs for Efficient Inference in Large-Scale Generative Language Models (ICLR 2024)
• Encoding Weights of Irregular Sparsity for Fixed-to-Fixed Model Compression (ICLR 2022)
• BiQGEMM: Optimal Binary Quantization Matrix Multiplication for Quantized Neural Networks (SC20) |
|
|
| °¿¬¿ä¾à : |
|
»ý¼ºÇü AI ¼ºñ½º´Â ´Ü¼øÇÑ ÁúÀÇÀÀ´ä(Simple QA)À» ³Ñ¾î, º¹ÀâÇÑ ¹®Á¦¸¦ ³í¸®ÀûÀ¸·Î ÇØ°áÇÏ´Â Ãß·Ð(Reasoning) ´Ü°è, ±×¸®°í ½º½º·Î µµ±¸¸¦ »ç¿ëÇØ °ú¾÷À» ¼öÇàÇÏ´Â ¿¡ÀÌÀüÆ®(Agent) ½Ã´ë·Î ±Þº¯Çϰí ÀÖ½À´Ï´Ù. ÀÌ·¯ÇÑ ¼ºñ½º ÆÐ·¯´ÙÀÓÀÇ º¯È´Â AI ¹ÝµµÃ¼¿Í ÀÎÇÁ¶ó¿¡ ÀüÇô ´Ù¸¥ Â÷¿øÀÇ ¿ä±¸»çÇ×À» ´øÁö°í ÀÖ½À´Ï´Ù.
°ú°Å¿¡´Â ´Ü¼ø ¿¬»ê ´É·Â(Compute)°¡ Áß¿äÇß´Ù¸é, °Å´ë ¸ðµ¨°ú ±ä ¹®¸ÆÀÌ ÇʼöÀûÀÎ ¿¡ÀÌÀüÆ® ½Ã´ë¿¡´Â ¸Þ¸ð¸® ´ë¿ªÆø(Memory Bandwidth)ÀÌ ¼º´ÉÀ» Á¿ìÇÕ´Ï´Ù. ´õ ³ª¾Æ°¡, ´Ù¼öÀÇ ¿¡ÀÌÀüÆ®°¡ ¹®¸Æ Á¤º¸¸¦ ½Ç½Ã°£À¸·Î °øÀ¯(KV Cache Sharing)ÇØ¾ß ÇϹǷΠ¹æ´ëÇÑ ¸Þ¸ð¸® ¿ë·®(Capacity)°ú Ĩ °£ Åë½Å ´É·Â(Communication/Interconnect)ÀÌ ¼ºñ½ºÀÇ ºñ¿ëÀ» °áÁ¤Áþ´Â ÇÙ½É ¿ä¼Ò°¡ µÇ¾ú½À´Ï´Ù.
º» ¼¼¼Ç¿¡¼´Â ´äº¯ ¡æ Ãß·Ð ¡æ ¿¡ÀÌÀüÆ®·Î À̾îÁö´Â AI ¹ßÀü ´Ü°èº° Çϵå¿þ¾î ¿ä±¸ »çÇ×ÀÇ º¯È¸¦ ºÐ¼®ÇÕ´Ï´Ù. ƯÈ÷, ¿¡ÀÌÀüÆ® ½Ã´ëÀÇ °¡Àå Å« º´¸ñÀÎ µðÄÚµå(Decode) ºñ¿ë°ú ij½Ã °ü¸®(Cache Management) ¹®Á¦¸¦ ÇØ°áÇϱâ À§ÇÑ ³×À̹öŬ¶ó¿ìµåÀÇ ÇÙ½É ÀÎÆÛ·±½º ÃÖÀûÈ ±â¼ú, ±×¸®°í °íÈ¿À² AI ¹ÝµµÃ¼ »ýŰè(K-AI)¸¦ À§ÇÑ ÀÎÇÁ¶ó Àü·«À» ½Éµµ ÀÖ°Ô °øÀ¯ÇÕ´Ï´Ù. |
|
|
|
|
|
|