TWF Bot
Staff member
- Joined
- Nov 29, 2020
- Messages
- 2,501
At Inspire this year we talked about how developers will be able to run Llama 2 on Windows with DirectML and the ONNX Runtime and we've been hard at work to make this a reality. We now have a sample showing our progress with Llama 2 7B! See https://github.com/microsoft/Olive/tree/main/examples/directml/llama_v2 This sample relies on first doing an optimization pass on the model with Olive, a powerful optimization tool for ONNX models. Olive utilizes powerful graph fusion optimizations from ONNX Runtime and a model architecture optimized for DirectML to speed up inference times by up to 10X! After this optimization pass, Llama 2 7B runs fast enough that you can have a conversation in real time on multiple vendors’ hardware! We’ve also built a little UI to make it easy to see the optimized model in action. Thank you to our hardware partners who helped make this happen. For more on how Llama 2 lights up on our partners’ hardware with DirectML see:
Continue reading...
- AMD: https://community.amd.com/t5/ai/how...a2-with-microsoft-directml-on-amd/ba-p/645190
- Intel: https://community.intel.com/t5/Blog...e-to-Optimize-DirectML-for-Intel/post/1542055
- NVIDIA: https://blogs.nvidia.com/blog/2023/11/15/ignite-rtx-ai-tensorrt-llm-chat-api
Getting started
Requesting Llama 2 access
To run our Olive optimization pass in our sample you should first request access to the Llama 2 weights from Meta.Drivers
We recommend upgrading to the latest drivers for the best performance.- AMD has released optimized graphics drivers supporting AMD RDNA™ 3 devices including AMD Radeon™ RX 7900 Series graphics cards. Download Adrenalin Edition™ 23.11.1 or newer (https://www.amd.com/en/support).
- Intel has released optimized graphics drivers supporting Intel Arc A-Series graphics cards. Download the latest drivers.
- NVIDIA: Users of NVIDIA GeForce RTX 20, 30 and 40 Series GPUs, can see these improvements first hand, in GeForce Game Ready Driver 546.01.
Continue reading...