Windows Server Posted January 24 Posted January 24 Greetings,We are trying to determine the correct VM sizing requirement for our AI workload, which is used for NLP processing. This workload does not require any training, but will only be used for inference.We have the following software configuration:a C# application that is heavily multithreaded using a lot of socket I/O. The application has concentrated bursts where 10-20 threads are fired concurrently to perform tasks (mostly socket I/O).This app communicates via dedicated sockets to:a Python application which performs various NLP tasks. This app is also multithreaded to handle multiple incoming requests from the .NET app. This app sends queries to a local LLM (model size will vary based on query type). We estimate we will need to support sub-second performance (at the very least) on a 7B parameter model. Ultimately, we may need to go to larger model sizes if accuracy is insufficient. The amount of text passed to the LLM will range from 300-3000 tokens.In short, we need:a) a CPU with sufficient cores to handle multiple concurrent threads on the .NET side. The app will have 5 or 6 background threads running continuously, and sudden bursts of activity which will require a minimum of 10-20 threads to run shorter-lived tasks.b) a GPU with sufficient VRAM to handle at the very least, a 7B parameter model. Ultimately, we may need to support larger models to perform the same task due to insufficient accuracy.We need the ideal configuration of GPU/VRAM and CPU/RAM to handle these tasks, and also, potentially, larger LLM sizes of up to 14B or 70B parameters.We are looking at the NC-series VMs, with a budget of about $1,000/month (see https://azure.microsoft.com/en-us/pricing/details/virtual-machines/windows/#pricing). Any feedback on the optimal configuration in terms of CPU/GPU would be greatly appreciated.Thank you in advance.View the full article Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.