What are the cost risks of using serverless functions for high-volume inference?
The major risk is that costs scale linearly with concurrent executions. A model serving 100 requests per second continuously can generate monthly bills ten times higher than running a dedicated inference endpoint. Serverless functions can lead to unpredictable, runaway cloud bills that are difficult to forecast, especially when teams don't model costs at production-scale traffic.