The demand for deploying image-based AI models on IoT nodes continues to rise, despite the significant computational, energy, and communication constraints inherent to these devices. This article presents a comprehensive overview of state-of-the-art approaches aimed at addressing these limitations, with a particular focus on intelligence partitioning as a heuristic-based partitioning strategy. AI model optimization techniques, including quantization and pruning, are examined in conjunction with node-server partitioning strategies, considering both computational and communication workload perspectives. By exploring these techniques, this study aims to identify critical research gaps and highlight key challenges that must be addressed to enhance the efficiency and scalability of AI-based IoT deployments.