Wireless use cases in industrial internet-of-thing (IIoT) networks often require guaranteed data rates ranging from a few kilobits per second to a few gigabits per second. Supporting such a requirement in a single radio access technique is difficult, especially when bandwidth is limited. Although non-orthogonal multiple access (NOMA) can improve the system capacity by simultaneously serving multiple devices, its performance suffers from strong user interference. In this paper, we propose a Q-learning-based algorithm for handling many-to-many matching problems such as bandwidth partitioning, device assignment to sub-bands, interference-aware access mode selection (orthogonal multiple access (OMA), or NOMA), and power allocation to each device. The learning technique maximizes system throughput and spectral efficiency (SE) while maintaining quality-of-service (QoS) for a maximum number of devices. The simulation results show that the proposed technique can significantly increase overall system throughput and SE while meeting heterogeneous QoS criteria.