They do, but even if they didn't AI companies are going take them anyway. Bots make up 50% of internet traffic. AI companies have ignored robot.txt entries. Anything publicly available, even if it's behind a password, is accessible since companies like Reddit sell that information.
The source of the article is Imperva 2024 Bad Bot Report, but I cannot download the report. I do not know how they measured traffic. In this age of social media, I am going to guess it is by data volume and site visits.