This is a reminder to myself as I have this problem every now and then. I thought I share this with you.
Ok the problem is to random(ish)ly select a record from a MySQL database in an efficient way without any scripting or programming. There are several more or less simple ways described everywhere on the web, one being
SELECT * FROM table_name ORDER BY RAND() DESC LIMIT 1;
It does the job. Very slowly though, it can take MySQL several seconds or even minutes to complete the query. After fiddling around for some time, I came up with this:
set @randomId=FLOOR(RAND()*(SELECT MAX(pk_field) FROM table)); PREPARE randomStmt FROM "SELECT * FROM table WHERE pk_field > ? LIMIT 1"; EXECUTE randomStmt USING @randomId;
…where `pk_field` is a numeric primary key. Ok the query goes like this:
- Get the highest value for pk_field from table_name, multiply with a random number between 0…1, cut off decimals and store in variable randomId.
- Prepare a statement where we can insert randomId and retrieve the desired record with.
- Take the query, insert randomId and execute.
This runs very fast even with a large dataset. I hear your objections already… :)
“This is stupid. Why didn’t you use `COUNT(pk_field)` to find the end of the number range? Even then this approach would still be stupid BTW.” Because it is slow. I currently have a table with >1.8 million records and `COUNT(pk_field)` takes >1 seconds to complete. Not good. `MAX(pk_field)` is good enough in most cases. YMMV though.
“The query is not truly random and prefers records at the end of a large pk_field range gap!” True. That’s what I meant with “YMMV” :) My solution is not truly random and it is biased, especially when there are large gaps in `pk_field`. It works for me though and is random enough when there are no large pk_field gaps.
Leave a Reply
You must be logged in to post a comment.