Technology
Efficient Deletion of Random Numbers from a Large List in Python
Efficient Deletion of Random Numbers from a Large List in Python
Scenario: You have a list of numbers around 400,000 items and you need to remove 16,000 random numbers from it. How can you achieve this efficiently?
The order of the numbers in the list might matter or not. If the order does not matter, you can shuffle the list into random order, delete the 16,000 elements from the end, then optionally sort it again if required.
However, if the order of the elements is important and they are not sorted, you should choose 16,000 unique indices to delete, then copy the list with those indices omitted. This method is almost always more efficient than deleting each item one by one, which would be much slower, especially with large lists.
Here is a sample Python code to demonstrate both approaches:
Shuffle-list Method
If the order of the numbers does not matter, you can shuffle the list, delete the 16,000 elements from the end, then sort it again if necessary.
import randomdata [...400k items...](data) # Shuffle the list into random orderunwanted set(range(len(data) - 16_000, len(data))) # Choose the last 16k indicesdata[:] [value for index, value in enumerate(data) if index not in unwanted] # Copy the list skipping the unwanted elements# If the order is important, sort the list again()
Delete by Indices Method
If the order of the elements is important and they are not sorted, you should choose 16,000 unique indices to delete, then copy the list with those indices omitted.
import randomdata [...400k items...]unwanted (range(len(data)), 16_000) # Choose 16k unique indicesdata[:] [value for index, value in enumerate(data) if index not in unwanted] # Copy the list skipping the unwanted elements
Flexible Deletion Method
Another flexible approach is to delete approximately 16,000 numbers rather than exactly 16,000. This can be useful if the deletion needs to be more or less precise, such as 15,984 or 16,116 numbers.
If you need to remove exactly 16,000 numbers:
import randoml [...400k items...]l l[:len(l) - 16_000] # Remove the last 16k elements
If you need to remove approximately 16,000 numbers, you can delete a certain percentage of the list:
import randoml [...400k items...]l [el for el in l if random.random() 0.04] # Delete about 4% of the list, which is about 16k elements
In the second approach, the command `random.random() 0.04` ensures that a random number between 0 and 1 is generated. If the generated number is greater than 0.04, the element is kept; otherwise, it is removed.
Conclusion: Depending on the importance of the order of the numbers in your list, you can choose the appropriate method to delete 16,000 random numbers efficiently.