The artificial knowledge subject information. A information to the varied species of faux… | by Cassie Kozyrkov | Jun, 2023


(Observe: the hyperlinks on this publish take you to explainers by the identical writer.)

Duplicated knowledge

Perhaps you measured 10,000 actual human heights however you need 20,000 datapoints. One method you’re taking is to suppose your current dataset already represents your inhabitants pretty properly. (Assumptions are at all times harmful, proceed with warning.) Then you can merely duplicate the dataset or duplicate some portion of it utilizing ye olde copy-paste. Ta-da! Extra knowledge! However is it good and helpful knowledge? That at all times is determined by what you want it for. For many conditions, the reply can be no. However hey, there are causes you had been born with a head, and people causes are to chew and to use your finest judgment.

Resampled knowledge

Talking of duplicating solely a portion of your knowledge, there’s a approach to inject a spot of randomness to help you in determining which portion to select. You need to use a random number generator to help you in choosing which peak to attract out of your current record of heights. You can do that “with out alternative”, which means that you simply make at most one copy of every current peak, however…

Bootstrapped knowledge

You’ll extra usually see folks doing this “with alternative”, which means that each time you randomly choose a peak to repeat, you instantly neglect you probably did this in order that the identical peak might make its method into your dataset as a second, third, fourth, and so on. copy. Maybe if there’s sufficient curiosity within the feedback, I’ll clarify why this can be a highly effective and efficient method (sure, it feels like witchcraft at first, I believed so…

Leave a Reply

Your email address will not be published. Required fields are marked *