Recent advances in ﬂash media have made it an attractive alternative for data storage in a wide spectrum of computing devices, such as embedded sensors, mobile phones, PDA’s, laptops, and even servers. However, ﬂash media has many unique characteristics that make existing data management/analytics algorithms designed for magnetic disks perform poorly with ﬂash storage. For example, while random (page) reads are as fast as sequential reads, random (page) writes and in-place data updates are orders of magnitude slower than sequential writes. In this paper, we consider an important fundamental problem that would seem to be particularly challenging for ﬂash storage: efﬁciently maintaining a very large (100 MBs or more) random sample of a data stream (e.g., of sensor readings). First, we show that previous algorithms such as reservoir sampling and geometric ﬁle are not readily adapted to ﬂash. Second, we propose B-FILE, an energy-efﬁcient abstraction for ﬂash media to store self-expiring items, and show how a BFILE can be used to efﬁciently maintain a large sample in ﬂash. Our solution is simple, has a small (RAM) memory footprint, and is designed to cope with ﬂash constraints in order to reduce latency and energy consumption. Third, we provide techniques to maintain biased samples with a B-FILE and to query the large sample stored in a B-FILE for a sub sample of an arbitrary size. Finally,we present an evaluation with ﬂash media that shows our techniques are several orders of magnitude faster and more energy-efﬁcient than (ﬂash-friendly versions of) reservoir sampling and geometric ﬁle. A key ﬁnding of our study, of potential use to many ﬂash algorithms beyond sampling, is that “semi-random” writes (as deﬁned in the paper) on ﬂash cards are over two orders of magnitude faster and more energy-efﬁcient than random writes.