I'd be worried about just switching RPi's off. We recently got a Pi for the office to run as a dashboard - and after a couple of power cuts it corrupted the SD card.
Now I'm going to have to set up the system again, and I don't know whether this is going to happen again. The SD card that got corrupted was a Class 4 Kingston.
Maybe I'll look into a Sandisk (possibly Class 10?) next time. But I am worried that it's not the SD card's fault, but rather a combination of a journaling filesystem, an SD card and a sudden power outage.
Edited: Apologies, I realized now that the red button cuts power to the network switch, not to each individual Pi. But my concerns about the Pi and power cuts still remain though.
SD cards are much like SSDs - they are a combination of NAND flash and an embedded controller. Upon power-up, the controller has to initialise by reading the block mapping tables (BMT) from the NAND. This wouldn't be a concern, and it wasn't, in the days of large-geometry/high-endurance flash where the controller would power up and just sit idle waiting for commands after initialisation. Any power interruption would just reset the controller and it'd try again.
However, newer higher-capacity (and smaller process size) flash is far less reliable - endurance and retention are orders of magnitude less, while raw bit error rates are correspondingly higher. MLC makes this even worse, but manufacturers have been masking the problems by using stronger error correction. This strategy mostly works, but combined with another characteristic of dense NAND flash -read disturb - makes for memory devices that are far more fragile and sensitive to power interruptions than before. Read disturb means that repetitive reading of the same blocks in flash has a writing effect to adjacent bits to the ones being read, so even read operations are somewhat destructive. What this means is that the block management controller may have to perform a copy (i.e. a write) and erase after a certain number of reads. Furthermore, blocks which have been idle for a long time also need to have their contents periodically refreshed, since the data slowly fades away as the electrons leak out.
All these characteristics together mean that at any one time, even if the SD card is "idle" or only being read, block erase/program operations maybe occurring internally. If a power interruption happens, then depending on what was being written at the time, anything from silent data corruption (if the block was storing user data) to a complete failure of the card (if the block was part of the BMT or other management data) can occur.
Most applications of SD cards don't often cut power abruptly, which is why this problem doesn't occur. The RPi is an exception. If you want to reduce this problem as much as possible, my recommendation is to use older, low-capacity SD cards, which may contain large-geometry SLC flash. This is not going to cheap (per capacity), but will be cheaper than new "industrial grade" cards (which may actually be worse). I've had good luck with cards from relatively unknown Chinese/Taiwanese OEMs - many of them explicitly specify "100K program/erase cycles", something that the "consumer" brands don't even mention.
Is there such thing as a modern high-reliability SD card, resistant to power cuts and tough environmental conditions? It seems like there would be a market for that in embedded industrial or military devices.
I'd say that they certainly try to, but the block-erase and page-program granularity of NAND flash makes it pretty much impossible to guarantee any atomicity of operations without ridiculously wasting the capacity. Furthermore, pages within a block can only be programmed sequentially to avoid corrupting previously programmed pages (this is known as "program disturb"), and each page can only be programmed once, also to avoid program disturb effects.
Interesting. I'm surprised that NAND flash chips don't have some kind of hardware assist for this (e.g., really small pages for handling the much smaller checkpoint writes). But I guess at the level of utterly fungible consumer hardware it's about cost rather than reliability, since the latter doesn't have any absolute metrics.
Read-only filesystem on the SD card for boot, and a read-write USB thumb drive for everything else. It's been a while since I looked, but from memory this avoids corruption on power-off.
There are some changes you can make to reduce the chances of SD card corruption. That being said, I've run plenty of Pi's in production over the past year, and I've only had 1 SD card corruption issue. That issue was caused by a customer unplugging the pi over and over trying to fix what turned out to be a network problem.
The easiest change, if you're not really worried about reading the logs in case of power failure is to move /var/log (plus a few other directories normally written to like /var/log var/tmp etc) to memory instead of on the SD. Also disable swap.
That way it's less likely there is a write going on when the power is pulled.
Another thing to look at, is making the entire card read only, and setting up a temporary directory in memory that's periodically backed up somewhere remote.
Make sure to cleanly shut the Pi down with "sudo shutdown -h now" every time. I'm yet to corrupt a Pi SD card and I have 4 of them which I've abused in a lot of different ways.
The real trouble area is the SD card reader in my experience. The pins are very easy to break off accidentally. The new B+ model uses a microSD card so it's not nearly as troublesome.
The pin issue can be somewhat mitigated with a case. If there are corridors for the SD card to slide down before entering the reader, it makes it much more difficult to contact the pins at bad angles.
This definitely worked for me anyway. Plus the case keeps the SD card from moving almost all while in position (which means you cannot suffer corruption from shaking it out of the reader, only power loss or similar).
The "fix" for inexpensive hardware is to create a card image once the system is setup. Leave a freshly imaged card taped to the Rpi, and when it corrupts the SD card, swap cards, take the corrupted card and re-image it and tape it to the RPi.
An alternative is to boot the RPi diskless, this works but since everything is going through the USB bus it gets even slower than it normally is, which can make it unsuitable for an application.
You actually can't PXE boot the RPi. At best, someone might get a port of U-Boot working with the RPi and that would be what's on the SD card. I was hoping to test out bare-metal provisioning systems using RPi's but no love ther.
Correct, you need to craft an NFS u-boot which can boot read-only from the SD card and bring the system up. (http://billforums.station51.net/viewtopic.php?f=1&t=17 is one such example, I found one on Robert Nelon's pages but can't find that one at the moment)
>Leave a freshly imaged card taped to the Rpi, and when it corrupts the SD card, swap cards, take the corrupted card and re-image it and tape it to the RPi.
That's exactly what I've done.
I periodically backup a sqlite db file to S3, and I wrote a script that will retrieve the latest backup on boot. Just plug in the new card and everything is back to the way it was minus at most 3 hours of data.
I wonder if there's a hardware fix. A familiar way to make a system more tolerant of blackout is to detect and use the grace period between when incoming power goes down and the internal power supply loses regulation. When this condition is detected, it triggers a process to gracefully square things away. Noncritical peripherals, such as graphics display, are allowed to simply capsize.
Does the Pi already monitor its supply voltages? If not, something could be hung on the GPIO to monitor +5 V. More elaborate circuitry can be used to extend the duration of the grace period if needed. (Diodes and capacitors, nothing exotic).
You could buy a deep cycle battery and a charge controller and run several raspberry pi off of the battery for several days if there was a power outage at a pretty reasonable price (~120$)
Lets you use a battery pack with a handful of AA batteries as a UPS. You can even detect when one power source disappears and then safely shutdown. Amongst various other useful power related things.
Yup, we don't need fast floating point - this just acting as an RMQ provider for some analog sensors.
What we do need are boards with much better manufacturing QA/QC than Raspberry Pi. After the nth time the USB 5V falls out, or the SD reader loses contact, you rapidly realize they're not targeted towards a production environment. As inexpensive and powerful as possible is a great goal, but you invariably lose some reliability ("pick two").
The only SD cards I've had that have suddenly and catastrophically become completely unreadable are Sandisk class 10 cards, so don't assume they'll do any better (I've owned too few SD cards to tell if this was just bad luck or a problem with the model).
I noticed the mention of FIRST and at the same time, noticed the red/blue color choice. I'm sure its just a coincidence, but still entertaining. Project looks awesome.
They say it was difficult to get a high performance DB running on a 700mhz chip with 512Mb of ram. Perhaps its just the wording but that sounds like the opposite of high performance to me.
Now I'm going to have to set up the system again, and I don't know whether this is going to happen again. The SD card that got corrupted was a Class 4 Kingston.
Maybe I'll look into a Sandisk (possibly Class 10?) next time. But I am worried that it's not the SD card's fault, but rather a combination of a journaling filesystem, an SD card and a sudden power outage.
Edited: Apologies, I realized now that the red button cuts power to the network switch, not to each individual Pi. But my concerns about the Pi and power cuts still remain though.