Posted on 03/26/2015 8:27:11 PM PDT by Utilizer
I barely know how to turn my computer on, but might a better title to this post be: Can you structure a problem that can be finished faster on disk than in-memory? It pays to be specific.
Given that nearly all operating systems use virtual memory, all bets are off anyway.
Makes sense. Instead of putting the string together in memory and writing it to disk when it’s completed, you are writing to the disk as it’s assembled. So you’re skipping a step.
Interesting comments at article. Many saying that the code was poorly written.
Cheating. If the objective is to build something on the disk, building it on the disk is going to be faster, duh.
Not only that, but when writing to disk it’s actually going to a RAM buffer. So it’s almost as fast anyway.
I can’t believe how retarded this is.
It makes no sense to concatenate the string in memory and then write it to disk, since in either case you will be writing the string sequentially to disk, anyway. Java, Python, C#, and other "managed" languages will always do this more slowly because their strings are immutable, which any decent coder knows.
Best approach: find out the allocation block size of a file on disk, pre-allocate one buffer of that size, memory write to that buffer, flushing the whole block to disk when it's full; this avoid the penalty of zillions of memory allocations and garbage collections and writes a block of optimal size.
In most cases, just pre-allocating a moderately sized block of memory without knowing the best block size is good enough and may even be preferable, because the underlying OS is going to optimally block IO, and probably also cache that at a secondary level.
The key point is to avoid over-allocating managed objects, and again, most good coders know to do this, even if people writing stupid research papers don't...
The real issue isn't memory vs. disk, its what the language you are using does to perform the string concatenation operation.
The fastest technique will be one that does string concatenation in memory while the disk write of the previous string section is completing, so that the disk latencies are used for string building. Oh, and of course the string concatenation code should be designed to run in cache and avoid any virtual memory paging or extra memory copy operations.
The key to performance is understanding how the system works, and writing code at a low enough level to be able to control how it interacts with the system. That's why C and C++ still get used.
The technique of "writing 1 byte at a time" to the disk is really just a way of utilizing the buffering present in the I/O system to queue up disk writes. All the interesting stuff is actually happening in memory, however its being done by clever system code written by people who understand how to get high performance.
A well written version of the string concatenation test should be able to write data to the disk as fast as the disk can write data.
I smell a bug.
That’s the way it was written, the test. It saw the flaw of some nature and then wrote a perfectly good set of conflicting code. We called them bugs and the people who exploit them hackers.
I’ve spent many of nights watching and analyzing processor bus activity on a logic analyzer along with a profiling running program in the OS to believe that they just didn’t find a bug to exploit.
See Fred's example above...
Assembly is still most efficient, especially if the action has to occur frequently in a system of modest capabilities.
The Story of Mel is still the best.
Thanks for the hint to history. Back in the day knowledge of hardware opportunities was widely used for system optimization, especially in real time systems. Hacking in HEX ruled...
How do you think we got to the Moon?
Yep. The VM paging code in Windows is highly optimized, going all the way back to David Cutler’s Windows NT in 1996 (and DEC VAX/VMS before that).
Generally the fastest way to write a file in Windows is to just call ::CreateMemoryMapping() and scribble away. You avoid the double buffering of ::WriteFile(), and the VM subsystem is smart enough to do readaheads and stride I/O too.
Or octal...
I am required to post the article with the complete headline as originally printed, so as to not cause any problems with re-posts or searches.
Once you replace you mechanical hard drive with an SSD, you will then know what fast really is.
I bought a Samsung 850EVO and wow!
I’m never going back.
Disclaimer: Opinions posted on Free Republic are those of the individual posters and do not necessarily represent the opinion of Free Republic or its management. All materials posted herein are protected by copyright law and the exemption for fair use of copyrighted works.