Posted on 03/26/2015 8:27:11 PM PDT by Utilizer
Right. In the disk method, the disk driver is performing at least a partial concatenation before actually writing to disk. The disk driver has more efficient code for concatenation than the code generated by Java (no surprise there) or Python.
Yeah, right. Like that's gonna happen. The monkeys churning out code today probably think Big Endian and Little Endian is a children's book about Native Americans.
Amen to that! Just a bunch of monkeys.
Id go with 10%.
They leave HDDs in their dust:
http://techreport.com/r.x/samsung-850evo/crystal-read.gif
http://techreport.com/r.x/samsung-850evo/crystal-write.gif
http://techreport.com/r.x/samsung-850evo/db2-read.gif
http://techreport.com/r.x/samsung-850evo/db2-write.gif
doing the operation in memory then doing a single 1m write to disk is still FAR faster then 1m 1 byte writes followed by 100k of 10bytes, etc.
even if the memory version was written as a single byte at a time, it would equate to the 1m 1 byte writes. the other writes would be slower
Generally, you should set a buffer size of 4-16K and format your app's output directly into the buffer, if possible. You may wish to use multiple 4-16K buffers, so that you are writing into the current buffer while one of your past buffers is being transferred to disk asynchronously. When you fill the current buffer, it should be queued for output, and you should switch your output-formatting activity to scribble on a previous buffer which has already been written. When you are done, you should remember to queue your final buffer for output and wait until all buffers have been written. Then please close the file.
The optimal buffer size and number of buffers should be determined by experiment.
Thanks for the tip, mate. :)
Disk drivers don't know squat about concatenation. They just know about "write this block of memory to this chunk of disk blocks". Of course, what happens next will depend on whether the HD is buffered or whether it's not an HD but an SSD, etc.
Are you talking 1970 or 2015?
If the problem fits in memory, then it can be solved in memory far faster than on disk. If not, then you need a strategy that takes the disparity of access times into account.
E.g., if the problem is sorting the donor file, then you need some sort of algorithm in which sorted subsets are written to disk, then read in and merged, written out again, until you end up with sorted output. Of course, if it's 2015, you just read in the damned file and sort it! Done!
In 2015, your laptop or your smartphone likely has way more RAM than a major glassed in, raised floor computer installation of the 1970's or 1980's had RAM plus disk.
Sorry, but my comment was not directed at you personally, but the author of the original article. My point being that the premise is false except for contrived tests that utilize memory at it's most inefficient and optimizes the hard drive.
One million single writes to disk can be a much different proposition if the test has the disk all to itself than it is on a busy system where every write operation can potentially have to get queued and wait for some other process to release the disk channel.
IMHO
“25 years ... half a dozen”
“half at best, 1 out of 10 at worst”
“10%”
Thanks for the replies. It’s nice to know I’m not the only one in this camp. My answer is 10%.
If I was a bit more of an entrepreneur, these markers are the ones I would look for when hiring programmers because within the 10%, the answer tends to be ... 10% - which tells me there is an ability block that “thinks this way”.
Alas, I work for a living ~
I'm sure it is possible to construct very narrowly tailored circumstances where what they are describing makes sense, but it's such an artificial construct that it's not really useful. It's simply a reminder to never use the word 'never'.
It only proves that you can design a test to do stupid things that don't really apply in the real world.
First and foremost, is the fact that memory is everything. In order for a process to write to disk, it must first put that data in a buffer, which is (gasp) MEMORY. In most modern, enterprise level systems, there is a ton of cache (more memory) sitting in the disk subsystem to receive the data from the operating system prior to it being written to disk.
Let's see them run an application or database doing real-world work and see how their theory holds up. I got $100 that says "not very well"
In simple terms the test was to get a string of bits written to the disk in a given order
So a one step operation—write to the disk— is faster then a two step operation—organize the bits in memory—then write to the disk....
Gee that a shock...(/sarcasm off)
ANYTHING can be done faster in memory than on Disk if the problem is properly stated and the program is properly constructed. I can imagine, easily, situations that either could result in more rapid performance given (essentially) unlimited space on disk but not in memory. (a problem with “sparse matrices”)
When performance means money (server time, server sizing, etc...), it can actually make sense to do these types of tests.
Disclaimer: Opinions posted on Free Republic are those of the individual posters and do not necessarily represent the opinion of Free Republic or its management. All materials posted herein are protected by copyright law and the exemption for fair use of copyrighted works.