Message boards :
Science (non-SETI) :
What is this mathematical concept called?
Message board moderation
Author | Message |
---|---|
Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13 |
I generally think I'm pretty adept at math and logic, but I've recently stumbled across a concept that I can't figure out the name of--so that I can look at some code examples to figure out how to implement it. So here's the situation I'm dealing with. I back my data up on single-layer blurays and I'm trying to find a relatively simple way to figure out the best combination of files from a specified list to fill a blank disc the most, obviously without going over. So for instance, a blank disc has a maximum capacity of 25,025,314,816 bytes. But because the UDF filesystem has some overhead, scaling the upper-limit back to basically 25,020,000,000 is pretty reasonable. The next part of this is to figure out which combination of file sizes will leave the least amount of free space on a disc: 6,602,954,144 8,746,750,940 5,926,849,456 8,381,808,456 8,171,643,333 7,799,476,661 10,534,550,072 5,790,425,802 6,459,963,147 8,309,658,389 10,702,133,615 7,833,683,871 13,086,035,172 8,914,976,690 16,583,079,767 8,653,721,306 9,991,193,406 8,525,913,473 10,818,082,934 8,742,417,308 7,337,584,245 8,324,430,552 8,517,137,881 7,576,837,851 7,398,540,827 7,036,220,580 6,386,086,694 11,736,780,052 8,533,571,898 8,812,475,629 Currently, I've just been guessing and estimating. I'll pick 2-4 of those and put them in the compilation and see how much space is left. Sometimes, there will be ~300 MiB free, so I'll look at what's still available and try to find something that is ~300 MiB larger than one of the files that is tentatively in the compilation to be burnt. This is somewhat tedious, and doesn't always end up working out very well, and sometimes, I end up having to just go ahead and burn, leaving 1-4 GiB of the available 23.4 unused, which is not very efficient at all. So is there a name for this function that can take that list of file sizes and figure out the best combination to use that gets as close to the target without going over? I can whip something up with C/C++ (or maybe even PHP..I'd have to get my book out for that one) once I know the name of the concept so that I can look up some code examples, and I would not be against a function in Excel or Access that can do it, either. I'm just tired of having to manually shuffle files around in a trial-and-error method, only to end up leaving a bunch of wasted space that likely could have been used more efficiently. I just don't know what this concept is/would be called. It's kind of like.. if you know the term 'bubble sort', you can go look up examples of how to code a bubble sort, and then you have a function that can do that. But it's really difficult to search for how to do something when you only know the description of it, but not its name--mostly because it requires that you use the right wording that matches the wording that somebody else used to describe it, as well. I've got some fuzzy ideas for the logic that I could use to do this, and it's probably not the cleanest or best way to approach the coding for it, but it would really help to see some examples that do something similar. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) |
Mr. Kevvy Send message Joined: 15 May 99 Posts: 3805 Credit: 1,114,826,392 RAC: 3,319 |
|
Dimly Lit Lightbulb 😀 Send message Joined: 30 Aug 08 Posts: 15399 Credit: 7,423,413 RAC: 1 |
Backup to an external hd drive mirrored with another for failure reasons, problem solved. Member of the People Encouraging Niceness In Society club. |
Graham Middleton Send message Joined: 1 Sep 00 Posts: 1520 Credit: 86,815,638 RAC: 0 |
Is there a reason why you cannot use a backup-type tool (there are many - 7-ZIP, WINRAR, I believe Nero etc. etc. ) to create a contiguous conmpressed file that can then span multiple Blu-ray disks? This would use virtually all the space on every disk other than the last one. Most of these tools can also compress the data input to further optimise storage. With most of this type of package individual files can still be extracted from the storage set if needed. In addition most allow you to control the amount of redundancy and checking data that is included within the storage set to allow for errors at a later time e.g. for archive storage. I have used WINRAR and others to do this & still allow the restoration of data after one complete volume of the storage set is unusable. Several of the tools will allow for data transfer from one platform to another with a different O/S. If you want any more info, please ask. Happy Crunching, Graham |
Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13 |
Bin packing problem. That looks like it's what I'm looking for. I've got some fuzzy mental images of some of the C++ logic needed, but it looks like there is a solver for Excel that can do it, too, so I may look into that. Backup to an external hd drive mirrored with another for failure reasons, problem solved. That's the problem I'm trying to get away from, actually. I already have 5x1TB drives in an external enclosure, but they are not RAIDed. I'm trying to back this stuff up in the most efficient manner that I can, to be a cheaper and more reliable version of RAID-1. Individual mechanical drives (being that most of them in the external are 5+ years old now) can die on a whim, and you lose what's on them if it isn't backed-up. Not to mention.. EMP can wipe the HDDs, or a power surge can fry them. But burnt bluray discs in jewel cases on a book shelf are practically immune to everything, except fire. Aside from the time that it takes to burn (and later read), BDRs are cheaper in the long run. I get a 50-pack of them for ~$35 (from Amazon, when they're on sale) and 50*25gb = 1250gb, which makes them cheaper than mechanical HDDs, and this is data that doesn't need to be re-written, since it is a backup. I burn them at 4x, since the slower you burn, the more legible the burn is, and the less wobble there is (just think about if you use a ball-point pen and you write slowly, the text is bolder and cleaner, but if you write in a hurry, the text is faded and harder to read--optical media works in much the same way). I've seen some research that showed that these Verbatim discs, when put through artificial aging tests, had a 95% reliability rate at a simulated age of 10 years. So for archive purposes, these blanks, and this method are perfectly fine. If I could afford an LTO-5 tape deck... I'd go with that, get some tapes, and then store the tapes in a safe deposit box at my local bank. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) |
Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13 |
Is there a reason why you cannot use a backup-type tool (there are many - 7-ZIP, WINRAR, I believe Nero etc. etc. ) to create a contiguous conmpressed file that can then span multiple Blu-ray disks? I know of that option, and that is technically the best way to fill every available byte. But the problem is that for instance.. putting these burnt archive discs in my bluray player, for example, I can play media files straight from the burnt disc without any issues. If it was just a spanned/split archive, the contents inside cannot be read or determined without at least some of the other pieces. Especially if you use LZMA. In order to extract one file from an LZMA archive, you have to start extracting from the beginning of the archive in order to reach the file that you're interested in. I'm going for using these blank BDRs as basically like a flash drive, except the data can only be written once. So portability and ease of use is the crux of it, and I'd like to figure out how to get the most efficient packing that I can on them. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) |
janneseti Send message Joined: 14 Oct 09 Posts: 14106 Credit: 655,366 RAC: 0 |
In computational complexity theory, it is a combinatorial NP-hard problem. Here is source code of the C++ Program to Implement the Bin Packing Algorithm. http://www.sanfoundry.com/cpp-program-implement-bin-packing-algorithm/ Some theory https://en.wikipedia.org/wiki/Knapsack_problem Pseudo code. 1 // Input: 2 // Values (stored in array v) 3 // Weights (stored in array w) 4 // Number of distinct items (n) 5 // Knapsack capacity (W) 6 7 for j from 0 to W do: 8 m[0, j] := 0 9 10 for i from 1 to n do: 11 for j from 0 to W do: 12 if w[i-1] > j then: 13 m[i, j] := m[i-1, j] 14 else: 15 m[i, j] := max(m[i-1, j], m[i-1, j-w(i-1)] + v[i-1]) |
cRunchy Send message Joined: 3 Apr 99 Posts: 3555 Credit: 1,920,030 RAC: 3 |
If it is the mathematics of this more than the practice or practicality that interests you Cosmic then perhaps number theory, infinate number series and a little slicing and folding might be up your street... :) Ramanujan comes to mind. . |
janneseti Send message Joined: 14 Oct 09 Posts: 14106 Credit: 655,366 RAC: 0 |
If it is the mathematics of this more than the practice or practicality that interests you Cosmic then perhaps number theory, infinate number series and a little slicing and folding might be up your street... :) LOL https://www.youtube.com/watch?v=CCFck3laziI But as a reminder, math is a very powerful tool to make our lifes somewhat easier. |
cRunchy Send message Joined: 3 Apr 99 Posts: 3555 Credit: 1,920,030 RAC: 3 |
........ Ramanujan's quite an interesting creature. The youtube video doesn't do any justice to his story or his value as a profound mathematician. He died in the UK probably from TB. He wasn't discovered but was promoted by the local elite (Brahmins) because he was a prodegie. He left his wife to come to England and probably never saw her again. What he gained here in the UK was our formulaic way of annoting our mathematics. It is an interesting story to hear. Some time in the 1960s some of his writings were found in a loft... Some of those mathematics are now being used to map cancer within populations. Ramanujan was generally self efacing as far as I can tell. Einstein I believe found some of his brightest (and probably darkest applications) within vedic traditions. Anyway sometimes it is worth looking at a formula to see if it can help you get close to what you need... . |
janneseti Send message Joined: 14 Oct 09 Posts: 14106 Credit: 655,366 RAC: 0 |
Einstein I believe found some of his brightest (and probably darkest applications) within vedic traditions. I think you mean Oppenheimer in the Manhattan Project instead of Einstein. https://www.youtube.com/watch?v=n8H7Jibx-c0 Ops, Oppenheimer and Einstein have played chess together:) https://www.youtube.com/watch?v=1XM9gHWAvX0&ebc=ANyPxKotS0-bYx5PYsGYmn3V5TXsGzNuERz-p0eWcCi7tKw4uesSEpRpD-SgGF9eVZRRPDxUDgbDLtlm3PWoE0LL2ux38tO42Q |
Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13 |
In computational complexity theory, it is a combinatorial NP-hard problem. Definitely a good start. Though something like that sanfoundry sample really only says how many bins (blank BDRs, in this case) would be needed.. but doesn't specify which items to put in the first bin to best-fill it. In simpler terms, from what I've seen from the suggestions so far is that most of these examples would simply just stop looking once a bin has been determined to be filled. For example: You have one bin of capacity 20. You have items with the values 3, 7, 11, 5, 16, 19. Whether or not the values are sorted in ascending order or not makes no difference here, because the basic logic behind most of these codes that I've seen will just take the first value, 3, and ask "is that less than or equal to 20?" If yes, add the next value, 7. 3+7 = 10. "is 10 less than or equal to 20?" If yes, add the next value. 10+11 = 21. "Is 21 less than or equal to 20?" No, it is not, you need another bin for that. Now, I haven't flow-charted the sanfoundry example out to see if it will go back and try and see if 5 will fit in the first bin or not, but what I suspect from the few examples I've seen so far is if, for instance, that list was ordered ascendingly, it would add 3, 5, and 7 to get 15, and none of the other values will fit in the one bin, so it will call that bin full. When in reality.. given the available items, the one single item of 19 fills the bin the most efficiently. After that bin is sealed-off and removed, 16+3 is the next best one. After that bin is sealed and removed, 7+11 is the next best. I do thank those who have chimed in with what the concept is called though. I'm sure I can glean something from it, but I've got some ideas in my head for how to approach this. I'll figure it out when I start typing the code. A lot of people say you should do flowcharts to help you figure out the logic, but over the years of all the coding projects in school.. I never could do flowcharts first--my brain doesn't work that way. I get an idea and just code it out, and then make the flowchart after (if it is required, otherwise, I don't bother). Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) |
cRunchy Send message Joined: 3 Apr 99 Posts: 3555 Credit: 1,920,030 RAC: 3 |
No. I meant Einstein but it wouldn't suprise me if Oppenheimer also found inspiration for ideas from enquiry that goes back 5+ thousand years. We stand on the shoulders of those who came before. Hopefully we do a little better each time.. Maybe.. |
William Rothamel Send message Joined: 25 Oct 06 Posts: 3756 Credit: 1,999,735 RAC: 4 |
Your query: This is called the "Knapsack Problem" in operations research. |
Dena Wiltsie Send message Joined: 19 Apr 01 Posts: 1628 Credit: 24,230,968 RAC: 26 |
I solved this problem or one very much like it almost 40 year ago. I needed to store a magnetic tape on a 20MB disk drive The drive appeared as 20 1MB platters. The tape was constructed of multiple file of different lengths but none were larger than the platter size. The solution. First pass over the tape I constructed a table with the location and length of each file. I then sorted the table largest file to smallest. I then placed the largest file on the first platter, determined how much room was left and then place the next largest file that would fit in the remaining space. I continued to do this until the first platter was full. On the next platter I placed the remaining largest file and then look for a large file to fill the remaining space. This process was done in a very small table in memory so I only needed to make one more pass over the tape placing the file in it's correct location on the disk pack. Naturall all the big files ended up on the first disk with a few small files filling in the gaps and all the smaller files ended up on the last disk in the stack. It surprised me how little space was wasted on the disk when I took this approach. |
Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13 |
I solved this problem or one very much like it almost 40 year ago. I needed to store a magnetic tape on a 20MB disk drive The drive appeared as 20 1MB platters. The tape was constructed of multiple file of different lengths but none were larger than the platter size. I have done a bit of research and reading on this subject, and there are mixed results in the efficiency depending on how the data is sorted. Sorting ascending (smallest to largest) tends to be the least efficient, while descending tends to be the most efficient (but still a 22% inefficiency average). That's with the bin-packing problem. Knapsack is more along the lines of what I need, because rather than a series of bins, I only have one bin. I'm not looking to see how many bins I need for the data to store, but instead, I'm trying to fill the one bin I have the best that it can be filled. I'm actually drawing the logic out on paper before attempting to start coding it, and I think I have very broad, overall macro-level picture for what needs to happen, but the smaller logic details inside the objects in the flowchart are still going to need some figuring-out. Once I get this figured out, my plan is to kind of make a hybrid of knapsack and bin-solving--effectively making it into a "knapsack-packing problem," if you want to call it that. Start with the first knapsack, find out the absolute best way to fill it. Once the best solution has been found, seal it up, remove the items from the available pool, and move on to the next knapsack, and so on. The complexity of this is that it isn't a simple matter of just going through the input list sequentially one or two times.. it's going to involve trying every possible iteration, then go back through the results of every possibility and find out which one is closest to the target capacity without exceeding it. I'm not an expert C++ coder though, but I know probably enough to figure this out, but I'm going to need a multi-dimensional array or three, and I admittedly have not done a whole lot with those in the past, so... I'll be referencing some examples via google, I'm sure. I may need to enlist some help at some point, probably later on to provide some kind of GUI for it rather than being CLI/terminal-use only. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) |
Dena Wiltsie Send message Joined: 19 Apr 01 Posts: 1628 Credit: 24,230,968 RAC: 26 |
For years I have played around with the ruler problem and the best solution to this problem is exhaustive. The problem with that is if you are working with very many files, the computation time goes off the scale. It's past my bed time so I haven't thought of a good nearly good enough solution yet but maybe I will. An exhaustive search can be done with a few single dimension arrays but it's a loop within a loop concept. It will make a really messy C program because the solution requires unstructured code and C is a structured language. I coded the ruler problem in 68000 assembler and the code was very compact and clean. When I attempted it in C it was really messy and required the over use of the branches out of a loop. This problem will be the same because there will be dead ends that the code will uncover so rather than wasting time, the code will need to branch out and make another attempt. |
Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13 |
For years I have played around with the ruler problem and the best solution to this problem is exhaustive. The problem with that is if you are working with very many files, the computation time goes off the scale. [...] Yeah, it can be pretty exhaustive, but on modern day machines, it shouldn't take more than a few seconds to do anywhere from a few thousand to a million or two simple additions and comparisons. As long as it doesn't drag out to being on the order of minutes, I don't mind if it takes 10-15 seconds. In my research, I did see that brute-force bin-packing has a complexity of O(n^(2k)), where n is the number of items and k is the number of different sizes for the items. So effectively, if you have 100 different input file sizes, you're looking at (100^(2*100)) computations. That's why there are algorithms that try to cut corners and basically just use heuristics to save time. I'm still at the brain-ing stage of this though, so maybe I'll figure out a good way to do this. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) |
Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13 |
(Thanks for unlocking, mods.) So I have this habit of trying to make something absolutely perfect the first time through and I realized that I'm never going to make any sort of progress at all if I try to do that. So then I decided I would try a simpler approach. Get the list of file sizes, and then just calculate and display the sum of every possible iteration. The logic of finding which one is closest to the target (25,020,000,000 bytes) without exceeding can come after getting at least this part of it done. So this is where I have reached a point that is beyond my knowledge of C++ or PHP: how do I code the loops for every possible iteration of an array of n elements, and also, sum the values of each iteration? As a quick non-code example, I have an array that is comprised of: 1 2 3 4 5. How would you get the output to be something like: 1 = 1 2 = 2 3 = 3 4 = 4 5 = 5 1 + 2 = 3 1 + 3 = 4 1 + 4 = 5 1 + 5 = 6 2 + 3 = 5 2 + 4 = 6 2 + 5 = 7 3 + 4 = 7 3 + 5 = 8 4 + 5 = 9 1 + 2 + 3 = 6 1 + 2 + 4 = 7 1 + 2 + 5 = 8 1 + 3 + 4 = 8 1 + 3 + 5 = 9 1 + 4 + 5 = 10 2 + 3 + 4 = 9 2 + 3 + 5 = 10 2 + 4 + 5 = 11 3 + 4 + 5 = 12 1 + 2 + 3 + 4 = 10 1 + 2 + 3 + 5 = 11 1 + 2 + 4 + 5 = 12 1 + 3 + 4 + 5 = 13 2 + 3 + 4 + 5 = 14 1 + 2 + 3 + 4 + 5 = 15 I've seen a few very primitive examples for how to show every iteration that is possible from the values 1-5, but only using three values, so: 123, 124, 125, 134, 135, 234, 235, 345. But that doesn't help in this case since I don't care about just displaying the values.. I want to add them together, but also, it limits the number of items to a static value. I need multiple dynamic values for this, which I know is going to be several nested loops.. but I guess the question here is.. how would you code something that could be anywhere from 1-500 input values, and would basically do the example I put in the code section above? I understand the basic logic in my head, but I have no idea how to put it into code. Secondly... I'm not absolutely restricted to C++ here. I don't know anything about other languages, but I can learn and Google is a pretty good help, so if any of you know of a simpler, easier way in another language, I'm open to suggestions. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) |
William Rothamel Send message Joined: 25 Oct 06 Posts: 3756 Credit: 1,999,735 RAC: 4 |
My advice: Draw a flow chart. Use EXCEL and create random numbers from the input range. Then eliminate duplicates until the total number of unique permutations meets your calculated value then sum up the sums which you have been generating all along. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.