Uploaded image for project: 'Picard'
  1. Picard
  2. PICARD-970

New guess format functionality should use explicit buffer size

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Normal
    • Resolution: Fixed
    • Affects Version/s: 1.4
    • Fix Version/s: 1.4
    • Component/s: Lookup & Match
    • Labels:
      None

      Description

      In picard/formats/_init_.py#48 we open the file for reading, and then read the first 128 bytes in order to try to determine the internal format from the first 128 bytes.

          with file(filename, "rb") as fileobj:
              header = fileobj.read(128)
      

      However on the file function we do not specify a buffer size, which means the default buffer size is used. The default buffer size can be determined with:

      import io
      print (io.DEFAULT_BUFFER_SIZE)
      

      and on my Win64 system is 8192.

      It would be better to use the following code, which will avoid reading more sectors of disk than minimum needed (depending on o/s, disk format and cluster size) and minimises the amount of data transferred across the network:

          with file(filename, "rb", 128) as fileobj:
              header = fileobj.read(128)
      

        Attachments

          Activity

            People

            • Assignee:
              sophist Sophist
              Reporter:
              sophist Sophist
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: