Uploaded image for project: 'Picard'
  1. Picard
  2. PICARD-970

New guess format functionality should use explicit buffer size

XMLWordPrintable

    • Icon: Improvement Improvement
    • Resolution: Fixed
    • Icon: Normal Normal
    • 1.4
    • 1.4
    • Lookup & Match
    • None

      In picard/formats/_init_.py#48 we open the file for reading, and then read the first 128 bytes in order to try to determine the internal format from the first 128 bytes.

          with file(filename, "rb") as fileobj:
              header = fileobj.read(128)
      

      However on the file function we do not specify a buffer size, which means the default buffer size is used. The default buffer size can be determined with:

      import io
      print (io.DEFAULT_BUFFER_SIZE)
      

      and on my Win64 system is 8192.

      It would be better to use the following code, which will avoid reading more sectors of disk than minimum needed (depending on o/s, disk format and cluster size) and minimises the amount of data transferred across the network:

          with file(filename, "rb", 128) as fileobj:
              header = fileobj.read(128)
      

            sophist Sophist
            sophist Sophist
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved:

                Version Package
                1.4