Uploaded image for project: 'ListenBrainz'
  1. ListenBrainz
  2. LB-473

ListenBrainz-Labs: SQL queries or DataFrame functions?

    XMLWordPrintable

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Normal
    • Resolution: Unresolved
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Spark dataframes are immutable.  We initially decided to be consistent and hence use SQL queries everywhere in our codebase when doing database (dataframe) operations. Like here

      But since dataframes are immutable we cannot run SQL queries to update or alter our dataframes ( in short modify existing dataframes). In such situation, we must use dataframe functions like union, subtract etc which will create a new dataframe, copy the contents of the previous dataframe to this new dataframe and append our changes to the new dataframe. Any operation will always create a new dataframe like how tuples work in python (an oversimplified example). 

      Should we from now on do every database operation using dataframe functions? Or should we use SQL wherever possible and use functions at other places? Or should we modify the existing SQL queries to use functions? Or should we leave the existing queries as such and adopt a consistent scheme from now onwards?

      This issue basically heads for discussion. Not an urgent one. 

        Attachments

          Activity

            People

            • Assignee:
              vansika Vansika Pareek
              Reporter:
              vansika Vansika Pareek
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:

                Packages

                Version Package