New-ish to Haskell. Can’t figure out the best way to get Cassava (Data.Csv) to do what I want. Can’t tell if I’m missing some haskell type idioms or common knowledge or what.

Task: I need to read in a CSV, but I don’t know what the headers/columns are going to be ahead of time. The user will provide input to say which headers from the CSV they want processed, but I won’t know where (index-wise) those columns will be in the CSV, nor how many total columns there will be (either specified by the user or total). Say I have a [String] which lists the headers they want.

Cassava is able to read CSVs with and without headers.

Without headers Cassava can read in entire rows, even if it doesn’t know how many columns are in that row. But then I wouldn’t have the header data to filter for the values that I need.

With headers Cassava requires(?) you to define a record type instantiating its FromNamedRecord typeclass, which is how you access parts of the column by name (using the record fields). But in order for this to be well defined you need to know ahead of time everything about the headers: their names, their quantity, and their order. You then emulate that in your record type.

Hopefully I’m missing something obvious, but it feels a lot like I have my hands tied behind my back dealing with the types provided by Cassava.

Help greatly appreciated :)

  • @Pipoca@lemmy.world
    link
    fedilink
    28 months ago

    I realize that this was posted a couple weeks ago, so hopefully this is still helpful. One thing just to point out is that there’s (FromField a, FromField b, Ord a) => FromNamedRecord (Map a b).

    That will just parse the CSV to a map, indexed by the field name. You might want a Map String String?

    • mrhOP
      link
      English
      38 months ago

      Yep thanks that’s exactly what I ended up doing! In my case it was easiest to work with Map Text Text.

  • @bss03@infosec.pub
    link
    fedilink
    English
    18 months ago

    Types are erased; they don’t exist are runtime. So, if the user is going to provide input at runtime to determine the processed data, you don’t select that data via types (or type class instances).

    You probably just want to have Cassava give you a [[Text]] value (FromRecord [Text] is provided by the library, you don’t have to write it), then present the header line to the user, then process the data based on what they select. Just do the simple thing.

    If the CSV file is too big to fit into memory, you might use Incremental module to read just the first record as a [Text], present that to the user, and then use their input to guide the rest of the processing. But, I wouldn’t mess with that until after you have the small case worked out fairly well first.