-
-
Notifications
You must be signed in to change notification settings - Fork 19k
Closed
Closed
Copy link
Labels
BugIO JSONread_json, to_json, json_normalizeread_json, to_json, json_normalizeStringsString extension data type and string dataString extension data type and string data
Milestone
Description
(noticed because of some doctest failures cfr #61886)
Currently, for the strings as object dtype, it seems that we assume that object dtype are actually strings, and encode that as such in the schema part of the JSON Table Schema output:
>>> pd.Series(["a", "b", None], dtype=object).to_json(orient="table", index=False)
'{"schema":{"fields":[{"name":"values","type":"string"}],"pandas_version":"1.4.0"},"data":[{"values":"a"},{"values":"b"},{"values":null}]}'
But for the now-default string dtype, this is still seen as some custom extension dtype:
>>> pd.Series(["a", "b", None], dtype="str").to_json(orient="table", index=False)
'{"schema":{"fields":[{"name":"values","type":"any","extDtype":"str"}],"pandas_version":"1.4.0"},"data":[{"values":"a"},{"values":"b"},{"values":null}]}'
(note the "type":"string"
vs "type":"any","extDtype":"str"
)
Given that the Table Schema spec has a "string" type, let's also use that when exporting our string dtype.
Metadata
Metadata
Assignees
Labels
BugIO JSONread_json, to_json, json_normalizeread_json, to_json, json_normalizeStringsString extension data type and string dataString extension data type and string data
Activity
khemkaran10 commentedon Jul 17, 2025
Changing the order in the as_json_table_type function (by moving the is_string_dtype check before the ExtensionDtype check):
seems to fix the issue. but I am not sure this is the best fix.
jorisvandenbossche commentedon Jul 17, 2025
@khemkaran10 that looks like a good fix! Feel free to open a PR for this
khemkaran10 commentedon Jul 18, 2025
take
khemkaran10 commentedon Jul 21, 2025
@jorisvandenbossche can you please review the PR and let me know if any changes are needed.