Skip to content

BUG: make to_json with JSON Table Schema work correctly with string dtype #61889

@jorisvandenbossche

Description

@jorisvandenbossche
Member

(noticed because of some doctest failures cfr #61886)

Currently, for the strings as object dtype, it seems that we assume that object dtype are actually strings, and encode that as such in the schema part of the JSON Table Schema output:

>>> pd.Series(["a", "b", None], dtype=object).to_json(orient="table", index=False)
'{"schema":{"fields":[{"name":"values","type":"string"}],"pandas_version":"1.4.0"},"data":[{"values":"a"},{"values":"b"},{"values":null}]}'

But for the now-default string dtype, this is still seen as some custom extension dtype:

>>> pd.Series(["a", "b", None], dtype="str").to_json(orient="table", index=False)
'{"schema":{"fields":[{"name":"values","type":"any","extDtype":"str"}],"pandas_version":"1.4.0"},"data":[{"values":"a"},{"values":"b"},{"values":null}]}'

(note the "type":"string" vs "type":"any","extDtype":"str")

Given that the Table Schema spec has a "string" type, let's also use that when exporting our string dtype.

Activity

added this to the 3.0 milestone on Jul 17, 2025
added
IO JSONread_json, to_json, json_normalize
StringsString extension data type and string data
on Jul 17, 2025
khemkaran10

khemkaran10 commented on Jul 17, 2025

@khemkaran10
Contributor

Changing the order in the as_json_table_type function (by moving the is_string_dtype check before the ExtensionDtype check):

elif is_string_dtype(x):
    return "string"
elif isinstance(x, ExtensionDtype):
    return "any"
else:
    return "any"

seems to fix the issue. but I am not sure this is the best fix.

jorisvandenbossche

jorisvandenbossche commented on Jul 17, 2025

@jorisvandenbossche
MemberAuthor

@khemkaran10 that looks like a good fix! Feel free to open a PR for this

khemkaran10

khemkaran10 commented on Jul 18, 2025

@khemkaran10
Contributor

take

khemkaran10

khemkaran10 commented on Jul 21, 2025

@khemkaran10
Contributor

@jorisvandenbossche can you please review the PR and let me know if any changes are needed.

modified the milestones: 3.0, 2.3.2 on Jul 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Labels

BugIO JSONread_json, to_json, json_normalizeStringsString extension data type and string data

Type

No type

Projects

No projects

Relationships

None yet

    Participants

    @jorisvandenbossche@khemkaran10

    Issue actions

      BUG: make to_json with JSON Table Schema work correctly with string dtype · Issue #61889 · pandas-dev/pandas